Web Development
HTML and Semantic Markup
Open any website, press Ctrl+U - and HTML appears. This is the only language that every browser on the planet understands. But between "working" HTML and correct HTML there is a gap that determines whether the site gets found on Google, whether visually impaired users can navigate it, and how easy it will be to maintain a year from now.
- **Wikipedia** - the semantic structure with `<article>`, `<nav>`, `<section>` allows screen readers to navigate an article in seconds
- **Airbnb** lost 25% of traffic due to a bug with missing Open Graph - links in messengers displayed without a preview
- **Lawsuits:** in 2019, 2,256 website accessibility lawsuits were filed in the USA. Domino's Pizza lost its case in the Supreme Court for having an inaccessible website
HTML Document Structure
**Every website starts with HTML.** Google, Wikipedia, YouTube - they are all HTML documents that a browser turns into visual pages. HTML (HyperText Markup Language) is not a programming language but a markup language: it describes the *structure* of content.
**`<!DOCTYPE html>`** - the document type declaration. Without it, the browser switches to quirks mode and renders the page according to outdated rules from 2001. One line prevents dozens of bugs.
An HTML document consists of **nested tags**. Each tag is an instruction for the browser. `<head>` contains meta-information (title, charset, stylesheet links), while `<body>` contains the visible page content.
| Tag | Purpose | Example |
|---|---|---|
| h1–h6 | Headings by level of importance | <h1>Main Heading</h1> |
| p | Paragraph of text | <p>Paragraph text</p> |
| a | Hyperlink | <a href="url">Link</a> |
| img | Image (self-closing) | <img src="photo.jpg" alt="Description"> |
| div | Block container (no meaning) | <div class="wrapper">...</div> |
| span | Inline container (no meaning) | <span class="highlight">text</span> |
**Attributes** add information to tags. `<a>` has `href` (link address), `<img>` has `src` (path to image) and `alt` (text description). The `class` attribute is used for styling, `id` - for unique element identification.
Always specify `lang` on the `<html>` tag and `charset` in `<meta>`. Without `lang`, a screen reader won't know what language to read the page in. Without `charset`, text can turn into garbled characters.
What happens when <!DOCTYPE html> is removed from the beginning of an HTML document?
Semantic HTML
**Consider a book without a table of contents, chapters, or subheadings** - just continuous text. It can be read, but finding a specific place is painful. The same thing happens when a site is built from only `<div>` elements - the browser, search engine, and screen reader cannot understand where the navigation is, where the article is, and where the sidebar is.
**Semantic HTML arrived with HTML5 (2014).** Before that, developers had no `<header>`, `<nav>`, `<main>` - they had to use `<div id="header">`. HTML5 introduced meaningful tags that convey information about the role of content.
| Tag | Role | When to use |
|---|---|---|
| <header> | Page or section header | Logo, navigation, search |
| <nav> | Main navigation | Menu, breadcrumbs, pagination |
| <main> | Main content (one per page!) | Unique page content |
| <article> | Self-contained content block | Article, post, comment |
| <section> | Thematic grouping | Chapters, tabs, sections |
| <aside> | Secondary content | Sidebar, ads, widgets |
| <footer> | Page or section footer | Copyright, contacts, links |
**The rule for choosing a tag:** ask - "If all styles are removed, will the page structure be clear?" A wall of `<div>` elements means the answer is no. Semantic tags are a contract between the author and everyone who will "read" the HTML: the browser, search engine, screen reader, and other developers.
**`<main>` can only appear once per page.** `<header>` and `<footer>` can be nested (for example, an `<article>` can have its own `<header>`), but `<main>` is always unique.
How many <main> tags are allowed on a single HTML page?
Accessibility (a11y)
**15% of the world's population lives with some form of disability** (WHO data). That is more than a billion people. Blind users use screen readers, people with motor impairments use keyboard navigation, and those with low vision use screen magnifiers. Accessibility (a11y) is not a "nice to have" - it is a required part of development.
**WCAG 2.1** (Web Content Accessibility Guidelines) - the international accessibility standard. Three levels: **A** (minimum), **AA** (recommended, required by law in many countries), **AAA** (maximum). Most laws require level AA.
**The first rule of a11y - use semantic HTML.** The browser automatically builds an accessibility tree from semantic tags. `<button>` gets role="button" and keyboard support (Enter/Space). `<div onclick>` does not. The semantics from the previous concept are the foundation of accessibility.
| Rule | What to do | Why |
|---|---|---|
| alt for images | Describe the content: alt="Kitten playing with a ball" | Screen reader will read the description instead of the image |
| label for input | Link via for/id or nesting | Clicking the label focuses the input; screen reader announces the field's purpose |
| Contrast ratio | Text/background minimum 4.5:1 (AA), 7:1 (AAA) | For low-vision users and bright sunlight |
| Keyboard navigation | Tab to move, Enter/Space for actions | Users without a mouse must have full access |
| ARIA attributes | aria-label, aria-describedby, role | Supplement semantics when HTML tags are not enough |
**The first rule of ARIA: do not use ARIA when a native HTML tag is available.** `<button>` is better than `<div role="button">`. ARIA is a supplement to semantics, not a replacement.
Check accessibility with tools: **Lighthouse** (built into Chrome DevTools), **axe DevTools** (extension), **WAVE**. Navigating a site with only the keyboard immediately reveals the problems.
When should ARIA attributes be used?
SEO and Metadata
**Google processes 8.5 billion search queries per day.** For a site to be found among billions of pages, the search engine must understand: what the page is about, how high-quality it is, and whether it is worth showing to the user. SEO (Search Engine Optimization) starts right in the HTML code.
**`<title>`** - the most important SEO element. It appears in search results, in the browser tab, and when saving to bookmarks. Recommended length - 50-60 characters. **`<meta description>`** - description for the search listing, 120-160 characters.
**Open Graph** - a protocol created by Facebook in 2010. When someone shares a link in a messenger or social network, the client reads the og tags and builds a preview: image, title, description. Without og tags, the preview will be empty or random.
| File / mechanism | Purpose | Example |
|---|---|---|
| robots.txt | Tells crawlers what to index | Disallow: /admin/ |
| sitemap.xml | Map of all pages on the site | List of URLs with priority and date |
| JSON-LD | Structured data for rich snippets | Recipe, review, FAQ in search results |
| canonical | Primary version of a duplicate | <link rel="canonical" href="..."> |
| hreflang | Language versions of the page | For multilingual sites |
Check SEO with tools: **Google Search Console** (free, shows how Google sees the site), **Lighthouse SEO audit**, **Schema.org validator** for checking JSON-LD markup.
SEO means stuffing meta keywords with keywords and adding hidden text to the page
Google has ignored meta keywords since 2009. Modern SEO is about quality content, semantic HTML structure, page load speed, mobile adaptation, and structured data
Search algorithms have evolved. Google uses ML models (BERT, MUM) that understand the meaning of text. Keyword manipulation is not only useless - it can result in lower rankings (Google Penalty)
Which meta tag does Google use to determine page relevance?
Key Ideas
- **HTML is structure, not visuals.** `<!DOCTYPE html>`, `<head>` for metadata, `<body>` for content. Each tag is an instruction for the browser
- **Semantics over div-soup.** `<header>`, `<nav>`, `<main>`, `<article>`, `<footer>` - describe the role of content. One `<main>` per page
- **Accessibility is required.** `alt` for img, `label` for input, keyboard navigation. ARIA - a supplement to semantics, not a replacement
- **SEO starts in HTML.** `<title>` and content matter more than meta keywords. Open Graph for social media, JSON-LD for rich snippets
Related Topics
HTML is the foundation on which all other web technologies are built:
- CSS: From Cascade to Grid — CSS styles HTML elements - without HTML there is nothing to style
- JavaScript: Language Fundamentals — JavaScript manipulates HTML via the DOM - Document Object Model
Вопросы для размышления
- Open a favorite website, press F12 → Elements. Are semantic tags used there, or is it div-soup?
- Try navigating any website using only the Tab key. Are all interactive elements accessible?
- Paste a link to any site in Telegram. Does the preview look good? If not - which og tags are missing?
Связанные уроки
- web-02 — CSS requires HTML structure to apply styles
- web-03 — JavaScript DOM manipulation operates on HTML elements
- alg-01-big-o — Both define contracts: algorithm complexity vs tag semantics
- st-01-feedback-loops — Semantic HTML creates feedback loops between browser, SEO, and accessibility
- sec-01 — Security considerations start at the HTML layer (XSS, CORS, CSP)
- comp-01-intro