Operating Systems
Input-Output
The processor is a million times faster than a hard drive. One random read from HDD = 10 million CPU instructions wasted. The network transmits gigabytes, the graphics card renders 60 frames per second, the disk writes logs - all this is I/O. Effective input-output management separates a sluggish program from a lightning-fast one. Understanding I/O is understanding why systems run fast or slow.
- **Why do games load faster on SSD?** HDD: seek 5ms × 10000 files = 50 seconds just for seeking. SSD: seek 0.1ms → the same operation in 1 second. Understanding I/O explains the 50x difference.
- **How does nginx handle 100,000 connections on one server?** The secret: epoll (asynchronous I/O). One process waits for events from thousands of sockets, the CPU is not blocked. Without epoll, 100,000 threads would be needed - collapse.
- **Why do servers use RAID?** A disk lives on average 4 years. A server with 20 disks experiences a failure every 2 months. RAID 6 allows operation with 2 dead disks - the service does not go down, data is safe.
Цели урока
- Distinguish device controllers, drivers, interrupts, and DMA
- Compare I/O methods: programmed I/O, interrupt-driven, DMA by CPU overhead
- Know disk scheduling: FCFS, SSTF, SCAN, C-SCAN, LOOK
- Understand RAID levels: 0 (stripe), 1 (mirror), 5 (parity), 10 (stripe+mirror) and trade-offs
- Apply buffering: page cache, double buffering, scatter-gather I/O
I/O Hardware
**Input-Output devices (I/O devices)** are the eyes, ears, and hands of a computer. Keyboard, mouse, disk, network card, video card - all these are I/O devices. The processor is 1000 times faster than the slowest I/O device, so effective I/O management is critical for performance.
**Classification of I/O devices:** 1. **Block devices** - work with blocks of data: HDD, SSD, USB flash drives 2. **Character devices** - stream data: keyboard, COM port, terminal 3. **Network devices** - network cards (special case)
**Anatomy of an I/O device.** Each device consists of two parts: 1. **Controller** (hardware) - a chip that manages the device. The controller has **registers** for commands and data. 2. **Driver** (software) - code in the OS that knows how to communicate with a specific controller.
Example: Reading from Disk
When a program calls `read()`, a chain of events occurs: 1. **Syscall** → kernel 2. **VFS** finds the file system driver 3. **Disk driver** writes the READ command to the controller's register 4. **Controller** moves the disk head, reads the sector 5. **Data** is transferred to memory via DMA (Direct Memory Access) 6. **Interrupt** notifies the CPU that the operation is complete
**Memory-Mapped I/O (MMIO).** Modern device controllers are "visible" as memory regions. The driver writes to address 0xFEDC000 - in reality, this is a network card register. The CPU does not distinguish - it's just a store to memory.
**Port-Mapped I/O (PMIO).** An alternative to MMIO: special processor instructions `IN/OUT` (x86). Rarely used, mainly by legacy devices.
How does Memory-Mapped I/O (MMIO) differ from Port-Mapped I/O (PMIO)?
I/O Management Methods
How does the processor know that a device has finished an operation? There are three main methods: **Polling**, **Interrupts**, **DMA (Direct Memory Access)**. Each method is a trade-off between simplicity and efficiency.
**1. Polling.** The simplest method: the CPU checks the device status in a loop. "Ready? No. Ready? No. Ready? Yes!" Simple but wasteful - the CPU is busy with useless waiting.
**2. Interrupts.** The device "knocks" on the CPU via a special signal when it is ready. The CPU is busy with other work in the meantime. An interrupt arrives → the CPU saves the context → processes it → returns. Efficient!
Interrupts as a Doorbell
Analogy: waiting for a courier. **Polling** - every 10 seconds, walk to the door and look through the peephole: "Arrived? No. Arrived? No." Wasted time. **Interrupt** - a doorbell exists. Work continues in the meantime, and when the courier arrives, the bell breaks in. More efficient! Similarly, the CPU: an interrupt allows it to work while the disk spins.
**3. DMA (Direct Memory Access).** Problem: even with interrupts, the CPU must copy data from the device buffer to memory. For large volumes (video, network), this is slow. **DMA controller** copies data directly to memory, the CPU is not involved at all!
DMA in Graphics Cards
When a graphics card renders a game frame, it generates **gigabytes** of data per second (4K @ 60fps ≈ 1.5 GB/s). If the CPU copied every pixel, performance would drop 10 times. **DMA** allows the GPU to write directly to video memory, bypassing the CPU. That's why games run smoothly.
**Choosing an I/O method depending on the device:** - **Polling:** Simple embedded systems, rare events (button pressed?) - **Interrupts:** Most devices (keyboard, mouse, timers) - **DMA:** High-speed devices (disks, network, video)
**Asynchronous I/O in Linux: select/poll/epoll.** When a program works with thousands of sockets (web server), blocking `read()` is not suitable. **epoll** allows waiting for events from multiple file descriptors simultaneously.
Why is DMA (Direct Memory Access) critically important for high-speed devices like 10 Gbit/s network cards?
Disk Scheduling Algorithms
**Hard Drives (HDD)** are mechanical devices. The read head moves over spinning platters. Seeking data (seek) is the slowest operation (~5-10ms). If requests come to read sectors 10, 500, 25 - in what order should they be executed? **Disk scheduling** optimizes the service order.
**HDD performance parameters:** - **Seek time** - moving the head to the required cylinder: 5-10 ms - **Rotational latency** - waiting for the required sector to come under the head: 4 ms (7200 RPM) - **Transfer time** - reading data: 0.1 ms for a 4KB block Seek time dominates! Minimizing head movement = maximum performance.
**FCFS (First-Come, First-Served).** The simplest algorithm: process requests in the order they arrive. Problem: the head moves chaotically across the disk, large seek times.
**SSTF (Shortest Seek Time First).** Choose the nearest request. A greedy algorithm: locally optimal, but can cause **starvation** (distant requests wait indefinitely).
**SCAN (Elevator algorithm).** The head moves in one direction, servicing all requests on the way, reaches the end → turns around. Like an elevator: goes up, stops at all floors, then down.
**C-SCAN (Circular SCAN).** An improvement over SCAN: after reaching the end, the head instantly "teleports" to the beginning, starts a new pass. More uniform wait times.
SCAN vs C-SCAN: Bus Analogy
**SCAN** - the bus goes in a loop, but one route there, another back. A passenger at the terminal station waits longer. **C-SCAN** - the bus goes only in one direction, quickly returns to the start at the terminal (without stops). More fair distribution of wait times among all requests.
**LOOK and C-LOOK.** Optimizations of SCAN/C-SCAN: the head does not reach the physical edge of the disk (0 or 256), but turns around at the last request. Saves movement.
**Disk scheduling in the SSD era:** SSD (flash memory) has no moving parts → seek time is practically zero (0.1 ms). The order of requests does not affect performance! Modern OS use **Deadline** or **BFQ (Budget Fair Queueing)** schedulers - they optimize latency and fairness, not head movement.
Why NVMe is Faster than SATA
**SATA III:** Interface for HDD, maximum 600 MB/s, one command queue (32 requests). Protocol optimized for slow disks. **NVMe (PCIe):** Designed for SSD. Bandwidth 3500 MB/s (PCIe 3.0 x4), **64000 queues** with 64000 commands each. Parallelism + low latency (direct access to CPU via PCIe). Therefore, NVMe SSDs are 5-7 times faster than SATA SSDs.
Which disk scheduling algorithm guarantees no starvation (endless waiting of requests)?
RAID: Fault Tolerance and Performance
**RAID (Redundant Array of Independent Disks)** - a technology for combining multiple disks to increase reliability, performance, or both. The idea: using 4 disks makes it possible to build a system that survives the failure of any disk (RAID 5), or reads data 4 times faster (RAID 0).
**Main RAID levels:** - **RAID 0** (striping) - speed, no fault tolerance - **RAID 1** (mirroring) - full duplication, 2x reliability - **RAID 5** (striping + parity) - performance + protection from 1 failure - **RAID 6** (double parity) - protection from 2 failures - **RAID 10** (1+0) - mirror of stripes, balance
**RAID 0 (Striping): maximum speed, zero protection.** Data is split into blocks and distributed across disks. A 4MB file with 4 disks: each disk gets 1MB. Reading/writing in parallel → speed multiplies by N. But: if **any** disk fails - all data is lost.
RAID 0 for Video Editing
Professional editing of 4K video requires speeds of 500+ MB/s (multiple streams simultaneously). One SSD gives 500 MB/s, but for 8K, more is needed. **RAID 0 of 4 NVMe SSDs** → 2000 MB/s. Data is not critical (working files, backup on the server), speed is important. A typical use case for RAID 0.
**RAID 1 (Mirroring): full duplication.** Every byte is written to 2 disks. One disk dies - data is on the second. Simple and reliable scheme. Drawback: capacity is halved (2TB + 2TB = 2TB usable).
**RAID 5 (Striping + Parity): balance between performance and reliability.** Data is distributed across disks (striping), plus a **parity** checksum is calculated for each stripe, stored on another disk. Data from any failed disk can be recovered using the formula: `D = A ⊕ B ⊕ P` (XOR).
RAID 5 in Servers
Typical file server: 6 disks of 4TB in RAID 5 → 20TB usable capacity (instead of 24TB). Read speed ~500 MB/s (almost like RAID 0), but with protection: any disk can fail, data is safe. While the failed disk is being replaced, the system operates (degraded mode). After replacement - automatic recovery (rebuild).
**RAID 6 (Double Parity): protection from 2 failures.** Like RAID 5, but with **two** parity blocks per stripe (using different algorithms). Can lose any 2 disks. Critical for large arrays: rebuilding RAID 5 can take a day, the probability of a second failure during rebuild is high.
**RAID 10 (1+0): Mirror of Stripes.** First, create mirror pairs (RAID 1), then combine them into a stripe (RAID 0). For example, 4 disks: (Disk1+Disk2) mirror, (Disk3+Disk4) mirror, two mirrors in RAID 0. Read speed 4x, high reliability, but capacity 50%.
**Hardware RAID vs Software RAID:** - **Hardware RAID:** Special controller with processor, cache, battery (for power failure protection). Does not load the CPU, but expensive ($500+). - **Software RAID:** Managed by the OS (Linux mdadm, Windows Storage Spaces). Uses CPU for parity, but modern CPUs handle it. Free, flexible. Trend: Software RAID is winning (ZFS, Btrfs built-in RAID).
Key Ideas
- **I/O devices are a bottleneck.** CPU is 1000+ times faster than a disk. Effective I/O management is critical: DMA frees the processor from data copying, interrupts eliminate useless waiting (polling). Without this, the system slows down.
- **Three I/O management methods: Polling, Interrupts, DMA.** Polling - simple but wasteful (CPU spins in a loop). Interrupts - efficient (CPU works, the device "knocks" when ready). DMA - for large volumes (device writes directly to memory, CPU is free).
- **Disk scheduling optimizes HDD head movement.** FCFS is chaotic, SSTF is greedy (starvation), SCAN is like an elevator (fair, no starvation). For SSD, it's irrelevant (no mechanics), but scheduling algorithms remain for fairness and latency.
- **RAID: trade-off between speed, capacity, reliability.** RAID 0 - maximum speed, zero protection. RAID 1 - simple mirror. RAID 5 - balance (performance + protection from 1 failure). RAID 6 - for large arrays (protection from 2 failures). RAID does not replace backup!
Related Topics
Input-output is closely related to OS architecture and system performance:
- File Systems — FS manages metadata and data placement on the disk. Understanding I/O (blocks, scheduling) is necessary for understanding FS performance: why ext4 is faster than ext3, what journaling is, how copy-on-write works in Btrfs
- Virtual Memory — Page faults cause disk I/O (swap). Swap on HDD kills performance (5ms latency), on SSD it's tolerable. Memory-mapped files (mmap) turn file I/O into memory work - OS manages page loading
- Process Scheduling — A process blocks on I/O (waits for disk/network) → scheduler switches to another task. I/O-bound processes (lots of I/O) require a different scheduling strategy than CPU-bound (lots of computation)
- Network Programming — Network is I/O. The same principles: blocking I/O (simple, slow), non-blocking (select/poll/epoll). Understanding asynchronous I/O is necessary for high-performance servers (nginx, Node.js)
Вопросы для размышления
- Why are databases (PostgreSQL, MySQL) so sensitive to disk speed? How can understanding I/O help optimize database performance?
- Designing a system for streaming video (4K, 60fps): which I/O management methods are appropriate? Why is DMA needed? Why might RAID 0 be useful?
- How does programming for embedded systems (microcontrollers) differ from desktop applications in terms of I/O? Why is polling used on Arduino, while Linux servers use epoll?
Связанные уроки
- os-09-filesystems — The filesystem is built on top of block I/O
- os-04-scheduling — I/O scheduling is the same class of problem as CPU scheduling
- os-18-io-advanced — io_uring, async I/O, vectored I/O are advanced techniques
- db-11-query-optimization