Computer Architecture
CPU Structure: The Brain of a Computer
Цели урока
- Understand von Neumann architecture and why it replaced hardwired programming
- Know the purpose of general-purpose and special-purpose registers
- Understand the Control Unit's role in instruction decoding
- Know the three bus types and their functions
- Understand the clock signal and its physical constraints
Предварительные знания
- ALU operations
- Logic gates
- Binary number system
2007. Apple ships the original iPhone at 412 MHz - ten times slower than that year's desktop Intel Core 2 Duo. Yet the iPhone felt faster than Windows Mobile on a 600 MHz chip. Frequency is not speed. Understanding CPU structure explains why Apple M1 at 3 GHz outpaces Intel at 5 GHz.
- **Code optimization**: cache-friendly algorithms run 10x faster because they reduce bus round-trips
- **Embedded systems**: ARM vs RISC-V selection depends on register file design and instruction set
- **Systems programming**: understanding PC, SP, FLAGS is essential for writing debuggers and OS kernels
- **Reverse engineering**: reading disassembly requires knowing what RAX, RSP, and RFLAGS mean
Historical context
In June 1945, John von Neumann authored 'First Draft of a Report on the EDVAC' - 101 pages that changed computing forever. The key idea: store the program in the same memory as the data. Before this, machines like ENIAC (30 tons) required days of physical rewiring to change programs. Von Neumann's stored-program concept made software possible. Every CPU built since - from Raspberry Pi to Apple M-series - follows the same architecture he sketched in 1945.
Von Neumann Architecture
2007. Apple ships the original iPhone with an ARM CPU clocked at 412 MHz - ten times slower than the desktop Intel Core 2 Duo of the same year. Yet the iPhone felt faster than Windows Mobile running at 600 MHz. The reason: a different microarchitecture and different instruction set. Clock frequency is not speed. Understanding CPU structure explains why the Apple M1 at 3 GHz outperforms Intel at 5 GHz.
- **Registers** - ultra-fast memory cells inside the CPU
- **ALU** - performs arithmetic and logic operations
- **Control Unit (CU)** - decodes instructions and orchestrates everything
- **Buses** - wires for transferring data, addresses, and control signals
Harvard Architecture: separate memory banks for code and data. Used in microcontrollers (Arduino, STM32). Faster fetch, but more complex silicon.
The defining feature of von Neumann architecture:
Registers: Memory Inside the CPU
A register is a tiny memory cell built directly into the processor die. Access takes 1 clock cycle (~0.3 ns at 3 GHz). RAM access costs 50-100 cycles. That 100x gap is why compiler register allocation is a core optimization.
| Register (x86-64) | Size | Purpose |
|---|---|---|
| RAX | 64 bits | Accumulator, return value |
| RBX | 64 bits | Base register |
| RCX | 64 bits | Counter (loops) |
| RDX | 64 bits | Data (I/O, multiply/divide) |
| RSI | 64 bits | Source Index (string ops) |
| RDI | 64 bits | Destination Index (string ops) |
| RSP | 64 bits | Stack Pointer |
| RBP | 64 bits | Frame Base Pointer |
| R8-R15 | 64 bits | Additional general-purpose |
Backward compatibility: RAX contains EAX (32 bits), AX (16 bits), AL (lower 8 bits):
Why are registers faster than RAM?
Special-Purpose Registers
Some registers serve fixed architectural roles and are managed automatically by the CPU:
| Register | Name | Function |
|---|---|---|
| PC / RIP | Program Counter | Address of the next instruction to fetch |
| IR | Instruction Register | Currently executing instruction |
| FLAGS / RFLAGS | Status Register | Condition flags: Z, N, C, V and more |
| SP / RSP | Stack Pointer | Top of the call stack |
| MAR | Memory Address Register | Address for the next memory read/write |
| MDR | Memory Data Register | Data just read from or about to be written to memory |
FLAGS register bits: ZF (Zero Flag) - result is zero; SF (Sign Flag) - result is negative; CF (Carry Flag) - unsigned overflow; OF (Overflow Flag) - signed overflow.
What does the Program Counter (PC) store?
Control Unit
The Control Unit is the conductor of the CPU. It reads each instruction from the IR, decodes the opcode, and fires the exact set of control signals needed to execute it.
| Type | Description | Examples |
|---|---|---|
| Hardwired | Pure logic circuits, fast | MIPS, ARM (RISC) |
| Microprogrammed | Microcode ROM, flexible | x86 (CISC complex instructions) |
x86 Microcode: complex instructions like REP MOVSB or CPUID are decoded into a sequence of micro-operations internally. CPU bugs (e.g., Spectre mitigations) can be patched by updating microcode without hardware changes.
What does the Control Unit do?
Buses: The Data Highways
A bus is a shared set of parallel wires carrying signals between CPU components and memory. Three types: Data Bus (the payload), Address Bus (where to send it), Control Bus (read or write and when).
| Bus | Function | Typical Width |
|---|---|---|
| Data Bus | Transfer payload data | 64 bits |
| Address Bus | Specify memory location | 48-64 bits |
| Control Bus | Read/Write/Clock signals | ~20 lines |
Bus bandwidth is often the bottleneck. The old Intel FSB (Front Side Bus) peaked at ~10 GB/s. AMD's modern Infinity Fabric hits 100+ GB/s. Moving data off-chip is expensive - cache exists to hide this cost.
If the Address Bus has 32 lines, what is the maximum addressable memory?
Clock Signal
The clock is the heartbeat of the CPU. A crystal oscillator generates a steady square wave; every rising edge triggers the next operation. Everything in the CPU marches in lockstep to this signal.
- Register read
- ALU operation
- Register write
- PC increment
Physical limit: power dissipation scales roughly as f^3. At 4-5 GHz silicon starts melting. Beyond that threshold, the only path to more performance is parallelism - wider pipelines, more cores.
Turbo Boost: a CPU can temporarily overclock by 200-500 MHz when thermal headroom allows, then throttles back. Intel calls it Turbo Boost; AMD calls it Precision Boost.
CPU clock frequency alone determines its performance
Performance depends on IPC (instructions per cycle), cache size, pipeline width, memory bandwidth, and more
The Apple M1 at 3 GHz consistently outperforms many Intel chips at 5 GHz, thanks to wider pipelines (IPC ~8) and better cache hierarchy. Frequency is just one variable in the equation.
A CPU running at 3 GHz executes how many clock cycles per second (answer in billions)?
Key Ideas
- Von Neumann: code and data share one memory - this enabled software as we know it
- Registers: ~16-32 ultra-fast cells inside the CPU, 1-cycle access vs 50-100 for RAM
- PC tracks the next instruction; FLAGS stores condition bits after each operation
- Control Unit decodes opcodes and fires control signals to every other CPU unit
- Three buses: Data (payload), Address (location), Control (read/write/timing)
- Clock frequency is not speed - IPC, pipeline width, and cache matter equally
Related Topics
CPU structure is the foundation for understanding performance optimizations:
- Instruction Cycle — Fetch-Decode-Execute - how the CPU uses these components each cycle
- Pipelining — Parallel execution of instruction stages for higher throughput
- Cache — Fast memory hierarchy that bridges the register-to-RAM latency gap
Связанные уроки
- arch-03-alu — The ALU is one of the four core CPU components studied here
- arch-02-logic-gates — Logic gates are the transistor-level substrate of every register and control circuit
- arch-05-instruction-cycle — CPU components (PC, IR, CU, ALU) are the physical implementation of Fetch-Decode-Execute
- arch-06-pipelining — Pipelining exploits independent CPU stages (CU, ALU, buses) in parallel across instructions
- arch-09-cache — Cache sits between registers and RAM - understanding both ends explains why cache exists
- os-01-intro