Operating Systems

Processes

A production PostgreSQL server under load runs 100+ processes. Chrome runs 30-50. A crashing Chrome tab leaves the rest untouched. A PostgreSQL worker brought down by a bad query does not take the whole cluster with it. Process isolation is the architectural decision that makes modern systems stable. Understanding what a process is means understanding why this works the way it does.

  • **Chrome Site Isolation**: each tab runs as a separate process. The death of a Flash plugin in 2015 crashed the plugin's page but not the browser - direct consequence of process isolation
  • **Nginx**: one master process manages config; worker processes (one per CPU core) handle requests. 1 million concurrent connections with minimal context switches
  • **PostgreSQL**: one backend process per connection. At 1000 connections - 1000 PCBs in kernel memory. This is exactly why PgBouncer (connection pooling) is critical at scale
  • **Docker containers** are processes with namespace isolation. `docker run` calls fork() + clone() with CLONE_NEWPID, CLONE_NEWNET, and other namespace flags
  • **Node.js cluster** module: the master calls fork() for each worker - the same system calls used in C programs 40 years ago

Цели урока

  • Define a process: program + execution context + resources + address space
  • List process states (New, Ready, Running, Waiting, Terminated) and transitions between them
  • Know what lives in the PCB: PID, state, PC, registers, memory pointers, open FDs
  • Apply fork/exec/wait; understand copy-on-write and why fork and exec are separate
  • Estimate costs: process create ~1ms, context switch ~5µs, exit ~100µs

Dennis Ritchie and the birth of processes in Unix

In 1969, Ken Thompson and Dennis Ritchie were writing Unix on a PDP-7 at Bell Labs. The process concept was a breakthrough: instead of running one program at a time, the system maintained multiple live programs simultaneously, switching between them. fork() emerged as a compact solution to the process creation problem - duplicate the current process, then replace the copy with the desired program via exec(). This fork+exec combination is 55 years old and still underlies every shell command execution on earth.

Process Concept

**A process** is a program in execution. Not just code sitting on disk - a live entity with its own memory, CPU registers, and state. Chrome on a laptop is not one program. It runs 30-50 processes: a separate one per tab, one for the GPU, one for network operations. A crashing tab leaves the rest untouched precisely because processes are isolated from each other.

Each process has its own **address space**, which includes: **program code** (text section), **data** (data section), **stack** for temporary variables, and **heap** for dynamic memory.

**Key distinction:** A program is a passive entity (a file on disk); a process is active (executing in memory). One program can spawn many processes. Nginx runs one master process and N worker processes - one per CPU core - all from the same binary.

The OS manages processes through a dedicated data structure - the **Process Control Block (PCB)** - which stores all information about a process. In Linux this is the `task_struct`, approximately 1.7 KB per process. With tens of thousands of processes on a server, tens of thousands of these structures live in kernel memory.

What is NOT part of a process's address space?

Process States

Throughout its lifetime, a process moves through several **states**. On a typical 4-core server there might be 2000 processes - but only 4 are in the Running state at any instant. The rest are waiting: for a CPU slot, for I/O to complete, for a signal.

**Five main states:** - **New** - process is being created - **Ready** - process is ready to run, waiting for the CPU - **Running** - process is executing on a CPU core - **Waiting** - process is waiting for an event (I/O, signal) - **Terminated** - process has finished execution

**State transitions:** - **New -> Ready:** OS finishes initializing the process - **Ready -> Running:** Scheduler selects the process for execution - **Running -> Ready:** Time slice expired or interrupt arrived - **Running -> Waiting:** Process requested an I/O operation - **Waiting -> Ready:** I/O completed, process is ready again - **Running -> Terminated:** Process finished execution

A **context switch** occurs when the CPU moves from one process to another. The OS saves the old process state and loads the new one. Cost: 0.1-1 microseconds, including TLB flush. On a server with thousands of processes and heavy I/O, context switching can consume 5-10% of total CPU. Node.js and Nginx are built around event loops specifically to minimize context switches.

Many processes can be in the **Ready** state simultaneously, but only one per CPU core can be in **Running** at any given moment.

A process has just finished reading a file from disk. What state does it transition to?

Process Control Block (PCB)

The **Process Control Block (PCB)** is a kernel data structure containing everything the OS knows about a process. Without the PCB, context switching is impossible - the kernel would not know what to save or what to restore.

**Core PCB fields:** - **Process ID (PID)** - unique process identifier - **Process State** - current state (New, Ready, Running, Waiting, Terminated) - **Program Counter** - address of the next instruction - **CPU Registers** - values of all CPU registers - **Memory Management Info** - page tables, segment boundaries - **Accounting Info** - CPU time used, priority, limits - **I/O Status** - open file descriptors, devices

During a **context switch**, the OS must: 1. Save the current process state into its PCB 2. Load the new process state from its PCB 3. Switch the address space (page tables)

In Linux, the PCB is `task_struct` - approximately 1.7 KB per process. At 100,000 processes that is 170 MB of kernel memory just for PCBs. The `ps aux` command reads directly from PCB data for all running processes: PID, state, CPU time, memory usage.

What happens to PCBs during a context switch from process A to process B?

Process Operations

The OS provides mechanisms to **create** and **terminate** processes. Processes form a hierarchy: every process (except init with PID 1) has a parent. On Linux, `pstree` reveals this tree - systemd is the ancestor of everything running on the system.

**Process creation in UNIX/Linux:** A parent creates a child via the **fork()** system call. The new process receives a copy of the parent's address space through copy-on-write - physical pages are not actually copied until a write occurs.

**How fork() works:** 1. A copy of the parent PCB is created 2. New address space is allocated (copy-on-write) 3. The child receives a copy of the parent's data 4. fork() returns 0 in the child, the child's PID in the parent

**Important:** After exec(), the new program completely replaces the process's code, data, and stack. The PID stays the same, but a different program is now running. This is how the shell executes any command without creating a fresh PID from scratch.

**Process termination:** A process terminates explicitly by calling exit() or by returning from main(). The OS reclaims memory, closes files, and frees the PCB. If the parent never calls wait(), the process becomes a **zombie** - its PCB lingers in kernel memory even after execution has ended. Zombie process leaks are a real production problem on long-running servers.

After fork(), the child process starts execution from the beginning of the program (from main)

After fork(), both parent and child continue execution from the point AFTER the fork() call

fork() does not restart the program - it creates a copy of the process in its current state. Both processes continue from the same instruction, distinguished only by the return value of fork(): 0 in the child, the child's PID in the parent.

What does fork() return in the child process?

Key Ideas

  • **Process** - a program in execution with its own memory, registers, and state. A program is a passive file; a process is an active entity
  • **Address space**: text (code) + data (globals) + heap (dynamic) + stack (locals). CPU registers are not part of the address space but are part of the execution context
  • **States**: New -> Ready -> Running -> Waiting -> Terminated. Context switch (0.1-1 microseconds, including TLB flush) occurs on each CPU transition
  • **PCB (task_struct in Linux)** - 1.7 KB per process, stores everything: PID, state, registers, memory/file pointers. Required for context switching
  • **fork()** creates a copy via copy-on-write, **exec()** replaces the program, **wait()** collects the child. Zombie processes are a real production hazard when wait() is missing
  • **Chrome, Nginx, PostgreSQL, Docker** all use process isolation as an architectural primitive for stability

Related Topics

Processes are the foundation on which everything else in operating systems is built:

  • Threads — Lightweight processes within one address space. Share memory but have their own stacks and registers.
  • CPU Scheduling — Algorithms for selecting a process from the Ready queue: FCFS, SJF, Round Robin, CFS (Linux).
  • Synchronization — Mutexes, semaphores, and atomic operations for coordinating concurrent processes and threads.
  • Memory Management — How the OS allocates and isolates process address spaces through virtual memory.

Вопросы для размышления

  • Why is context switching expensive? What exactly happens to the TLB when the process changes?
  • What is the performance advantage of copy-on-write in fork() compared to full memory copying?
  • Why do Unix processes use fork() + exec() rather than a single creation system call?
  • What is a zombie process and why is it dangerous on a production server?

Связанные уроки

  • os-01-intro — Core OS concepts and the kernel's role in resource management
  • os-03-threads — Threads are lightweight processes sharing an address space
  • os-04-scheduling — The scheduler selects processes from the Ready queue
  • os-05-sync — Synchronization is needed when multiple processes share data
  • arch-04-cpu — Context switching is a register-level CPU operation
  • os-07-memory — Virtual memory isolates process address spaces
  • db-25-connection-pooling
Processes

0

1

Sign In