DevOps
Docker: Fundamentals
2013. Heroku engineer Solomon Hykes demos Docker at PyCon in 5 minutes. The audience gives a standing ovation. Within a year the GitHub repo has 14,000 stars and Docker becomes the fastest-growing open-source project in history. The idea is simple: if an application works in a container on the developer's laptop, it works exactly the same way on the production server. 'Works on my machine' disappears as a category of problem.
- **Netflix** runs 700+ types of microservices in Docker - each deployed independently thousands of times per day
- **GitHub Actions** - every CI/CD pipeline is a container with a known environment, not 'flaky tests due to the Python version'
- **Cloudflare Workers** isolation model is based on the same namespace isolation as Docker, but without the image layer overhead
Historical context
Docker Inc. grew out of dotCloud - a PaaS company Solomon Hykes founded in 2008. The internal tool for isolating customer applications was presented publicly in March 2013. The idea was not technically revolutionary: Linux containers (LXC) existed since 2008, namespaces since 2002, cgroups since 2006. The revolution was UX: Dockerfile + image registry + a simple CLI turned a complex systems technology into a tool accessible to any developer. By 2016 Docker had become the de-facto containerization standard, leading to the creation of OCI (Open Container Initiative) - a neutral standard for runtime and image format.
Dockerfile
**A Dockerfile** is a text file with instructions for building a Docker image. Each instruction creates a separate layer in the union filesystem; layers are cached and reused on subsequent builds. The goal is a reproducible build: the same Dockerfile produces an identical image on any host at any time.
| Instruction | Creates layer | Purpose |
|---|---|---|
| FROM | Yes | Base image - starting point |
| COPY / ADD | Yes | Copy files into the image |
| RUN | Yes | Execute commands at build time |
| ENV | Yes | Environment variables |
| EXPOSE | No | Document the port (does not open it!) |
| CMD / ENTRYPOINT | No | Container start command |
**Layer order is critical for cache:** instructions with rarely changing data (installing dependencies) must come before instructions with frequently changing data (copying source code). One changed layer invalidates all subsequent layers.
Why does the recommended Dockerfile copy package.json before copying source code?
Images
**A Docker image** is a read-only template for creating containers. Structure: a union filesystem of stacked layers. Each layer is a delta on top of the previous. Identification: content-addressable SHA256 digest; a tag (name:tag) is a mutable pointer to a digest.
**Base image sizes:** Ubuntu ~77 MB, Debian Slim ~80 MB, Alpine ~7 MB, Distroless ~2 MB, Scratch 0 MB. Smaller image = faster pull in CI/CD, smaller attack surface. Alpine + musl libc works for most Go and Node applications.
| Registry | Purpose | Notes |
|---|---|---|
| Docker Hub | Public official images | Default for docker pull |
| GitHub Container Registry | Images from GitHub Actions | ghcr.io, integrated with GHCR |
| AWS ECR / GCR / ACR | Private cloud registry | IAM integration, geo-replication |
| Self-hosted Harbor | On-premises registry | Vulnerability scanning, RBAC |
Two Docker images have identical content (same layers) but different tags. How is that content stored in Docker?
Containers
**A Docker container** is a running instance of an image. Technically: Linux namespace isolation (pid, net, mnt, uts, ipc, user) + cgroups for resource limits + a thin writable layer on top of read-only image layers. Not a virtual machine - processes are visible on the host OS via `ps`, no hypervisor involved.
**OOM Killer:** when a container hits its memory limit, the Linux OOM Killer terminates the process (exit code 137). Without `--memory`, a container can consume all host memory and kill neighboring containers. Always set memory limits in production.
A container exits with code 137. What happened?
Volumes
**Docker volumes** solve a fundamental problem: the container's writable layer is ephemeral - data disappears when the container is removed. Volumes are a persistent storage mechanism managed by the Docker daemon outside the container's filesystem. Three mount types with different trade-offs.
| Type | Syntax | Stored at | When to use |
|---|---|---|---|
| Named Volume | -v mydata:/data | /var/lib/docker/volumes/ | Production: databases, uploads, any persistent data |
| Bind Mount | -v /host/path:/container/path | Arbitrary host path | Development: hot reload of source files |
| tmpfs Mount | --tmpfs /tmp | RAM only | Secrets, temporary files, tests |
**Secrets vs Volumes:** passwords and keys should NOT be passed via `-e` (visible in `docker inspect`). Docker Secrets (Swarm) or Kubernetes Secrets mount as tmpfs - in memory, not on disk, invisible in container metadata.
Docker volumes are just folders on the host - Docker is an unnecessary middleman
Named volumes are managed by the Docker daemon: portable across hosts via volume drivers (NFS, cloud storage), built-in lifecycle management, backup support via `docker run --volumes-from`
Bind mounts are indeed just host folders. Named volumes are an abstraction on top: a driver can store data on S3, NFS, or iSCSI. When moving a container to another host, data follows via the volume driver.
PostgreSQL runs in Docker without volumes. What happens to data on `docker rm postgres-container`?
Key ideas
- **Dockerfile** - reproducible build: instructions layer by layer, cache invalidates top-to-bottom - instruction order is critical for build speed
- **Image** - read-only template from a union filesystem of layers, identified by SHA256 digest; tag is a mutable pointer
- **Container** - isolated process via Linux namespaces + cgroups, not a VM. Exit 137 = SIGKILL (OOM or forced kill)
- **Volume** - persistent storage outside the ephemeral container layer: named volumes for production, bind mounts for dev, tmpfs for secrets
Вопросы для размышления
- A company moves from bare-metal servers to Docker. A developer proposes packaging the entire monolith (nginx + app + PostgreSQL) into one container for simplicity. What specific problems does this create, and how should they be addressed?
Связанные уроки
- devops-05 — Kubernetes orchestrates Docker containers - understand Docker before Kubernetes
- cloud-04 — EC2 VM vs Docker container - the fundamental virtualization trade-off
- se-04 — SOLID in containerization: one container = one responsibility
- emb-04 — Linux namespace isolation mirrors MPU memory protection in embedded systems
- bt-04-dns-tls — DNS in Docker networking: service discovery via the built-in DNS resolver
- devops-01
- devops-02
- devops-03
- cloud-01
- os-19-containers
- os-12-virtualization