Cloud Computing

Virtualization and Containers

In 2013 Docker released its first version and the industry changed forever. Before Docker, deploying an application looked like: 'it works on my machine, but not on the server.' After Docker: one image works the same everywhere - on a laptop, in CI, in production. How did the industry go from gigabyte VMs to containers that start in milliseconds?

  • **Google** runs 4 billion containers per week - every search query is handled inside a container
  • **Spotify** migrated 1,200 microservices to Kubernetes - deployment time dropped from hours to minutes
  • **AWS Lambda** handles trillions of invocations per month - from IoT sensors to mobile app APIs

From Mainframe to Containers: 60 Years of Isolation

1966, IBM CP/CMS - the first virtualization system for mainframes. One expensive computer was shared among dozens of users, each thinking they were on a dedicated machine. 1979: Unix chroot - the first file system isolation primitive for processes. 2000: FreeBSD jails - full process isolation. 2008: Linux LXC (Linux Containers) - native containers in the kernel. 2013: Docker abstracts LXC and makes containers accessible to everyone. Sixty years later the same mainframe idea - one container costs USD 0.0001 per second.

Предварительные знания

  • Introduction to Cloud Computing

Hypervisor: the virtual machine manager

2006, Amazon Web Services launches EC2. For the first time in history, renting a server takes one minute and billing is per-second. The technology that made this possible - the **hypervisor**: a program that allows multiple operating systems to run on a single physical server. Each OS receives virtualized resources and thinks it owns real hardware.

0

1

Sign In

CharacteristicType 1 (Bare-metal)Type 2 (Hosted)
Runs onDirectly on hardwareOn top of the host OS
ExamplesVMware ESXi, KVM, Xen, Hyper-VVirtualBox, VMware Workstation, Parallels
PerformanceHigh (direct hardware access)Lower (via host OS)
Use caseData centers, cloud providersDevelopment, testing
Latency overhead~2-5% overhead~10-20% overhead

**KVM (Kernel-based Virtual Machine)** - a hypervisor built directly into the Linux kernel. It is the foundation of AWS EC2, Google Cloud, and most OpenStack-based clouds. EC2 instances run on KVM.

Hardware-assisted virtualization (Intel VT-x, AMD-V) - special CPU instructions that allow the hypervisor to efficiently isolate VMs. Without them, virtualization was slow and insecure.

AWS runs EC2 instances for millions of customers. What type of hypervisor do they use?

Virtual Machines

A startup deploys 20 microservices. Each in its own VM - each with its own Ubuntu kernel. 20 OS kernels at 500 MB RAM each = 10 GB just for operating systems. Disk: 10 GB x 20 = 200 GB wasted. A **Virtual Machine (VM)** is a complete emulation of a computer with its own OS, kernel, drivers, and file system. Full isolation - like a separate physical server.

CharacteristicValue
IsolationFull - separate OS kernel, own memory
Image sizeGigabytes (full OS + application)
Start timeMinutes (OS boot, service initialization)
OverheadEach VM is a full OS (kernel ~500 MB RAM minimum)
SecurityHigh - hardware isolation via hypervisor

**AMI (Amazon Machine Image)** - a snapshot of a VM from which a new instance can be launched. It includes the OS, pre-installed software, and configuration. Custom AMIs can be created or ready-made ones used from the Marketplace.

**The VM problem:** 20 microservices each in their own VM means 20 full operating systems. ~10 GB disk space and ~500 MB RAM just for the OS kernel, per VM. Multiplied: 10 GB x 20 = 200 GB of wasted disk.

**When VMs are the right choice:** 1. Full isolation is required (multi-tenant) 2. Different OSes on the same server 3. Compliance requires hardware-level isolation 4. Legacy app with OS-level dependencies.

10 Node.js microservices, each in a separate VM (Ubuntu 22.04). What is the main problem with this approach?

Containers and Docker

2013, PyCon. Solomon Hykes shows a five-minute Docker demo. The audience gives a standing ovation. "It works on my machine" - the problem every developer knows - just got a solution. A **container** is an isolated process that uses the host OS kernel instead of its own. No separate OS, no separate kernel - just the application and its dependencies.

**Docker Image vs Container:** An Image is a blueprint (class), a Container is a running instance (object). From one Image 100 Containers can be launched. An Image is immutable, a Container is a live process.

**Layers and caching** - the key optimization in Docker. Each instruction in a Dockerfile creates a layer. If a layer hasn't changed - Docker uses the cache. That's why `COPY package.json` comes before `COPY src/` - when the code changes, dependencies aren't reinstalled.

ParameterVMContainer
Image size1-10 GB50-500 MB
Start time30-60 seconds< 1 second
RAM overhead~500 MB (OS kernel)~5-10 MB
IsolationHardware (hypervisor)Software (namespaces, cgroups)
Density~10 VMs per server~100+ containers per server
OrchestrationVMware vSphereKubernetes, Docker Swarm

**Kubernetes (K8s)** - the standard for container orchestration. Manages thousands of containers: auto-scaling, self-healing (restarting crashed containers), rolling updates, service discovery. Google runs 4 billion containers per week on Kubernetes.

Why does a Dockerfile first copy package.json, install dependencies, and only then copy the source code?

Serverless: Functions as a Service

An image processing service: a photo is uploaded, a thumbnail is needed. Traffic is unpredictable: 0 requests at night, thousands during peak hours. A VM sits idle 20 hours out of 24. Pay for idle time? **Serverless** is the next step in abstraction. Code is written as a function, uploaded to the cloud, and executed in response to an event. Between invocations the function doesn't exist.

ProviderServerless ServiceLimits
AWSLambda15 min max, 10 GB RAM, 1000 concurrent invocations
GCPCloud Functions60 min max, 32 GB RAM, 3000 concurrent invocations
AzureFunctions230 sec (Consumption), unlimited (Premium)
CloudflareWorkers30 sec CPU, 128 MB RAM, edge locations worldwide

**Event-driven architecture** - a function is triggered by an event: an HTTP request, a file upload to S3, a message in a queue, a schedule (cron). Between invocations the function doesn't exist - payment only for actual execution time.

**Cold Start** - the main downside of Serverless. On the first invocation (or after a period of inactivity) the function wakes up: a container is created, code is loaded, dependencies are initialized. This takes 100ms-3s depending on the language and package size. Java/C# are slower, Node.js/Python are faster.

**When Serverless is not a good fit:** 1. Long-running computations > 15 min 2. Persistent WebSocket connections needed 3. High-frequency calls (cheaper to keep a server running) 4. Latency-critical paths (cold start is unacceptable).

Serverless means 'no servers' - code executes by itself in the cloud

Servers exist, they just aren't visible or managed directly. AWS Lambda runs code inside a Firecracker microVM - a special micro-VM built specifically for this purpose

The name 'serverless' describes the developer experience, not the architecture. Servers require no direct management - but under the hood AWS automatically creates and destroys containers for each function invocation

A service handles 100 requests per hour, each taking 200ms. What's cheaper: EC2 t3.micro or AWS Lambda?

Key Takeaways

  • **Hypervisor** splits a physical server into VMs. Type 1 (bare-metal) - for data centers, Type 2 (hosted) - for development
  • **VM** - a full OS with a kernel. Complete isolation, but heavy (gigabytes) and slow to start (minutes)
  • **Containers (Docker)** - a process with isolation, sharing the host OS kernel. Lightweight (megabytes), instant start
  • **Serverless** - only the function is written, everything else is managed by the cloud. Pay-per-execution
  • Docker 2013: its 'build once, run anywhere' principle is now reality - an image from a laptop deploys to production unchanged

Related Topics

Virtualization and containers are the 'how' of the cloud. Next is the 'where':

  • Regions, Zones, and Availability — Where VMs and containers physically live
  • Introduction to Cloud Computing — IaaS/PaaS/SaaS - abstraction layers over virtualization

Вопросы для размышления

  • Does the current project use VMs, containers, or serverless? Does that choice match the workload?
  • When designing architecture for 50 microservices - would VMs, Docker + Kubernetes, or Serverless be chosen? Why?
  • Which tasks in the project are ideal for Serverless (infrequent, event-driven, short-lived)?

Связанные уроки

  • cloud-01 — Cloud intro establishes the IaaS/PaaS/SaaS model built on top of virtualization
  • cloud-03 — Regions and zones - where VMs and containers physically live
  • devops-04 — Docker and Kubernetes - practical container management
  • devops-02 — Linux is the foundation: namespaces and cgroups under containers
  • cloud-07 — Kubernetes orchestrates containers at cluster scale
  • sec-05 — Container security: isolation, image scanning, runtime security
  • os-12-virtualization
Virtualization and Containers