Cloud Computing
KMS and Encryption
Dropbox stores 750 petabytes of user files. One master key = one point of failure for the entire infrastructure. Capital One lost data on 100M customers in 2019 via an unencrypted S3 bucket. How does encryption scale to exabytes while surviving compromise of any single key?
- **Dropbox** uses envelope encryption for every file: a DEK per file, a KEK in its own HSM-backed KMS
- **Stripe** rotates CMKs every 365 days automatically and stores secrets only in Vault, never in source code
- **Capital One 2019**: 100M credit applications stolen via SSRF from an EC2 without encryption - a $190M incident
Envelope Encryption
Dropbox stores 750 petabytes of user files. Encrypting every file directly with a master key is not viable: a leaked key compromises all data, and rotation requires re-encrypting 750 PB. The answer is **envelope encryption**: data is encrypted with a one-time data key (DEK), and the DEK is stored encrypted under a master key (KEK). Two layers, two operations.
On write: a DEK (AES-256) is generated, data is encrypted with the DEK, the DEK is encrypted with the KEK (one KMS call), and the encrypted DEK is stored next to the data. On read: the encrypted DEK is fetched, decrypted in KMS, and the data is decrypted locally. The KEK never leaves the HSM, and KMS load is minimal (only the short DEK).
**Why two layers?** Direct encryption of large volumes via KMS is expensive ($0.03 per 10k calls) and slow (1 RPS per key). DEK encryption is a local operation via AES-NI, capable of millions of ops per second. KMS only protects the small encrypted DEK.
A service encrypts 10 million small objects in S3 via KMS. Which scheme should be used?
CMK: Customer Master Key
**CMK (Customer Master Key)** is the KEK in envelope encryption. In AWS KMS it lives in an HSM (Hardware Security Module) certified FIPS 140-2 Level 3. The key never gets exported in plaintext; all operations (Encrypt, Decrypt, GenerateDataKey) execute inside the HSM.
Three types of CMK by management model: **AWS-managed** (e.g. `aws/s3` - free but no policy control), **customer-managed** ($1/month plus calls, but custom policy and rotation), **AWS-owned** (invisible to clients, used by internal services). Access is governed by **key policy** (on the key) plus IAM policy (on the user) - both must permit the operation.
**kms:ViaService** restricts key use to a specific AWS service. This is exfiltration protection: even a compromised role cannot decrypt data directly, only through S3/RDS/etc.
A compliance audit requires encryption keys to be company-controlled and deletable on request. Which CMK type fits?
Secrets Manager
A PostgreSQL password lives in `.env`, gets baked into a Docker image, ends up in CI logs, and is read by every engineer. A typical startup setup. **AWS Secrets Manager** (or HashiCorp Vault, GCP Secret Manager) is a centralized secret store with KMS encryption, CloudTrail auditing, and an API for applications.
Applications do not store the password locally - they call `GetSecretValue` on startup or hourly. Secrets Manager decrypts the value via the CMK and returns it per the IAM permission. Every fetch is logged: it is clear who read the secret and when.
**Secrets Manager vs Parameter Store** in AWS: Parameter Store is free for the standard tier but has no automatic rotation and a 4 KB limit. Secrets Manager costs $0.40 per secret per month but provides Lambda rotation and native integration with RDS/Redshift.
What is the fundamental advantage of Secrets Manager over storing passwords in Kubernetes Secrets?
Key and Secret Rotation
The production database password has not changed in 3 years - a typical picture. Three years later: 5 engineers left (and remember the password), copies sit in a dozen places, and a leak means changing it in 20 services. **Rotation** is regular secret replacement without downtime. For CMKs: AWS does this automatically once a year. For secrets: rotation via Lambda.
Rotation happens in 4 steps (AWS model): **createSecret** (Lambda generates a new password), **setSecret** (applies it to the database), **testSecret** (verifies connection with the new value), **finishSecret** (the new version is promoted to `AWSCURRENT`, the old gets the `AWSPREVIOUS` label). Applications reading the secret hourly switch over automatically.
The interval between **setSecret** and **finishSecret** is a window with two valid passwords: the old still works (applications cache it), the new already works. This **intentional overlap** prevents rotation from breaking the service.
Key rotation is just a compliance checkbox; real leaks rarely involve stale keys.
Rotation bounds the blast radius of any compromise to a time window; without it, a leak from a 5-year-old backup still holds an active key.
Security is not about preventing every attack but limiting the impact. Regular rotation renders most leaks harmless by the time they surface.
An application caches the DB password for 60 minutes (in-memory LRU). What overlap duration is needed between AWSPREVIOUS and AWSCURRENT during rotation?
Key Ideas
- **Envelope encryption** splits keys into two layers: a DEK for local operations and a KEK in an HSM protecting the DEK - solving scale and rotation
- **Customer-managed CMKs** give policy, rotation, and deletion control; AWS-managed keys are unfit for compliance scenarios
- **Secrets Manager** replaces .env-stored passwords with a central service plus CloudTrail audit
- **Automatic rotation** via Lambda with 4 steps and an overlap window makes rotation safe for live applications
- **kms:ViaService and key policies** are exfiltration protection: a key works only through its intended AWS service
Related Topics
Returning to Dropbox's 750 PB: envelope encryption makes encryption scalable, rotation limits blast radius, and secrets manager removes passwords from code. This infrastructure connects to:
- IAM and access control — Key policy and IAM policy together decide who can decrypt a secret - without IAM, encryption is useless
- Compliance and logging — CloudTrail logs every KMS Decrypt and every GetSecretValue - the basis for SOC2/PCI auditing and forensic analysis
Вопросы для размышления
- A 5-engineer startup chooses between Parameter Store ($0) and Secrets Manager ($0.40 per secret per month). When does saving $100 a month stop justifying the lack of rotation?
- Envelope encryption protects against KEK compromise via rotation. But what if the KMS service itself is compromised (insider threat at AWS)? Which architectural choices exist against this threat?
- 90-day rotation is a compliance standard, yet for long-lived connections (cross-DC Kafka) the overlap window can stretch into hours. Where is the line between security and operational complexity?
Связанные уроки
- cloud-13 — IAM controls access to KMS keys
- cloud-15 — KMS audit log is the basis of compliance
- bc-06-digital-signatures — KMS - centralized vs blockchain distributed key management
- bc-07-ecc — ECC keys are used in cloud KMS
- net-23-https-tls — HTTPS private keys are stored in KMS
- bt-04-dns-tls