DevOps

Terraform: Advanced

In 2017 a GitLab engineer accidentally ran `rm -rf` on a production database after mixing up terminal windows. A year later HashiCorp noticed that 30% of incidents at large Terraform customers were similar stories: 'wrong workspace', 'wrong state', 'wrong environment variable'. That is how Terraform Enterprise and Sentinel were born: control over WHAT can be deployed, FROM WHERE, and BY WHOM - before apply does damage.

**Coinbase**: 200+ engineers, multi-region infrastructure - Terragrunt + state splitting by team, each team owns its own state file
**HashiCorp Sentinel in production**: GitHub Enterprise uses Sentinel to block public S3 buckets and enforce tags on all resources
**Lyft in 2019**: migration from one 8000-resource state to 60+ isolated states - apply time dropped from 45 minutes to 3, blast radius shrank by 99%

Workspaces: one codebase, many environments

In 2021 a GitLab engineer accidentally rolled out a production-database migration to a staging instance - the wrong Terraform workspace was selected. A **workspace** in Terraform is a named instance of a state file for a single configuration. The same `main.tf` can manage dev, staging, and prod if a workspace is created for each: `terraform workspace new prod`. The `${terraform.workspace}` variable is available inside the config and lets resource names, instance types, and pricing plans vary per environment.

Workspaces work well for feature branches (short-lived isolated stacks) but are NOT recommended for long-lived environment separation (dev/staging/prod). Reason: one backend, one state location - a human `terraform workspace select` error leads to disasters. Best practice for prod is separate directories or repositories with their own backend (terragrunt or simply `envs/prod/main.tf`, `envs/dev/main.tf`).

DevOps

Terraform: Advanced

**Coinbase**: 200+ engineers, multi-region infrastructure - Terragrunt + state splitting by team, each team owns its own state file
**HashiCorp Sentinel in production**: GitHub Enterprise uses Sentinel to block public S3 buckets and enforce tags on all resources
**Lyft in 2019**: migration from one 8000-resource state to 60+ isolated states - apply time dropped from 45 minutes to 3, blast radius shrank by 99%

Terraform: Advanced

Workspaces: one codebase, many environments

Terraform: Advanced

Workspaces: one codebase, many environments

Remote State and locking

Sentinel: Policy-as-Code

Blast radius and how to shrink it

Key ideas

Related topics

Вопросы для размышления

Связанные уроки