Cloud Computing
VPC and Network Isolation
In 2019, Capital One lost data on 100 million customers due to a misconfigured WAF on EC2 in AWS. The vulnerability: the EC2 instance had an IAM role with excessive permissions, and network isolation did not restrict access to the metadata endpoint. Correct VPC architecture with NACLs and Security Groups could have significantly limited the blast radius.
- **PCI DSS compliance:** card payment processing requires isolating the cardholder data environment - a separate VPC or subnet with strict NACLs
- **Multi-account AWS:** each account has its own VPC; Transit Gateway connects production, staging, and shared-services
- **Hybrid cloud:** VPN or Direct Connect joins an on-premise network to a VPC as another subnet
Subnets and CIDR
When AWS launched EC2 in 2006, all instances lived in one shared network - any instance could attempt to reach any other. In 2009 VPC arrived: a virtual private network that each account owns completely. IP ranges, subnets, route tables - full control over the network topology.
**CIDR (Classless Inter-Domain Routing)** is notation for IP ranges. 10.0.0.0/16 means: the first 16 bits are fixed (10.0), the remaining 16 bits are free - that is 65,536 addresses. A subnet divides a VPC into smaller blocks: 10.0.1.0/24 is 256 addresses in one availability zone.
**A subnet is bound to one AZ.** For high availability, the same subnet type (public/private) must exist in each AZ. An ALB requires at least 2 public subnets in different AZs.
Subnet 10.0.5.0/24 is created inside VPC 10.0.0.0/16. How many IP addresses are available for EC2 instances?
Security Groups vs NACLs
AWS provides two firewall layers. A **Security Group** is a stateful firewall at the instance level (EC2, RDS, Lambda). A **Network ACL (NACL)** is a stateless firewall at the subnet level. Most architectures rely on Security Groups as the primary tool, with NACLs as an additional layer.
**Stateful vs stateless** is the critical distinction. A Security Group remembers: if an inbound packet is allowed, the return traffic is automatically permitted without an explicit rule. A NACL does not remember: explicit rules are required in both directions for every connection.
| Feature | Security Group | NACL |
|---|---|---|
| Level | Instance (ENI) | Subnet |
| Stateful | Yes - return traffic automatic | No - rules required both ways |
| DENY rules | No - only ALLOW | Yes - explicit DENY supported |
| Scope | Each EC2/RDS separately | All instances in the subnet |
| Default behavior | Block all inbound | Allow all (default NACL) |
| SG reference | Can reference another SG | CIDR only |
An EC2 instance receives a request on port 80. The Security Group allows inbound TCP 80. Is an explicit outbound rule needed for the response?
NAT Gateway and Routing
An instance in a private subnet has no public IP. But it still needs internet access: downloading package updates, calling external APIs, sending logs. This is what a **NAT Gateway** provides: it lives in a public subnet, holds an Elastic IP, and translates addresses for outbound traffic from private instances.
A **Route Table** determines where packets from a subnet are sent. Each subnet is associated with one route table. Rules: local (traffic within the VPC goes directly), 0.0.0.0/0 via Internet Gateway (for public subnets), 0.0.0.0/0 via NAT Gateway (for private subnets).
**NAT Gateway vs NAT Instance:** NAT Gateway is an AWS managed service (highly available, auto-scaling, up to 45 Gbps). A NAT Instance is a regular EC2 with IP forwarding (cheaper for small volumes, requires management). For production - always NAT Gateway.
A company has two NAT Gateways in one AZ serving two private subnets in different AZs. What is wrong?
VPC Peering and Transit Gateway
A company grows. Three VPCs appear: production, staging, and shared-services (monitoring, CI/CD). How should they communicate? **VPC Peering** creates a direct connection between two VPCs over the AWS backbone, with no internet transit. Traffic stays inside AWS with minimal latency.
The problem with peering: it is **non-transitive**. VPC-A is peered with VPC-B, VPC-B is peered with VPC-C. But VPC-A and VPC-C cannot see each other. With 10 VPCs, 45 peering connections are needed. The solution: **Transit Gateway** - a central hub that each VPC connects to through a single attachment.
**Pricing:** VPC Peering - pay only for traffic ($0.01/GB inter-AZ). Transit Gateway - $0.05/hour per attachment + $0.02/GB. With few VPCs, peering is cheaper. With 5+ VPCs, Transit Gateway is simpler operationally.
VPC-A is peered with VPC-B, and VPC-B is peered with VPC-C. Can an instance in VPC-A reach VPC-C?
VPC and Network Isolation
- A VPC is an isolated virtual network; CIDR defines the IP range; a subnet is a range within one AZ
- Public subnets route to an Internet Gateway; private subnets route outbound only through a NAT Gateway
- Security Groups are stateful instance-level firewalls; NACLs are stateless subnet-level firewalls
- NAT Gateway translates outbound traffic from private subnets; inbound from the internet is blocked
- VPC Peering is a direct non-transitive link between two VPCs; Transit Gateway is a central hub for N VPCs
Related Topics
VPC is the foundation for placing any cloud resource inside an isolated network.
- Managed Databases: RDS, Cloud SQL — RDS always runs in a VPC private subnet - Security Groups control who can connect
- Load Balancing and Auto Scaling — ALB sits in public subnets; EC2 Auto Scaling groups run in private subnets
- Serverless: Lambda, Cloud Functions — Lambda inside a VPC needs a private subnet ENI to reach RDS and internal services
Вопросы для размышления
- Why should production RDS always live in a private subnet, even when a Security Group restricts access?
- When is a dedicated NAT Gateway per AZ justified despite the additional cost?
- How does VPC architecture affect the ability to meet PCI DSS or HIPAA compliance requirements?