Cloud Computing
Cost Optimization
Airbnb's AWS bill exceeded $110 million in 2021. At that scale a 20% reduction saves $22 million per year. Cloud cost optimization is not optional for product economics - it is the difference between a profitable SaaS and one that burns cash on infrastructure inefficiency.
- Capital One saved over $100M/year by moving from Reserved EC2 to Compute Savings Plans, which apply across EC2, Fargate, and Lambda automatically.
- Spotify saves ~60% on batch processing costs by using Spot (preemptible) instances with job checkpointing to handle interruptions gracefully.
- Slack reduced its AWS bill by 35% in one quarter by right-sizing instances using Compute Optimizer - over 70% ran below 20% average CPU.
Reserved Instances
**Reserved Instances** is a foundational pattern in Cost Optimization. It addresses specific operational, scalability, or cost challenges that cloud-native architectures face at scale.
Reserved Instances is a standard topic in AWS Solutions Architect and senior cloud engineering interviews. Understanding the trade-offs and failure modes is more valuable than memorizing the exact API.
What is the primary operational benefit of Reserved Instances?
Savings Plans
**Savings Plans** is a foundational pattern in Cost Optimization. It addresses specific operational, scalability, or cost challenges that cloud-native architectures face at scale.
Savings Plans is a standard topic in AWS Solutions Architect and senior cloud engineering interviews. Understanding the trade-offs and failure modes is more valuable than memorizing the exact API.
What is the primary operational benefit of Savings Plans?
Spot Instances
**Spot Instances** is a foundational pattern in Cost Optimization. It addresses specific operational, scalability, or cost challenges that cloud-native architectures face at scale.
Spot Instances is a standard topic in AWS Solutions Architect and senior cloud engineering interviews. Understanding the trade-offs and failure modes is more valuable than memorizing the exact API.
What is the primary operational benefit of Spot Instances?
Right-Sizing
**Right-Sizing** is a foundational pattern in Cost Optimization. It addresses specific operational, scalability, or cost challenges that cloud-native architectures face at scale.
Right-Sizing is a standard topic in AWS Solutions Architect and senior cloud engineering interviews. Understanding the trade-offs and failure modes is more valuable than memorizing the exact API.
Cost Optimization is primarily a theoretical concern - real teams just use managed services and ignore architectural patterns
Managed services reduce operational burden but do not eliminate the need for sound architectural decisions about failure modes, scaling, and cost
Managed services handle undifferentiated heavy lifting (patching, backups, failover) but the choice between them, their configuration, and their integration patterns still require deep architectural understanding.
What is the primary operational benefit of Right-Sizing?
Key Ideas
- **Reserved Instances:** 1 or 3 year commitment to specific instance type and region for up to 72% discount vs on-demand
- **Savings Plans:** flexible 1 or 3 year commitment to a compute spend rate ($/hour) that auto-applies across EC2, Fargate, and Lambda
- **Spot Instances:** spare AWS capacity at up to 90% discount; 2-minute interruption notice; right for stateless, fault-tolerant, or checkpointable workloads
- **Right-sizing:** matching instance size to actual workload; Compute Optimizer analyzes CloudWatch metrics and recommends downsizing
Related Topics
These topics form the broader Cost Optimization ecosystem:
- Multi-Account Strategy — Consolidated billing and organization-wide Savings Plans sharing are key multi-account cost levers
- Cloud Architecture Design — Cost efficiency is one of the five Well-Architected Framework pillars - baked into design, not bolted on
- Performance Tuning — Right-sizing improves cost and performance simultaneously - under-provisioned instances cause latency, over-provisioned waste money
Вопросы для размышления
- How does the architecture for Cost Optimization change when scaling from 1,000 to 10 million users?
- What are the primary failure modes in a Cost Optimization system, and what monitoring detects them before users are affected?
- What trade-offs would change the architectural decision for Cost Optimization in a regulated industry with strict data residency requirements?