Cloud Computing

Managed Databases: RDS, Cloud SQL

A startup spent 30% of engineering time maintaining PostgreSQL on EC2: monitoring, backups, patching, replication. After migrating to RDS, those tasks took 2% of the time. RDS, Aurora, Cloud SQL are not just "managed PostgreSQL" - they represent a different operational model entirely.

**Netflix:** Aurora for metadata - thousands of RPS with failover in seconds, not minutes
**Airbnb:** RDS with read replicas - separating read/write traffic during booking peaks
**Shopify:** migrated to Aurora Serverless v2 for auto-scaling during Black Friday

Amazon RDS: Managed Databases

In 2009, Amazon launched RDS (Relational Database Service) with one premise: stop spending weeks on installing PostgreSQL, configuring backups, and patching. RDS takes over the entire operational layer - instance provisioning, storage, automated backups, patching, and Multi-AZ failover. Engineers deal only with SQL and schema design.

RDS supports MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, and Amazon's own Aurora engine. Under the hood it runs on an EC2 instance with EBS storage, but AWS manages it as a service. Access works through standard database ports; the underlying instance is invisible to the user.

Feature	Self-managed EC2	RDS
DB installation	Manual	Automatic
Backups	Scripts / cron	Automatic, point-in-time
Multi-AZ failover	HAProxy + Pacemaker	Built-in, ~60-120 sec
OS patching	Manual	Managed maintenance window
Monitoring	CloudWatch + agent	Enhanced Monitoring built-in
Cost	Lower with expertise	Higher, but simpler ops

An RDS instance with Multi-AZ enabled creates a standby replica. What happens when the primary fails?

Aurora: Distributed Storage

In 2014, Amazon Aurora answered the question: what if the MySQL/PostgreSQL storage engine were rewritten from scratch for the cloud? The key idea: separate compute (the SQL engine) from storage. Storage becomes a distributed service; compute nodes sit on top of it.

Aurora keeps 6 copies of data across 3 availability zones (2 copies per zone). A write is acknowledged after being written to 4 of 6 copies. A read requires 3 of 6. When one AZ fails (2 copies lost) the quorum holds. This is a fundamentally different architecture from classic RDS.

**Aurora vs RDS:** Aurora costs 20-30% more, but provides failover in seconds instead of minutes, up to 15 read replicas with minimal replication lag (<10ms), and automatic storage scaling. Justified for high-load and critical OLTP systems.

Why do Aurora Read Replicas promote to primary in seconds rather than minutes like standard RDS?

Cloud SQL vs AlloyDB

Google Cloud offers two tiers of managed relational databases. **Cloud SQL** is the RDS equivalent: managed MySQL, PostgreSQL, and SQL Server on standard storage. **AlloyDB** is Google's answer to Aurora: a PostgreSQL-compatible engine with a columnar cache and distributed storage.

Cloud SQL is simpler and cheaper. AlloyDB is positioned as "up to 4x faster than standard PostgreSQL for OLTP and up to 100x for analytical queries" through a built-in columnar engine that sits alongside row storage.

Feature	Cloud SQL	AlloyDB
Engines	MySQL, PostgreSQL, SQL Server	PostgreSQL only
Storage	Standard (Persistent Disk)	Distributed (6 copies)
Failover	~60 sec	~10 sec
Analytics	Standard PostgreSQL	Built-in columnar engine
Cost	Lower	Higher (~2x)
AWS equivalent	RDS	Aurora

A team is choosing between Cloud SQL and AlloyDB for a high-load PostgreSQL OLTP workload at ~50k RPS. What should they pick?

Read Replicas and Read Scaling

A typical OLTP workload is 90% reads and 10% writes. The primary instance handles all writes and some reads. As traffic grows, SELECT queries become the bottleneck. The solution: **Read Replicas** - asynchronous copies of the database that accept only SELECT statements.

Asynchronous replication means a small delay: the replica lags behind the primary by milliseconds to seconds. A write to the primary followed immediately by a read from a replica may return stale data. Applications must account for this: non-critical reads go to the replica, critical reads (immediately after a write) go to the primary.

**Handling eventual consistency:** after a write, reading from the primary for a few seconds (or using session tokens) prevents the "vanishing object" problem where a user creates a record and immediately does not see it in a list.

A user creates a post and immediately opens the post list. The post does not appear. What is the most likely cause?