Cloud Computing

Managed Databases: RDS, Cloud SQL

A startup spent 30% of engineering time maintaining PostgreSQL on EC2: monitoring, backups, patching, replication. After migrating to RDS, those tasks took 2% of the time. RDS, Aurora, Cloud SQL are not just "managed PostgreSQL" - they represent a different operational model entirely.

  • **Netflix:** Aurora for metadata - thousands of RPS with failover in seconds, not minutes
  • **Airbnb:** RDS with read replicas - separating read/write traffic during booking peaks
  • **Shopify:** migrated to Aurora Serverless v2 for auto-scaling during Black Friday

Amazon RDS: Managed Databases

In 2009, Amazon launched RDS (Relational Database Service) with one premise: stop spending weeks on installing PostgreSQL, configuring backups, and patching. RDS takes over the entire operational layer - instance provisioning, storage, automated backups, patching, and Multi-AZ failover. Engineers deal only with SQL and schema design.

RDS supports MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, and Amazon's own Aurora engine. Under the hood it runs on an EC2 instance with EBS storage, but AWS manages it as a service. Access works through standard database ports; the underlying instance is invisible to the user.

FeatureSelf-managed EC2RDS
DB installationManualAutomatic
BackupsScripts / cronAutomatic, point-in-time
Multi-AZ failoverHAProxy + PacemakerBuilt-in, ~60-120 sec
OS patchingManualManaged maintenance window
MonitoringCloudWatch + agentEnhanced Monitoring built-in
CostLower with expertiseHigher, but simpler ops

An RDS instance with Multi-AZ enabled creates a standby replica. What happens when the primary fails?

Aurora: Distributed Storage

In 2014, Amazon Aurora answered the question: what if the MySQL/PostgreSQL storage engine were rewritten from scratch for the cloud? The key idea: separate compute (the SQL engine) from storage. Storage becomes a distributed service; compute nodes sit on top of it.

Aurora keeps 6 copies of data across 3 availability zones (2 copies per zone). A write is acknowledged after being written to 4 of 6 copies. A read requires 3 of 6. When one AZ fails (2 copies lost) the quorum holds. This is a fundamentally different architecture from classic RDS.

**Aurora vs RDS:** Aurora costs 20-30% more, but provides failover in seconds instead of minutes, up to 15 read replicas with minimal replication lag (<10ms), and automatic storage scaling. Justified for high-load and critical OLTP systems.

Why do Aurora Read Replicas promote to primary in seconds rather than minutes like standard RDS?

Cloud SQL vs AlloyDB

Google Cloud offers two tiers of managed relational databases. **Cloud SQL** is the RDS equivalent: managed MySQL, PostgreSQL, and SQL Server on standard storage. **AlloyDB** is Google's answer to Aurora: a PostgreSQL-compatible engine with a columnar cache and distributed storage.

Cloud SQL is simpler and cheaper. AlloyDB is positioned as "up to 4x faster than standard PostgreSQL for OLTP and up to 100x for analytical queries" through a built-in columnar engine that sits alongside row storage.

FeatureCloud SQLAlloyDB
EnginesMySQL, PostgreSQL, SQL ServerPostgreSQL only
StorageStandard (Persistent Disk)Distributed (6 copies)
Failover~60 sec~10 sec
AnalyticsStandard PostgreSQLBuilt-in columnar engine
CostLowerHigher (~2x)
AWS equivalentRDSAurora

A team is choosing between Cloud SQL and AlloyDB for a high-load PostgreSQL OLTP workload at ~50k RPS. What should they pick?

Read Replicas and Read Scaling

A typical OLTP workload is 90% reads and 10% writes. The primary instance handles all writes and some reads. As traffic grows, SELECT queries become the bottleneck. The solution: **Read Replicas** - asynchronous copies of the database that accept only SELECT statements.

Asynchronous replication means a small delay: the replica lags behind the primary by milliseconds to seconds. A write to the primary followed immediately by a read from a replica may return stale data. Applications must account for this: non-critical reads go to the replica, critical reads (immediately after a write) go to the primary.

**Handling eventual consistency:** after a write, reading from the primary for a few seconds (or using session tokens) prevents the "vanishing object" problem where a user creates a record and immediately does not see it in a list.

A user creates a post and immediately opens the post list. The post does not appear. What is the most likely cause?

Managed Databases

  • RDS handles OS, patching, backups, Multi-AZ failover (~60-120 sec); the client manages schema and queries
  • Aurora separates compute from storage (6 copies, 3 AZs); failover <30 sec, read replicas share the same storage
  • Cloud SQL is the GCP RDS equivalent; AlloyDB is the Aurora equivalent with a columnar engine
  • Read Replicas scale reads through asynchronous replication; replica lag must be handled in the application

Related Topics

Managed databases are part of the broader cloud storage and networking picture.

  • VPC and Network Isolation — RDS always runs inside a VPC - security groups and subnets control access
  • Auto Scaling — Aurora Serverless v2 auto-scales compute - similar to EC2 Auto Scaling
  • Cloud Storage: S3, GCS, Blob — RDS backups are stored in S3; snapshots can be exported to Parquet for analytics

Вопросы для размышления

  • In what scenarios is self-managed PostgreSQL on EC2 preferable to RDS?
  • Why does Aurora's architecture outperform RDS for Multi-AZ failover if both use replication?
  • How should replica lag be handled in an application without significantly complicating the codebase?

Связанные уроки

  • db-03-acid
Managed Databases: RDS, Cloud SQL

0

1

Sign In