Databases
NewSQL Databases
The 2010s saw a mass migration from PostgreSQL to Cassandra and MongoDB for horizontal scale, sacrificing ACID guarantees. Engineers spent years writing compensating logic, saga patterns, and dual-write synchronization to recover the consistency they lost. NewSQL - CockroachDB, TiDB, Google Spanner - was built to answer: 'What if you could have both horizontal scale and ACID transactions?'
- **Google Ads**: Spanner processes billions of auction transactions per day with global external consistency. Ad impressions in Singapore and billing events in New York are always in correct causal order without a central coordinator.
- **Comcast**: CockroachDB for billing, 10M transactions/day with multi-region automatic failover. Migration from PostgreSQL required only a connection string change due to PostgreSQL wire protocol compatibility.
What Is NewSQL
NewSQL databases combine the horizontal scalability of NoSQL systems with the ACID transaction guarantees of traditional relational databases. The 2010s saw companies abandon PostgreSQL for Cassandra and MongoDB to scale, then rewrite business logic to compensate for lost consistency. NewSQL reclaims ACID at distributed scale.
CockroachDB was inspired by Google Spanner's paper (2012). The founders built it after seeing the operational complexity at Google and wanting to offer the same guarantees as a managed open-source system. The name refers to the ability to survive almost any failure - like a cockroach.
What fundamentally distinguishes NewSQL from NoSQL when scaling?
CockroachDB: Distributed PostgreSQL
CockroachDB distributes data across nodes as key-value ranges (default 512 MB each). Each range is replicated to 3 nodes using Raft consensus. SQL queries are decomposed into range-level key-value operations by the distributed SQL layer. The PostgreSQL wire protocol means existing applications often connect without changes.
Comcast uses CockroachDB for their billing system, processing 10 million transactions per day with multi-region automatic failover. The migration from PostgreSQL required only a connection string change because CockroachDB implements the PostgreSQL wire protocol and most of PostgreSQL's SQL dialect.
CockroachDB stores data ranges on 3 nodes using Raft. One node fails. What happens?
TiDB: HTAP Database
TiDB (PingCAP, 2016) implements the MySQL protocol over a distributed storage layer (TiKV, built on RocksDB + Raft). TiDB's key innovation is HTAP (Hybrid Transactional/Analytical Processing): TiKV stores row-format data for OLTP; TiFlash is a columnar replica for OLAP. One system handles both without separate ETL pipelines.
What is HTAP (Hybrid Transactional/Analytical Processing) in TiDB?
Google Spanner: Global Consistency
Google Spanner (2012, public API 2017) is a globally distributed relational database with external consistency - a stronger guarantee than serializable: every committed transaction has a globally consistent timestamp, enabling reads that are consistent across all Spanner instances worldwide. TrueTime provides bounded clock uncertainty to implement this.
Google Ads processes billions of auction transactions per day using Spanner. The global consistency guarantee means an ad impression logged in Singapore and a billing event in New York are always in correct causal order - without a central coordinator. This enables accurate billing and analytics without reconciliation delays.
Why does Google Spanner use atomic clocks (TrueTime) for transactions?
Trade-offs of Distributed SQL
NewSQL databases solve distributed consistency but introduce new trade-offs: higher write latency for cross-range transactions, cross-region latency bounded by the speed of light, and more complex operational tuning. CAP theorem applies: distributed systems cannot avoid partition tolerance, so NewSQL databases choose CP (consistency over availability).
Multi-region transactions in CockroachDB or Spanner are bounded by the speed of light. A transaction touching data in New York and London takes 150-200ms minimum (100ms round-trip latency). Design data locality carefully: most transactions should touch data in one region to stay within 10-20ms.
A multi-region CockroachDB transaction takes 150-200ms. Why?
Key Ideas
- **NewSQL** = NoSQL horizontal scale + ACID transactions. Achieved via distributed consensus (Raft), distributed transactions (2PC + MVCC), and automatic sharding.
- **CockroachDB**: ranges replicated to 3 nodes via Raft. PostgreSQL wire protocol. Single-region write latency: 5-10ms. Multi-region: 150-200ms (speed of light).
- **TiDB HTAP**: same data in TiKV (row format, OLTP) and TiFlash (columnar, OLAP). Optimizer routes queries automatically. No ETL pipeline needed.
- **Spanner TrueTime**: atomic clocks + GPS provide bounded clock uncertainty (<7ms). Enables globally monotonic commit timestamps without a central coordinator.
- **NewSQL is not always better**: single-region PostgreSQL is faster and simpler for most workloads. Use NewSQL when you need distributed ACID across regions or automated sharding beyond PostgreSQL's vertical scale.
Related Topics
NewSQL combines replication, sharding, and storage engine concepts:
- Replication — NewSQL uses Raft consensus for replication, providing automatic leader election and fault tolerance without manual failover procedures.
- Sharding — NewSQL databases auto-shard data into ranges and automatically rebalance. This eliminates the manual shard key design and resharding operations required with traditional sharding.
- LSM-Tree — CockroachDB (Pebble) and TiDB (RocksDB via TiKV) use LSM-Tree storage engines for their key-value storage layers.
Вопросы для размышления
- CockroachDB charges 150-200ms for cross-region transactions. How would you design the data model and locality settings to ensure 95% of transactions are single-region?
- TiDB's HTAP eliminates the ETL pipeline between OLTP and OLAP. What consistency guarantees does TiFlash provide - can it ever return stale data compared to TiKV?
- Google Spanner's TrueTime adds a 7-14ms commit wait for every transaction. When is this overhead worth it, and when would CockroachDB (no atomic clocks, lower overhead) be preferable?