Databases

Redis: In-Memory Store in Production

In 2012 Instagram processed 1 billion requests per day with a team of 13 engineers. Redis was a key part: session caching, rate limiting, and the activity feed - all on Redis. Understanding its data structures is the difference between a 10 ms response and a 1 ms one.

  • **Twitter**: Redis Sorted Set for trending topics - real-time ranking updated as tweets arrive.
  • **GitHub**: Redis + Sidekiq for 50M+ background jobs in CI/CD pipelines.
  • **Stack Overflow**: Redis caches the entire site - 95%+ of requests are served from cache with <1 ms latency.

Redis Data Structures

Redis is not just a key-value store - it is a data structure server. Each type has specialized O(log N) or O(1) operations that eliminate application-side processing. String (INCR, DECR), List (LPUSH, RPOP), Hash (HSET, HGETALL), Set (SADD, SISMEMBER), Sorted Set (ZADD, ZRANK), HyperLogLog (PFADD, PFCOUNT).

Discord stores the online member list for each server in a Redis Set. On connection, SADD user_id to server:online:{server_id}. On disconnect, SREM. SCARD returns the online count in O(1) - no SQL COUNT needed.

Which Redis data structure is optimal for a real-time leaderboard requiring fast rank lookup?

Persistence: RDB and AOF

Redis is an in-memory database but supports two persistence modes. RDB (Redis Database) creates point-in-time snapshots: fast restart, compact file, but data between snapshots can be lost. AOF (Append Only File) logs every write command: durable but larger file, slower restart.

Pure cache with recoverable data: disable persistence entirely (save '' + appendonly no). Reduces write I/O by 30-50% and eliminates fork() overhead on large datasets. Twitter uses Redis in no-persistence mode for timeline caches.

An application uses Redis to cache sessions. All data can be recovered from PostgreSQL. Which persistence mode is appropriate?

Pub/Sub and Redis Streams

Redis Pub/Sub is fire-and-forget messaging. A publisher sends to a channel; all subscribers receive it. No message history, no acknowledgment, no consumer groups. If a subscriber disconnects, it misses messages permanently.

Redis Streams (added in Redis 5.0) solve these problems. They are an append-only log with persistent history, consumer groups (load balancing across multiple workers), and message acknowledgment (XACK). Similar to Kafka but simpler and in-process with the cache.

GitHub uses Redis Pub/Sub for real-time notifications in GitHub Actions. Kafka is overkill when messages have a TTL of seconds and the source of truth is PostgreSQL. Redis Pub/Sub adds <1 ms latency.

How do Redis Streams differ from Pub/Sub in terms of reliability?

Lua Scripts and Atomicity

Redis is single-threaded - all commands execute sequentially. But between two commands another client can intervene. Lua scripts execute atomically on the server: no other command runs while the script is executing. This is the only way to implement read-modify-write without race conditions.

Stripe uses Lua scripts in Redis for idempotency key management: check if a key exists, create if not, return the stored result - all atomically. This prevents duplicate payments on retry even under concurrent requests.

Why use a Lua script for a rate limiter instead of separate GET + INCR commands?

Redis Cluster

Redis Cluster automatically shards data across 16,384 hash slots. A key is hashed to a slot; each master owns a range of slots. When a key is written, the client is redirected to the correct node automatically. The cluster supports automatic failover: if a master is unreachable, its replica is promoted.

Cluster limitation: multi-key operations (MSET, MGET, SUNION) only work if all keys are in the same slot. Use hash tags {prefix} to colocate related keys. Alternatively, use a proxy like Twemproxy or KeyDB Cluster mode.

Redis loses all data on restart

Redis supports RDB snapshots and AOF persistence. With RDB+AOF combined, data loss is limited to at most 1 second. In Cluster mode with replicas, persistence is further redundant.

The misconception comes from Redis's in-memory nature. Persistence is optional but fully supported and widely used in production.

Why can MGET user:1 user:2 user:3 return an error in Redis Cluster?

Key Ideas

  • **Data structures**: String, List, Hash, Set, ZSet - each has O(1)/O(log N) operations that match specific use cases.
  • **RDB vs AOF**: RDB is faster, AOF is more durable. Pure cache = no persistence needed.
  • **Lua for atomicity**: multiple operations in one atomic script - the only way to prevent race conditions in read-modify-write.
  • **Cluster**: 16,384 hash slots across nodes. Use hash tags {prefix} to colocate related keys.

Related Topics

Redis integrates with many database patterns.

  • Caching — Redis is the primary caching layer in the majority of production stacks.
  • Connection Pooling — Redis connections also need pooling to avoid per-request overhead.
  • Time-Series Databases — Redis Streams can serve as a lightweight time-series store for recent data.

Вопросы для размышления

  • Redis is single-threaded but handles millions of ops/sec. How, and what does this mean for latency-sensitive workloads?
  • When would Redis Streams be the right choice over Kafka for event streaming?
  • A startup uses Redis for everything: cache, sessions, job queue, rate limiting. What are the risks as it scales?

Связанные уроки

  • bt-16-redis-streams
  • rt-28
Redis: In-Memory Store in Production

0

1

Sign In