Real-Time Backend
Event-Driven Architecture
2011, LinkedIn: the site is buckling under load. Every service calls every other directly, hundreds of HTTP hops in a chain. The fix: Kafka and a move to event-driven. Today 700+ LinkedIn microservices talk through events, pushing 7T messages per day.
- Netflix stores watch history as an event log (Event Sourcing). The recommendation system replays the log and builds new models without data migrations
- Uber's ETA service subscribes to `driver.location.updated` events from thousands of drivers and reactively recalculates arrival times, with no polling or direct calls between services
- LinkedIn uses Kafka as a central event bus: the jobs service publishes `JobPosted`, while recommendations, analytics, and email notifications are independent consumers of one topic
- Axon Framework implements CQRS+ES in Java: a `PlaceOrderCommand` becomes an `OrderPlacedEvent` stored in EventStore and projected into multiple read models
Event Driven
In Event-Driven Architecture (EDA) components don't call each other directly. They publish **events** and react to events of others. An event is an immutable fact: 'order created', 'payment cleared', 'driver moved 200m'. Nobody waits for a reply. The service publishes and moves on.
Uber's ETA service is built exactly this way: thousands of drivers emit `driver.location.updated` events every second, and the ETA calculator subscribes to that stream and recomputes routes reactively, without polling and without direct calls to the geolocation service.
- **Loose coupling**: the producer does not know who listens or how many listeners there are
- **Scalability**: consumers can be added without touching the producer
- **Resilience**: a failure in one consumer doesn't block others and doesn't bring down the producer
- **Audit trail**: the event stream itself is a change log
Uber's ETA service consumes `driver.location.updated` events. What happens if the ETA service goes down for 30 seconds?
Event Sourcing
Event Sourcing is a storage pattern: instead of saving a snapshot of current state in the DB, you save a **log of every event** that led to it. Current state is a projection (fold) over the log. Netflix uses ES for watch history: every `VideoPlayed`, `VideoPaused`, `VideoCompleted` is written to an append-only log. The profile state is the result of replaying these events.
**Axon Framework** (Java) is a production-ready CQRS+ES implementation. Each aggregate is stored as a log of domain events. A `PlaceOrderCommand` becomes an `OrderPlacedEvent` written to the `EventStore` and projected into a read model. LinkedIn uses ES to audit user profile changes.
- **Full history**: can answer 'what was the order's state at 14:32 yesterday'
- **Debugging**: reproduce any bug by replaying events up to the failure
- **Temporal queries**: state of an entity at any point in time
- **Trade-off**: eventual consistency between write and read model, plus projection complexity
In an Event Sourcing system, a developer wants the user's account balance as of January 1, 2024. How to get it?
Event Bus
An Event Bus is the infrastructure component that receives events from producers and delivers them to subscribers. It is the central communication backbone in EDA. LinkedIn moved its inter-service communication to Kafka in 2011, and Kafka became their event bus for 700+ microservices. Every service publishes events to topics, others read them.
- **In-process bus**: a library inside one process (NestJS EventEmitter, Spring ApplicationEventPublisher). Does not survive restart
- **Message broker**: external service (Kafka, RabbitMQ, Redis Streams). Durable, cross-service, scalable
- **Cloud event bus**: AWS EventBridge, GCP Pub/Sub. Managed, serverless-friendly
**Consumer Group** is a key Kafka pattern. Several instances of one service join a group and split partitions among themselves (parallel processing). Different services run in different groups: each gets every event independently.
LinkedIn is launching a new `RecommendationService` that should react to `ProfileUpdated` events. What is needed in the already running Kafka setup?
Event Replay
Event Replay is re-playing historical events. If the event log lives long enough (Kafka default: 7 days; Event Store: forever), a new service can rewind to the very beginning and rebuild a current projection. When Netflix launches a new analytics service, it replays the entire watch history (billions of events) and gets a ready-made model without data migrations.
- New service starts at offset=0 (start of topic / event store)
- Reads events in batches, builds an in-memory or persistent projection
- After catch-up, switches to real-time mode and reads new events
- Old service keeps running. Zero-downtime migration
**Pitfall**: replays must not re-trigger side effects (sending emails, charging cards). The projection has to be **idempotent**: applying an event twice yields the same result. Pattern: store `event_id` and skip already-processed ones.
Event Replay and Event Sourcing are the same thing
Event Sourcing is a state storage pattern based on an event log. Event Replay is the operation of re-playing that log to build a new projection or to debug.
ES without replay is possible (just keep history). Replay without ES is possible too: a Kafka topic is a log regardless of whether services use ES. Replay is a tool, ES is an architectural pattern.
On replay of `OrderPlaced` events, a new service sends a welcome email on each event. What happens and how to fix it?
Key ideas
- EDA: components talk via events, not direct calls. Loose coupling, independent scaling
- Event Sourcing: state = fold over the event log. Full history, temporal queries, debugging via replay
- Event Bus: delivery infrastructure (Kafka, RabbitMQ, EventBridge). Producer doesn't know about consumers; consumer groups provide parallelism
- Event Replay: replaying the log for new projections or debugging. Side effects must be idempotent
Related topics
EDA overlaps with several architectural patterns of real-time systems:
- CQRS — Command Query Responsibility Segregation, a natural pair with Event Sourcing: commands write events, queries read projections
- Kafka Streams — Stream processing on top of the event bus: stateful operations (join, aggregate, window) over event streams
- Microservices — EDA is the preferred way for microservices to communicate, in place of synchronous REST calls
- Saga Pattern — Distributed transactions via a chain of events and compensating commands, built on top of the event bus
Вопросы для размышления
- Which service in the project would benefit most from a switch to event-driven communication: where does tight coupling create the biggest pain today?
- If user activity analytics needs to be added tomorrow, how much easier is it in an EDA system vs a synchronous one?
- What data in the application would actually benefit from being stored as an event log (Event Sourcing) rather than as current state?