Cloud Computing
Event-Driven Architecture
Amazon's own retail platform is a collection of decoupled services communicating via events. When a customer places an order, an OrderPlaced event triggers inventory updates, payment processing, shipping label generation, and email confirmation - all independently, all asynchronously. The checkout service does not wait for all these to complete; it emits an event and moves on. This decoupling is what enables each service to fail, scale, and deploy independently.
- Uber's driver-rider matching processes millions of location events per minute through Kafka (similar to Kinesis). Each driver GPS update is an event; consumers (matching, ETA, map display) subscribe independently and process at their own pace.
- Stripe processes billions of payment events per year through queue-based systems. Failed payments retry with exponential backoff via SQS dead-letter queues. If fraud detection is slow, the queue buffers load without dropping events.
- Netflix uses SNS to fan out encoding job completion events to dozens of downstream services - quality control, CDN warming, catalog update, recommendation engine - triggered by one event from the encoding pipeline.
SQS Message Queues
SQS decouples producers from consumers. A producer sends messages to a queue; consumers poll and process independently. SQS provides at-least-once delivery and configurable visibility timeout (the message is hidden from other consumers while being processed; if not deleted within the timeout, it reappears). Standard queues guarantee at-least-once delivery with best-effort ordering. FIFO queues guarantee exactly-once processing and strict ordering at up to 3,000 messages per second.
Dead-letter queues (DLQs) capture messages that fail processing after maxReceiveCount attempts. A CloudWatch alarm on DLQ message count alerts on-call engineers when consumers are failing. Without DLQs, failed messages loop indefinitely or are dropped silently, making debugging impossible.
What does SQS at-least-once delivery mean for consumer logic?
SNS Pub/Sub Topics
SNS implements the publish/subscribe pattern. A producer publishes a message to an SNS topic; all subscribers receive a copy simultaneously. Subscribers can be SQS queues, Lambda functions, HTTP endpoints, email addresses, or SMS. The canonical SNS+SQS fan-out pattern: one SNS topic delivers to multiple SQS queues, letting different consumer services receive the same event and process independently at their own pace.
SNS message filtering (FilterPolicy) lets each subscriber receive only the events it cares about - reducing processing cost and simplifying consumer logic. Without filtering, every consumer must receive all messages and discard irrelevant ones. SNS filtering happens before delivery, saving Lambda invocations and SQS storage.
What problem does the SNS fan-out to SQS pattern solve?
EventBridge Event Bus
EventBridge is a serverless event bus that routes events based on content-based filtering rules. Unlike SNS (which delivers to all subscribers), EventBridge applies rules that match specific event patterns - fields, values, or presence of certain keys. Events come from AWS services (EC2 state changes, S3 events, CloudTrail), custom applications, or SaaS partners (Zendesk, Stripe, GitHub). Targets include Lambda, SQS, Step Functions, API Gateway, and cross-account buses.
EventBridge Pipes connect sources (SQS, Kinesis, DynamoDB Streams) to targets with optional filtering and enrichment. A Pipe can filter Kinesis stream records matching specific criteria before forwarding to a Lambda - reducing invocations and cost vs processing all records in Lambda.
How does EventBridge routing differ from SNS fan-out?
Kinesis Data Streams
Kinesis Data Streams provides real-time, ordered, durable event streaming. Unlike SQS (which deletes messages after consumption), Kinesis retains records for 1-7 days (up to 365 with Extended Data Retention). Multiple consumer groups can independently read the same stream at different positions. Shards are the throughput unit: each shard supports 1 MB/s write and 2 MB/s read. Records with the same partition key go to the same shard and are strictly ordered within that shard.
Hot shards are the most common Kinesis scaling problem. If one partition key (e.g., one extremely active user ID) receives disproportionate traffic, that shard reaches its 1 MB/s limit. Solutions: shard splitting (increases shard count), key randomization (add a suffix to distribute load), or redesigning the partition strategy.
What is the key advantage of Kinesis over SQS for event streaming workloads?
Key Ideas
- **SQS:** managed message queue with at-least-once delivery and visibility timeout; FIFO variant for exactly-once ordered processing; dead-letter queues for failed message handling
- **SNS fan-out:** publish once, deliver to multiple SQS queues/Lambda simultaneously; content filtering per subscriber eliminates irrelevant message processing
- **EventBridge:** content-based routing rules match event patterns and route to different targets; accepts AWS services, custom apps, and SaaS partner events
- **Kinesis:** ordered, durable streams with replay capability; multiple consumers at different positions; shard-based throughput scaling; right for analytics and event sourcing
Related Topics
Event-driven patterns connect every cloud service:
- Microservices in the Cloud — Async messaging decouples microservices - a queue failure does not block service-to-service communication
- Data Lakes and Analytics — Kinesis Firehose delivers streaming events directly to S3 data lake for batch analytics and ML pipelines
- Multi-Account Strategy — EventBridge cross-account event buses enable centralized event monitoring across all organization accounts
Вопросы для размышления
- SQS at-least-once delivery requires idempotent consumers. For a payment processing consumer, what does idempotency mean in practice - and how is a legitimate second payment attempt distinguished from a duplicate delivery?
- EventBridge event patterns can filter on nested JSON fields. What security risk exists when user-controlled data flows through EventBridge event detail fields without validation?
- Kinesis hot partition keys (one user generating 90% of events) cannot be solved by adding more shards. What architectural options exist for redistributing hot partition key traffic?