Real-Time Backend
Webhooks
An online store connects to Stripe. A payment goes through, but the store never gets the notification - the endpoint was under maintenance. The user waits for an 'Order paid' email that never arrives. An hour later they call support. Stripe's retry policy could prevent this, but only if the endpoint is implemented correctly.
- **Stripe:** processes >1B webhook events per month, retries delivery for up to 3 days (5 attempts with exponential backoff). HMAC-SHA256 signature includes a timestamp for replay attack protection.
- **GitHub:** webhook timeout is 10 seconds. On failure it retries up to 3 times. `X-Hub-Signature-256` is HMAC-SHA256 of the raw body with the repository secret. Used for CI/CD integrations (Jenkins, CircleCI, Vercel).
- **Shopify:** stores webhook delivery attempt history for the last 5 days in the admin panel. Merchants see the status of each event - delivered, error, retry. This cuts support load by 40%.
- **Twilio:** when sending an SMS, it receives a webhook from carriers about delivery status (delivered/failed). If the Twilio client endpoint is down, statuses are buffered and delivered on restoration, for up to 24 hours.
What is a webhook
A webhook is an HTTP POST request that your server sends to a client URL when an event happens. It is the inverse of polling: instead of the client asking 'any new data?' every N seconds, the server notifies the client the moment an event happens. Stripe notifies on a payment, GitHub on a push to a repo, Twilio on an incoming SMS.
Stripe handles >1B webhook events per month. Each transaction triggers many events: payment_intent.created, payment_intent.processing, payment_intent.succeeded, charge.succeeded. Each event is delivered to every registered endpoint of the merchant.
A core constraint of webhooks: the client must have a publicly reachable HTTPS endpoint. That makes webhooks awkward for local development (localhost is not reachable from the internet). The fix is tunneling tools like ngrok or specialized tooling (Stripe CLI, webhook.site).
An e-commerce shop wants instant updates on order status when the courier service changes its delivery status. Which approach should they pick?
Retry Policies for Webhooks
The client endpoint can be unavailable: a deploy, a transient failure, a timeout. A reliable webhook system does not lose events on those failures - it retries delivery on a defined policy. The main catch: the client must be idempotent, otherwise retries create duplicates.
Stripe retries webhook delivery for up to 3 days on failure - 5 attempts with exponential backoff. GitHub Actions webhook timeout is 10 seconds: if the receiver does not reply in 10 sec, it counts as a failure. Shopify stores the history of all webhook delivery attempts in the admin panel for the last 5 days. The merchant can see the status of each delivery.
A webhook endpoint returned 500 Internal Server Error. What should the sending system do?
Webhook Signatures (HMAC)
When the server receives a webhook request, how do you know it came from a trusted sender and not an attacker? HMAC signatures solve this: the sender signs the payload with a secret key, the receiver verifies it with the same key. Without this anyone can send a fake 'payment succeeded' event.
Stripe uses HMAC-SHA256 with a timestamp in the signature: `t=timestamp,v1=signature`. The timestamp lets you reject requests older than 5 minutes - replay attack protection (an attacker intercepts a legitimate webhook and re-sends it). GitHub uses HMAC-SHA256 in the `X-Hub-Signature-256` header.
An attacker intercepted a legitimate 'payment.succeeded' webhook and re-sent it 10 minutes later. How should the system reject it?
Webhook Security Best Practices
A webhook endpoint is a public URL accepting data from the internet. Without the right defenses it becomes an attack vector: SSRF, payload injection, DDoS from request floods. Several layers of protection turn the endpoint from a hole into a solid interface.
- **Always verify the HMAC signature** before any payload processing - the first and most important check
- **Reply 200 OK quickly** (<5 sec), do the work asynchronously - otherwise the sender will assume delivery failed and start retrying
- **Idempotency by event_id** - store processed IDs and skip duplicates, because retries are guaranteed
- **Rate limit the endpoint** - cap requests per IP/source, DoS protection
- **Check Content-Type** - accept only application/json, reject other formats
- **Do not trust payload data for high-risk actions** - re-fetch via the provider's API (for example, Stripe SDK verify)
A critical mistake when verifying Stripe webhooks: computing HMAC from JSON.parse + JSON.stringify instead of the raw body. JSON serialization does not guarantee an identical string (key order, whitespace). HMAC is verified against the exact bytes of the raw body, which is why Stripe requires you to pass a raw Buffer rather than a parsed object.
Securing a webhook endpoint with HTTPS is enough - encryption guarantees security.
HTTPS encrypts traffic but does not authenticate the sender. Without HMAC verification, any attacker can POST to the endpoint with a fake payload. HTTPS + HMAC are both required.
HTTPS protects against interception in transit. HMAC protects against sender forgery. These are two different threat models: man-in-the-middle vs impersonation. Both are real and require different mechanisms.
The Stripe webhook endpoint suddenly receives thousands of requests per second. Each one takes 2-3 seconds to process (DB write, email send). What happens?
Summary
- **Webhook = push instead of poll**: the server notifies the client on an event (<100 ms) instead of the client polling every N seconds - cuts load and latency at the same time.
- **Retries + idempotency are inseparable**: any webhook system retries on error, so the receiver must handle duplicate deliveries without side effects (event_id check in Redis).
- **HMAC signature is over the raw body**: verifying against a JSON.parse object breaks the signature due to non-deterministic serialization - verify the exact bytes of the body, which is how the Stripe SDK does it.
Related Topics
Webhooks combine several core patterns of reliable systems:
- Webhook Delivery Guarantees — At-least-once delivery, ordering, dead letter queue - the next lesson extends the reliability story
- Idempotency — Receiver idempotency is required, because retries are guaranteed on any transient error
- Message Queues — Correct webhook receiver pattern: quick 200, then processing via a queue - the classic async processing pattern
Вопросы для размышления
- Stripe sent a payment webhook. Do you still need to re-fetch the data via the Stripe API if the HMAC signature is valid? In which scenarios is that critical?
- How do you test a webhook integration locally when localhost is not reachable from the internet? Which tools and approaches exist?
- A webhook endpoint receives events from 10 different providers (Stripe, GitHub, Twilio...), each with its own signature format. How do you structure the code to avoid duplicating verification logic?