Node.js Internals
Diagnostics Channel: Tracing and Monitoring
How to understand why the production server is slowing down? Diagnostics Channel allows you to see EVERYTHING: HTTP requests, DB queries, memory leaks - without changing the application code.
- **DataDog APM** traces millions of Node.js applications through the Diagnostics Channel, automatically detecting N+1 queries and slow endpoints.
- **AWS X-Ray** uses channels for distributed tracing in Lambda - links requests through microservices into a single trace.
- **Sentry** subscribes to http.client.request for automatic breadcrumb logging - all API calls before the error are visible.
- **Netflix** uses channels for A/B testing - intercepts requests and randomly changes behavior without deployment
Why is the Diagnostics Channel needed
**Diagnostics Channel** (Node.js 15+) - a built-in pub/sub system for application instrumentation. Unlike EventEmitter, this is not business logic but an infrastructure layer for monitoring: tracing HTTP requests, database connections, file operations.
**Key Idea:** Diagnostics Channel allows libraries to publish diagnostic events, and APM tools (DataDog, New Relic) to subscribe to them **without changing the application code**.
Differences from EventEmitter
| Criterion | EventEmitter | Diagnostics Channel |
|---|---|---|
| Purpose | Business logic (user.created, order.paid) | Infrastructure (http.request, db.query) |
| Lifecycle | Created explicitly (new EventEmitter) | Global channels (channel('http.server.request')) |
| Performance | Overhead with each emit | Zero-cost if there are no subscribers |
| Use Case | Connection between modules | Monitoring, tracing, profiling |
Zero-cost abstraction
If no one is subscribed to the channel, `channel.publish()` does nothing - there is no overhead for data serialization, event object creation. This allows libraries to add diagnostics without impacting performance.
**Important:** Diagnostics Channel does not replace logging (Winston, Pino) - it is a low-level mechanism for APM. Logs are written to files, channels send metrics in real-time.
What is the main difference between Diagnostics Channel and EventEmitter?
Creating and Publishing Channels
Channels are created via `diagnostics_channel.channel(name)` - this is a global registry, calls with the same name return the same channel. Publish events through `channel.publish(message)`, subscribe via `channel.subscribe(callback)`.
Naming convention
Use the namespace pattern: `module.operation.phase` - for example, `http.server.request.start`, `db.postgres.query.end`. This allows APM tools to filter events by prefixes.
Message structure
**Best Practice:** Pass objects with a minimal set of fields - subscribers can enrich the data. Avoid serializing large payloads (req.body), only pass metadata.
Subscriber check
**Errors in subscribers:** If a callback throws an exception, it does not interrupt `publish()` - the remaining subscribers will still be called. However, the error will be output to `stderr`.
Why check channel.hasSubscribers before publish()?
Subscription to system channels
Node.js emits events for built-in modules: `http`, `net`, `dns`, `child_process`. By subscribing to them, you can collect metrics without changing the application code.
System HTTP channels
| Channel | When is it published | Data |
|---|---|---|
| http.client.request.start | Start of an outgoing HTTP request | { request } |
| http.client.response.finish | Full response received | { request, response } |
| http.server.request.start | Incoming request to the server | { request, response, socket } |
| http.server.response.finish | Response sent to the client | { request, response, socket } |
Tracing outgoing requests
net and dns channels
**Use Case:** Audit logging - record all outgoing connections for compliance (SOC 2, GDPR). Diagnostics Channel allows you to do this centrally, without changing the module code.
**Beware of leaks:** Do not store references to `request`/`response` in a Map without cleanup - this will lead to a memory leak. Use WeakMap or remove them in finish events.
What advantage does subscribing to http.client.request.start provide instead of wrapping over http.request?
Integration with OpenTelemetry
**OpenTelemetry** - a standard for distributed tracing, metrics, and logs. Diagnostics Channel is the main mechanism through which OTel collects traces from Node.js applications without changing the code.
How OTel auto-instrumentation works
Context Propagation: linking across services
OpenTelemetry transmits the `traceparent` header across services to link requests into a single trace. The Diagnostics Channel automatically adds this header to outgoing requests.
**Traceparent format:** `00-{traceId}-{spanId}-{flags}`. For example: `00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01`.
Custom spans and attributes
How does OpenTelemetry instrument a Node.js application without changing the code?
APM Instrumentation: DataDog Integration
**APM (Application Performance Monitoring)** - tools for monitoring production: DataDog, New Relic, Dynatrace. They use Diagnostics Channel for automatic collection of metrics, traces, profiling.
DataDog APM: automatic instrumentation
What DataDog collects via Diagnostics Channel
- **HTTP traces:** incoming and outgoing requests (method, URL, status, latency)
- **Database queries:** auto-instrumentation for pg, mysql, mongodb, redis
- **Message queues:** RabbitMQ, Kafka, SQS
- **External APIs:** fetch, axios, node-fetch
- **DNS lookups and TCP connections**
Custom metrics and tags
Profiling: CPU and Memory
DataDog collects CPU profiles (flamegraphs) and heap snapshots through the Diagnostics Channel. This helps identify bottlenecks in production without impacting performance.
**Zero-overhead profiling:** DataDog uses sampling - it takes a stack trace every 10ms, which results in <1% CPU overhead.
Integration with logs
**Production considerations:** - Do not trace PII (emails, credit cards) - use filters - Trace sampling: 100% in dev, 1-10% in production - Limit spans per trace: 1000 (otherwise the trace will be truncated)
Diagnostics Channel is only needed for APM tools, it is useless in a regular application.
Diagnostics Channel is useful for any monitoring: log auditing, rate limiting, A/B testing.
Through channels, you can: 1. Log all outgoing API calls for compliance 2. Track rate limits by IP (subscribe to http.server.request.start) 3. A/B tests: intercept requests and randomly change behavior 4. Audit trail: record who accessed which resource and when
What advantage does continuous profiling through DataDog provide?
Key Ideas
- **Diagnostics Channel** - pub/sub for monitoring, not for business logic (use EventEmitter for the latter)
- **Zero-cost overhead:** if there are no subscribers, publish() does nothing - check channel.hasSubscribers
- **System channels** (http, net, dns) allow monitoring Node.js without changing the code.
- **OpenTelemetry** auto-instrumentation works through the Diagnostics Channel - automatic distributed tracing
- **APM tools** (DataDog, New Relic) use channels for collecting metrics, profiling, error tracking
Related topics
Diagnostics Channel - part of the observability ecosystem in Node.js:
- Performance Hooks — Performance measurement (performance.now, PerformanceObserver) complements tracing through channels
- Async Hooks — Low-level API for tracing async context - the foundation for AsyncLocalStorage and OTel context propagation
- Worker Threads — Tracing workers requires passing traceparent through workerData or MessagePort
Вопросы для размышления
- Which parts of your application need monitoring? HTTP API, DB queries, external integrations?
- Can you replace middleware logging with subscription through Diagnostics Channel?
- If you are using microservices, how do you trace requests between services? Do you use traceparent headers?