Node.js Internals

Diagnostics Channel: Tracing and Monitoring

How to understand why the production server is slowing down? Diagnostics Channel allows you to see EVERYTHING: HTTP requests, DB queries, memory leaks - without changing the application code.

**DataDog APM** traces millions of Node.js applications through the Diagnostics Channel, automatically detecting N+1 queries and slow endpoints.
**AWS X-Ray** uses channels for distributed tracing in Lambda - links requests through microservices into a single trace.
**Sentry** subscribes to http.client.request for automatic breadcrumb logging - all API calls before the error are visible.
**Netflix** uses channels for A/B testing - intercepts requests and randomly changes behavior without deployment

Why is the Diagnostics Channel needed

**Diagnostics Channel** (Node.js 15+) - a built-in pub/sub system for application instrumentation. Unlike EventEmitter, this is not business logic but an infrastructure layer for monitoring: tracing HTTP requests, database connections, file operations.

**Key Idea:** Diagnostics Channel allows libraries to publish diagnostic events, and APM tools (DataDog, New Relic) to subscribe to them **without changing the application code**.

Differences from EventEmitter

Criterion	EventEmitter	Diagnostics Channel
Purpose	Business logic (user.created, order.paid)	Infrastructure (http.request, db.query)
Lifecycle	Created explicitly (new EventEmitter)	Global channels (channel('http.server.request'))
Performance	Overhead with each emit	Zero-cost if there are no subscribers
Use Case	Connection between modules	Monitoring, tracing, profiling

Zero-cost abstraction

If no one is subscribed to the channel, `channel.publish()` does nothing - there is no overhead for data serialization, event object creation. This allows libraries to add diagnostics without impacting performance.

**Important:** Diagnostics Channel does not replace logging (Winston, Pino) - it is a low-level mechanism for APM. Logs are written to files, channels send metrics in real-time.

What is the main difference between Diagnostics Channel and EventEmitter?

Creating and Publishing Channels

Channels are created via `diagnostics_channel.channel(name)` - this is a global registry, calls with the same name return the same channel. Publish events through `channel.publish(message)`, subscribe via `channel.subscribe(callback)`.

Naming convention

Use the namespace pattern: `module.operation.phase` - for example, `http.server.request.start`, `db.postgres.query.end`. This allows APM tools to filter events by prefixes.

Message structure

**Best Practice:** Pass objects with a minimal set of fields - subscribers can enrich the data. Avoid serializing large payloads (req.body), only pass metadata.

Subscriber check

**Errors in subscribers:** If a callback throws an exception, it does not interrupt `publish()` - the remaining subscribers will still be called. However, the error will be output to `stderr`.

Why check channel.hasSubscribers before publish()?

Subscription to system channels

Node.js emits events for built-in modules: `http`, `net`, `dns`, `child_process`. By subscribing to them, you can collect metrics without changing the application code.

System HTTP channels

Channel	When is it published	Data
http.client.request.start	Start of an outgoing HTTP request	{ request }
http.client.response.finish	Full response received	{ request, response }
http.server.request.start	Incoming request to the server	{ request, response, socket }
http.server.response.finish	Response sent to the client	{ request, response, socket }

Tracing outgoing requests

net and dns channels

**Use Case:** Audit logging - record all outgoing connections for compliance (SOC 2, GDPR). Diagnostics Channel allows you to do this centrally, without changing the module code.

**Beware of leaks:** Do not store references to `request`/`response` in a Map without cleanup - this will lead to a memory leak. Use WeakMap or remove them in finish events.

What advantage does subscribing to http.client.request.start provide instead of wrapping over http.request?

Integration with OpenTelemetry

**OpenTelemetry** - a standard for distributed tracing, metrics, and logs. Diagnostics Channel is the main mechanism through which OTel collects traces from Node.js applications without changing the code.

How OTel auto-instrumentation works

Context Propagation: linking across services

OpenTelemetry transmits the `traceparent` header across services to link requests into a single trace. The Diagnostics Channel automatically adds this header to outgoing requests.

**Traceparent format:** `00-{traceId}-{spanId}-{flags}`. For example: `00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01`.

Custom spans and attributes

How does OpenTelemetry instrument a Node.js application without changing the code?

APM Instrumentation: DataDog Integration

**APM (Application Performance Monitoring)** - tools for monitoring production: DataDog, New Relic, Dynatrace. They use Diagnostics Channel for automatic collection of metrics, traces, profiling.

DataDog APM: automatic instrumentation

What DataDog collects via Diagnostics Channel

**HTTP traces:** incoming and outgoing requests (method, URL, status, latency)
**Database queries:** auto-instrumentation for pg, mysql, mongodb, redis
**Message queues:** RabbitMQ, Kafka, SQS
**External APIs:** fetch, axios, node-fetch
**DNS lookups and TCP connections**

Custom metrics and tags

Profiling: CPU and Memory

DataDog collects CPU profiles (flamegraphs) and heap snapshots through the Diagnostics Channel. This helps identify bottlenecks in production without impacting performance.

**Zero-overhead profiling:** DataDog uses sampling - it takes a stack trace every 10ms, which results in <1% CPU overhead.

Integration with logs

**Production considerations:** - Do not trace PII (emails, credit cards) - use filters - Trace sampling: 100% in dev, 1-10% in production - Limit spans per trace: 1000 (otherwise the trace will be truncated)

Diagnostics Channel is only needed for APM tools, it is useless in a regular application.

Diagnostics Channel is useful for any monitoring: log auditing, rate limiting, A/B testing.

Through channels, you can: 1. Log all outgoing API calls for compliance 2. Track rate limits by IP (subscribe to http.server.request.start) 3. A/B tests: intercept requests and randomly change behavior 4. Audit trail: record who accessed which resource and when

What advantage does continuous profiling through DataDog provide?

Key Ideas

**Diagnostics Channel** - pub/sub for monitoring, not for business logic (use EventEmitter for the latter)
**Zero-cost overhead:** if there are no subscribers, publish() does nothing - check channel.hasSubscribers
**System channels** (http, net, dns) allow monitoring Node.js without changing the code.
**OpenTelemetry** auto-instrumentation works through the Diagnostics Channel - automatic distributed tracing
**APM tools** (DataDog, New Relic) use channels for collecting metrics, profiling, error tracking

Вопросы для размышления

Which parts of your application need monitoring? HTTP API, DB queries, external integrations?
Can you replace middleware logging with subscription through Diagnostics Channel?
If you are using microservices, how do you trace requests between services? Do you use traceparent headers?

Связанные уроки

arch-09-cache

Node.js Internals

Diagnostics Channel: Tracing and Monitoring

How to understand why the production server is slowing down? Diagnostics Channel allows you to see EVERYTHING: HTTP requests, DB queries, memory leaks - without changing the application code.

**DataDog APM** traces millions of Node.js applications through the Diagnostics Channel, automatically detecting N+1 queries and slow endpoints.
**AWS X-Ray** uses channels for distributed tracing in Lambda - links requests through microservices into a single trace.
**Sentry** subscribes to http.client.request for automatic breadcrumb logging - all API calls before the error are visible.
**Netflix** uses channels for A/B testing - intercepts requests and randomly changes behavior without deployment

Why is the Diagnostics Channel needed

**Key Idea:** Diagnostics Channel allows libraries to publish diagnostic events, and APM tools (DataDog, New Relic) to subscribe to them **without changing the application code**.

Differences from EventEmitter

Criterion	EventEmitter	Diagnostics Channel
Purpose	Business logic (user.created, order.paid)	Infrastructure (http.request, db.query)
Lifecycle	Created explicitly (new EventEmitter)	Global channels (channel('http.server.request'))
Performance	Overhead with each emit	Zero-cost if there are no subscribers
Use Case	Connection between modules	Monitoring, tracing, profiling

Zero-cost abstraction

**Important:** Diagnostics Channel does not replace logging (Winston, Pino) - it is a low-level mechanism for APM. Logs are written to files, channels send metrics in real-time.

What is the main difference between Diagnostics Channel and EventEmitter?

Creating and Publishing Channels

Naming convention

Use the namespace pattern: `module.operation.phase` - for example, `http.server.request.start`, `db.postgres.query.end`. This allows APM tools to filter events by prefixes.

Message structure

**Best Practice:** Pass objects with a minimal set of fields - subscribers can enrich the data. Avoid serializing large payloads (req.body), only pass metadata.

Subscriber check

**Errors in subscribers:** If a callback throws an exception, it does not interrupt `publish()` - the remaining subscribers will still be called. However, the error will be output to `stderr`.

Why check channel.hasSubscribers before publish()?

Subscription to system channels

Node.js emits events for built-in modules: `http`, `net`, `dns`, `child_process`. By subscribing to them, you can collect metrics without changing the application code.

System HTTP channels

Channel	When is it published	Data
http.client.request.start	Start of an outgoing HTTP request	{ request }
http.client.response.finish	Full response received	{ request, response }
http.server.request.start	Incoming request to the server	{ request, response, socket }
http.server.response.finish	Response sent to the client	{ request, response, socket }

Tracing outgoing requests

net and dns channels

**Use Case:** Audit logging - record all outgoing connections for compliance (SOC 2, GDPR). Diagnostics Channel allows you to do this centrally, without changing the module code.

**Beware of leaks:** Do not store references to `request`/`response` in a Map without cleanup - this will lead to a memory leak. Use WeakMap or remove them in finish events.

What advantage does subscribing to http.client.request.start provide instead of wrapping over http.request?

Integration with OpenTelemetry

How OTel auto-instrumentation works

Context Propagation: linking across services

OpenTelemetry transmits the `traceparent` header across services to link requests into a single trace. The Diagnostics Channel automatically adds this header to outgoing requests.

**Traceparent format:** `00-{traceId}-{spanId}-{flags}`. For example: `00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01`.

Custom spans and attributes

How does OpenTelemetry instrument a Node.js application without changing the code?

APM Instrumentation: DataDog Integration

**APM (Application Performance Monitoring)** - tools for monitoring production: DataDog, New Relic, Dynatrace. They use Diagnostics Channel for automatic collection of metrics, traces, profiling.

DataDog APM: automatic instrumentation

What DataDog collects via Diagnostics Channel

**HTTP traces:** incoming and outgoing requests (method, URL, status, latency)
**Database queries:** auto-instrumentation for pg, mysql, mongodb, redis
**Message queues:** RabbitMQ, Kafka, SQS
**External APIs:** fetch, axios, node-fetch
**DNS lookups and TCP connections**

Custom metrics and tags

Profiling: CPU and Memory

DataDog collects CPU profiles (flamegraphs) and heap snapshots through the Diagnostics Channel. This helps identify bottlenecks in production without impacting performance.

**Zero-overhead profiling:** DataDog uses sampling - it takes a stack trace every 10ms, which results in <1% CPU overhead.

Integration with logs

Diagnostics Channel is only needed for APM tools, it is useless in a regular application.

Diagnostics Channel is useful for any monitoring: log auditing, rate limiting, A/B testing.

What advantage does continuous profiling through DataDog provide?

Key Ideas

**Diagnostics Channel** - pub/sub for monitoring, not for business logic (use EventEmitter for the latter)
**Zero-cost overhead:** if there are no subscribers, publish() does nothing - check channel.hasSubscribers
**System channels** (http, net, dns) allow monitoring Node.js without changing the code.
**OpenTelemetry** auto-instrumentation works through the Diagnostics Channel - automatic distributed tracing
**APM tools** (DataDog, New Relic) use channels for collecting metrics, profiling, error tracking

Вопросы для размышления

Which parts of your application need monitoring? HTTP API, DB queries, external integrations?
Can you replace middleware logging with subscription through Diagnostics Channel?
If you are using microservices, how do you trace requests between services? Do you use traceparent headers?

Связанные уроки

arch-09-cache

Why is the Diagnostics Channel needed

Differences from EventEmitter

Zero-cost abstraction

Creating and Publishing Channels

Naming convention

Message structure

Subscriber check

Subscription to system channels

System HTTP channels

Tracing outgoing requests

net and dns channels

Integration with OpenTelemetry

How OTel auto-instrumentation works

Context Propagation: linking across services

Custom spans and attributes

APM Instrumentation: DataDog Integration

DataDog APM: automatic instrumentation

What DataDog collects via Diagnostics Channel

Custom metrics and tags

Profiling: CPU and Memory

Integration with logs

Key Ideas

Related topics

Вопросы для размышления

Связанные уроки

Why is the Diagnostics Channel needed

Differences from EventEmitter

Zero-cost abstraction

Creating and Publishing Channels

Naming convention

Message structure

Subscriber check

Subscription to system channels

System HTTP channels

Tracing outgoing requests

net and dns channels

Integration with OpenTelemetry

How OTel auto-instrumentation works

Context Propagation: linking across services

Custom spans and attributes

APM Instrumentation: DataDog Integration

DataDog APM: automatic instrumentation

What DataDog collects via Diagnostics Channel

Custom metrics and tags

Profiling: CPU and Memory

Integration with logs

Key Ideas

Related topics

Вопросы для размышления

Связанные уроки