Node.js Internals
Event Loop: The Heart of Node.js
Why can Netflix stream video to millions of users simultaneously on Node.js servers? Why does Discord handle billions of messages a day with minimal latency? The secret is not in the power of the servers, but in understanding the Event Loop - the mechanism that makes Node.js one of the most efficient solutions for I/O-intensive applications.
- **LinkedIn** migrated from Ruby on Rails to Node.js and reduced the number of servers from 30 to 3 while handling the same traffic. Reason: Event Loop efficiently uses a single thread instead of creating a new thread for each connection.
- **PayPal** after switching to Node.js doubled requests/sec while reducing response time by 35%. The Event Loop allowed handling API calls to banks in parallel, instead of sequentially waiting for each response.
- **Walmart** processes 500 million pageviews/month on Node.js, saving millions on infrastructure. The Event Loop allows a single server to maintain hundreds of thousands of WebSocket connections for real-time cart updates.
Loop Overview
Consider a waiter in a busy restaurant. Instead of standing at each table waiting for the customer to decide, the waiter moves continuously: taking an order here, delivering a dish there, picking up a check elsewhere. One person, dozens of tables served simultaneously. This is exactly how Node.js works.
**Event Loop** is an infinite loop that checks task queues and executes them one by one. Node.js uses a single thread, but thanks to the asynchronous model, it handles thousands of connections in parallel. The secret is that input-output (I/O) operations are performed in the background by the operating system or the libuv library, and JavaScript code is called only when the result is ready.
**Why is Node.js faster than traditional multithreaded servers for I/O-intensive tasks?** In Apache or Tomcat, each connection creates a new thread. 1000 connections = 1000 threads = gigabytes of memory for stacks + expensive context switches. Node.js uses a single thread for JS code and delegates I/O to the operating system. Result: millions of active WebSocket connections on a single server (like in WhatsApp).
Real example: API server processes 1000 requests
**Multithreaded Server (Apache):** - 1000 requests = 1000 threads - Each thread: ~1MB stack = 1GB memory - Context switch between threads: thousands of switches/sec - CPU spends time managing threads rather than doing useful work **Node.js:** - 1000 requests = 1 JavaScript thread + libuv thread pool (4-128 threads for I/O) - Memory: ~50MB for JS heap + small overhead for callbacks - While one request waits for the database, the Event Loop handles others - CPU is busy only when there is real work (executing JS code) That's why Node.js has become the standard for microservices and real-time applications.
**Main rule:** Never block the Event Loop! Any synchronous operation that lasts more than a few milliseconds (complex calculations, synchronous reading of large files, `JSON.parse()` on megabytes of data) will freeze the entire server. For CPU-intensive tasks, use Worker Threads or offload them to separate microservices.
An API server on Node.js handles requests to a database. The average request takes: 5ms CPU + 45ms waiting for a database response. How many requests per second can one process theoretically handle?
Loop Phases
Event Loop is not just `while(true)`. It is a strictly ordered cycle of **6 phases**, each of which processes its own type of tasks. Understanding the phases is critical for debugging: why `setImmediate()` sometimes executes before `setTimeout(0)`, why `process.nextTick()` can freeze the server, how the poll phase works, which takes up most of the time.
After **each phase**, **microtasks** are executed: first the entire `process.nextTick()` queue, then `Promise.then()` / `queueMicrotask()`.
**Details of each phase:** **1. Timers** - executes `setTimeout()` and `setInterval()` callbacks whose timers have expired. Important: timers do not guarantee exact execution time. `setTimeout(fn, 100)` means "execute no earlier than after 100ms," but it may be later if the Event Loop is busy. **2. Pending callbacks** - executes I/O callbacks deferred from the previous cycle (e.g., TCP errors). **3. Idle, prepare** - an internal libuv phase used for preparation for poll. **4. Poll** - THE MOST IMPORTANT phase. Here the Event Loop receives new I/O events (incoming HTTP requests, database responses, data from sockets) and executes their callbacks. If the queue is empty, the Event Loop **blocks** here and waits for new events (but not longer than the nearest timer). **5. Check** - executes `setImmediate()` callbacks. This phase exists to allow code execution immediately after the poll phase. **6. Close callbacks** - executes connection closure callbacks (`socket.on('close')`, `server.close()`).
Real Case: Slow Timers
You set `setTimeout(() => sendMetrics(), 5000)` to send metrics every 5 seconds. But in production, metrics arrive every 10-15 seconds. Why? **Reason:** The Event Loop is blocked by a CPU-intensive task. For example, `JSON.stringify()` on a large object takes 8 seconds. During this time: - The timer expired after 5 seconds - But the Event Loop is stuck in another callback (JSON parsing) - Only after 8 seconds will the Event Loop reach the timers phase - The timer will execute with a 3-second delay **Solution:** Break heavy operations into chunks or use Worker Threads.
**Danger of blocking operations:** A single `fs.readFileSync()` on a 100MB file will block the Event Loop for seconds. During this time: - All new HTTP requests are queued by the OS (or receive ECONNREFUSED) - All timers execute with a delay - WebSocket connections may time out In production, this means complete downtime of the service. ALWAYS use asynchronous versions: `fs.readFile()`, `crypto.pbkdf2()`, etc.
An HTTP server is created. In each request handler, `crypto.pbkdf2Sync()` (a CPU-intensive password hash) runs and takes 500ms. The server receives 10 requests simultaneously. How many seconds does it take to process the last request?
Microtasks
Microtasks are a special queue that executes **between phases of the Event Loop** (and even between individual callbacks within a phase). In Node.js, there are two types of microtasks with different priorities: **`process.nextTick()`** (highest priority) and **Promise microtasks** (`Promise.then()`, `queueMicrotask()`).
Picture the Event Loop as a mailman visiting houses (phases) in sequence. Microtasks are urgent letters that **must** be delivered before moving to the next house. `process.nextTick()` letters are marked "open immediately" - they are processed BEFORE regular microtasks.
**Critical difference from macrotasks:** setTimeout, setImmediate, I/O callbacks are macrotasks. They are executed in their phases of the Event Loop. Microtasks are executed **between** phases and have priority. Even if 100 setTimeouts are waiting in the timers phase, ALL microtasks will be executed first.
**DANGER: process.nextTick() can freeze the Event Loop!** If each nextTick callback creates a new nextTick, an infinite chain is formed. The Event Loop will never reach the next phase because the nextTick queue is constantly being replenished. This is called **nextTick starvation**.
Real bug: race condition due to nextTick
```typescript class Database { private connected = false; connect() { // Emulation of async connection setImmediate(() => { this.connected = true; this.emit('ready'); }); } query(sql: string) { if (!this.connected) throw new Error('Not connected!'); // ... } } const db = new Database(); db.connect(); db.query('SELECT * FROM users'); // ERROR! // Problem: query() executes synchronously, // while connect() will complete only in the next Event Loop phase. // Solution 1: Promise-based API async connect() { await new Promise(resolve => { setImmediate(() => { this.connected = true; resolve(); }); }); } await db.connect(); db.query('SELECT * FROM users'); // OK // Solution 2: callback db.connect(() => { db.query('SELECT * FROM users'); // OK }); ``` This is a classic example of why it's important to understand asynchrony at the Event Loop level, not just "async/await magic".
**When to use each type:** - **`process.nextTick()`** - for critical logic that must be executed BEFORE any I/O operations. For example, emitting an 'error' event before the function completes. Use VERY cautiously! - **`Promise.then()` / `queueMicrotask()`** - the standard way for asynchronous logic. More predictable, less risk of starvation. - **`setImmediate()`** - for deferring work to the next Event Loop cycle. Ideal for breaking heavy tasks into chunks. - **`setTimeout(fn, 0)`** - similar to setImmediate, but with a guarantee of "not earlier than in 1ms". Almost never needed in Node.js (there is setImmediate).
Given this code: ```typescript setTimeout(() => console.log('A'), 0); Promise.resolve().then(() => { console.log('B'); process.nextTick(() => console.log('C')); }); process.nextTick(() => console.log('D')); ``` What is the order of output?
Poll Phase
**Poll phase** is the heart of the Event Loop, the place where all the magic of Node.js asynchrony happens. It is here that the Event Loop receives new events from the operating system: incoming HTTP requests, responses from the database, data from files, socket events. The Poll phase is the only phase where the Event Loop can **block** and wait for new events.
Picture a waiter who has visited all tables (completed all Event Loop phases) and now stands at the entrance waiting for new customers. The wait is not indefinite - if an order is being prepared in the kitchen (a pending timer exists), the waiter will check the kitchen (return to the timers phase). The Poll phase works the same way: it blocks and waits for I/O events, but no longer than the nearest timer.
**How the poll phase works under the hood:** Node.js uses system calls like `epoll` (Linux), `kqueue` (macOS/BSD), `IOCP` (Windows) - these are OS kernel mechanisms for efficiently monitoring multiple file descriptors. Instead of polling each socket in a loop, the OS notifies Node.js when data appears on a descriptor. This operates at the kernel level without creating threads.
Real Case: Why the Server "Sleeps" When Idle
You launched an Express server and are looking at htop - the Node.js process shows 0% CPU. This is not a bug, it's a **feature**! **What's happening:** 1. The server has processed all requests 2. The Event Loop reached the poll phase 3. The poll queue is empty, no pending timers 4. The Event Loop called `epoll_wait()` with timeout = ∞ 5. The OS put the process into SLEEP state 6. The process does not consume CPU until an event arrives **A new HTTP request arrives:** 1. The TCP packet hits the network card 2. The Linux kernel processes the TCP handshake 3. Data goes into the socket buffer 4. `epoll_wait()` returns control with event information 5. Node.js wakes up and processes the request 6. All this takes microseconds That's why Node.js can handle thousands of connections with minimal resource consumption - most of the time it just sleeps, waiting for events from the OS.
**Danger: CPU-intensive task in I/O callback blocks poll phase** ```typescript server.on('request', (req, res) => { // Parsing a huge JSON const data = JSON.parse(hugeString); // 2 seconds res.json({ ok: true }); }); ``` While the first request is parsing JSON: - The Event Loop is stuck in the poll phase callback - New HTTP requests accumulate in the OS queue - Other I/O events are not processed - The server appears "frozen" to new clients **Solution:** Break into chunks or use Worker Threads for heavy operations.
**Optimization of the poll phase for high-load applications:** 1. **UV_THREADPOOL_SIZE** - size of the libuv thread pool (default is 4). Increase it to the number of CPU cores for fs/crypto operations: ```bash UV_THREADPOOL_SIZE=16 node server.js ``` 2. **Use streams** instead of buffering the entire file in memory: ```typescript fs.createReadStream('huge.json') .pipe(parser) .pipe(res); ``` The Event Loop will process chunks without blocking on the entire file. 3. **Worker Threads** for CPU-intensive tasks - offload parsing, cryptography, compression to separate threads: ```typescript const { Worker } = require('worker_threads'); const worker = new Worker('./heavy-task.js', { workerData: data }); ``` 4. **Graceful degradation** - if the Event Loop lag exceeds the threshold, reject new requests with 503: ```typescript const toobusy = require('toobusy-js'); app.use((req, res, next) => { if (toobusy()) return res.status(503).send('Server too busy'); next(); }); ```
Key Ideas
- **Event Loop consists of 6 phases:** timers, pending callbacks, idle/prepare, poll, check, close. Each phase processes its type of tasks in a strict order. Understanding the phases is critical for the predictable behavior of asynchronous code.
- **Microtasks are executed between phases:** process.nextTick has the highest priority, then Promise.then/queueMicrotask, followed by macrotasks (setTimeout, setImmediate). Microtasks can cause starvation by blocking the Event Loop.
- **Poll phase - the heart of asynchronicity:** this is where the Event Loop receives I/O events from the OS through epoll/kqueue and can block while waiting for new events. This allows Node.js to handle millions of connections with minimal CPU consumption.
- **NEVER block the Event Loop:** any synchronous operation >10ms blocks ALL connections. Use asynchronous APIs, break heavy tasks into chunks, offload CPU-intensive work to Worker Threads.
Related topics
Event Loop is the foundation of Node.js's asynchronous nature. For a complete understanding, study the related concepts:
- libuv and Thread Pool — libuv is a C library that implements an Event Loop. Understanding the thread pool (for fs/crypto operations) and async I/O (for network) explains why some operations are parallel and others are not.
- Streams and Backpressure — Streams use the Event Loop to process data in chunks without blocking memory. The backpressure mechanism prevents memory overflow with a slow consumer.
- Worker Threads — For CPU-intensive tasks, the Event Loop is not sufficient - separate threads are needed. Worker Threads allows executing JS code in parallel without blocking the main Event Loop.
- Memory Management and Garbage Collection — GC is executed synchronously and blocks the Event Loop. Understanding V8 heap and GC patterns is critical for high-load applications.
Вопросы для размышления
- A server processes 1000 req/sec with an avg latency of 50ms. After deploying a new feature, latency jumps to 500ms while CPU stays at 30%. How can Event Loop monitoring help diagnose the root cause?
- When setImmediate is used to break a heavy task into chunks, HTTP requests still time out under load. Which phase of the Event Loop is being blocked and why?
- In which scenarios is process.nextTick preferable to Promise.then, despite the risk of queue starvation? Provide a real-world example from a Node.js library (e.g., EventEmitter).