Node.js Internals
libuv Deep Dive: Internals of the Asynchronous Engine
Do you think you know libuv? Open the Node.js source code: 90% of the code is wrappers around libuv. Understanding uv_loop_t, ref/unref, platform-specific I/O is the difference between "I use Node.js" and "I understand how it works".
- **Production: process does not terminate after deploy**: The new code added fs.watch() but forgot .close(). process._getActiveHandles() shows 500+ FSWatcher handles. Memory leak of 2GB per day. wtfnode provides a stack trace - fix in 5 minutes.
- **High-throughput API: bottleneck in thread pool**: The service processes 10k req/sec, each performs bcrypt.hash() (CPU-bound, goes into the thread pool). UV_THREADPOOL_SIZE=4 - disaster. Switching to worker_threads pool - latency dropped from 500ms to 50ms.
- **Cross-platform native addon**: Your C++ code works on Linux (epoll) but crashes on macOS (kqueue). You read the libuv source: edge-triggered semantics are different. Understanding platform-io saves a week of debugging.
Loop Internals: uv_loop_t and Execution Modes
The **uv_loop_t** structure is the heart of libuv. It contains a min-heap of timers, a queue of pending callbacks, a file descriptor for epoll/kqueue, and a counter of active handles. Each Node.js process has one default loop, but you can create a custom one.
**uv_run()** accepts three modes: **UV_RUN_DEFAULT** (runs while there are active handles), **UV_RUN_ONCE** (one iteration, useful for embedding), **UV_RUN_NOWAIT** (poll without blocking, for integration with other event loops).
**Embedding libuv** in your event loop: use UV_RUN_NOWAIT in combination with your polling mechanism. For example, a GUI application can alternate uv_run(UV_RUN_NOWAIT) with handling UI events.
**UV_RUN_DEFAULT** - the standard mode in Node.js. The loop continues as long as `uv_loop_alive()` returns true (there are active handles or pending requests). When all handles are closed, the event loop terminates.
**UV_RUN_ONCE** is useful for embedding: you call it manually from your main loop, handing control to libuv for one iteration, then return to your logic (for example, UI processing in Qt/GTK).
**uv_backend_timeout()** calculates how many milliseconds can be blocked in epoll_wait/kevent. If there are pending callbacks or setImmediate, timeout = 0 (non-blocking poll). If there is a timer in 500ms, timeout = 500ms.
What is the difference between UV_RUN_ONCE and UV_RUN_NOWAIT?
Lifecycle of handles and requests: ref/unref, uv_close()
**Handle** lives until you explicitly call `.close()`. **Request** is automatically destroyed after the callback. A memory leak in Node.js is almost always unclosed handles: sockets, timers, fs watchers.
**ref/unref** controls whether a handle should keep the event loop alive. By default, the handle is ref=1 (the event loop will not terminate). `.unref()` sets it to ref=0 (the event loop can terminate even if the handle is active).
**uv_loop_alive()** returns true if `active_handles > 0 || active_reqs > 0 || closing_handles > 0`. Unref'd handles are not counted in active_handles. Therefore, the process can terminate even if there is an unref'd timer.
**Four types of special handles**: **idle** (called every loop iteration if there is other work), **prepare** (before the poll phase), **check** (after poll, used for setImmediate), **async** (thread-safe wakeup).
**uv_close() - asynchronous operation!** You call `uv_close(handle, close_cb)`, but the handle is not destroyed immediately. First, it is marked as closing, then in the close callbacks phase `close_cb` is called, and only then is the memory freed.
**Callback hell** in libuv: to properly close a handle, you need to call `uv_close()` and wait for `close_cb`. If you have 10 handles, you end up with 10 nested callbacks. Solution: a counter for pending closes + a single cleanup callback.
What happens if you call `setInterval().unref()` and there are no other active handles?
Cross-platform abstractions: epoll, kqueue, IOCP
**epoll (Linux)** - edge-triggered mechanism for monitoring file descriptors. You add a fd to epoll via `epoll_ctl(EPOLL_CTL_ADD)`, then `epoll_wait()` blocks until an event (readable/writable/error).
**kqueue (macOS/BSD)** - a more versatile counterpart to epoll. It supports not only sockets but also file system events (EVFILT_VNODE), signals (EVFILT_SIGNAL), timers (EVFILT_TIMER). libuv uses only EVFILT_READ/WRITE for compatibility.
**IOCP (Windows)** - completion-based model: you do not wait for readiness ("socket ready to read"), but receive a notification "operation completed". A different approach at its core. libuv emulates a readiness model on top of IOCP.
**Edge-triggered vs Level-triggered**: epoll/kqueue operate in edge-triggered mode - they notify only when there is a change in state. If you have not read all the data, the next notification will not come until new data appears.
**Why do files use a thread pool?** epoll and kqueue only work with non-blocking I/O (sockets, pipes). Regular files do not support non-blocking mode in POSIX - `read()` is always blocking, even with O_NONBLOCK.
**IOCP on Windows** supports true asynchronous file I/O! `ReadFile()` with an OVERLAPPED structure does not block. But libuv still uses a thread pool for compatibility with the POSIX API (Node.js must work the same on all OS).
**Thundering herd** in older versions of Linux (before epoll). With `select()`/`poll()`, all threads would wake up on an event on a shared socket. epoll solves this through EPOLLEXCLUSIVE (Linux 4.5+), libuv uses this flag for load balancing.
Why does Windows support asynchronous file I/O (IOCP), but Linux does not (requires a thread pool)?
Thread Pool Tuning: UV_THREADPOOL_SIZE, Profiling, Alternatives
**UV_THREADPOOL_SIZE** (default 4) controls the number of threads for blocking operations: fs, dns.lookup, crypto, zlib. Maximum 1024, but optimally 2-4x the number of CPU cores.
**When to increase?** If your application performs many parallel operations requiring a thread pool (for example, image processing with sharp, archiving, password hashing), the standard 4 threads are a bottleneck.
**CRITICAL**: UV_THREADPOOL_SIZE must be set before starting the process (environment variable). Setting it via `process.env.UV_THREADPOOL_SIZE = '16'` in the code DOES NOT WORK - libuv initializes the thread pool before executing JavaScript.
**When NOT to increase**: if the bottleneck is in the CPU (CPU-bound tasks like image processing), more threads won't help. Context switching will kill performance. Use worker_threads for CPU-bound tasks.
**Alternative: worker_threads** for CPU-intensive tasks. The libuv thread pool is intended for I/O-blocking operations (fs, dns), not for CPU-bound. worker_threads provides full control over threads and V8 isolates.
Your application processes 100 uploaded images in parallel (sharp library, CPU-bound). What is the optimal solution?
Debugging libuv: handle leaks, uv_print_all_handles(), dtrace
**Handle leaks** are the main cause of memory leaks in Node.js. Forgot to call `.close()` on a socket? The handle lives forever, holding a reference to a callback with a closure, and the closure holds tens of megabytes of data.
**process._getActiveHandles()** (unofficial API) returns an array of all active handles. Useful for diagnosing "why the process is not terminating."
**uv_print_all_handles()** (C API) outputs a list of all handles with type and address. For Node.js, a native addon is needed. Alternative: wtfnode module (npm install wtfnode).
**wtfnode** - npm module for automatic diagnostics of active handles. Shows the stacktrace where the handle was created.
**uv_print_active_handles()** (C API) outputs only active handles (ref=1). uv_print_all_handles() outputs all, including unref'd.
**DTrace/SystemTap probes** in libuv (Linux/macOS). You can trace: handle creation, uv_run calls, I/O operations. Requires compiling Node.js with the --with-dtrace flag.
**async_hooks for advanced diagnostics**: tracking the lifecycle of all async operations (fs, timers, promises).
If you increase UV_THREADPOOL_SIZE to 128, all async operations will be executed faster.
Thread pool is intended only for I/O-blocking operations (fs, dns.lookup, crypto, zlib). CPU-bound tasks (image processing, heavy computations) should be executed in worker_threads. An excessively large thread pool kills performance through context switching.
The libuv thread pool is not a universal thread pool for any tasks. It is used exclusively for blocking system calls (read, write, getaddrinfo), where the thread spends most of its time sleeping, waiting for I/O. If you load the thread pool with CPU-bound tasks (e.g., 128 parallel crypto.pbkdf2), context switching between threads will kill CPU cache locality, and performance will drop. Optimal: UV_THREADPOOL_SIZE = 2-4x CPU cores for I/O, worker_threads pool = CPU cores for CPU-bound.
Your Node.js process does not terminate after calling process.exit(). First step in diagnostics?
Summary
- **uv_loop_t** - the central structure: min-heap of timers, fd for epoll/kqueue, request queue. uv_run modes: DEFAULT (until completion), ONCE (one iteration), NOWAIT (non-blocking poll for embedding).
- **ref/unref** controls whether the handle keeps the event loop alive. uv_close() is an asynchronous operation, the callback is called in the close phase. Handle leaks are diagnosed through process._getActiveHandles() or wtfnode.
- **Cross-platform abstractions**: epoll (Linux), kqueue (macOS), IOCP (Windows). epoll/kqueue - edge-triggered, sockets only. IOCP - completion-based, all I/O. Regular files use a thread pool, as POSIX read() is always blocking.
- **UV_THREADPOOL_SIZE** for I/O-blocking operations (fs, dns, crypto). Optimal is 2-4x CPU cores. CPU-bound tasks (image processing) - in worker_threads, not in the thread pool. DTrace/async_hooks for advanced diagnostics.
Related topics
Advanced understanding of libuv opens doors to professional work with Node.js:
- Event Loop Internals — `uv_run()` implements 7 phases of the event loop. Understanding `uv_backend_timeout()` explains why `setTimeout(0)` and `setImmediate` can execute in a different order.
- Worker Threads — `worker_threads` create a separate `uv_loop_t` for each worker. Inter-thread communication is through `uv_async_t`. An alternative to the thread pool for CPU-bound tasks.
- Native Addons (N-API) — Native addons directly use the libuv API: uv_queue_work for the thread pool, uv_async_send for thread-safe callbacks. Understanding handles/requests is critical for C++ bindings.
Вопросы для размышления
- How to implement a graceful shutdown if you have 1000+ active WebSocket connections (handles)? Do you need to call `uv_close` for each one, or can you forcefully kill the process?
- Why can io_uring (the new Linux API for async I/O) replace a thread pool for files, but epoll/kqueue cannot? What distinguishes io_uring from epoll at its core?
- If you create 10 worker_threads, each with UV_THREADPOOL_SIZE=8, how many threads will there be in total in the process? Consider: the main thread, the thread pool of each worker, V8 background threads.