Mastering Node.js: Architecture, the Event Loop, and Best Practices

I've shipped a lot of Node.js services over the years, and the single most common source of confusion I see — from juniors and seniors alike — is the runtime itself. People know async/await, they know callbacks, but ask them why a for loop crunching numbers freezes their entire API and the answers get fuzzy.

Node.js isn't magic. It's a small, well-defined machine: a single-threaded JavaScript runtime sitting on top of a C library called libuv that does the heavy lifting for I/O. Once you understand how those two pieces talk to each other, a whole class of bugs and performance problems becomes obvious. This post is the mental model I wish I'd had earlier.

The single-threaded model (and why it's not really single-threaded)

People say "Node is single-threaded." That's true and misleading at the same time.

Your JavaScript runs on one thread. There is exactly one call stack executing your code, and only one piece of your code runs at any given instant. There's no shared-memory data race in your business logic, which is genuinely freeing.

But Node itself is not single-threaded. Under the hood, libuv maintains a thread pool (4 threads by default) for operations that can't be done asynchronously at the OS level, and the operating system handles network I/O via efficient event notification mechanisms (epoll on Linux, kqueue on macOS, IOCP on Windows). So the right way to say it:

Node runs your JavaScript on a single thread, and offloads I/O and a few CPU-bound built-ins to libuv, which uses the OS and a thread pool to do the actual work.

This is why Node is fantastic for I/O-heavy workloads (APIs, proxies, real-time apps) and a poor fit, out of the box, for CPU-heavy ones.

libuv: the engine room

V8 executes your JavaScript. libuv is the C library that gives Node its asynchronous, event-driven nature. It provides:

The event loop itself.
Asynchronous TCP/UDP sockets and DNS resolution.
Asynchronous file system operations (via the thread pool).
The thread pool for fs, crypto (some operations), zlib, and DNS lookups using getaddrinfo.

When you call fs.readFile, Node doesn't block. It hands the request to libuv, which dispatches it to a thread pool worker. When the read completes, libuv queues your callback to run back on the main thread during the appropriate event loop phase.

The event loop phases

The event loop is not one big queue. It's a set of phases that run in a fixed order, and each phase has its own callback queue. Understanding the order is what lets you reason about when a given callback fires.

The phases, in order:

Timers — executes callbacks scheduled by setTimeout and setInterval.
Pending callbacks — executes certain system-level callbacks deferred from the previous iteration (e.g. some TCP errors).
Idle, prepare — internal use only.
Poll — retrieves new I/O events and executes their callbacks. This is where Node spends most of its time waiting. If there are no timers due, it can block here waiting for I/O.
Check — executes setImmediate callbacks.
Close callbacks — executes close events like socket.on('close', ...).

A simple way to see the ordering quirk between setTimeout and setImmediate:

const fs = require('fs');

fs.readFile(__filename, () => {
  setTimeout(() => console.log('timeout'), 0);
  setImmediate(() => console.log('immediate'));
});

Inside an I/O callback, setImmediate always fires before setTimeout(fn, 0), because after the poll phase the loop goes straight to the check phase. Outside of I/O, the order between those two is non-deterministic — it depends on process timing. This trips people up constantly.

Microtasks vs macrotasks

The phases above process macrotasks (timers, I/O callbacks, setImmediate). But there's another layer that runs between every macrotask: the microtask queue.

Microtasks come from two sources:

process.nextTick() — Node-specific, has its own queue with the highest priority.
Promise callbacks (.then, .catch, .finally, and the continuation after await).

After each macrotask completes — and after each phase transition — Node drains the entire microtask queue before moving on. process.nextTick callbacks run before promise callbacks.

console.log('1: sync start');

setTimeout(() => console.log('2: setTimeout'), 0);

Promise.resolve().then(() => console.log('3: promise'));

process.nextTick(() => console.log('4: nextTick'));

console.log('5: sync end');

// Output:
// 1: sync start
// 5: sync end
// 4: nextTick
// 3: promise
// 2: setTimeout

The synchronous code runs first. Then, before the event loop even gets to the timers phase, it drains microtasks: nextTick first, then the promise. Only then does setTimeout fire.

A practical warning: recursively scheduling process.nextTick can starve the event loop entirely — the loop never advances to the I/O phases because the microtask queue never empties. If you need to defer work without starving I/O, reach for setImmediate instead.

The thread pool and non-blocking I/O

There's an important distinction between two kinds of "async" in Node:

Network I/O (sockets, HTTP) is handled by the OS event mechanism — it does not use the thread pool. It scales to tens of thousands of connections cheaply.
File system and some crypto/compression operations do use the libuv thread pool, which defaults to 4 threads.

This matters under load. If you fire off many concurrent crypto.pbkdf2 or fs operations, you can saturate those 4 threads and create a queue, even though your CPU has more cores available. You can bump the pool size:

// Must be set before any async work that uses the pool
process.env.UV_THREADPOOL_SIZE = 8;

Set it based on your workload and core count — not blindly. More threads than cores for CPU-bound pool work just adds contention.

Streams and backpressure

Streams are one of Node's best features and one of the most underused. Instead of loading an entire file or response into memory, you process it in chunks. This keeps memory flat regardless of payload size.

The naive way to copy a large file:

// Bad: loads the whole file into memory
const data = await fs.promises.readFile('huge.log');
await fs.promises.writeFile('copy.log', data);

The streaming way, with backpressure handled for you:

const { pipeline } = require('node:stream/promises');
const fs = require('node:fs');

await pipeline(
  fs.createReadStream('huge.log'),
  fs.createWriteStream('copy.log'),
);

Backpressure is the mechanism that prevents a fast producer from overwhelming a slow consumer. If you write to a stream faster than it can flush, the internal buffer grows unbounded and your memory balloons. pipeline (and the older .pipe()) respect the return value of .write() — when it returns false, the source pauses until a drain event. Always use pipeline: it handles errors and cleanup that manual .pipe() chains silently leak.

Worker threads and clustering

When you genuinely have CPU-bound work — image processing, parsing huge payloads, cryptographic hashing in a tight loop — the answer is not to optimize the event loop. The answer is to get the work off the main thread.

Two tools:

Worker threads run JavaScript in a separate thread with its own V8 instance and event loop. Use them for CPU-bound tasks.

const { Worker } = require('node:worker_threads');

function runHeavyTask(input) {
  return new Promise((resolve, reject) => {
    const worker = new Worker('./heavy-task.js', { workerData: input });
    worker.on('message', resolve);
    worker.on('error', reject);
    worker.on('exit', (code) => {
      if (code !== 0) reject(new Error(`Worker stopped with code ${code}`));
    });
  });
}

Clustering (or running multiple processes behind a load balancer / process manager) lets you use all CPU cores by spawning multiple Node processes, each with its own event loop. This is how you scale an HTTP server horizontally on a single machine.

const cluster = require('node:cluster');
const http = require('node:http');
const { availableParallelism } = require('node:os');

if (cluster.isPrimary) {
  for (let i = 0; i < availableParallelism(); i++) {
    cluster.fork();
  }
} else {
  http.createServer((req, res) => res.end('handled')).listen(3000);
}

In modern deployments I usually let the orchestrator (Kubernetes, a process manager) handle multi-process scaling rather than cluster directly, but the principle is identical: one event loop per core.

Common pitfalls: blocking the loop

The cardinal sin in Node is blocking the event loop. Because all your JS runs on one thread, any synchronous work that takes a long time freezes everything — every pending request, every timer, every I/O callback.

Things that block:

Long synchronous loops over large datasets.
JSON.parse / JSON.stringify on very large objects.
Synchronous fs calls (readFileSync) in a request handler.
Heavy regular expressions (catastrophic backtracking — "ReDoS").

// This freezes the entire process for the duration of the loop
app.get('/report', (req, res) => {
  let total = 0;
  for (let i = 0; i < 5_000_000_000; i++) total += i; // blocks!
  res.json({ total });
});

The fix is one of: chunk the work and yield with setImmediate, move it to a worker thread, or precompute it. Rule of thumb: if a synchronous operation might take more than a few milliseconds, it doesn't belong on the main thread of a request handler.

Error handling

Async error handling in Node has sharp edges. A few rules I treat as non-negotiable:

Always await inside try/catch for promise-based code. An unhandled promise rejection will crash the process on modern Node.
Listen for error events on streams and emitters — an unhandled error event throws.
Don't swallow errors. Log them with context and let them propagate to a central handler.

process.on('unhandledRejection', (reason) => {
  console.error('Unhandled rejection:', reason);
  // Log, alert, and shut down gracefully — do not pretend it didn't happen
  process.exit(1);
});

process.on('uncaughtException', (err) => {
  console.error('Uncaught exception:', err);
  process.exit(1);
});

After an uncaughtException, the process is in an undefined state — log it and exit. Let your orchestrator restart a clean instance. Trying to "recover" and keep serving is how you get corrupted state.

Observability

You can't fix what you can't see. For Node specifically, the metric I watch first is event loop lag — how long the loop is delayed beyond when it should have run. Rising lag is the canary for blocking work.

const { monitorEventLoopDelay } = require('node:perf_hooks');

const h = monitorEventLoopDelay({ resolution: 20 });
h.enable();

setInterval(() => {
  // p99 lag in milliseconds
  console.log('event loop p99 lag (ms):', h.percentile(99) / 1e6);
  h.reset();
}, 5000);

Beyond that: track active handles/requests, heap usage, and GC pauses. Export these to your metrics backend (Prometheus, OpenTelemetry) and alert on event loop lag crossing a threshold. In production I'd also wire up the diagnostics_channel and structured logging so I can correlate a latency spike with the request that caused it.

Final thoughts

Everything about Node.js performance flows from one fact: your JavaScript runs on a single thread, and that thread must stay free to keep dispatching I/O. The event loop, microtasks, the thread pool, streams, worker threads — they're all mechanisms for keeping that one thread unblocked.

Internalize the phase order, respect backpressure, never block the loop, and get CPU work onto workers. Do those four things and Node will handle staggering amounts of concurrency on modest hardware. Skip them and you'll be debugging mysterious latency spikes at 2am — I've been there, and the cause was always one of the items above.

More in Backend Development