很多人知道 Node.js 内置的 cluster 模块并且受益于它，仅需要简单的数行代码，我们就可以极大地利用所有的 CPU 算力来运行服务器程序。然而，这其中有一个缺陷，它会消耗大量的系统内存，因为每一个 workwe 进程都启动了自己的 node 运行时。

同时，很多人也都听说过 Node.js 的 worker_threads 模块，但是很少人却能得益于它，因为不知道如何使用它。如所建议的，worker 线程的意义是用来运行 CPU 密集型任务，或者用来将多个任务分发给多个线程并行处理。但由于 Node.js 以及社区包的异步天性，并没有留给我们很多 CPU 密集型的任务。

但如果我说，还有另外一种使用 worker_threads 的方式，它能够实现 cluster 模块所提供的能力。是不是很有趣？是的，我们可以这么做。

Cluster 版本

// cluster.mjs
import * as http from "node:http";
import { availableParallelism } from "node:os";
import cluster from "node:cluster";

if (cluster.isPrimary) {
    const numCPUs = availableParallelism();

    for (let i = 0; i < numCPUs; i++) {
        cluster.fork();
    }
} else {
    http.createServer((req, res) => {
        res.writeHead(200);
        res.end(`Hello World! (processId: ${process.pid})\n`);
    }).listen(8000, "localhost", () => {
        console.log(`Listening on http://localhost:${8000}/ (processId: ${process.pid})`);
    });
}

JavaScript

现在我们运行 node cluster.mjs 我们可以在终端中看到如下输出：

Listening on http://localhost:8000/ (processId: 66660)
Listening on http://localhost:8000/ (processId: 66661)
Listening on http://localhost:8000/ (processId: 66655)
Listening on http://localhost:8000/ (processId: 66656)
Listening on http://localhost:8000/ (processId: 66658)
Listening on http://localhost:8000/ (processId: 66657)
Listening on http://localhost:8000/ (processId: 66659)
Listening on http://localhost:8000/ (processId: 66662)

Bash

同时，如果我们运行 curl http://localhost:8000 多次，我们将会看到如下输出：

Hello World! (processId: 66895)
Hello World! (processId: 66897)
Hello World! (processId: 66896)
Hello World! (processId: 66893)

Bash

这显示了负载已经被分发到了不同的进程中，正如预期的那样。现在，让我们看看真正的猛兽。

worker_threads 版本

// threads.mjs
import * as http from "node:http";
import { availableParallelism } from "node:os";
import { isMainThread, Worker, workerData, threadId } from "node:worker_threads";
import { fileURLToPath } from "node:url";

/** @type {http.RequestListener} */
const listener = (req, res) => {
    res.writeHead(200);
    res.end(`Hello World! (threadId: ${threadId})\n`);
};

if (isMainThread) {
    const server = http.createServer(listener);
    server.listen(8000, () => {
        console.log(`Listening on http://localhost:${8000}/ (threadId: ${threadId})`);

        const maxWorkers = availableParallelism() - 1;

        for (let i = 0; i < maxWorkers; i++) {
            new Worker(fileURLToPath(import.meta.url), {
                workerData: { handle: { fd: server._handle.fd } }
            });
        }
    });
} else {
    http.createServer(listener).listen(workerData.handle, () => {
        console.log(`Listening on http://localhost:${8000}/ (threadId: ${threadId})`);
    });
}

JavaScript

现在运行 node threads.mjs ，我们将会在终端中看到如下输出：

Listening on http://localhost:8000/ (threadId: 0)
Listening on http://localhost:8000/ (threadId: 4)
Listening on http://localhost:8000/ (threadId: 6)
Listening on http://localhost:8000/ (threadId: 2)
Listening on http://localhost:8000/ (threadId: 7)
Listening on http://localhost:8000/ (threadId: 3)
Listening on http://localhost:8000/ (threadId: 5)
Listening on http://localhost:8000/ (threadId: 1)

Bash

如果我们运行 curl http://localhost:8000 多次，我们将会看到如下输出：

Hello World! (threadId: 7)
Hello World! (threadId: 4)
Hello World! (threadId: 4)
Hello World! (threadId: 6)

Bash

这显示了负载已经被分发到不同的线程中，正如预期的那样。

比较

监听端口

使用 cluster 时，我们在每一个 worker 进程中监听同一个端口，和编写单线程服务器一样。然而这是一个假象，我们并不能在多个进程里监听相同的端口。Node.js 在内部秘密地于主进程中监听了这个端口，并将负载使用轮询算法分发到各个子进程中。

使用 worker_threads 时，我们需要在主线程中显式监听端口，然而，与其在每一个 worker 线程中也都监听相同的端口，我们将服务器的文件描述符（server._handle.fd）传递到 worker 线程中，并且绑定到同一个文件描述符。

接受请求

使用 cluster 时，Node.js 使用轮询算法来均衡负载，我们可以从上面的例子中看出，每一个请求都被不同的子进程处理。

而使用 cluster_threads 时，负载并没有被很好的分配，某些请求被与处理前一个请求的线程所处理。然而，这并不重要，当请求频繁时，所有的线程都会最终被使用上。同时需要注意的是，和 cluster 模式不同，使用 worker_threads 时，主线程也需要处理请求，因此我们只需要创建 `cpus – 1` 个 worker 线程。

内存使用

使用 cluster 时，每一个 worker 拥有独立的进程，也意味着它启动了一个独立的 Node.js 运行时，这将需要消耗额外的内存，并且随着进程数增加而以倍数增加。

而使用 worker_threads 时，只有一个 Node.js 进程，并不需要消耗额外的内存。

Benchmark

我已测试过，两个版本表现几乎相同，每秒都能够同时处理差不多数量的请求。我是用下面的命令在我的笔电中运行 benchmark：

autocannon -c 1000 -d 10 http://localhost:8000

Bash

结果是相似的：

使用 cluster 时，平均的 req/sec 为 69819，而平均的延迟为 14 毫秒。

使用 worker_threads 时，平均的 req/sec 为 69323，而平均的延迟也是 14 毫秒。

结论

是的，我们可以使用 worker 线程来在 Node.js 中实现 cluster 的行为，它能够省下很多系统资源，这同时意味着我们可以减少我们的服务器预算，却依旧能够保证和传统 cluster 模式相同的表现。

Node.js 集群，但使用 worker 线程

Cluster 版本

worker_threads 版本

比较

监听端口

接受请求

内存使用

Benchmark

结论

Leave a comment取消回复