Preface
First of all, don’t get me wrong, JavaScript is pretty fast on its own, with appropriate strategy and algorithm, JS program can even outperform Rust in some cases. How did I get this impression? That relies during the process I tried to write the program described in this article.
I was trying to implement a search function that uses Levenshtein distance to search posts among massive candidates. The process is simple, providing a new post with a title and content, search existing records that is about 12k entries to find similar ones, in order to check if a similar post already exists. This is a typical CPU-intensive task.
I know Rust is fast, and is perfect for this job. But how fast? And how many times faster than the JS version?
You may be surprised, as I did, that most Rust implementations of Levenshtein distance function are slower than the one in JS from @std/text, at least for large strings comparison. For example, the strsim crate, which has 61m recent downloads, is 4 times slower than the JS version. The JS version took about 4.5 seconds to find the match, while the Rust (strsim) version took 17 seconds, that’s absurd.
I used the keyword levenshtein
to search packages on crates.io, most packages were even worse than strsim, ranging from 4 – 9 times worse than the JS version. So now you get the idea. JS is pretty fast on its own, with proper algorithm.
Finally, I was able to find a crate that matches the JS implementation, called rapidfuzz, which only has 41k recent downloads, so sad… However, the Python version of RapidFuzz is very popular according to Google AI, so the Rust version should be as good as well. And it is.
Initialize the project
As of today, integrate Rust code into Node.js/Deno projects is pretty easy with napi-rs, way easier than N-API with C++. Napi-rs literally packs everything for us, from compiling Rust code to generating JavaScript and TypeScript files, all we have to do, is writing some Rust functions, and the rest is taking care of.
To initialize a napi-rs package, we will need to first install its CLI tool @napi-rs/cli, along with yarn, which is used internally by the CLI tool to install JS dependencies. However, we could remove yarn afterwards if we don’t want to use it. And that’s what I did, I use Deno instead.
npm -i g @napi-rs/cli yarn # install CLI tools
napi new # initialize a new Node.js addon package
BashThe CLI program will prompt us to name our new package, and choose which targets/platforms we need to support (since we are writing a native package). The default targets are x86_64-apple-darwin, x86_64-pc-windows-msvc, x86_64-unknown-linux-gnu. Since I’m using an M1 chip MacBook, I also selected aarch64-apple-darwin.
The CLI tool then generates all necessary files and folders for us, along with some JS dependencies like ava
for testing. Again, we can remove them if we don’t want to use them. As I did, I use Deno instead.
Create the Rust search function
The major work will be located in the src/lib.rs
file, as this program isn’t complicated, I will place all the code here. The CLI tool already generated an example sum
function in this file, as it was not used, I deleted it, and only kept three lines of the generated code.
// src/lib.rs
#![deny(clippy::all)]
#[macro_use]
extern crate napi_derive;
RustNext, I defined some data structures for the search function.
#[napi(object)]
#[derive(Debug, Clone)]
pub struct PostData {
pub title: String,
pub content: String,
}
#[napi(object)]
#[derive(Debug, Clone)]
pub struct Match {
pub target: PostData,
pub score: f64,
}
#[napi(object)]
#[derive(Debug, Clone)]
pub struct FindTopNResult {
pub matches: Vec<Match>,
pub process_time: i64, // how many time is used for processing
}
RustThen a function to get the weights of each field in the PostData
:
use napi::{Error, Result};
// ...
fn get_weights(source: &PostData) -> Result<(f64, f64)> {
let title_chars = source.title.chars().count();
let content_chars = source.content.chars().count();
let total_chars = title_chars + content_chars;
if total_chars == 0 {
Err(Error::from_reason("source is invalid"))
} else {
Ok((
title_chars as f64 / total_chars as f64,
content_chars as f64 / total_chars as f64,
))
}
}
RustFinally, the search function:
use napi::{Error, Result};
use rapidfuzz::distance::levenshtein::normalized_similarity;
// ...
#[napi]
pub fn find_similar_posts_native(
source: PostData,
candidates: Vec<PostData>,
top_n: u32,
) -> Result<FindTopNResult> {
let start = Instant::now();
let (title_weight, content_weight) = get_weights(&source)?;
let mut matches = vec![];
for candidate in candidates.iter() {
let title_score =
normalized_similarity(source.title.chars(), candidate.title.chars()) * title_weight;
let content_score =
normalized_similarity(source.content.chars(), candidate.content.chars())
* content_weight;
let score = title_score + content_score;
// 0.5 is the threshold to consider a match
if score > 0.5 {
matches.push(Match {
target: candidate.clone(),
score,
});
}
}
matches.sort_by(|a, b| {
let order = b.score.partial_cmp(&a.score);
order.unwrap_or(std::cmp::Ordering::Equal)
});
let top_n_matches = if matches.len() <= top_n as usize {
matches
} else {
matches.into_iter().take(top_n as usize).collect()
};
let duration = start.elapsed();
Ok(FindTopNResult {
matches: top_n_matches,
process_time: duration.as_millis() as i64,
})
}
RustThen run deno task build
to build the Rust code to a Node.js addon, which generates an index.js
and an index.d.ts
file, along with a platform-specific .node
file. Napi-rs still emits CommonJS files, so I need to modify the package.json
file and add `”type”: “commonjs”. So Deno won’t treat JS files as ES modules.
Create the JS counterpart
The Rust function has a suffix _native
because I’ve also created a TS version of it, in order to compare how much faster the Rust version is compared to the TS version.
// ts/mod.ts
import { levenshteinDistance } from "@std/text"
import { chars } from "@ayonli/jsext/string"
// @deno-types="../index.d.ts"
import { Match, PostData, FindTopNResult } from "../index.js"
// @std/text doesn't provide a normalized function, we implement our own
function normalizedSimilarity(a: string, b: string): number {
if (a === b) {
return 1
}
const aChars = chars(a).length
const bChars = chars(b).length
return 1 - levenshteinDistance(a, b) / Math.max(aChars, bChars)
}
function getWeights(source: PostData): [number, number] {
const titleChars = chars(source.title).length
const contentChars = chars(source.content).length
const totalChars = titleChars + contentChars
if (totalChars === 0) {
throw new Error("Source is invalid")
}
return [
titleChars > 0 ? titleChars / totalChars : 0,
contentChars > 0 ? contentChars / totalChars : 0,
]
}
export function findSimilarPosts(source: PostData, candidates: PostData[], topN: number): FindTopNResult {
const start = Date.now()
const [titleWeight, contentWeight] = getWeights(source)
const matches: Match[] = []
for (const candidate of candidates) {
const titleScore = normalizedSimilarity(source.title, candidate.title) * titleWeight
const contentScore = normalizedSimilarity(source.content, candidate.content) * contentWeight
const score = titleScore + contentScore
// 0.5 is the threshold to consider a match
if (score > 0.5) {
matches.push({
target: candidate,
score,
})
}
}
matches.sort((a, b) => b.score - a.score)
return {
matches: matches.length <= topN ? matches : matches.slice(0, topN),
processTime: Date.now() - start,
}
}
TypeScriptExecute and compare
Now that I have two versions of the search function, it’s time to execute them and compare the performance.
// main.mts :Have to use .mts since this is a CommonJS package
import { deepStrictEqual, strictEqual } from "node:assert"
import { parse } from "@std/csv"
import { findSimilarPosts } from "./ts/mod.ts"
// @ts-types="./index.d.ts"
import {
findSimilarPostsNative,
Match,
PostData,
} from "./index.js"
const filename = Deno.cwd() + "/assets/blogs.csv" // the archive of blog posts
const contents = await Deno.readTextFile(filename)
const records = parse(contents, { skipFirstRow: true }) as { title: string, text: string }[]
const posts: PostData[] = records.map(r => ({ title: r.title, content: r.text }))
console.log("Existing posts:", posts.length)
const newPost: PostData = {
title: "The economic theories of MLA",
// The archived version has some non-unicode characters, this one doesn't, but don't worry,
// the search function will be able to match it.
content: `
A resolution calling for full-and-part-time faculty members to be eligible for tenure and expressing
the view that all higher education employees should have appropriate forms of job security, due
process, a living wage and access to health care benefits passed in a 81-15 vote, but not without
concerns from delegates that the wording went too far or not far enough.
Ian Barnard, an associate professor of English at California State University-Northridge, said he
wanted to see the resolution extended to include a call for all faculty to be eligible not only for
tenure but also for full-time employment. Simply voicing support for a lecturer to continue to be
guaranteed one course per semester was, he said, really weak a way for us to cop out, for
departments to avoid paying for health benefits and for adjunct faculty to continue bouncing around
among many jobs just to make ends meet.
The full story is here. Why don't journalists demand something similar? You can pinch yourself, but
it really is 2010. By the way, here are some facts:
In 1960, 75 percent of college instructors were full-time tenured or tenure-track professors; today
only 27 percent are.
`
}
interface FunctionResult {
fn_name: string;
call_time_ms: number;
process_time_ms: number;
data_passing_ms: number;
}
const results: FunctionResult[] = []
let match: Match | undefined
{
const start = Date.now()
const { matches, processTime } = findSimilarPosts(newPost, posts, 3)
const callTime = Date.now() - start
results.push({
fn_name: "findSimilarPosts",
call_time_ms: callTime,
process_time_ms: processTime,
data_passing_ms: callTime - processTime,
})
strictEqual(matches.length, 1)
strictEqual(matches[0].target.title, "The economic theories of the MLA")
strictEqual(matches[0].score > 0.95, true)
match = matches[0]
}
{
const start = Date.now()
const { matches, processTime } = findSimilarPostsNative(newPost, posts, 3)
const callTime = Date.now() - start
results.push({
fn_name: "findSimilarPostsNative",
call_time_ms: callTime,
process_time_ms: processTime,
data_passing_ms: callTime - processTime,
})
strictEqual(matches.length, 1)
deepStrictEqual(matches[0], match)
}
console.table(results)
TypeScriptAfter running the program (with deno run -A main.mts
), I got these results:
fn_name | call_time_ms | process_time_ms | data_passing_ms |
findSimilarPosts | 4584 | 4584 | 0 |
findSimilarPostsNative | 1196 | 1149 | 47 |
Pretty good, the Rust version is about 4 times faster than the JS version, thanks to its native performance, although there is some overhead of passing data to Rust side from JS.
Parallel processing in Rust
But this is a CPU-intensive task, it would be pointless to use Rust if we don’t use parallel processing ability. So I’ve also created another search function in Rust, utilizing parallel processing with multiple threads. Parallel processing in Rust is very easy, all I have to do is using the rayon crate and call the par_iter
function.
use rayon::iter::{IntoParallelRefIterator, ParallelIterator};
// ...
#[napi]
fn find_similar_posts_native_parallel(
source: PostData,
candidates: Vec<PostData>,
top_n: u32,
) -> Result<FindTopNResult> {
let start = Instant::now();
let (title_weight, content_weight) = get_weights(&source)?;
let matches: Vec<Match> = candidates
.par_iter() // Parallel is as simple as just calling a function
.map(|candidate| {
let title_score =
normalized_similarity(source.title.chars(), candidate.title.chars()) * title_weight;
let content_score =
normalized_similarity(source.content.chars(), candidate.content.chars())
* content_weight;
let score = title_score + content_score;
if score > 0.5 {
Some(Match {
target: candidate.clone(),
score,
})
} else {
None
}
})
.filter(|e| e.is_some())
.map(|e| e.unwrap())
.collect();
let mut matches = matches;
matches.sort_by(|a, b| {
let order = b.score.partial_cmp(&a.score);
order.unwrap_or(std::cmp::Ordering::Equal)
});
let top_n_matches = if matches.len() <= top_n as usize {
matches
} else {
matches.into_iter().take(top_n as usize).collect()
};
let duration = start.elapsed();
Ok(FindTopNResult {
matches: top_n_matches,
process_time: duration.as_millis() as i64,
})
}
RustParallel processing in JS
Some people may be familar with Web Workers (or worker_threads), which give JS the ability to run another JS program in another thread, we can use them to process multiple tasks in parallel as well.
In order to compare with the Rust version, I’ve also created a TS version of the parallel search function:
// ts/parallel.ts
import { chunk } from "@ayonli/jsext/array"
import { avg } from "@ayonli/jsext/math"
import parallel from "@ayonli/jsext/parallel"
// @deno-types="../index.d.ts"
import { FindTopNResult, PostData } from "../index.js"
const { findSimilarPosts } = parallel(() => import("./mod.ts"))
export async function findSimilarPostsParallel(
source: PostData,
candidates: PostData[],
topN: number
): Promise<FindTopNResult> {
const chunks = chunk(candidates, Math.ceil(candidates.length / navigator.hardwareConcurrency))
const processTimes: number[] = []
let results = (await Promise.all(chunks.map(chunk => findSimilarPosts(source, chunk, topN))))
.flatMap(e => {
processTimes.push(e.processTime)
return e.matches
})
.toSorted((a, b) => b.score - a.score)
if (results.length > topN) {
results = results.slice(0, topN)
}
return {
matches: results,
processTime: Math.round(avg(...processTimes)),
}
}
TypeScriptThis function calls the original findSimilarPosts
under the hood, which may introduce a little duplicated work, but it’s so tiny that we can ignore it.
After rebuild the Rust code, in main.mts
, I added another tow tests for the new functions:
import { findSimilarPostsParallel } from "./ts/parallel.ts"
// @ts-types="./index.d.ts"
import {
findSimilarPostsNativeParallel,
// ...
} from "./index.js"
// ...
{
const start = Date.now()
const { matches, processTime } = await findSimilarPostsParallel(newPost, posts, 3)
const callTime = Date.now() - start
results.push({
fn_name: "findSimilarPostsParallel",
call_time_ms: callTime,
process_time_ms: processTime,
data_passing_ms: callTime - processTime,
})
strictEqual(matches.length, 1)
deepStrictEqual(matches[0], match)
}
{
const start = Date.now()
const { matches, processTime } = findSimilarPostsNativeParallel(newPost, posts, 3)
const callTime = Date.now() - start
results.push({
fn_name: "findSimilarPostsNativeParallel",
call_time_ms: callTime,
process_time_ms: processTime,
data_passing_ms: callTime - processTime,
})
strictEqual(matches.length, 1)
deepStrictEqual(matches[0], match)
}
console.table(results)
TypeScriptAfter rerun the program, new results is printed:
fn_name | call_time_ms | process_time_ms | data_passing_ms |
findSimilarPosts | 4669 | 4669 | 0 |
findSimilarPostsNative | 1188 | 1141 | 47 |
findSimilarPostsParallel | 1214 | 1009 | 205 |
findSimilarPostsNativeParallel | 277 | 229 | 48 |
This is exciting, the search operation time is reduced from about 4.6 seconds to just 277 milliseconds, we’re able to boost the program about 17 times faster with native parallel in Rust.
And with Web Workers, we’re also able to improve the processing performance about 4 times faster than the JS single-threaded version, reach roughly the same speed as the Rust single-threaded versoin.
However, it’s important to point out that Web Workers (or worker_threads) are not like regular native threads, they are isolated v8 instances, each one acts just like another Node.js instance and consume a lot of memory, and data passing between the main thread and worker threads are very slow due to the need to deep clone and serialize/deserialize data.
AsyncTask
Although the Rust parallel search function is pretty fast, it has a flaw, it blocks the main thread. Even though we’re using multithreading here, while waiting for the threads to complete, the main thread will be stuck and no any other code can be run. This is not idiomatic JS, we need to fix it.
Luckily, Napi-rs provides an AsyncTask
API, which allows us to run blocking code in the libuv thread pool, and frees us from blocking the main thread.
use napi::{bindgen_prelude::AsyncTask, Env, Error, Result, Task};
// ...
pub struct AsyncFindSimilarPosts {
source: PostData,
candidates: Vec<PostData>,
top_n: u32,
}
impl Task for AsyncFindSimilarPosts {
type Output = FindTopNResult;
type JsValue = FindTopNResult;
fn compute(&mut self) -> Result<Self::Output> {
find_similar_posts_native_parallel(self.source.clone(), self.candidates.clone(), self.top_n)
}
fn resolve(&mut self, _env: Env, output: Self::Output) -> Result<Self::JsValue> {
Ok(output)
}
}
#[napi(ts_return_type = "Promise<FindTopNResult>")]
pub fn find_similar_posts_native_async(
source: PostData,
candidates: Vec<PostData>,
top_n: u32,
) -> AsyncTask<AsyncFindSimilarPosts> {
AsyncTask::new(AsyncFindSimilarPosts {
source,
candidates,
top_n,
})
}
RustAnd then update main.mts
, add one more test for the new function:
// @ts-types="./index.d.ts"
import {
findSimilarPostsNativeAsync,
// ...
} from "./index.js"
// ...
{
const start = Date.now()
const { matches, processTime } = await findSimilarPostsNativeAsync(newPost, posts, 3)
const callTime = Date.now() - start
results.push({
fn_name: "findSimilarPostsNativeAsync",
call_time_ms: callTime,
process_time_ms: processTime,
data_passing_ms: callTime - processTime,
})
strictEqual(matches.length, 1)
deepStrictEqual(matches[0], match)
}
console.table(results)
TypeScriptRerun the program, and new results is printed:
fn_name | call_time_ms | process_time_ms | data_passing_ms |
findSimilarPosts | 4597 | 4597 | 0 |
findSimilarPostsNative | 1191 | 1144 | 47 |
findSimilarPostsParallel | 1171 | 1008 | 163 |
findSimilarPostsNativeParallel | 271 | 223 | 48 |
findSimilarPostsNativeAsync | 276 | 223 | 53 |
This is very good, now we have a search function that is native, parallel, asynchronous and 17 times faster than the original JS version.
Conclusion
Node.js is still a great choice (Deno is greater) for IO-intensive tasks thanks to its asynchronous design at heart, and the JS/TS language is very flexible, especially in terms of prototyping and fast-pacing development. However, when it comes to CPU-intensive tasks, the options of JavaScript is very limited.
Although we can use Web Workers (or worker_threads) for parallel processing, they are insufficient since workers cannot share memory, passing messages is very inefficient between threads, almost making it impossible to use Web Workers for serious CPU-intensive tasks.
Rust is a modern, safe, and expressive programming language with an intelligent type system, and we can find many familiar designs in Rust from the JS/TS perspective, like property shorthands, object destructuring and code blocks, JS/TS developers will definitely appreciate the similarities between the two languages.
That said, Rust is still a very hard language to master, makes it impractical to rewrite every thing in Rust or even to start a new project fully in Rust. However, with napi-rs, we can integrate Rust into our Node.js and Deno workflows, making it a perfect balance between the flexibility of JavaScript and the performance of Rust.
The full example of the program can be found here.