Tinkered with WAMR

Preface#

Recently, I plan to write a programming language (a flag set at the end of the year), and I have selected a backend for compilation. I originally planned to compile to a C backend and then use clang to compile it into binary code. The advantage is that the C toolchain is already very mature, and it can be easily compiled to various targets based on LLVM, ensuring execution speed (off-screen comment: why don't you compile directly to LLVM IR = =). Later, I thought about it again (~~rebellious 🐻~~) and looked at the cranelift backend implemented in Rust. It seems to be a good choice because cranelift focuses on JIT compilation (it can also directly generate binary files through the object module), so it is much faster in code generation speed compared to LLVM (of course, it does not have as many optimization passes), while still maintaining very good performance. Because cranelift does not provide bindings for other languages, I learned Rust for a while, and then gave up (I failed to learn Rust for the nth time, I'm too bad 😭). Finally, I had a flash of inspiration and thought of using webassembly as the backend for compilation. Therefore, I started to research wasm runtime and finally chose WAMR.

Why Choose WAMR#

Extremely fast execution speed (very close to native performance)
Comes with an interpreter mode (which can meet the requirement of fast startup in dev mode)
Supports wasi libc
Very small binary size

Start tinkering#

At first, I planned to try WAMR first, so I didn't consider embedding it for now. I directly used the two CLI tools provided by the official website for testing. Because the official website only provides x86_64 prebuilt binaries for macOS, and I happen to use an ARM chip, I need to compile it manually. First, download the source code of WAMR to the local machine. We need two CLI tools: iwasm and wamrc. Let's start by compiling iwasm.

Compile iwasm#

First, we find the path of iwasm in the product-mini/platforms/darwin folder. We can see the CMakeLists.txt file. Those who are interested can open the file and see the options that can be set during compilation. Based on my use case, I created a make.sh file in the product-mini/platforms/darwin folder for compilation. Now let's take a look at the content.

#!/bin/sh

mkdir build && cd build
# Pass compilation options
cmake .. -DWAMR_BUILD_TARGET=AARCH64 -DWAMR_BUILD_JIT=0
# Compile
make
cd ..

In this shell script, the key part is the cmake section. We passed in two compilation options. Let's interpret the meaning of these compilation options.
WAMR_BUILD_TARGET=AARCH64 Compiles to ARM 64 instruction set.
WAMR_BUILD_JIT=0 Does not compile JIT functionality (Actually, I hoped to use JIT functionality in dev mode so that the speed in dev mode would not be too far behind the final build mode. Currently, WAMR has two JIT modes: Fast JIT and LLVM JIT. The LLVM JIT is too large in size, so I didn't plan to compile this feature from the beginning. After all, it is only used in dev mode and there is no need. The Fast JIT, on the other hand, is relatively lightweight and only adds a very small binary size. According to the official statement, its performance can reach 50% of LLVM. This is enough for dev mode. Unfortunately, it didn't compile successfully on my computer. I will try again later.)
After execution, you can see the iwasm executable file in the build folder. In the AARCH64 architecture, the size of the pure interpreter-executed binary file is only 426 KB, which is very lightweight. Next, let's generate a webassembly file and try it out. Here, I chose to compile it to the wasm32-wasi target using Rust. First, let's add the wasm32-wasi target using rustup.

rustup target add wasm32-wasi

Then, let's create a new Rust project using cargo.

cargo new --bin hello_wasm

Next, let's write a program to calculate the Fibonacci sequence.

use std::io;


fn fib_recursive(n: usize) -> usize {
    match n {
        0 | 1 => 1,
        _ => fib_recursive(n - 2) + fib_recursive(n - 1),
    }
}

fn main() {
    println!("Please enter a number to calculate the Fibonacci sequence:");

    let mut input = String::new();
    io::stdin().read_line(&mut input).expect("Failed to read input");

    let n: usize = input.trim().parse().expect("Please enter a valid number");

    // Calculate the Fibonacci sequence and measure the time
    let start_time = std::time::Instant::now();
    let result = fib_recursive(n);
    let elapsed_time = start_time.elapsed();

    println!("The value of the {}th item in the Fibonacci sequence is: {}", n, result);
    println!("Calculation time: {:?}", elapsed_time);
}

Compile to wasi
cargo build --target wasm32-wasi --release
After compilation, you can find the compiled hello_wasm.wasm file in the target/wasm32-wasi/release directory. Let's use the iwasm we just compiled to execute this wasm file.
iwasm --interp hello_wasm.wasm
You can see that the program executed successfully. On my Mac mini (M1 chip), executing fib(40) took about 3.7 seconds, while the native Rust program took 337 milliseconds. It can be seen that the efficiency of interpreter-executed webassembly is about 1/10 of the native program (which is already a good result because WAMR has a fast interpreter implementation internally, which converts the stack-based virtual machine instructions of webassembly into internal IR and then executes them).

Compile wamrc#

Next, we will focus on optimizing performance. Let's compile wamrc, which means converting wasm files into aot files, and then using the previously compiled iwasm to execute them to achieve faster execution speed. Because wamrc relies on llvm for compilation optimization, we need to compile llvm first. In macOS, let's first install the dependencies required for compiling llvm (those who have already installed cmake and ninja can ignore this step).

brew install cmake && brew install ninja

Execute build.sh in the wamr-compiler path.

./build_llvm.sh

Then, we found an error = =. Let's follow the prompt to fix it. It seems that the version of llvm I downloaded does not support the LLVM_CCACHE_BUILD option, so I need to modify the compilation options in the build-scripts/build_llvm.py file to disable the ccache option.

LLVM_COMPILE_OPTIONS.append("-DLLVM_CCACHE_BUILD:BOOL=OFF")

After modifying it, let's build llvm again. Then, let's compile wamrc. This part is not much different from compiling iwasm. Just follow the compilation steps in the official readme.

mkdir build && cd build
cmake .. -DWAMR_BUILD_PLATFORM=darwin
make

After execution, we will get the wamrc executable file in the build path. Let's use wamrc to compile it.

./wamrc --size-level=3 -o hello_wasm.aot hello_wasm.wasm

Here, because I am using an ARM64 architecture chip, I need to add the --size-level=3 option, otherwise it cannot be compiled (related to file size).

Execute wasm with aot#

Let's use the aot artifact compiled above to execute it with iwasm.

./iwasm hello_wasm.aot

Let's try calling fib(40) again. This time, it only took 337 milliseconds on my machine, which is the same as the native Rust program. Although this simple example cannot fully represent the performance difference between aot and native Rust programs, it also shows that after optimization by llvm, webassembly can achieve execution speed close to native performance.

Anecdote#

Node.js uses V8, which is also a highly optimized JIT compiler. However, unlike aot, V8 first interprets and executes JavaScript bytecode, and only JIT compiles hot functions. Also, because JavaScript is a dynamically typed language, it is more difficult to optimize compared to static typed languages like webassembly. So, as the pinnacle of dynamic languages, how long does it take to execute the above fib function? On my machine, executing fib(40) took about 959 milliseconds, which is 30% of the native Rust program. This also shows that V8 is indeed powerful.