NanoOs

31-Jan-2025 - WASM VM

Let’s first talk about the goal and why I even have it. The Arduino Nano Every - as well as some other models of Arduino and other systems - is a Harvard architecture. That means that it’s impossible to load executable code into RAM and run it. Translation: There is absolutely no way for me to load code compiled for the Nano Every’s processor (an ATMEGA4809) and run it natively. If I’m going to run arbitrary commands that are read from the filesystem, my only option is to have something that can translate something in the command file to something that executes. So, we’re talking about a VM here.

One option would be to make a VM for the ATMEGA and compile binaries to that. But, why? There’s nothing inherently great about that for what I’m doing. These SD cards are enormous. Once we’re talking about accessing data on the card, the binaries can be basically any size and we don’t care. That’s why getting to the point of having a usable filesystem was so important.

A friend of mine suggested that I look into being able to run Web Assembly (WASM). That has the potential to be a universal assembly language. If I make a VM for that, anything that is compiled to WASM becomes something that my OS can run. So, that’s what I set out to do.

First thing’s first: Delete all the existing command infrastructure. There were two reasons for this. First: I needed the space. By the time I got done with the SD card and FAT filesystem infrastructure, I only had a little less than 800 bytes of program flash left. Deleting the existing infrastructure gave me about 8.5 KB to work with. Second: It will no longer be relevant. All of the processes will have to run through the VM and the existing commands just flat out won’t even work anymore. OK, get rid of it.

Space recovered, I set about trying to make something that can run WASM-compiled code. One issue with this is that there are multiple WASM formats. The most popular one right now is called “Emscripten”. Emscripten aims to provide a POSIX-compliant environment that will allow most existing C code to be compiled to WASM with little or no modification. That would be ideal, however that puts a pretty high burden on the VM developer for providing the necessary environmental functionality. A competing format is called “WASI” which aims to be the official OS support definition for WASM. It has simpler environmental requirements than Emscripten does.

Given the limited resources I have to work with, deciding which format to go with was not a super hard decision. I need to eliminate all the complexity I can in order to save flash space. I didn’t know going into this how much flash was going to be consumed by either format, but I didn’t want to get part way through implementing the larger one and then find out that only the smaller one would fit. WASI seemed like the safest bet, so that’s what I went with.

Now, irrespective of which format is used, there are several things that any WASM VM requires being setup before a program is even started. In addition to the memory for the code itself, there are five (5) memory segments required by a WASM program:

Linear Memory
The Global Stack
The Call Stack
Global Storage
Table Space

Each of these things would take up way more RAM than I have available. So, the first thing I had to do was develop a virtual memory system. Mind you, this is not virtual memory in the traditional sense of the term. This is just a way for the VM to access data. From the standpoint of the VM process, it’s virtual memory because the system will fetch whatever data is referenced by the addresses the process references, but there’s no page fault or TLB or anything else going on here. The system looks up data in a file according a provided address and returns whatever is there if it’s a read or sets the data if it’s a write.

Just having five pieces of virtual memory wasn’t enough, though. They had to be initialized as well. First, the linear memory had to be allocated. That was pretty straightforward: Just write 64 KB of zeros to the virtual memory and you’re done. The stacks were also pretty straightforward: Just write the value of what would be the stack register to the first 32-bits of the stack and you’re done.

There were two areas, however, that required pretty extensive logic: Function imports and function exports. WASM doesn’t have interrupts. The way that WASM interacts with the host OS is by listing the functions it will use and then context switching to them via a CALL instruction. The instructions reference an index into a table of imported functions that’s listed near the beginning of the binary. When the VM starts, it scans the binary for the host functions that the program needs. It then looks up the inforamtion for those functions within host space and stores its own metadata in the Table Space virtual memory segment.

Function exports serve the inverse purpose: They allow an external entity to call code in the WASM binary. This is mostly for libraries, however there’s obviously one export that’s critical for an application: The function that serves as the main entry point to the application. So, the WASM VM has to scan the exports to find the _start export and identify where to start the application.

Once I had all the logic in place to do all that virtual memory management, setup, scanning, and configuration, I was within about 1.5 KB of the end of the available program flash. Not good. I figured at that point that my odds of being able to complete the rest of the infrastructure to run a WASI application and have it be within the storage limit were pretty low in the best case.

Well, only one way to find out. I had already put in enums for the possible WASM instructions and implementing the logic to handle them was the last thing I needed to do. Nothing to do but start implementing the logic and see what happened to the binary size. There are close to 200 single-byte opcodes defined in WASM. That’s a heck of a lot to try and support.

I broke the instruction set down into sections and started with the group of control instructions. There are 13 instructions in this group. I put in an implementation for this and compiled it. Bad news: I was 1,900 bytes over the limit. Just implementing support for the first 13 of almost 200 instructions took up about 3.5 KB. No way was I going to be able to support any kind of WASM.

So, here lays in rest the dream of building the first universal operating system… at least on an Arduino. I’d have to have at least the resources of a Raspberry Pi to pull this off. That would be an entirely different beast. On the Arduino, I can get away with using the Arduino libraries for basic communication. On a Pi, I’d have to learn ARM assembly to do that or maybe find a bootloader that could set up I/O before loading my OS. Either way, it would be a very different effort. Maybe I’ll look into that one day, but today isn’t that day.

I’m not completely giving up on the idea of being able to run commands from the filesystem, though. I’ve started looking into simpler architectures that might be better candidates. Right now, I’m looking at RSIC-V’s RV32I architecture. (Side note: That’s a captial letter ‘I’, not a number ‘1’. I screwed that up initially.) It only has 47 instructions, one of which is an interrupt. That would probably be more doable. Maintaining an interrupt table would probably be much simpler and space efficient than trying to jump through all the hoops requried to manage WASM imports.

The work on a WASI VM has been archived under the wasi-vm branch. It was interesting and valuable work and I definitely learned a lot, so I don’t want to lose it. If I decide to take it further on a bigger system, I’ll come back to it. For now, I’ll have to consider a different route. To be continued…

Table of Contents

This site is open source. Improve this page.