Machine Learning One

We have an engine that compiles WAT to runnable modules. But running a module that does anything useful — fetching a URL, querying an AI, storing data — requires passing data between the host (Rust) and the guest (WASM). This is harder than it sounds.

WASM programs live in a sealed box. They can only work with i32 values. The host has strings, structs, vectors, and maps. We need to bridge this gap with two things:

A wire protocol that serializes Rust types into bytes that both sides understand
A bump allocator that manages the guest's linear memory

The Wire Protocol

Why Not JSON?

JSON is text-based, which means parsing requires string manipulation — something WASM is terrible at. The guest would need to walk a byte array character by character, handle escape sequences, manage nested structures, and deal with Unicode. That's hundreds of lines of WAT just for parsing.

We need a format where the guest can extract data by reading fixed offsets. Our choice: length-prefixed binary encoding.

The Format

Every value is encoded as:

Primitives:   raw little-endian bytes (no prefix)
  bool:       [u8]           (0 or 1)
  i32:        [4 bytes LE]
  i64:        [8 bytes LE]

Strings:      [u32 LE length][utf-8 bytes]
  "hello" →   [5, 0, 0, 0, 104, 101, 108, 108, 111]

Bytes:        [u32 LE length][raw bytes]

Sequences:    [u32 LE total_byte_length][element][element]...
  vec![1,2,3] → [12, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0]
              (12 bytes of payload: three i32s)

Structs:      [u32 LE total_byte_length][field][field]...
              (fields serialized in declaration order, names ignored)

Options:      [u32 LE total_byte_length][u8 discriminant][value if Some]
  None →      [1, 0, 0, 0, 0]
  Some(1u8) → [2, 0, 0, 0, 1, 1]

The key insight: strings and byte sequences start with a 4-byte length prefix. This means the guest can read a string by: (1) reading 4 bytes at the pointer to get the length, (2) reading length bytes starting at pointer + 4. No parsing, no scanning for delimiters.

Building the Protocol Crate

The protocol is a standalone crate that implements serde's Serializer and Deserializer traits. This means any Rust type that derives Serialize and Deserialize automatically works with our wire format.

protocol/
├── Cargo.toml
└── src/
    ├── lib.rs
    ├── ser.rs
    ├── de.rs
    └── error.rs

The Serializer

The core idea is a byte buffer with an offset stack for nested length-prefix encoding:

// protocol/src/ser.rs

pub struct BytesSerializer {
    buffer: RefCell<Vec<u8>>,
    offsets: RefCell<Vec<usize>>,
}

When serializing a length-prefixed type (sequence, struct, option), we:

Push the current buffer position onto the offset stack
Write a placeholder 4-byte zero (the length we don't know yet)
Serialize all the inner data
Pop the offset, calculate current_position - offset - 4, and overwrite the placeholder

fn start_bytelen_encoding(&self) -> Result<&Self> {
    self.offsets.borrow_mut().push(self.buffer.borrow().len());
    self.buffer.borrow_mut().extend(&0u32.to_le_bytes());
    Ok(self)
}

fn end_bytelen_encoding(&self) -> Result<()> {
    let buffer_len = self.buffer.borrow().len();
    let offset = self.offsets.borrow_mut().pop()
        .ok_or(Error::UnresolvedPrefix)?;
    let len = (buffer_len - offset - 4) as u32;
    self.buffer.borrow_mut()[offset..offset + 4]
        .copy_from_slice(&len.to_le_bytes());
    Ok(())
}

Primitives are straightforward — push their little-endian bytes:

fn serialize_i32(self, v: i32) -> Result<()> {
    self.buffer.borrow_mut().extend(&v.to_le_bytes());
    Ok(())
}

fn serialize_str(self, v: &str) -> Result<()> {
    let vb = v.as_bytes();
    self.serialize_u32(vb.len() as u32)?;
    self.buffer.borrow_mut().extend(vb);
    Ok(())
}

Complex types use start_bytelen_encoding/end_bytelen_encoding:

fn serialize_seq(self, _len: Option<usize>) -> Result<Self::SerializeSeq> {
    self.start_bytelen_encoding()
}

fn serialize_struct(self, _name: &'static str, _len: usize) -> Result<Self::SerializeStruct> {
    self.start_bytelen_encoding()
}

fn serialize_none(self) -> Result<()> {
    self.start_bytelen_encoding()?;
    self.serialize_u8(0)?;    // discriminant: None
    self.end_bytelen_encoding()
}

fn serialize_some<T: Serialize>(self, v: &T) -> Result<()> {
    self.start_bytelen_encoding()?;
    self.serialize_u8(1)?;    // discriminant: Some
    v.serialize(self)?;
    self.end_bytelen_encoding()
}

The public API is a single function:

pub fn to_bytes<T: Serialize>(value: &T) -> Result<Vec<u8>> {
    let ser = BytesSerializer::default();
    ser.to_bytes(value)
}

The Deserializer

The deserializer walks a byte slice with a position cursor:

// protocol/src/de.rs

pub struct BytesDeserializer<'de> {
    buffer: &'de [u8],
    position: RefCell<usize>,
}

Reading primitives advances the position:

fn read_u32(&self) -> Result<u32> {
    let bytes = self.read_bytes(4)?;
    Ok(u32::from_le_bytes([bytes[0], bytes[1], bytes[2], bytes[3]]))
}

Strings use zero-copy deserialization when possible:

fn deserialize_str<V: de::Visitor<'de>>(self, visitor: V) -> Result<V::Value> {
    let len = self.read_u32()? as usize;
    let start = self.peek_position();
    let end = start + len;
    if end > self.buffer.len() {
        return Err(Error::InvalidData);
    }
    let s = std::str::from_utf8(&self.buffer[start..end])
        .map_err(|_| Error::InvalidData)?;
    *self.position.borrow_mut() = end;
    visitor.visit_borrowed_str(s)
}

Sequences use a byte-counting approach — the length prefix tells us how many bytes remain, not how many elements:

struct SeqAccess<'a, 'de> {
    de: &'a BytesDeserializer<'de>,
    remaining: usize,  // remaining BYTES, not elements
}

impl<'de> de::SeqAccess<'de> for SeqAccess<'_, 'de> {
    fn next_element_seed<T>(&mut self, seed: T) -> Result<Option<T::Value>>
    where T: de::DeserializeSeed<'de> {
        if self.remaining == 0 {
            return Ok(None);
        }
        let before = self.de.peek_position();
        let val = seed.deserialize(self.de)?;
        let consumed = self.de.peek_position() - before;
        self.remaining -= consumed;
        Ok(Some(val))
    }
}

This is a deliberate design choice. Byte-counting means we don't need to know the element count upfront, and it works for heterogeneous sequences (tuples, struct fields) where elements have different sizes.

The public API mirrors the serializer:

pub fn from_bytes<'a, T: Deserialize<'a>>(bytes: &'a [u8]) -> Result<T> {
    let de = BytesDeserializer::default();
    de.from_bytes(bytes)
}

Verifying the Format

The protocol crate has roundtrip tests for every type. Here are a few to build intuition:

// "hello" serializes to [length=5][h][e][l][l][o]
let bytes = to_bytes(&"hello").unwrap();
assert_eq!(bytes, vec![5, 0, 0, 0, 104, 101, 108, 108, 111]);
let decoded: &str = from_bytes(&bytes).unwrap();
assert_eq!(decoded, "hello");

// vec![1i32, 2, 3] serializes to [length=12][1][2][3] (each i32 is 4 bytes)
let bytes = to_bytes(&vec![1i32, 2, 3]).unwrap();
assert_eq!(bytes, vec![12, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0]);

// None::<i32> serializes to [length=1][discriminant=0]
let bytes = to_bytes(&None::<i32>).unwrap();
assert_eq!(bytes, vec![1, 0, 0, 0, 0]);

// Some(1u8) serializes to [length=2][discriminant=1][1]
let bytes = to_bytes(&Some(1u8)).unwrap();
assert_eq!(bytes, vec![2, 0, 0, 0, 1, 1]);

The full serializer and deserializer implementations with all serde trait methods are in the crate. The code is mechanical — each serde method either writes raw bytes (primitives) or wraps content in start_bytelen_encoding/end_bytelen_encoding (complex types). Implementing the full set of serde trait methods (SerializeSeq, SerializeTuple, SerializeMap, SerializeStruct, etc.) is left as a reader exercise — follow the pattern from the examples above.

The Bump Allocator

Now that we can serialize data, we need somewhere in WASM memory to put it. WASM gives us linear memory — a contiguous byte array starting at offset 0, growing in 64KB pages. We manage it with the simplest possible allocator: a bump allocator.

// rt/src/memory.rs

pub struct WasmMemory {
    top: u32,         // Next available offset (high-water mark)
    pub(crate) max: u64,  // Maximum allowed memory in bytes
}

Allocation

impl WasmMemory {
    pub fn new(max: u64, initial_top: u32) -> Self {
        WasmMemory { top: initial_top, max }
    }

    pub fn alloc(
        &mut self,
        mem: &Memory,
        store: &mut impl AsContextMut,
        size: u32,
    ) -> Result<u32> {
        let new_top = self.top.checked_add(size).ok_or(Error::MemoryExhausted {
            requested: size,
            limit: self.max,
        })?;
        if new_top as u64 > self.max {
            return Err(Error::MemoryExhausted {
                requested: size,
                limit: self.max,
            });
        }
        // Grow pages if needed (each page = 64KB)
        let needed_pages = (new_top as u64).div_ceil(65536);
        let current_pages = mem.size(store.as_context());
        if needed_pages > current_pages {
            let delta = needed_pages - current_pages;
            mem.grow(store.as_context_mut(), delta).map_err(|_| {
                Error::MemoryExhausted {
                    requested: size,
                    limit: self.max,
                }
            })?;
        }
        let ptr = self.top;
        self.top = new_top;
        Ok(ptr)
    }
}

The algorithm:

Compute new_top = top + size (with overflow check)
Verify new_top <= max (the configured memory limit)
Compute how many 64KB pages are needed for new_top
If more pages are needed than currently allocated, grow the memory
Return the old top as the pointer, set top = new_top

There is no free(). This is intentional. WASM programs in our system are short-lived — they run, produce a result, and the entire instance is discarded. A bump allocator is the fastest possible allocator for this use case: O(1) allocation, zero fragmentation overhead, zero bookkeeping.

Writing Serialized Data

pub fn write_bytes(
    &mut self,
    mem: &Memory,
    store: &mut impl AsContextMut,
    data: &[u8],
) -> Result<u32> {
    let serialized = protocol::to_bytes(&data.to_vec())?;
    let ptr = self.alloc(mem, store, serialized.len() as u32)?;
    mem.write(store.as_context_mut(), ptr as usize, &serialized)
        .map_err(|e| Error::Other(e.into()))?;
    Ok(ptr)
}

This serializes the data using our wire protocol (which adds a length prefix), allocates space, writes the bytes, and returns the pointer. The guest can then read the data by: reading 4 bytes at ptr to get the length, then reading length bytes at ptr + 4.

Reading Data Back

pub fn read_bytes(
    &self,
    mem: &Memory,
    store: &impl AsContext,
    ptr: u32,
) -> Result<Vec<u8>> {
    let data = mem.data(store);
    let size = data.len() as u32;
    let len_end = ptr.checked_add(4)
        .ok_or(Error::InvalidPointer { ptr, size })?;
    if len_end > size {
        return Err(Error::InvalidPointer { ptr, size });
    }
    let len = u32::from_le_bytes(
        data[ptr as usize..len_end as usize].try_into().unwrap()
    );
    let data_end = len_end.checked_add(len)
        .ok_or(Error::InvalidPointer { ptr, size })?;
    if data_end > size {
        return Err(Error::InvalidPointer { ptr, size });
    }
    Ok(data[ptr as usize..data_end as usize].to_vec())
}

Bounds checking at every step: ptr + 4 overflow, ptr + 4 > memory_size, ptr + 4 + len overflow, ptr + 4 + len > memory_size. A malicious or buggy guest program cannot trick us into reading out of bounds.

The Caller Helper Functions

Inside wasmtime's func_wrap closures (which implement host functions), we face a borrow-splitting problem. We need to access the WASM memory through the Caller, but we also need to access our State through the same Caller. Rust's borrow checker won't let us hold &mut caller and &mut caller.data_mut().memory simultaneously.

The solution is three helper functions that perform manual borrow splitting:

/// Allocate `size` bytes via the Caller, solving the borrow-split problem.
pub fn caller_alloc(caller: &mut Caller<'_, State>, size: u32) -> Result<u32> {
    let mem = caller
        .get_export("mem.tape")
        .and_then(|e| e.into_memory())
        .ok_or_else(|| Error::Other(anyhow::anyhow!("mem.tape not found")))?;
    let top = caller.data().memory.top;
    let max = caller.data().memory.max;
    let new_top = top.checked_add(size).ok_or(Error::MemoryExhausted {
        requested: size, limit: max,
    })?;
    if new_top as u64 > max {
        return Err(Error::MemoryExhausted { requested: size, limit: max });
    }
    let needed = (new_top as u64).div_ceil(65536);
    let current = mem.size(&*caller);
    if needed > current {
        mem.grow(&mut *caller, needed - current)
            .map_err(|_| Error::MemoryExhausted { requested: size, limit: max })?;
    }
    caller.data_mut().memory.top = new_top;
    Ok(top)
}

/// Allocate and write raw bytes (already in wire format).
pub fn caller_write_raw(caller: &mut Caller<'_, State>, data: &[u8]) -> Result<u32> {
    let ptr = caller_alloc(caller, data.len() as u32)?;
    let mem = caller.get_export("mem.tape")
        .and_then(|e| e.into_memory())
        .ok_or_else(|| Error::Other(anyhow::anyhow!("mem.tape not found")))?;
    mem.write(&mut *caller, ptr as usize, data)
        .map_err(|e| Error::Other(e.into()))?;
    Ok(ptr)
}

/// Read length-prefixed data from WASM memory.
pub fn caller_read_bytes(caller: &mut Caller<'_, State>, ptr: u32) -> Result<Vec<u8>> {
    let mem = caller.get_export("mem.tape")
        .and_then(|e| e.into_memory())
        .ok_or_else(|| Error::Other(anyhow::anyhow!("mem.tape not found")))?;
    let data = mem.data(&*caller);
    let size = data.len() as u32;
    // ... same bounds-checking logic as read_bytes() ...
    Ok(data[ptr as usize..data_end as usize].to_vec())
}

These functions will be used in every capability implementation. They're the low-level plumbing that makes host-guest data exchange work.

Memory Layout

Here's what linear memory looks like during execution:

Offset 0                                              max_memory_bytes
│                                                              │
▼                                                              ▼
┌──────────────┬───────────────┬──────────────┬────────────────┐
│ Static Data  │  arg0 bytes   │  arg1 bytes  │   (free)       │
│ (strings)    │  [len][data]  │  [len][data] │                │
└──────────────┴───────────────┴──────────────┴────────────────┘
               ▲                              ▲
               │                              │
          initial_top                        top (high-water mark)

Offset 0 to initial_top: Static data sections (string literals placed by the preprocessor — we'll build this in Article 8)
initial_top onward: Dynamic allocations (function arguments, host function results)
top: The next available offset — all allocations happen here
Beyond top: Unused memory (pages may or may not be allocated yet)

Testing

#[test]
fn alloc_basic() {
    let (mut store, mem) = test_env();
    let mut wm = WasmMemory::new(65536, 0);
    let ptr = wm.alloc(&mem, &mut store, 100).unwrap();
    assert_eq!(ptr, 0);
    assert_eq!(wm.top(), 100);

    let ptr2 = wm.alloc(&mem, &mut store, 50).unwrap();
    assert_eq!(ptr2, 100);
    assert_eq!(wm.top(), 150);
}

#[test]
fn alloc_grows_pages() {
    let (mut store, mem) = test_env();
    let mut wm = WasmMemory::new(256 * 1024, 0);
    let ptr = wm.alloc(&mem, &mut store, 70000).unwrap();
    assert_eq!(ptr, 0);
    assert!(mem.size(&store) >= 2); // Grew past 1 page (64KB)
}

#[test]
fn alloc_exceeds_max() {
    let (mut store, mem) = test_env();
    let mut wm = WasmMemory::new(1024, 0);
    let result = wm.alloc(&mem, &mut store, 2048);
    assert!(result.is_err());
}

#[test]
fn write_read_roundtrip() {
    let (mut store, mem) = test_env();
    let mut wm = WasmMemory::new(65536, 0);
    let data = b"hello world";
    let ptr = wm.write_bytes(&mem, &mut store, data).unwrap();
    let read_back = wm.read_bytes(&mem, &store, ptr).unwrap();
    let decoded: Vec<u8> = protocol::from_bytes(&read_back).unwrap();
    assert_eq!(decoded, data);
}

What We Have So Far

After this article, our system can:

Serialize any Rust type to a compact binary format
Deserialize binary data back to Rust types (with zero-copy for strings and bytes)
Manage WASM linear memory with a bump allocator
Grow memory pages on demand, up to a configured limit
Read and write length-prefixed data across the WASM boundary
Handle borrow-splitting inside wasmtime host function closures

What it can't do yet: register host functions that the guest can call. For that, we need the Capability trait — the plugin system that defines how the host exposes functionality to the sandbox.

Bridging Two Worlds: Linear Memory and the Wire Protocol