Making WAT Livable: Macros, String Inlining, and Local Hoisting
Extending WAT with higher-level language features through a multipass preprocessor.
In the previous article, we felt the pain of raw WAT: 5 lines for every argument load, no inline strings, locals forced to the function top. This article builds the cure — a three-pass preprocessor that transforms an extended WAT dialect into standard WAT.
The Three Passes
The preprocessor takes a WAT function body and produces:
- A transformed body (standard WAT)
- A list of data sections (static data to write into memory before execution)
- An initial memory top (past the static data zone)
pub struct PreprocessResult {
pub body: String,
pub data_sections: Vec<DataSection>,
pub initial_top: u32,
}
pub struct DataSection {
pub offset: u32,
pub bytes: Vec<u8>,
}The three passes are:
- Pass 0: Macro expansion —
(argv N $var),(check $err),(resv $ptr)→ standard WAT - Pass 1: String inlining —
"hello"and"""multi-line"""→(i32.const offset)+ data sections - Pass 2: Local hoisting —
(local ...)declarations moved to function top, deduped
Pass 0: Macro Expansion
The tree-sitter-wat-plus Grammar
The preprocessor needs to parse WAT-plus source into an AST — standard WAT plus our macro extensions. We started from the existing tree-sitter-wasm grammar, which provides a full tree-sitter grammar for the WebAssembly text format. We then stripped out everything irrelevant to our i32-only subset (floating-point instructions, SIMD, reference types, table operations) and extended the grammar with three new node types: argv_macro, check_macro, and resv_macro. The result is tree-sitter-wat-plus, a minimal grammar that parses exactly the language our LLM generates.
Starting from an established WAT grammar instead of writing one from scratch was important — tree-sitter grammars for S-expression languages have subtle corner cases around nested parentheses, comments, and string escaping. The wasm-lsp grammar had already solved these.
The preprocessor uses this grammar to parse the body into an AST, walks the AST to find macro nodes, expands each one, and splices the expansions into the source text.
The argv Macro
Input:
(argv 0 $url_ptr)Expansion:
(local $url_ptr i32)
(local $url_err i32)
(call $sys.argv (i32.const 0))
(local.set $url_err)
(local.set $url_ptr)
(if (i32.ne (local.get $url_err) (i32.const 0))
(then (return (local.get $url_err)))
)One line becomes seven. The error local name is derived automatically: $url_ptr → $url_err (strip _ptr suffix if present, append _err):
fn derive_err_name(name: &str) -> String {
let base = name.strip_prefix('$').unwrap_or(name);
let base = base.strip_suffix("_ptr").unwrap_or(base);
format!("${}_err", base)
}So $query_ptr → $query_err, $seed → $seed_err, $body_ptr → $body_err.
The check Macro
Input:
(check $body_err)Expansion:
(if (i32.ne (local.get $body_err) (i32.const 0))
(then (return (local.get $body_err)))
)One line becomes three. This is the error guard pattern — if the error code is non-zero, return it immediately.
The resv Macro
Input:
(resv $result_ptr)Expansion:
(call $sys.resv (local.get $result_ptr))Implementation
The macro expander walks the tree-sitter AST with an iterative DFS, collecting MacroHit structs:
struct MacroHit {
start: usize, // byte offset in source
end: usize, // byte offset in source
expansion: String,
}For each macro node, it extracts the arguments (index and variable name for argv, variable name for check/resv) and formats the expansion string. After collecting all hits, it sorts them by descending position and splices the expansions back-to-front (to avoid offset shifts):
hits.sort_by(|a, b| b.start.cmp(&a.start));
let mut result = source.to_vec();
for hit in &hits {
result.splice(hit.start..hit.end, hit.expansion.bytes());
}If no macros are found, the function returns None (fast path — avoids re-parsing overhead).
Pass 1: String Inlining
After macro expansion, the preprocessor re-parses with tree-sitter and walks the AST again, this time collecting string_literal nodes.
String Formats
Two formats are supported:
Single-line strings: "hello world" — standard WAT escape sequences apply:
\n,\t,\r— whitespace escapes\",\',\\— literal characters\HH— two hex digits → one byte\u{hex...}— Unicode codepoint (UTF-8 encoded)
Triple-quoted strings: """multi-line text""" — raw strings with leading/trailing newline stripped. No escape processing.
Deduplication
If the same string appears multiple times, only one copy is stored in memory. The deduplication uses an ordered vector as a map (small N makes this efficient):
let mut string_map: Vec<(Vec<u8>, u32)> = Vec::new();
let mut current_offset: u32 = 0;
for hit in &string_hits {
if !string_map.iter().any(|(k, _)| k == &hit.decoded) {
let entry_size = align_up(4 + hit.decoded.len() as u32, 4);
string_map.push((hit.decoded.clone(), current_offset));
current_offset += entry_size;
}
}
let initial_top = current_offset;Memory Layout
Each string entry is 4-byte aligned:
[u32 LE length][content bytes][zero padding to 4-byte boundary]Example: "hello" (5 bytes) → [05 00 00 00][68 65 6c 6c 6f][00 00 00] = 12 bytes.
fn align_up(x: u32, align: u32) -> u32 {
(x + align - 1) & !(align - 1)
}
fn build_data_section(offset: u32, content: &[u8]) -> DataSection {
let len = content.len() as u32;
let entry_size = align_up(4 + len, 4) as usize;
let mut bytes = Vec::with_capacity(entry_size);
bytes.extend_from_slice(&len.to_le_bytes());
bytes.extend_from_slice(content);
while bytes.len() < entry_size {
bytes.push(0);
}
DataSection { offset, bytes }
}Replacement
Each string literal in the source is replaced with (i32.const <offset>):
for hit in &string_hits {
let offset = string_map.iter()
.find(|(k, _)| k == &hit.decoded)
.map(|(_, o)| *o)
.unwrap_or(0);
edits.push(Edit {
start: hit.start,
end: hit.end,
replacement: format!("(i32.const {})", offset),
});
}Before: (call $http.get "https://example.com")
After: (call $http.get (i32.const 0))
Data section at offset 0: [19 00 00 00]https://example.com[00] (19 bytes + padding)
The data sections are written into linear memory by LinkedModule.instantiate() before the program runs. The bump allocator's initial_top is set past the static data zone, so dynamic allocations don't overwrite the strings.
Pass 2: Local Hoisting
The final pass collects all (local ...) declarations from the body, removes them from their original positions, and places them at the function top.
Deduplication Rules
- Named locals (
(local $x i32)): first occurrence kept, duplicates removed - Anonymous locals (
(local i32)): all kept (they represent distinct stack slots)
let mut seen_names: Vec<String> = Vec::new();
let mut unique_locals: Vec<String> = Vec::new();
for hit in &local_hits {
locals_to_remove.push((hit.start, hit.end));
match &hit.name {
Some(name) => {
if !seen_names.contains(name) {
seen_names.push(name.clone());
unique_locals.push(hit.text.trim().to_string());
}
}
None => {
unique_locals.push(hit.text.trim().to_string());
}
}
}This is important because the argv macro generates (local $name i32) declarations. If the LLM also declares (local $name i32) manually, we get a duplicate. WAT doesn't allow duplicate named locals, so deduplication prevents compilation errors.
Applying All Edits
Passes 1 and 2 generate edits simultaneously. All edits (string replacements + local removals) are sorted by descending start position and applied back-to-front:
edits.sort_by(|a, b| b.start.cmp(&a.start));
let mut result_bytes = source.to_vec();
for edit in &edits {
result_bytes.splice(edit.start..edit.end, edit.replacement.bytes());
}
let final_body = if !unique_locals.is_empty() {
let preamble = unique_locals.join("\n");
format!("{}\n{}", preamble, cleaned_body)
} else {
cleaned_body.to_string()
};Before and After
The ugly KV program from Article 7 (35 lines):
(local $key_ptr i32)
(local $key_err i32)
(local $val_ptr i32)
(local $val_err i32)
(local $set_err i32)
(local $get_ptr i32)
(local $get_err i32)
(call $sys.argv (i32.const 0))
(local.set $key_err)
(local.set $key_ptr)
(if (i32.ne (local.get $key_err) (i32.const 0))
(then (return (local.get $key_err)))
)
(call $sys.argv (i32.const 1))
(local.set $val_err)
(local.set $val_ptr)
(if (i32.ne (local.get $val_err) (i32.const 0))
(then (return (local.get $val_err)))
)
(call $kv.set (local.get $key_ptr) (local.get $val_ptr))
(local.set $set_err)
(if (i32.ne (local.get $set_err) (i32.const 0))
(then (return (local.get $set_err)))
)
(call $kv.get (local.get $key_ptr))
(local.set $get_err)
(local.set $get_ptr)
(if (i32.ne (local.get $get_err) (i32.const 0))
(then (return (local.get $get_err)))
)
(call $sys.resv (local.get $get_ptr))
(i32.const 0)With the preprocessor (13 lines):
(argv 0 $key_ptr)
(argv 1 $val_ptr)
(call $kv.set (local.get $key_ptr) (local.get $val_ptr))
(local $set_err i32)
(local.set $set_err)
(check $set_err)
(call $kv.get (local.get $key_ptr))
(local $get_err i32) (local $get_ptr i32)
(local.set $get_err)
(local.set $get_ptr)
(check $get_err)
(resv $get_ptr)
(i32.const 0)The fetch-and-summarize program that was impossible in raw WAT:
(argv 0 $url_ptr)
(call $http.get (local.get $url_ptr))
(local $body_err i32) (local $body_ptr i32)
(local.set $body_err)
(local.set $body_ptr)
(check $body_err)
(call $ai.assist "Summarize in one paragraph." (local.get $body_ptr) (call $sys.nil))
(local $sum_err i32) (local $sum_ptr i32)
(local.set $sum_err)
(local.set $sum_ptr)
(check $sum_err)
(resv $sum_ptr)
(i32.const 0)The string "Summarize in one paragraph." is automatically inlined into memory. The (argv 0 $url_ptr) macro handles all the argument loading boilerplate. The (local ...) declarations appear next to the code that uses them — the preprocessor hoists them to the top automatically.
Tests
The preprocessor has comprehensive tests. Here are the key cases:
#[test]
fn single_string_literal() {
let body = r#"(call $f "hello")"#;
let result = preprocess(body).unwrap();
assert_eq!(result.initial_top, 12); // align_up(4+5, 4) = 12
assert_eq!(result.data_sections.len(), 1);
assert!(result.body.contains("(i32.const 0)"));
}
#[test]
fn duplicate_string_literals_dedup() {
let body = r#"(call $f "hello")
(call $g "hello")"#;
let result = preprocess(body).unwrap();
assert_eq!(result.data_sections.len(), 1); // Only one copy
let count = result.body.matches("(i32.const 0)").count();
assert_eq!(count, 2); // Both point to same offset
}
#[test]
fn argv_macro_basic() {
let result = preprocess("(argv 0 $query_ptr)").unwrap();
assert!(result.body.contains("(local $query_ptr i32)"));
assert!(result.body.contains("(local $query_err i32)"));
assert!(result.body.contains("(call $sys.argv (i32.const 0))"));
}
#[test]
fn mid_body_local_hoisted() {
let body = "(call $sys.argv (i32.const 0))\n(local $x i32)\n(local.set $x)";
let result = preprocess(body).unwrap();
assert!(result.body.starts_with("(local $x i32)"));
}
#[test]
fn duplicate_named_local_deduped() {
let body = "(local $x i32)\n(call $f)\n(local $x i32)";
let result = preprocess(body).unwrap();
let count = result.body.matches("(local $x i32)").count();
assert_eq!(count, 1);
}Run the full test suite with cargo test -p rt.
How the Template Uses the Preprocessor
The Template.assemble() method calls the preprocessor first, then wraps the result in a complete module:
pub fn assemble(&self, body: &str) -> Result<AssembleResult> {
let pre = crate::preprocessor::preprocess(body)?;
let mut wat = String::from("(module\n");
for import in &self.wat_imports {
wat.push_str(import);
wat.push('\n');
}
wat.push_str(" (memory $mem.tape 1)\n");
wat.push_str(" (export \"mem.tape\" (memory $mem.tape))\n\n");
wat.push_str(" (func $run (result i32)\n");
wat.push_str(&pre.body);
wat.push_str("\n )\n");
wat.push_str(" (export \"run\" (func $run))\n");
wat.push_str(")\n");
Ok(AssembleResult {
wat,
data_sections: pre.data_sections,
initial_top: pre.initial_top,
})
}The data sections and initial_top are passed through to LinkedModule.instantiate(), which writes the static data into memory and sets the bump allocator's starting offset.
The Full Processing Pipeline
LLM generates WAT-plus body
│
▼
Pass 0: expand_macros()
(argv N $var) → full argv sequence
(check $err) → error guard
(resv $ptr) → sys.resv call
│
▼
Pass 1: String inlining
"hello" → (i32.const 0)
Data sections: [{offset: 0, bytes: [5,0,0,0,h,e,l,l,o,0,0,0]}]
│
▼
Pass 2: Local hoisting
Move (local ...) to function top
Deduplicate named locals
│
▼
Template.assemble()
Wrap in (module ... imports ... (func $run ...) ...)
│
▼
Engine.compile_wat()
Compile to wasmtime::Module
│
▼
LinkedModule.instantiate()
Write data sections → set initial_top → create Instance
│
▼
Instance.run()With the preprocessor in place, the LLM can generate concise, readable WAT-plus code, and the runtime handles all the mechanical transformations.
The Pain of Raw WAT (and Why We Need a Preprocessor)
Discovering the limits of raw WAT through building HTTP and KV capabilities.
Expanding the Toolkit: Strings, Arrays, Debug, Filesystem, and More
Giving the agent a practical toolkit by building capabilities for strings, arrays, AI, filesystem, and more.