Machine Learning One

We have an engine that compiles and runs WASM, a memory system that serializes data across the boundary, a preprocessor that makes WAT bearable, and nine capabilities that give programs access to HTTP, AI, files, strings, and more. But nothing ties these together. Nothing decides what program to write, what arguments to pass, or what to do with the results.

This article builds the agent — the orchestration layer that takes a user message, gives it to an LLM, receives WAT code (or a catalog program reference, or a final text response), compiles and executes it, feeds the observation back to the LLM, and repeats until the task is done. This is the ReAct loop: Think, Act, Observe.

The agent crate introduces several new dependencies. Here is agent/Cargo.toml:

[package]
name = "agent"
version = "0.1.0"
edition = "2021"

[dependencies]
rt = { path = "../rt" }
capability = { path = "../capability" }
protocol = { workspace = true }
reqwest = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
tokio = { workspace = true }
anyhow = { workspace = true }
toml = "0.8"
regex = "1"
tracing = { workspace = true }

The key additions: toml for parsing WAT library frontmatter, regex for response parsing helpers, and tracing for structured logging. The crate depends on both rt (for the engine, linker, and template) and capability (for all the host function implementations).

The System Prompt

The LLM needs to know what tools it has, how WASM memory works, what syntax to use for responses, and how multi-step reasoning works. All of this is assembled into a system prompt at runtime. The prompt is built from live data — the actual registered capabilities and loaded catalog programs — so it is always in sync with the runtime.

// agent/src/prompt.rs

pub fn build_system_prompt(template: &Template, library: &WatLibrary) -> String {
    let imports = template.describe_imports();
    let catalog = library.catalog();

    format!(
        r#"You are an agent that processes user requests by writing WebAssembly (WAT) programs
that execute in a sandboxed runtime.

## Available Host Functions

{imports}

## Memory Model

Wire format: `[u32 LE length][payload bytes]` for all pointers.
Your function body runs inside `(func $run (result i32) ...)` — return 0 for success.

## Library Programs

{catalog}

Only the programs listed above are available. Do not use `ToolCall::Catalog` with
any other name.

## Response Format

Use the ToolCall syntax for ALL responses. Each response must contain exactly one ToolCall.

### 1. Inline WAT — execute generated code
...
### 2. Catalog program — use a pre-built program
...
### 3. Direct text response — no execution
...

## Multi-Step Reasoning (ReAct)
...

## Important Notes

- Multi-value returns: after `(call $fn)` returning `(ptr: i32, err: i32)`, pop in
  reverse order: `(local.set $err)` then `(local.set $ptr)`.
- Error codes: 0=SUCCESS, 1=ETRFM, 2=ENOMEM, 3=EACCESS, 4=ENOENT, 5=EBOUND,
  6=EREMOTE, 7=EPARSE
- Always check error codes from host function calls before using returned pointers.
- NEVER output raw WAT code outside of a ToolCall."#
    )
}

The full prompt is about 136 lines. The key sections:

Available Host Functions is generated by template.describe_imports(), which calls describe() on every registered capability and concatenates the results. This means the LLM sees every function signature and its documentation — if you add a new capability, the prompt updates automatically.

Library Programs is generated by library.catalog(), which formats each pre-built program with its name, description, and argument specs. The LLM uses this to decide whether to reference a catalog program or write WAT from scratch.

Response Format defines three ToolCall variants. This is the interface contract between the agent and the LLM:

ToolCall::Wat(```wat
(your WAT body)
```, "arg1", "arg2")

ToolCall::Catalog("program_name", "arg1", "arg2")

ToolCall::Response("""
Your final text answer here.
""")

Multi-Step Reasoning tells the LLM about the ReAct protocol: wrap reasoning in <reasoning> tags, emit one ToolCall per turn, observe execution results, repeat until done. The LLM is explicitly told it can take multiple steps and should break complex tasks into smaller pieces.

Argument macros are documented so the LLM uses the preprocessor extensions: (argv N $name) instead of the five-line raw pattern, (resv $ptr) instead of (call $sys.resv ...), and (check $err) for error propagation. String literals are explained — "hello" and """multi-line""" are placed in static memory by the preprocessor.

ToolCall Parsing

The LLM outputs free-form text containing one ToolCall. The parser must extract it. Since LLMs are unreliable formatters, the parser is deliberately fault-tolerant.

// agent/src/agent.rs

fn parse_response(response: &str) -> Parsed {
    if let Some(parsed) = parse_tool_call(response) {
        return parsed;
    }
    // Fallback: no ToolCall found, treat entire response as Response
    Parsed::Response(response.to_string())
}

If no ToolCall syntax is found at all, the entire response becomes a Parsed::Response. This means the agent never crashes on malformed output — it just treats it as a text answer.

The parse_tool_call function searches for each variant in order:

fn parse_tool_call(text: &str) -> Option<Parsed> {
    // Extract thought: prefer <reasoning>...</reasoning>, fall back to raw text
    let thought = text.find("ToolCall::").map(|idx| {
        let before = text[..idx].trim();
        if let Some(start) = before.find("<reasoning>") {
            let content_start = start + "<reasoning>".len();
            if let Some(end) = before[content_start..].find("</reasoning>") {
                let content = before[content_start..content_start + end].trim();
                if !content.is_empty() {
                    return Some(content.to_string());
                }
            }
        }
        if before.is_empty() { None } else { Some(before.to_string()) }
    })?;

    // 1. Try ToolCall::Response("""...""")
    if let Some(idx) = text.find("ToolCall::Response(") {
        let after = &text[idx + "ToolCall::Response(".len()..];
        if let Some(start) = after.find("\"\"\"") {
            let content_start = start + 3;
            if let Some(end) = after[content_start..].find("\"\"\"") {
                let content = after[content_start..content_start + end].to_string();
                return Some(Parsed::Response(content.trim().to_string()));
            }
            // Opening """ but no closing — take rest as content
            let content = &after[content_start..];
            let content = content.trim_end().trim_end_matches(')').trim_end_matches('"');
            return Some(Parsed::Response(content.trim().to_string()));
        }
        // No triple-quotes, take everything between parens
        if let Some(end) = find_closing_paren(after) {
            return Some(Parsed::Response(after[..end].to_string()));
        }
        // No closing paren either — take everything
        let content = after.trim_end().trim_end_matches(')').trim_end_matches('"');
        return Some(Parsed::Response(content.trim().to_string()));
    }

    // 2. Try ToolCall::Wat(```wat...```, "arg1", ...)
    if let Some(idx) = text.find("ToolCall::Wat(") {
        let after = &text[idx + "ToolCall::Wat(".len()..];
        if let Some(body_and_args) = extract_wat_body_and_args(after) {
            return Some(Parsed::Wat {
                thought: thought.clone(),
                body: body_and_args.0,
                args: body_and_args.1,
            });
        }
    }

    // 3. Try ToolCall::Catalog("name", "arg1", ...)
    if let Some(idx) = text.find("ToolCall::Catalog(") {
        let after = &text[idx + "ToolCall::Catalog(".len()..];
        let raw = if let Some(end) = find_closing_paren(after) {
            &after[..end]
        } else {
            after.trim_end().trim_end_matches(')')
        };
        let all_args = parse_quoted_args(raw);
        if let Some((name, args)) = all_args.split_first() {
            return Some(Parsed::Catalog {
                thought: thought.clone(),
                name: name.clone(),
                args: args.to_vec(),
            });
        }
    }

    None
}

The fault tolerance is in the fallbacks. For ToolCall::Response: try triple-quoted string first, then balanced parens, then take everything remaining. For ToolCall::Wat: if the closing ``` is missing, take the rest of the text as the WAT body. For ToolCall::Catalog: if the closing ) is missing, strip trailing characters and parse what's there.

The find_closing_paren helper respects quoted strings so that parentheses inside string arguments don't terminate the parse early:

fn find_closing_paren(text: &str) -> Option<usize> {
    let mut in_quotes = false;
    let mut escape = false;
    for (i, ch) in text.char_indices() {
        if escape { escape = false; continue; }
        match ch {
            '\\' if in_quotes => escape = true,
            '"' => in_quotes = !in_quotes,
            ')' if !in_quotes => return Some(i),
            _ => {}
        }
    }
    None
}

The thought extraction also has two modes: it prefers <reasoning> tags (the format we ask the LLM to use), but falls back to treating any text before ToolCall:: as the thought. This backward-compatible approach means the agent works whether the LLM follows our tag convention or not.

The ReAct Loop

The core loop lives in two methods: process (returns final result) and process_stream (sends events via channel for SSE streaming). The logic is identical; only the output mechanism differs. Here is the structure from process_inner:

// agent/src/agent.rs (simplified)

async fn process_inner(&self, sid: &str, message: &str, history: Vec<Message>,
                       kv_state: Arc<RwLock<KvState>>) -> Result<AgentResponse> {
    // 1. Guard check (optional content safety)
    if let Some(ref guard) = self.guard {
        if let GuardVerdict::Blocked { category, .. } = guard.check(message).await? {
            return Ok(AgentResponse { text: safety_message(&category).to_string(), .. });
        }
    }

    // 2. Build system prompt from live capabilities + catalog
    let template = self.build_template();
    let system = build_system_prompt(&template, &self.catalog);

    let mut messages = vec![Message::system(&system)];
    messages.extend(history);

    // 3. ReAct loop
    let mut all_executions: Vec<Execution> = Vec::new();

    for step in 0..self.config.max_steps {
        // Think: call LLM
        let response = self.client.complete(&messages).await?;

        match parse_response(&response) {
            // Terminal: final text answer
            Parsed::Response(text) => {
                self.record_assistant(sid, &response);
                return Ok(AgentResponse {
                    text, executions: all_executions, steps: step, ..
                });
            }

            // Act: inline WAT
            Parsed::Wat { body, args, .. } => {
                messages.push(Message::assistant(&response));
                let obs = self.execute_step(
                    sid, &template, &body, args, &kv_state,
                    ExecutionSource::Inline, &mut all_executions, &mut messages,
                ).await;
                let obs_msg = format_observation(&obs, step);
                messages.push(Message::user(&obs_msg));
            }

            // Act: catalog program
            Parsed::Catalog { name, args, .. } => {
                messages.push(Message::assistant(&response));
                let program = match self.catalog.get(&name) {
                    Some(p) => p,
                    None => {
                        messages.push(Message::user(&format!(
                            "[Observation (step {})] Catalog program '{}' not found.", step + 1, name
                        )));
                        continue;
                    }
                };
                // Fill in default arguments, then execute
                let obs = self.execute_step(
                    sid, &template, &program.body, final_args, &kv_state,
                    ExecutionSource::Catalog { name }, &mut all_executions, &mut messages,
                ).await;
                let obs_msg = format_observation(&obs, step);
                messages.push(Message::user(&obs_msg));
            }
        }
    }

    // Max steps exhausted — force final response
    messages.push(Message::user(
        "[System] Maximum reasoning steps reached. You MUST respond with \
         ToolCall::Response(\"\"\"...\"\"\") now, summarizing what you accomplished.",
    ));
    let final_response = self.client.complete(&messages).await?;
    // ... extract text, return ...
}

The loop is simple: call the LLM, parse the response, and branch:

Response: done. Return the text.
Wat: compile and execute, format the result as an observation, push it as a user message, continue the loop.
Catalog: look up the program body, fill in default arguments, then same as Wat.

Each step appends the LLM's response as an assistant message and the observation as a user message. This builds up a conversation that the LLM can use for context in subsequent steps.

When the loop exhausts max_steps (default: 10), it forces one final LLM call with an explicit instruction to respond. This prevents infinite loops.

Observation Format

The observation message follows a consistent pattern:

fn format_observation(result: &StepResult, step: usize) -> String {
    match result {
        StepResult::Ok(text) => {
            format!("[Observation (step {})] Execution result:\n{}", step + 1, text)
        }
        StepResult::Err(err) => {
            format!("[Observation (step {})] Execution FAILED:\n{}", step + 1, err)
        }
    }
}

The [Observation (step N)] prefix gives the LLM a clear signal that this is an execution result, not user input. Numbering the steps helps the LLM track where it is in a multi-step task.

Execution with Retry

When the LLM generates WAT that fails to compile — and it will, frequently — the agent does not immediately give up. It sends the compilation error back to the LLM and asks it to fix the code:

async fn execute_step(&self, session_id: &str, template: &Template, body: &str,
    args: Vec<String>, kv_state: &Arc<RwLock<KvState>>, source: ExecutionSource,
    executions: &mut Vec<Execution>, messages: &mut Vec<Message>) -> StepResult
{
    let mut current_body = body.to_string();
    let mut retries = 0;

    loop {
        match self.execute(template, &current_body, &args, kv_state).await {
            Ok(exec) => {
                let text = format_execution_result(&exec);
                executions.push(Execution {
                    source, exit_code: exec.exit_code,
                    results: exec.results, fuel_consumed: exec.fuel_consumed,
                });
                return StepResult::Ok(text);
            }
            Err(e) => {
                let error_msg = e.to_string();
                if retries < self.config.max_retries
                    && (error_msg.contains("compilation failed")
                        || error_msg.contains("Compilation"))
                {
                    retries += 1;
                    messages.push(Message::assistant(&format!(
                        "```wat\n{}\n```", current_body
                    )));
                    messages.push(Message::system(&format!(
                        "Your WAT failed to compile: {}. Fix it and respond with \
                         the corrected ToolCall::Wat(```wat ... ```) block.", error_msg
                    )));

                    if let Ok(retry_response) = self.client.complete(messages).await {
                        if let Parsed::Wat { body: new_body, .. } =
                            parse_response(&retry_response)
                        {
                            current_body = new_body;
                            continue;
                        }
                    }
                }
                return StepResult::Err(format!("{:?}", e));
            }
        }
    }
}

The retry is selective: only compilation failures trigger a retry, not runtime errors. A program that compiles but exits with a non-zero error code is a valid execution — the observation is sent back to the LLM in the normal flow. But a syntax error in the WAT output is a formatting mistake that the LLM can fix if shown the error message. Up to max_retries (default: 2) attempts are made.

Pre-Cache

Compiling WAT to machine code is not free. For catalog programs that get used repeatedly, we cache the compiled result:

type PreCacheEntry = (LinkedModule, Vec<DataSection>, u32);

pub struct Agent {
    // ...
    pre_cache: RwLock<HashMap<u64, PreCacheEntry>>,
    // ...
}

The cache key is a content hash of the assembled WAT text. At startup, all catalog programs are pre-warmed:

let pre_cache: HashMap<u64, PreCacheEntry> = {
    let template = Self::make_template(&config);
    let mut cache = HashMap::new();
    for program in catalog.programs() {
        match template.assemble(&program.body) {
            Ok(ar) => {
                let hash = content_hash(ar.wat.as_bytes());
                match engine.compile_wat(&ar.wat) {
                    Ok(module) => match linker.pre_link(&module) {
                        Ok(linked) => {
                            cache.insert(hash, (linked, ar.data_sections, ar.initial_top));
                        }
                        Err(e) => tracing::warn!("Failed to pre-link: {}", e),
                    },
                    Err(e) => tracing::warn!("Failed to pre-compile: {}", e),
                }
            }
            Err(e) => tracing::warn!("Failed to preprocess: {}", e),
        }
    }
    cache
};

During execution, the hot path checks the cache before compiling:

async fn execute(&self, template: &Template, body: &str, args: &[String],
                 kv_arc: &Arc<RwLock<KvState>>) -> Result<ExecutionResult> {
    let ar = template.assemble(body)?;
    let hash = content_hash(ar.wat.as_bytes());

    let cached = {
        let cache = self.pre_cache.read().unwrap();
        cache.get(&hash).cloned()
    };

    let (linked, data_sections, initial_top) = match cached {
        Some(entry) => entry,
        None => {
            let module = self.engine.compile_wat(&ar.wat)?;
            let linked = self.linker.pre_link(&module)?;
            let entry = (linked, ar.data_sections, ar.initial_top);
            self.pre_cache.write().unwrap().insert(hash, entry.clone());
            entry
        }
    };

    // Build per-execution state, instantiate, run
    let mut extensions = ExtensionMap::new();
    // ... init capability state ...

    let serialized: Vec<Vec<u8>> = args.iter()
        .map(rt::serialize_arg)
        .collect::<rt::Result<_>>()?;

    let mut instance = linked
        .instantiate(&self.engine, extensions, serialized, data_sections, initial_top)
        .await?;
    let exit_code = instance.run().await?;
    let results = instance.raw_results().to_vec();

    Ok(ExecutionResult { exit_code, results, fuel_consumed: None })
}

The RwLock allows concurrent reads (the common case — cache hits) while serializing writes (cache misses). Inline WAT programs are also cached after first compilation, so if the LLM generates the same code twice (which happens with simple tasks), the second execution skips compilation entirely.

The Inference Client

The agent talks to the LLM through a trait, not a concrete type:

// agent/src/inference.rs

pub trait InferenceClient: Send + Sync {
    fn complete<'a>(
        &'a self,
        messages: &'a [Message],
    ) -> Pin<Box<dyn Future<Output = Result<String>> + Send + 'a>>;

    fn model(&self) -> &str;
}

The concrete implementation wraps the Cerebras API (OpenAI-compatible):

pub struct CerebrasClient {
    http: reqwest::Client,
    api_key: String,
    base_url: String,
    model: String,
}

impl CerebrasClient {
    pub fn new(api_key: String) -> Self {
        Self {
            http: reqwest::Client::new(),
            api_key,
            base_url: "https://api.cerebras.ai/v1/chat/completions".to_string(),
            model: "gpt-oss-120b".to_string(),
        }
    }

    pub fn with_model(api_key: String, model: String) -> Self {
        Self { http: reqwest::Client::new(), api_key,
               base_url: DEFAULT_URL.to_string(), model }
    }

    pub fn with_config(api_key: String, model: String, base_url: String) -> Self {
        Self { http: reqwest::Client::new(), api_key, base_url, model }
    }
}

impl InferenceClient for CerebrasClient {
    fn complete<'a>(&'a self, messages: &'a [Message])
        -> Pin<Box<dyn Future<Output = Result<String>> + Send + 'a>>
    {
        Box::pin(self.do_complete(messages))
    }

    fn model(&self) -> &str { &self.model }
}

The do_complete method constructs the JSON body, sends the POST request, and extracts the content from the first choice:

async fn do_complete(&self, messages: &[Message]) -> Result<String> {
    let msgs: Vec<serde_json::Value> = messages.iter().map(|m| {
        serde_json::json!({ "role": m.role_str(), "content": &m.content })
    }).collect();

    let body = serde_json::json!({
        "model": &self.model,
        "messages": msgs,
        "max_tokens": 4096,
        "temperature": 0.3,
    });

    let resp = self.http.post(&self.base_url)
        .header("Authorization", format!("Bearer {}", self.api_key))
        .header("Content-Type", "application/json")
        .json(&body)
        .send().await?;

    if !resp.status().is_success() {
        let text = resp.text().await.unwrap_or_default();
        anyhow::bail!("API error {}: {}", resp.status(), text);
    }

    let data: CompletionResponse = resp.json().await?;
    Ok(data.choices.first()
        .map(|c| c.message.content.clone())
        .unwrap_or_default())
}

The trait abstraction means you can swap in any OpenAI-compatible provider, or a mock client for testing. The agent does not care — it calls self.client.complete(&messages) and gets back a string.

The WAT Library / Catalog

Pre-built programs live as .wat files with TOML frontmatter prefixed by ;;;:

;;; name = "http_get"
;;; description = "Fetch a URL via HTTP GET and return the response body"
;;; [[args]]
;;; name = "url"
;;; type_hint = "string"
;;; description = "The URL to fetch"

(argv 0 $url_ptr)
(call $http.get (local.get $url_ptr))
(local $body_err i32) (local $body_ptr i32)
(local.set $body_err)
(local.set $body_ptr)
(check $body_err)
(resv $body_ptr)
(i32.const 0)

The parser is straightforward:

// agent/src/catalog.rs

fn parse_wat_program(content: &str) -> Result<WatProgram> {
    let mut header_lines = Vec::new();
    let mut body_lines = Vec::new();
    let mut in_header = true;

    for line in content.lines() {
        if in_header && line.starts_with(";;;") {
            let stripped = line.strip_prefix(";;; ")
                .unwrap_or(line.strip_prefix(";;;").unwrap_or(""));
            header_lines.push(stripped);
        } else {
            in_header = false;
            body_lines.push(line);
        }
    }

    let header_text = header_lines.join("\n");

    #[derive(Deserialize)]
    struct Header {
        name: String,
        description: String,
        #[serde(default)]
        args: Vec<ArgSpec>,
    }

    let header: Header = toml::from_str(&header_text)
        .context("parsing front-matter TOML")?;

    let body = body_lines.join("\n").trim().to_string();

    Ok(WatProgram {
        name: header.name,
        description: header.description,
        args: header.args,
        body,
    })
}

Lines starting with ;;; are collected, stripped of the prefix, and parsed as TOML. Everything after the header is the WAT function body. The WatLibrary loads a directory of these files and indexes them by name:

pub struct WatLibrary {
    programs: HashMap<String, WatProgram>,
}

impl WatLibrary {
    pub fn load_dir(path: &Path) -> Result<Self> {
        let mut lib = Self::new();
        for entry in std::fs::read_dir(path)? {
            let entry = entry?;
            let file_path = entry.path();
            if file_path.extension().and_then(|e| e.to_str()) != Some("wat") {
                continue;
            }
            let content = std::fs::read_to_string(&file_path)?;
            let program = parse_wat_program(&content)?;
            lib.programs.insert(program.name.clone(), program);
        }
        Ok(lib)
    }

    pub fn catalog(&self) -> String {
        if self.programs.is_empty() {
            return "No library programs available.".to_string();
        }
        let mut out = String::new();
        let mut programs: Vec<&WatProgram> = self.programs.values().collect();
        programs.sort_by_key(|p| &p.name);
        for prog in programs {
            out.push_str(&format!("- **{}**: {}\n", prog.name, prog.description));
            if !prog.args.is_empty() {
                out.push_str("  Arguments:\n");
                for arg in &prog.args {
                    // ... format each argument with optional default ...
                }
            }
        }
        out
    }
}

The catalog() method generates the markdown that goes into the system prompt. When the LLM sees - **http_get**: Fetch a URL via HTTP GET with argument docs, it can emit ToolCall::Catalog("http_get", "https://example.com") instead of writing the WAT from scratch.

Arguments support defaults — if the catalog program defines default = "some value" on an argument, the agent fills it in when the LLM omits it. This allows catalog programs to have optional parameters.

Session Management

Each conversation gets a session that persists conversation history and KV state across ReAct steps:

// agent/src/session.rs

pub struct Session {
    pub id: String,
    pub history: Vec<Message>,
    pub kv_state: Arc<RwLock<KvState>>,
    pub created_at: Instant,
    pub last_active: Instant,
}

pub struct SessionStore {
    sessions: HashMap<String, Session>,
    ttl: Duration,
}

impl SessionStore {
    pub fn new(ttl: Duration) -> Self {
        Self { sessions: HashMap::new(), ttl }
    }

    pub fn get_or_create(&mut self, id: Option<&str>) -> &mut Session {
        self.cleanup_expired();
        let id = id.map(String::from).unwrap_or_else(generate_id);
        self.sessions.entry(id.clone()).or_insert_with(|| {
            let now = Instant::now();
            Session {
                id, history: Vec::new(),
                kv_state: Arc::new(RwLock::new(KvState::default())),
                created_at: now, last_active: now,
            }
        })
    }

    pub fn cleanup_expired(&mut self) {
        let now = Instant::now();
        self.sessions.retain(|_, s| now.duration_since(s.last_active) < self.ttl);
    }
}

The KV state is wrapped in Arc<RwLock<KvState>>. This is critical: when a WASM program calls kv.set("key", value), it modifies the state through the Arc. The next program in the same ReAct loop — or a subsequent user message in the same session — sees the updated value. No explicit state copying or merging is needed; the Arc provides zero-cost sharing.

Session IDs can be provided by the caller (for named sessions in the CLI or HTTP API) or auto-generated from the current timestamp. The TTL cleanup runs lazily — expired sessions are removed whenever get_or_create is called, not on a background timer.

Putting It All Together

Here is how a single user request flows through the system:

User: "Fetch https://httpbin.org/get and summarize the response"

1. Agent receives message, creates/retrieves session
2. Builds system prompt with all capabilities + catalog
3. Sends [system prompt, history, user message] to LLM

4. LLM responds:
   <reasoning>
   The user wants to fetch a URL and summarize it. I'll first fetch the URL.
   </reasoning>
   ToolCall::Catalog("http_get", "https://httpbin.org/get")

5. Agent parses → Catalog { name: "http_get", args: ["https://httpbin.org/get"] }
6. Looks up catalog program, fills in body
7. Preprocesses → assembles WAT → checks cache → compiles if needed
8. Executes WASM: http.get is called, response body returned
9. Formats observation: "[Observation (step 1)] Execution result:\n{...json...}"
10. Appends assistant message + observation to conversation

11. LLM receives observation, responds:
    <reasoning>
    I have the HTTP response. Now I'll use ai.assist to summarize it.
    </reasoning>
    ToolCall::Wat(```wat
    (argv 0 $body_ptr)
    (call $ai.assist (local.get $body_ptr) "Summarize in one sentence." (i32.const 0))
    (local $sum_err i32) (local $sum_ptr i32)
    (local.set $sum_err) (local.set $sum_ptr)
    (check $sum_err)
    (resv $sum_ptr)
    (i32.const 0)
    ```, "{...the JSON response...}")

12. Agent compiles + executes. ai.assist makes async LLM call inside WASM.
13. Observation: "[Observation (step 2)] Execution result:\nThe httpbin..."

14. LLM responds:
    ToolCall::Response("""
    The httpbin.org /get endpoint returns a JSON object containing the request
    headers, origin IP, and URL that was accessed.
    """)

15. Agent returns final text to user. Session history updated.

Three steps, three LLM calls, two WASM executions. The agent decomposed the task, used a catalog program for the first step, wrote custom WAT for the second, and synthesized a response from both observations.

The Agent Struct

The full agent holds all shared infrastructure:

pub struct Agent {
    engine: Engine,
    linker: Linker,
    pre_cache: RwLock<HashMap<u64, PreCacheEntry>>,
    client: Arc<dyn InferenceClient>,
    catalog: WatLibrary,
    sessions: std::sync::Mutex<SessionStore>,
    guard: Option<ContentGuard>,
    config: AgentConfig,
}

The engine and linker are created once at startup with all capabilities registered. The pre_cache starts pre-warmed with catalog programs. The client is behind Arc<dyn InferenceClient> so it can be shared across concurrent requests. The sessions use a std::sync::Mutex (not tokio) because session operations are fast pointer manipulations, not async I/O. The optional guard provides content safety filtering.

The AgentConfig controls all tunable parameters:

pub struct AgentConfig {
    pub api_key: String,
    pub model: Option<String>,
    pub engine_config: EngineConfig,
    pub http_allowed_domains: Option<Vec<String>>,
    pub http_timeout: Duration,
    pub fs_root: Option<PathBuf>,
    pub session_ttl: Duration,
    pub max_retries: usize,    // compilation retry limit (default: 2)
    pub max_steps: usize,      // ReAct loop limit (default: 10)
    pub catalog_path: Option<PathBuf>,
    pub guard_enabled: bool,
}

Streaming

The process_stream variant sends events through a tokio::sync::mpsc::Sender<AgentEvent> instead of collecting results:

#[derive(Debug, Clone, Serialize)]
#[serde(tag = "type")]
pub enum AgentEvent {
    Thinking { step: usize, content: String },
    ToolStart { step: usize, label: String, source: Option<String>,
                description: Option<String> },
    ToolResult { step: usize, source: String, success: bool, output: String },
    Retry { step: usize, attempt: usize, error: String },
    Response { text: String, session_id: String, steps: usize },
    Error { message: String },
}

The HTTP server serializes these events as SSE (Server-Sent Events), giving the frontend real-time visibility into the agent's reasoning and execution. The frontend can show which step the agent is on, what code it generated, whether execution succeeded, and what it observed — all before the final response arrives.

What We Have

After this article, the system is complete as a pipeline: user message in, text response out, with arbitrary WASM execution in between. The agent can:

Write WAT programs on the fly to accomplish tasks
Reference pre-built catalog programs for common operations
Retry compilation failures by showing the LLM its mistakes
Execute multiple steps with observations feeding back into reasoning
Cache compiled modules for performance
Maintain per-session conversation history and KV state
Stream real-time events for UI consumption

The full agent.rs is approximately 1500 lines — the process loop, parsing, execution, retry logic, streaming, session management, and tests. Reading it end to end is a worthwhile exercise, but the core loop described here is the beating heart.

What remains is the user-facing shell: the CLI entry point, the REPL, the HTTP server, and the frontend that ties it all together.

The Brain: Building a ReAct Agent That Writes Its Own Programs