Machine Learning One

The threat model for LLM-generated code and deriving sandbox requirements from first principles.

You have an LLM. It can reason, plan, and generate code. You want it to do things — fetch web pages, query databases, write files, call APIs. The obvious approach: have the LLM generate Python, then exec() it.

This is a terrible idea. Let's understand why, and derive from first principles what we actually need.

The Seductive Trap of `exec()`

Here is the simplest possible "AI agent" in Python:

import openai

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Write Python to fetch https://example.com"}]
)

exec(response.choices[0].message.content)  # What could go wrong?

Everything. Everything could go wrong.

Problem 1: No Sandbox

The LLM's generated code runs in your process. It has access to every file on your machine, every environment variable, every network interface. A prompt injection — or just a hallucination — and you're running os.system("rm -rf /").

You might think "I'll just restrict the imports." But Python's import system is porous. __builtins__ gives access to __import__. Metaclasses can override attribute access. ctypes can call arbitrary C functions. The attack surface is enormous.

Problem 2: No Resource Limits

How long should the LLM's code run? How much memory can it consume? In Python, there is no reliable mechanism to answer either question. You can set signal.alarm() for a crude timeout on Unix, but that doesn't work across threads. Memory limits require OS-level cgroups or containers. A simple infinite loop — while True: pass — hangs your entire agent.

Problem 3: No Determinism

Python's behavior depends on the installed packages, the OS, the Python version, the system locale, the current working directory, and approximately 200 environment variables. The same generated code might work on your machine and fail in production. Reproducibility is accidental.

Problem 4: No Clean Boundary

When you exec() Python, the generated code and your orchestrator share a runtime. Global state leaks in both directions. The generated code can monkey-patch your modules. Your modules can be corrupted by the generated code's side effects. There is no membrane between "the brain" (your orchestrator) and "the hands" (the generated code).

Problem 5: The LLM is Untrusted Code

In a traditional system, you trust the programmer and distrust the user input. In an agent system, the LLM is both the programmer and the user input. It writes code based on user requests, which may include prompt injections, adversarial inputs, or simply ambiguous instructions that the LLM misinterprets.

You would never exec() code from an untrusted HTTP request. The LLM deserves the same suspicion.

What JavaScript Gets Right (and Wrong)

Node.js has vm.createContext() and the vm module. Browsers have iframes with sandboxing. These are closer to what we need — they provide some isolation. But:

The V8 isolate doesn't support fuel metering. A tight loop still hangs.
Memory limits are coarse (V8 heap limit, not per-execution).
The capability model is all-or-nothing: you either have fetch or you don't.
The instruction set is complex. JavaScript has prototypes, closures, async/await, generators, proxies, symbols, weak references. An LLM generating JavaScript must navigate all of this.
Startup cost for a V8 isolate is measured in milliseconds, which matters when you're executing dozens of small programs per agent turn.

Deriving the Ideal VM from First Principles

Let's forget about existing technology for a moment and ask: if we were designing a virtual machine specifically for AI agent execution, what properties would it have?

Property 1: Memory Isolation

The guest program gets a flat, contiguous byte array. That's it. No pointers to host memory. No shared heap. No file descriptors inherited from the parent process. The guest can read and write within its own array, and nothing else.

This is the most important property. It makes the sandbox structural, not policy-based. You don't need to check every operation against a blocklist. The guest simply cannot address anything outside its own memory.

Property 2: Capability-Based Security

The guest has zero abilities by default. Every interaction with the outside world — HTTP requests, file access, AI inference — comes through explicitly imported host functions. The host decides which functions to provide. This is the principle of least privilege made architectural.

Contrast this with Python, where import os is always available unless you go to extraordinary lengths to prevent it. In our ideal VM, there is no import — there are only the functions the host has chosen to expose.

Property 3: Fuel Metering

Every instruction costs fuel. When fuel runs out, execution stops. This is a hard guarantee, not a cooperative timeout. The guest cannot disable it, ignore it, or work around it. An infinite loop consumes fuel and terminates.

This is different from wall-clock timeouts. A timeout tells you "something took too long." Fuel tells you "this program executed N instructions." Fuel is deterministic — the same program with the same input always consumes the same fuel, regardless of system load.

Property 4: Deterministic Execution

Given the same program, the same inputs, and the same host function implementations, execution produces the same result. No thread scheduling. No garbage collection pauses. No floating-point nondeterminism. The VM is a pure function from (program, inputs) to (outputs, fuel consumed).

Property 5: Simple Instruction Set

The LLM is the programmer. It needs an instruction set it can reliably generate. Every additional feature — closures, exceptions, generics, macros — is another thing the LLM can get wrong.

The ideal instruction set is minimal: integer arithmetic, conditional branches, loops, function calls, and linear memory access. That's enough to be Turing-complete, and simple enough that an LLM can generate correct programs on the first try.

Property 6: Fast Compilation

Agent programs are ephemeral. The LLM generates a program, the runtime executes it, the result is observed, and the next program is generated. Compilation must be fast — microseconds to low milliseconds — because it happens on every agent step.

Property 7: Portable Binary Format

The same program should run on any host: x86, ARM, macOS, Linux, inside a container, on a serverless platform. No recompilation, no platform-specific code paths.

The ReAct Loop: Why This Matters

To see why these properties matter in practice, consider the ReAct pattern — the dominant architecture for AI agents:

User: "Find the latest paper on attention mechanisms and summarize it."

Agent:
  Think: "I need to search for recent papers. Let me query ArXiv."
  Act:   [Execute program: HTTP GET to ArXiv API]
  Observe: "Got search results with 10 papers..."

  Think: "The most recent paper is '...' Let me fetch the abstract."
  Act:   [Execute program: HTTP GET to paper URL]
  Observe: "Got the abstract: ..."

  Think: "Now I can summarize this."
  Act:   [Execute program: Call AI to summarize the abstract]
  Observe: "Summary: ..."

  Think: "I have all the information. Let me respond."
  Response: "The latest paper on attention mechanisms is..."

Each Act step is a program execution. In a single conversation, the agent might execute 5-10 programs. Each program:

Must be sandboxed (it's fetching URLs from user-influenced search results)
Must be resource-limited (a malformed response could trigger an infinite parse loop)
Must have only the capabilities it needs (the summarization step shouldn't have file write access)
Must compile fast (the user is waiting)
Must be simple enough for the LLM to generate correctly

This is the execution model our VM must support: many small, sandboxed, capability-controlled programs, compiled and executed in rapid succession, orchestrated by an LLM that is generating the programs in real time.

The Trust Boundary

Here is the architecture we are building toward:

┌──────────────────────────────────────────────┐
│  Orchestrator (Rust)               TRUSTED   │
│  ┌─────────────┐  ┌──────────────────────┐   │
│  │ LLM Client  │  │  WASM Runtime        │   │
│  │ (Cerebras)  │  │  ┌────────────────┐  │   │
│  │             │  │  │  Guest Program │  │   │
│  │  "Generate  │──│─▶│  (WAT/WASM)    │  │   │
│  │   program"  │  │  │   UNTRUSTED    │  │   │
│  │             │  │  └───────┬────────┘  │   │
│  └─────────────┘  │          │           │   │
│        ▲          │    Host Functions    │   │
│        │          │    (Capabilities)    │   │
│        │          └──────────────────────┘   │
│        │                    │                │
│   [Observation]        [Results]             │
│        │                    │                │
│        └────────────────────┘                │
└──────────────────────────────────────────────┘

The trust boundary is the WASM sandbox. Everything outside it — the LLM client, the host functions, the orchestrator — is trusted Rust code under our control. Everything inside it — the guest program generated by the LLM — is untrusted.

The host functions are the controlled perforations in the sandbox. Each one is a deliberate, auditable decision to grant the guest a specific capability. http.get lets the guest fetch URLs (from an allowlist). kv.set lets the guest store a value. fs.read lets the guest read files (from a jailed directory). The guest cannot exceed these grants.

What Comes Next

We have derived seven properties of an ideal agent VM. In the next article, we will discover that a technology designed for a completely different purpose — running code safely in web browsers — happens to satisfy nearly all of them. That technology is WebAssembly, and the subset of it we need is remarkably small.

Why AI Agents Need Their Own Virtual Machine