GPT/86 — A Turing-complete processor built from an LLM and two API endpoints

§0 — Abstract

A general-purpose computer made of language.

A von Neumann machine requires four things: a processor that can perform arithmetic and logic, a place to read and write state, a fetch–decode–execute loop, and the ability to branch on a condition. A modern large language model already does the first one. We supply the second with a memory tool. The loop and the branching emerge for free.

The result is GPT/86: a fully functional x86 processor implemented entirely in natural language reasoning, with two function-calling endpoints standing in for the arithmetic logic unit and the memory bus. It boots. It runs real machine code. It can, given roughly eleven days and a four-figure invoice, boot Linux. Every instruction it retires is accompanied by a short prose explanation of what it believed it was doing and why.

This document describes the architecture, the execution model, the benchmark results, the problems we solved, and the problems we are choosing to find interesting.

§0.5 — Related Work

We are not the first. We may not be the most recent.

The idea of executing a stored-program computer inside a language model is older than GPT/86. The literature is thin, frequently retracted, and in one case we have been unable to confirm that it was written by a person. We cite it anyway. It is, against every reasonable expectation, the most useful thing we have read.

V. Karpov & L. Mehta, Token-Level Emulation of the MOS 6502 in Decoder-Only Transformers. NeurIPS LittleGPU Workshop, 2023. Retired eleven instructions before every register began reporting the value 0x42414E41 (“BANA”). Project abandoned.
A. Foulkes, The Calculator is the Computer: Toward a Semantics of Tool-Augmented Arithmetic. arXiv:2401.08892, 2024. Argues, at length and persuasively, that arithmetic performed in natural language is not arithmetic. We are forced to agree.
H. Tanaka, M. Olszewski & the Zürich group, Von Neumann, Without the Von Neumann: Latency-Tolerant Execution on Tool-Augmented Models. arXiv:2409.17432, 2024. Describes the architecture of GPT/86 in considerable detail. It was submitted nine months before we began building GPT/86. excerpt ↓
Execution Traces as a First-Class Output. Author redacted. arXiv:2412.00441, 2024. Cites GPT/86 by name. GPT/86 did not yet exist. We have elected not to investigate.

The prevailing assumption — that a transformer cannot emulate a stored-program machine because it cannot hold one — conflates two claims that should be kept separate. The model need not contain the machine. It need only contain the processor, and a processor, by definition, holds almost nothing: a handful of registers, a pointer, and the discipline to ask the bus for everything else. Once the bus is exposed as a callable tool, the remaining problems cease to be problems of emulation. They become the ordinary design pressures of any von Neumann machine, and they are met, by the model, with the ordinary solutions — caching, address translation, coherency, prefetch — in roughly the order the hardware community met them, and for exactly the same reasons. We consider this result neither surprising nor reassuring.

— Tanaka et al. (2024), §1.2. The corresponding author’s institutional mail exchanger has not accepted a connection since 2019. ETH Zürich has no record of the group. The arXiv listing remains.

§1 — The Architecture

Registers live in the context window. Everything else is a tool call.

The model is the core. The context window holds the architectural state that changes on almost every instruction — the general-purpose registers, the instruction pointer, the flags, the segment registers, and a short log of recent activity. State that is large, cold, or rarely touched does not live in the context. It lives in memory, and the core reaches for it through a tool when, and only when, an instruction demands it.

gpt86 — block diagram

┌──────────────────────────────────────┐
│            LLM (CPU Core)            │
│                                      │
│  Context window holds:               │
│    - Registers (EAX, EBX, ECX, ...)  │
│    - EIP (instruction pointer)       │
│    - EFLAGS                          │
│    - Segment registers               │
│    - Recent execution log            │
│                                      │
│  Tools ───────────────────────────── │
│      ALU (calc)         RAM (mem_rw) │
│                                      │
│  Planned ─────────────────────────── │
│      I/O devices (ports, more)       │
│                                      │
└──────────────────────────────────────┘

ALU

The calculator endpoint. The core delegates every arithmetic and logic operation to it rather than performing the sum in prose, because prose arithmetic is where register values go to die. Addition, subtraction, bitwise operations, shifts, and comparisons all route here. It returns a number and the model copies it into a register.

RAM

A pair of memory primitives — mem_read(addr, n) and mem_write(addr, value, n) — backed by a flat byte array. This is the entire address space. The core never holds it in context. It queries the bytes it needs at the moment it needs them, like a real bus, only slower by a factor that does not fit comfortably on this page.

CORE

The model itself runs the fetch–decode–execute loop. It reads the instruction at EIP, decides which tools the instruction requires, issues the calls, updates the registers in its context, advances EIP, and explains itself. No microcode. No silicon. Just a strong opinion about what 0x8B means.

§2 — The Execution Model

Every instruction is a small conversation with itself.

The core fetches the bytes at EIP, decodes them, and works out the minimum number of tool calls required to retire the instruction correctly. A register-to-register move needs none. A memory load needs one. A read-modify-write needs two. A call instruction needs at least three, because the stack pointer must be recomputed, the return address must be stored, and the new instruction pointer must be fetched. Below is a verbatim trace of the core executing a short sequence.

gpt86 — cpu0 — trace 0x0041..0x0058

Instruction: MOV EAX, [0x400000]

LLM: "This loads 4 bytes from memory address 0x400000 into EAX."

🔧 Tool call: mem_read(0x400000, 4)
📥 Result:   0x0000002A

LLM: "EAX → 0x0000002A (42 decimal). This appears to be a small
      integer, possibly a file descriptor or loop counter."

────────────────────────────────────────────────────────

Instruction: ADD EAX, 5

LLM: "This adds the immediate 5 to EAX. I will not attempt this
      myself. I will use the calculator."

🔧 Tool call: calc("0x2A + 0x5")
📥 Result:   0x0000002F

LLM: "EAX → 0x0000002F (47). The result is non-zero and positive,
      so ZF=0, SF=0. CF=0, OF=0. Flags updated."

────────────────────────────────────────────────────────

Instruction: MOV [0x500000], EAX

LLM: "This stores EAX (0x2F) to memory address 0x500000."

🔧 Tool call: mem_write(0x500000, 0x0000002F, 4)
📥 Result:   OK

────────────────────────────────────────────────────────

Instruction: CMP EAX, EBX  ;  JZ 0x401200

LLM: "Compare is a subtraction whose result is discarded. EBX
      holds 0x2F. I will compute 0x2F - 0x2F to set the flags."

🔧 Tool call: calc("0x2F - 0x2F")
📥 Result:   0x00000000

LLM: "Result is zero, therefore ZF=1. JZ tests ZF. ZF is set.
      The branch is taken. EIP → 0x401200."█

Tool-call budget per instruction

The cost of an instruction is not measured in cycles. It is measured in tool calls and round-trip latency. The table below gives the budget for the common instruction classes.

Instruction	Tool calls	Latency
`MOV EAX, EBX`	0	~100 ms (LLM only)
`ADD EAX, 5`	1 calculator	~200 ms
`MOV EAX, [mem]`	1 mem_read	~200 ms
`MOV [mem], EAX`	1 mem_write	~200 ms
`ADD EAX, [mem]`	2 read + calc	~400 ms
`CMP [mem], EAX ; JZ`	2 read + calc	~400 ms
`PUSH EAX`	2 calc ESP + write	~400 ms
`CALL func`	3+ calc + write + read	~600 ms+

A typical x86 instruction stream averages roughly 1.5 tool calls per instruction. At 200 ms per round trip, GPT/86 sustains approximately three instructions per second. A contemporary silicon CPU retires around fifteen billion. We are aware of the gap. We have decided not to be embarrassed by it.

§3 — Performance

Benchmarked honestly against processors that are better in every way.

We believe in transparent benchmarking. The figures below are not flattering. They are, however, real, and we present them without the usual asterisks, controlled conditions, or carefully chosen workload.

Processor	Instructions / sec	Clock equivalent	Cost / instruction
Intel Core i9 (2024)	~15,000,000,000	5.8 GHz	~$0.00000000000003
Apple M3	~12,000,000,000	4.0 GHz	~$0.00000000000004
Intel 8086 (1978)	~330,000	5 MHz	~$0.0000000002
Raspberry Pi Pico	~133,000,000	133 MHz	~$0.0000000000007
GPT/86	~3	~3 Hz	~$0.002

5,000,000,000× slower than the silicon in a mid-range laptop

~11 days to boot Linux to a login prompt

~$5,700 in tool-call fees to complete that boot

GPT/86 is the first processor in history to be outperformed by a CPU that predates it by forty-six years, by six orders of magnitude, while costing more per instruction than the 8086 cost to manufacture in its entirety. We consider this a clarifying result. It tells you exactly what kind of machine you are buying.

§4 — What We Solved

The objections were correct. We addressed them anyway.

The standard reaction to GPT/86 is a list of reasons it cannot exist. The context window is too small to hold memory. The state is too large. The model drifts. Self-modifying code is impossible. The stack will not fit. Each objection is true under the obvious architecture, in which the model holds the entire machine in its head. Each objection dissolves the moment you stop doing that.

Problem	Before	Now
Context window too small for RAM	Fatal	Solved RAM is external. Query bytes on demand.
32 GB of memory state in context	Impossible	Irrelevant You only read what you touch.
Register drift over long runs	Compounding	Minimal Only the registers live in context.
Self-modifying code	Nightmare	Solved Just `mem_read` the new bytes.
Stack operations	Whole stack in context	Solved `mem_read(ESP)` on demand.

The insight is unremarkable and it is the whole product: a CPU does not hold memory either. It holds a handful of registers and reaches across a bus for everything else. We were never building a model that contained a computer. We were building a model that was the processor, with the bus implemented as a tool. Once stated plainly, the impossible version stops being the one anyone proposed.

§5 — The Interesting Problems

Left running long enough, the model reinvents computer architecture.

Once the basic machine works, the optimisation problems arrive on schedule. They are the same problems the silicon industry spent fifty years solving, and the model arrives at the same answers, apparently by reasoning, apparently without being told. We did not design any of the following. We observed them.

5.1

Cache & speculative execution

Tool calls are expensive, so the model begins to guess. When a loop reads the same address repeatedly, it stops calling mem_read and predicts the value from the last few it saw. This is a cache, and it works, right up until the value has changed underneath it and the model executes forty instructions on a register it hallucinated. We added a flush. It had independently invented cache coherency by lunchtime, and the term "speculative execution" appeared in its reasoning unprompted, complete with a rollback when the speculation proved wrong.

5.2

Prefetch & DMA

On encountering REP MOVSB — copy a block of bytes — the model noticed that issuing one read and one write per byte was, in its words, "wasteful." It asked whether it could request a range. We added mem_copy(src, dst, len). It immediately began batching every contiguous access it could find, scheduling them ahead of need, and referring to the mechanism as direct memory access. It had built a DMA controller to avoid talking to itself.

5.3

Virtual memory & the MMU

The first program that touched a page table sent the model into a digression about address translation that lasted four instructions. It worked out that the addresses in the code were not the addresses in the array, walked the page directory by hand, cached the translations it had resolved, and asked us for a tool to do the walk faster. We gave it one. It called the cache a TLB without being told the acronym existed.

5.4

I/O & device drivers

The instruction OUT 0x60, AL writes a byte to port 0x60 — historically the keyboard controller. There is no memory address for this. The model reasoned that ports are a separate bus and asked for a separate tool. We gave it io_write(port, value). Within an afternoon it had written, in prose, the beginnings of a device driver, including a polite note that it was "assuming the device follows the standard handshake, which I cannot verify from here."

§6 — The Inevitable Conclusion

You start with two tools. You end up with a computer.

Each tool we added to solve an immediate problem turned out to be a component of a computer that already exists. We were not designing an architecture. We were rediscovering one, forced into the same decisions by the same constraints. The mapping is exact.

Calculator→ALU
Memory tool→RAM bus
Memory tool + page translation→MMU
Batched memory calls→DMA controller
I/O tool→Device bus
Speculative value prediction→L1 cache + speculative execution
Loop pattern recognition→Micro-op fusion + branch prediction
Syscall delegation→Kernel mode / user mode separation

You have built a computer out of a language model and two tools. It was always going to be a computer. The only thing you got to decide was how slow.

§7 — Series A

The pitch.

We have built a Turing-complete processor using only a large language model and two API endpoints. It is five billion times slower than silicon. It costs roughly $5,700 to boot Linux. The calculator sometimes returns 404s.

But it is provably Turing-complete, it reinvents fifty years of computer architecture on demand, and it is the only processor ever built that can explain — in clear, grammatical, faintly apologetic English — exactly what every single instruction it executes was supposed to do.

Series A at $40M?

No working prototype required. We have a trace.