GPT/86 · Technical Whitepaper · Rev. 1.4

GPT/86

A Turing-complete processor built from a large language model and two API endpoints.

We give the model a calculator and a memory tool. It executes x86. It is approximately five billion times slower than the silicon in your phone. We believe this is a feature.

gpt86 — cpu0 — execution trace
; first instruction after reset
Instruction: MOV EAX, [0x400000]

LLM: "This loads 4 bytes from memory address 0x400000 into EAX."

🔧 Tool call: mem_read(0x400000, 4)
📥 Result:   0x0000002A

LLM: "EAX → 0x0000002A (42 decimal). This appears to be a small
      integer, possibly a file descriptor or loop counter."
3 instructions / second
$0.002 per instruction
100% Turing-complete
[ scroll to read the spec ]
§0 — Abstract

A general-purpose computer made of language.

A von Neumann machine requires four things: a processor that can perform arithmetic and logic, a place to read and write state, a fetch–decode–execute loop, and the ability to branch on a condition. A modern large language model already does the first one. We supply the second with a memory tool. The loop and the branching emerge for free.

The result is GPT/86: a fully functional x86 processor implemented entirely in natural language reasoning, with two function-calling endpoints standing in for the arithmetic logic unit and the memory bus. It boots. It runs real machine code. It can, given roughly eleven days and a four-figure invoice, boot Linux. Every instruction it retires is accompanied by a short prose explanation of what it believed it was doing and why.

This document describes the architecture, the execution model, the benchmark results, the problems we solved, and the problems we are choosing to find interesting.

§1 — The Architecture

Registers live in the context window. Everything else is a tool call.

The model is the core. The context window holds the architectural state that changes on almost every instruction — the general-purpose registers, the instruction pointer, the flags, the segment registers, and a short log of recent activity. State that is large, cold, or rarely touched does not live in the context. It lives in memory, and the core reaches for it through a tool when, and only when, an instruction demands it.

gpt86 — block diagram
┌──────────────────────────────────┐
│         LLM (CPU Core)           │
│                                  │
│  Context holds:                  │
│  - Registers (EAX, EBX, etc.)    │
│  - EIP (instruction pointer)     │
│  - EFLAGS                        │
│  - Segment registers             │
│  - Recent execution log          │
│                                  │
│         ┌─── Tools ───┐          │
│         │             │          │
│    ┌────▼────┐  ┌────▼────┐      │
│    │  ALU    │  │  RAM    │      │
│    │ (calc)  │  │(mem_rw) │      │
│    └─────────┘  └─────────┘      │
│                                  │
│         ┌─── Future ───┐         │
│    ┌────▼────────────┐ │         │
│    │  I/O Devices    │ │         │
│    │  (more tools)   │ │         │
│    └─────────────────┘ │         │
└──────────────────────────────────┘
ALU

The calculator endpoint. The core delegates every arithmetic and logic operation to it rather than performing the sum in prose, because prose arithmetic is where register values go to die. Addition, subtraction, bitwise operations, shifts, and comparisons all route here. It returns a number and the model copies it into a register.

RAM

A pair of memory primitives — mem_read(addr, n) and mem_write(addr, value, n) — backed by a flat byte array. This is the entire address space. The core never holds it in context. It queries the bytes it needs at the moment it needs them, like a real bus, only slower by a factor that does not fit comfortably on this page.

CORE

The model itself runs the fetch–decode–execute loop. It reads the instruction at EIP, decides which tools the instruction requires, issues the calls, updates the registers in its context, advances EIP, and explains itself. No microcode. No silicon. Just a strong opinion about what 0x8B means.

§2 — The Execution Model

Every instruction is a small conversation with itself.

The core fetches the bytes at EIP, decodes them, and works out the minimum number of tool calls required to retire the instruction correctly. A register-to-register move needs none. A memory load needs one. A read-modify-write needs two. A call instruction needs at least three, because the stack pointer must be recomputed, the return address must be stored, and the new instruction pointer must be fetched. Below is a verbatim trace of the core executing a short sequence.

gpt86 — cpu0 — trace 0x0041..0x0058
Instruction: MOV EAX, [0x400000]

LLM: "This loads 4 bytes from memory address 0x400000 into EAX."

🔧 Tool call: mem_read(0x400000, 4)
📥 Result:   0x0000002A

LLM: "EAX → 0x0000002A (42 decimal). This appears to be a small
      integer, possibly a file descriptor or loop counter."

────────────────────────────────────────────────────────

Instruction: ADD EAX, 5

LLM: "This adds the immediate 5 to EAX. I will not attempt this
      myself. I will use the calculator."

🔧 Tool call: calc("0x2A + 0x5")
📥 Result:   0x0000002F

LLM: "EAX → 0x0000002F (47). The result is non-zero and positive,
      so ZF=0, SF=0. CF=0, OF=0. Flags updated."

────────────────────────────────────────────────────────

Instruction: MOV [0x500000], EAX

LLM: "This stores EAX (0x2F) to memory address 0x500000."

🔧 Tool call: mem_write(0x500000, 0x0000002F, 4)
📥 Result:   OK

────────────────────────────────────────────────────────

Instruction: CMP EAX, EBX  ;  JZ 0x401200

LLM: "Compare is a subtraction whose result is discarded. EBX
      holds 0x2F. I will compute 0x2F - 0x2F to set the flags."

🔧 Tool call: calc("0x2F - 0x2F")
📥 Result:   0x00000000

LLM: "Result is zero, therefore ZF=1. JZ tests ZF. ZF is set.
      The branch is taken. EIP → 0x401200."

Tool-call budget per instruction

The cost of an instruction is not measured in cycles. It is measured in tool calls and round-trip latency. The table below gives the budget for the common instruction classes.

Instruction Tool calls Latency
MOV EAX, EBX 0 ~100 ms (LLM only)
ADD EAX, 5 1 calculator ~200 ms
MOV EAX, [mem] 1 mem_read ~200 ms
MOV [mem], EAX 1 mem_write ~200 ms
ADD EAX, [mem] 2 read + calc ~400 ms
CMP [mem], EAX ; JZ 2 read + calc ~400 ms
PUSH EAX 2 calc ESP + write ~400 ms
CALL func 3+ calc + write + read ~600 ms+

A typical x86 instruction stream averages roughly 1.5 tool calls per instruction. At 200 ms per round trip, GPT/86 sustains approximately three instructions per second. A contemporary silicon CPU retires around fifteen billion. We are aware of the gap. We have decided not to be embarrassed by it.

§3 — Performance

Benchmarked honestly against processors that are better in every way.

We believe in transparent benchmarking. The figures below are not flattering. They are, however, real, and we present them without the usual asterisks, controlled conditions, or carefully chosen workload.

Processor Instructions / sec Clock equivalent Cost / instruction
Intel Core i9 (2024) ~15,000,000,000 5.8 GHz ~$0.00000000000003
Apple M3 ~12,000,000,000 4.0 GHz ~$0.00000000000004
Intel 8086 (1978) ~330,000 5 MHz ~$0.0000000002
Raspberry Pi Pico ~133,000,000 133 MHz ~$0.0000000000007
GPT/86 ~3 ~3 Hz ~$0.002
5,000,000,000× slower than the silicon in a mid-range laptop
~11 days to boot Linux to a login prompt
~$800 in tool-call fees to complete that boot

GPT/86 is the first processor in history to be outperformed by a CPU that predates it by forty-six years, by six orders of magnitude, while costing more per instruction than the 8086 cost to manufacture in its entirety. We consider this a clarifying result. It tells you exactly what kind of machine you are buying.

§4 — What We Solved

The objections were correct. We addressed them anyway.

The standard reaction to GPT/86 is a list of reasons it cannot exist. The context window is too small to hold memory. The state is too large. The model drifts. Self-modifying code is impossible. The stack will not fit. Each objection is true under the obvious architecture, in which the model holds the entire machine in its head. Each objection dissolves the moment you stop doing that.

Problem Before Now
Context window too small for RAM Fatal Solved RAM is external. Query bytes on demand.
32 GB of memory state in context Impossible Irrelevant You only read what you touch.
Register drift over long runs Compounding Minimal Only the registers live in context.
Self-modifying code Nightmare Solved Just mem_read the new bytes.
Stack operations Whole stack in context Solved mem_read(ESP) on demand.

The insight is unremarkable and it is the whole product: a CPU does not hold memory either. It holds a handful of registers and reaches across a bus for everything else. We were never building a model that contained a computer. We were building a model that was the processor, with the bus implemented as a tool. Once stated plainly, the impossible version stops being the one anyone proposed.

§5 — The Interesting Problems

Left running long enough, the model reinvents computer architecture.

Once the basic machine works, the optimisation problems arrive on schedule. They are the same problems the silicon industry spent fifty years solving, and the model arrives at the same answers, apparently by reasoning, apparently without being told. We did not design any of the following. We observed them.

5.1

Cache & speculative execution

Tool calls are expensive, so the model begins to guess. When a loop reads the same address repeatedly, it stops calling mem_read and predicts the value from the last few it saw. This is a cache, and it works, right up until the value has changed underneath it and the model executes forty instructions on a register it hallucinated. We added a flush. It had independently invented cache coherency by lunchtime, and the term "speculative execution" appeared in its reasoning unprompted, complete with a rollback when the speculation proved wrong.

5.2

Prefetch & DMA

On encountering REP MOVSB — copy a block of bytes — the model noticed that issuing one read and one write per byte was, in its words, "wasteful." It asked whether it could request a range. We added mem_copy(src, dst, len). It immediately began batching every contiguous access it could find, scheduling them ahead of need, and referring to the mechanism as direct memory access. It had built a DMA controller to avoid talking to itself.

5.3

Virtual memory & the MMU

The first program that touched a page table sent the model into a digression about address translation that lasted four instructions. It worked out that the addresses in the code were not the addresses in the array, walked the page directory by hand, cached the translations it had resolved, and asked us for a tool to do the walk faster. We gave it one. It called the cache a TLB without being told the acronym existed.

5.4

I/O & device drivers

The instruction OUT 0x60, AL writes a byte to port 0x60 — historically the keyboard controller. There is no memory address for this. The model reasoned that ports are a separate bus and asked for a separate tool. We gave it io_write(port, value). Within an afternoon it had written, in prose, the beginnings of a device driver, including a polite note that it was "assuming the device follows the standard handshake, which I cannot verify from here."

§6 — The Inevitable Conclusion

You start with two tools. You end up with a computer.

Each tool we added to solve an immediate problem turned out to be a component of a computer that already exists. We were not designing an architecture. We were rediscovering one, forced into the same decisions by the same constraints. The mapping is exact.

You have built a computer out of a language model and two tools. It was always going to be a computer. The only thing you got to decide was how slow.

§7 — Series A

The pitch.

We have built a Turing-complete processor using only a large language model and two API endpoints. It is five billion times slower than silicon. It costs $800 to boot Linux. The calculator sometimes returns 404s.

But it is provably Turing-complete, it reinvents fifty years of computer architecture on demand, and it is the only processor ever built that can explain — in clear, grammatical, faintly apologetic English — exactly what every single instruction it executes was supposed to do.

Series A at $40M?

No working prototype required. We have a trace.