github.com/octoflow-lang/octoflow
The idea: the GPU is the computer, the CPU is the BIOS.
You boot a VM, program a dispatch chain of kernel instances, submit once with vkQueueSubmit, and everything — layer execution, inter-layer communication, self-regulation, compression, database queries — happens on the GPU without CPU round-trips. The CPU just provides I/O.
let vm = vm_boot()
let prog = vm_program(vm, kernels, 4)
vm_write_register(vm, 0, 0, input)
vm_execute(prog)
let result = vm_read_register(vm, 3, 30)
4 VM instances, one submit, no CPU involvement between stages.The memory model is 5 SSBOs: Registers (per-VM working memory), Metrics (regulator signals), Globals (shared mutable — KV cache, DB tables), Control (indirect dispatch params), Heap (immutable bulk data — quantized weights).
What makes it interesting:
- Homeostasis regulator: each VM instance has a kernel that monitors activation norms, memory pressure, throughput. The GPU self-regulates without asking the CPU.
- GPU self-programming: a kernel writes workgroup counts to the Control buffer, the next vkCmdDispatchIndirect reads them. The GPU decides its own workload.
- Compression as computation: Q4_K dequantization, delta encoding, dictionary lookup — these are just kernels in the dispatch chain, not a special subsystem. Adding a new codec = writing an emitter. No Rust changes.
- CPU polling: Metrics and Control are HOST_VISIBLE. CPU can poll GPU state and activate dormant VMs without rebuilding the command buffer. The GPU broadcasts needs, the CPU fulfills them.
The VM is workload-agnostic. Same architecture handles LLM inference, database queries, physics sims, graph neural networks, DSP pipelines, and game AI. We've validated all six. The dispatch chain is the universal primitive.
What's new in v1.0.0 beyond GPU VM: - 247 stdlib modules (up from 51) - Native media codecs (PNG, JPEG, GIF, MP4/H.264 — no ffmpeg) - GUI toolkit with 15+ widgets - Terminal graphics (Kitty/Sixel) - 1,169 tests passing - Still 2.3 MB, still zero external dependencies
The zero-dep thing is real — zero Rust crates. The binary links against vulkan-1 and system libs, nothing else. cargo audit has nothing to audit.
Landing page: https://octoflow-lang.github.io/octoflow/ GPU VM details: https://octoflow-lang.github.io/octoflow/gpu-vm.html GitHub: https://github.com/octoflow-lang/octoflow Download: https://github.com/octoflow-lang/octoflow/releases/latest
I'm one developer. This is early. The GPU VM works and tests pass bit-exact, but there's a lot of road ahead — real LLM inference at scale, multi-agent orchestration, the full database engine. I'd love feedback from anyone who works with GPU compute, Vulkan, or language design.
Most languages treat GPU as "write a kernel, dispatch it, copy results back." OctoFlow flips it — data lives on
the GPU by default. The CPU handles I/O and nothing else.
let a = gpu_fill(1.0, 10000000)
let b = gpu_scale(a, 2.0)
let c = gpu_add(a, b)
print("sum: {gpu_sum(c)}")
10 million elements. Data never leaves VRAM between operations.
It's early — there's a lot to improve — but it works today and I'd love feedback from people who try it.
What you can do right now:
- GPU compute with arrays up to 10M+ elements
- Statistical analysis, ML (regression, clustering, neural net primitives)
- CSV/JSON data processing, HTTP client
- Stream pipelines for image processing
- Interactive REPL with GPU access
- Import from 51 stdlib modules across 11 domains
What you need: any GPU with a Vulkan driver and the 2.2 MB binary. That's it.
I've been working on this solo and would genuinely appreciate people kicking the tires. What works, what breaks,
what's missing — all useful.
https://github.com/octoflow-lang/octoflow