mr_octopus on Hacker News

Show HN: OctoFlow v1.0.0 – GPU VM where the GPU runs autonomously, CPU is BIOS

Three days ago I posted OctoFlow 0.83 here (GPU-native programming language, 2.2 MB binary). The feedback was great. Since then I've pushed v1.0.0 with the thing I've actually been building toward: a GPU Virtual Machine.

The idea: the GPU is the computer, the CPU is the BIOS.

You boot a VM, program a dispatch chain of kernel instances, submit once with vkQueueSubmit, and everything — layer execution, inter-layer communication, self-regulation, compression, database queries — happens on the GPU without CPU round-trips. The CPU just provides I/O.

  let vm = vm_boot()
  let prog = vm_program(vm, kernels, 4)
  vm_write_register(vm, 0, 0, input)
  vm_execute(prog)
  let result = vm_read_register(vm, 3, 30)

4 VM instances, one submit, no CPU involvement between stages.

The memory model is 5 SSBOs: Registers (per-VM working memory), Metrics (regulator signals), Globals (shared mutable — KV cache, DB tables), Control (indirect dispatch params), Heap (immutable bulk data — quantized weights).

What makes it interesting:

- Homeostasis regulator: each VM instance has a kernel that monitors activation norms, memory pressure, throughput. The GPU self-regulates without asking the CPU.

- GPU self-programming: a kernel writes workgroup counts to the Control buffer, the next vkCmdDispatchIndirect reads them. The GPU decides its own workload.

- Compression as computation: Q4_K dequantization, delta encoding, dictionary lookup — these are just kernels in the dispatch chain, not a special subsystem. Adding a new codec = writing an emitter. No Rust changes.

- CPU polling: Metrics and Control are HOST_VISIBLE. CPU can poll GPU state and activate dormant VMs without rebuilding the command buffer. The GPU broadcasts needs, the CPU fulfills them.

The VM is workload-agnostic. Same architecture handles LLM inference, database queries, physics sims, graph neural networks, DSP pipelines, and game AI. We've validated all six. The dispatch chain is the universal primitive.

What's new in v1.0.0 beyond GPU VM: - 247 stdlib modules (up from 51) - Native media codecs (PNG, JPEG, GIF, MP4/H.264 — no ffmpeg) - GUI toolkit with 15+ widgets - Terminal graphics (Kitty/Sixel) - 1,169 tests passing - Still 2.3 MB, still zero external dependencies

The zero-dep thing is real — zero Rust crates. The binary links against vulkan-1 and system libs, nothing else. cargo audit has nothing to audit.

Landing page: https://octoflow-lang.github.io/octoflow/ GPU VM details: https://octoflow-lang.github.io/octoflow/gpu-vm.html GitHub: https://github.com/octoflow-lang/octoflow Download: https://github.com/octoflow-lang/octoflow/releases/latest

I'm one developer. This is early. The GPU VM works and tests pass bit-exact, but there's a lot of road ahead — real LLM inference at scale, multi-agent orchestration, the full database engine. I'd love feedback from anyone who works with GPU compute, Vulkan, or language design.

2mr_octopus3mo ago0

Show HN: OctoFlow – A GPU-native programming language

I built a general-purpose programming language where the GPU is the primary execution target, not an afterthought.

  Most languages treat GPU as "write a kernel, dispatch it, copy results back." OctoFlow flips it — data lives on
  the GPU by default. The CPU handles I/O and nothing else.

  let a = gpu_fill(1.0, 10000000)
  let b = gpu_scale(a, 2.0)
  let c = gpu_add(a, b)
  print("sum: {gpu_sum(c)}")

  10 million elements. Data never leaves VRAM between operations.

  It's early — there's a lot to improve — but it works today and I'd love feedback from people who try it.

  What you can do right now:

  - GPU compute with arrays up to 10M+ elements
  - Statistical analysis, ML (regression, clustering, neural net primitives)
  - CSV/JSON data processing, HTTP client
  - Stream pipelines for image processing
  - Interactive REPL with GPU access
  - Import from 51 stdlib modules across 11 domains

  What you need: any GPU with a Vulkan driver and the 2.2 MB binary. That's it.

  I've been working on this solo and would genuinely appreciate people kicking the tires. What works, what breaks,
  what's missing — all useful.

  https://github.com/octoflow-lang/octoflow

Show HN: OctoFlow v1.0.0 – GPU VM where the GPU runs autonomously, CPU is BIOS

The idea: the GPU is the computer, the CPU is the BIOS.

  let vm = vm_boot()
  let prog = vm_program(vm, kernels, 4)
  vm_write_register(vm, 0, 0, input)
  vm_execute(prog)
  let result = vm_read_register(vm, 3, 30)

4 VM instances, one submit, no CPU involvement between stages.

What makes it interesting:

- Homeostasis regulator: each VM instance has a kernel that monitors activation norms, memory pressure, throughput. The GPU self-regulates without asking the CPU.

- GPU self-programming: a kernel writes workgroup counts to the Control buffer, the next vkCmdDispatchIndirect reads them. The GPU decides its own workload.

- CPU polling: Metrics and Control are HOST_VISIBLE. CPU can poll GPU state and activate dormant VMs without rebuilding the command buffer. The GPU broadcasts needs, the CPU fulfills them.

The zero-dep thing is real — zero Rust crates. The binary links against vulkan-1 and system libs, nothing else. cargo audit has nothing to audit.

Show HN: OctoFlow – A GPU-native programming language

I built a general-purpose programming language where the GPU is the primary execution target, not an afterthought.

  Most languages treat GPU as "write a kernel, dispatch it, copy results back." OctoFlow flips it — data lives on
  the GPU by default. The CPU handles I/O and nothing else.

  let a = gpu_fill(1.0, 10000000)
  let b = gpu_scale(a, 2.0)
  let c = gpu_add(a, b)
  print("sum: {gpu_sum(c)}")

  10 million elements. Data never leaves VRAM between operations.

  It's early — there's a lot to improve — but it works today and I'd love feedback from people who try it.

  What you can do right now:

  - GPU compute with arrays up to 10M+ elements
  - Statistical analysis, ML (regression, clustering, neural net primitives)
  - CSV/JSON data processing, HTTP client
  - Stream pipelines for image processing
  - Interactive REPL with GPU access
  - Import from 51 stdlib modules across 11 domains

  What you need: any GPU with a Vulkan driver and the 2.2 MB binary. That's it.

  I've been working on this solo and would genuinely appreciate people kicking the tires. What works, what breaks,
  what's missing — all useful.

  https://github.com/octoflow-lang/octoflow

mr_octopus

Recent submissions

Show HN: I wrote a free trilogy about perception, presence, and leadership (opens in new tab)

Show HN: OctoFlow–GPU-native lang, vibe-coded with human at every decision gate (opens in new tab)

Show HN: OctoFlow v1.0.0 – GPU VM where the GPU runs autonomously, CPU is BIOS

Show HN: OctoFlow – A GPU-native programming language

Recent submissions

Show HN: I wrote a free trilogy about perception, presence, and leadership (opens in new tab)

Show HN: OctoFlow–GPU-native lang, vibe-coded with human at every decision gate (opens in new tab)

Show HN: OctoFlow v1.0.0 – GPU VM where the GPU runs autonomously, CPU is BIOS

Show HN: OctoFlow – A GPU-native programming language