The smarter the models get the less the harnesses matter (outside of devx).
Maybe one day I'll run it through swebech.
I also wrote one by myself last week (just for fun and learning). It works, including integration with configured mcpServers (like you do in most coding agents). Wrote about the whole step-by-step process and what is needed at what step and why: https://nb1t.sh/building-a-real-agent-step-by-step/
Rewriting things in rust is "cool". Bun did it, other projects did it. Therefore, writing a coding agent in one should be cool too.
And apparently enough HN crowd agrees with it to take the #1 spot on the board.
I decided to allow for customization in a different way:
1. The prompt library (~/.config/hypernova/prompts/) acts as a simpler alternative to Skills, with the built-in prompts that should replace superpowers + Claude's frontend-design
2. Compile-time features; things that might make the agent more bloated can be disabled when you decide to compile zerostack
3. Clean code; code that's short and easy to read, you can just throw zerostack on its own source code in order to build a custom fork if your necessity can't be satisfied. Good features could also be adopted by the main version.
4. Permission mode; as you can see in the README, there was lots of concern around the permission model, and I landed on a 4-mode system that goes from "Restrictive" (no commands) to "YOLO" (whatever the agent wants to do" + custom regex patterns for allow/ask/deny permission on 'bash' calls. In your case, you just need to run `zerostack -R` to force all tools to ask for permission.
(Also, there is a work-in-progress features for programmable agents, but that's yet to be announced)
You might find it nice for pretty much all use cases except for high-performance scripting (so, if you are not try to build the entire logic entirely in rhai, you are going to be fine).
I like this, Claude Code is using multiple gigabytes, which is really annoying on lowend laptops
There's no reason what is essentially a string concat engine should be slow on any hardware, including old hardware.
The reasons why the memory footprint of zerostack are:
- Rust, and not JS/Python, so no interpreters/VMs on top
- Load-as-needed, so we only allocate things like LLM connectors when needed
- `smallvec` used for most of the array usage of the tool (up to N items are stored in stack)
- `compactstring` used for most of the string usage of the tool (up to N chars are stored in stack)
- `opt-level=z` to force LLVM to optimize for binary size and not for performance (even tho we still beat both in TTFT and in tool use time opencode)
- heavy usage of [LTO](https://en.wikipedia.org/wiki/Interprocedural_optimization#W...)
To be honest, I just plagiarized Pi, Dirac, OpenCode. Any new tricks in this one that I can steal?
What you showed is a clear bug in my codebase, if you can, open a Github issue with each of your bugs.
Thanks!
Will check this out! Seems cool!
Maybe a workaround could be to use bubblewrap of the scripts ther recursively call the llm (and run the agent in yolo inside the wrap).
Nothing is committed until the final top-level transaction is accepted.
Manually checking the dependencies used by this project, I was pleased to see they are all the latest version. That doesn't mean there are no issues lurking in transitive dependencies, of course.
As for getting an LLM to review the code, I think we can get all opinionated very fast. For instance, when I was eyeballing the code, some of the enum methods converting to/from strings made me think "this could've been a single #[derive] with strum." That would make the code in provider.rs a lot more concise, at the cost of importing one crate (with no dependencies!)
Lastly, for fun, I decided to get DeepSeek V4 Pro (with Max thinking) to "audit" the codebase. The output mentioned no obvious signs of hidden telemetry, but it did note that the project sets the panic handler to "abort", which I have strong opinions on... Presumably the OP wanted to avoid linking against libunwind to save a few kilobytes of binary size, but now you have a binary that immediately aborts and doesn't give the user a stacktrace of what just crashed. I would rather have a ~50 KiB larger binary if it means getting useful debug info during a panic. Additionally, if there are async tasks that panic, they can't be recovered to display a generic error message; instead the whole process just aborts.
`cargo add` tip is very helpful, I had a hunch this happened in my own Rust project and I think you just filled in the missing piece for me there.
1. I had experience not only with wrong versions selected by the agents, but also weird crates (ex. choosing a crate with 10 github stars when a more complete and more supported one was available), reason why now I always choose the dependencies and then I let the agent work.
2. Yes, some of the provider code could be made using macros, I am just lazy... But thanks for the tip! I will save it for later.
3. No telemetry, and it can be checked thanks to the fact that there are no HTTP calls outside of the MCP implementation (via rmcp) and LLM connectors (via rig)
4. Yes, i set panic handler to 'abort', thinking that I would've get a nice size decrease: i yet have to experience a panic on this project, but I will revert it to default behavior if the binary size saving is really so small
5. While it is async, the entire project runs on one thread (as expressed in the main.rs with ```#[tokio::main(flavor = "current_thread")]```), as it allows for a nice ~8MB memory saving (so, 50% off) and no real performance loss, being such a simple tool.
---
P.S. Just switched back to default settings for panic handler
Doesn't prompt injection make that a rather flimsy investigation?
$ airun -q -p 'output a shell command for linux to display the current time. output only the command with no other code fencing or prose' | airun -q -s 'review the provided shell command, determine if it is safe, run it only if it is safe, and then summarize the output from the command' --permissions-allow='bash:date *'
I personally decided to not implement Skills and instead using a prompt library approach, where certain .md are used to fully replace the system prompt, in order to allow for an approach similar to Skills with ~100 LoC dedicated to this system.
I believe Pi extensibility is the most important feature, exactly as how it was important for WordPress. WordPress won because anyone could install it and add the plugins they needed. WordPress also has the same hook system where multiple plugins can build on the same hook.
Companies will want to completely customize their agent harness so it optimally works for their situation.
https://x.com/PandelisZ/status/2055633346831548902
The two things I want to get right before actually releasing it is properly eval it againt other harnesses and make sure its better.
And the licence. I don't think a GPL licence will yield addoption so I would like to MIT Roder or figure out the right licence
Sandboxing should be the default. Rather than routinely allowing unsandboxed access, one should be able to configure the sandbox to allow exactly what is needed
That's hard. For example, I've been unable to give wayland access to agents inside the sandbox (there's a special flag in bubblewrap to mount /dev/dri in a way you can make use of it, but you also must give access to the wayland socket, and maybe other things). So I think that maybe harnesses should invest in more sandboxing resources
I also used bwrap for sandboxing. I'm looking at layering slirp4netns, because I found out that models will happily break out of the sandbox via the the host network interface.
Both are in Rust and both mention Unix in their descriptions.
I’m curious how the prompts idea performs in practice compared to typical skills and subagents. I frequently combine the two to get otherwise tricky workflows done. Say I have a failing build. I invoke my /fix-ci skill (sometimes in the same context I made the code change in), it launches a subagent to extract an error message / stack traces / relevant logs, and works through the problem. Say an integration test ran into a db query issue. Sometimes the agent itself, sometimes with a slight nudge from me, will load the readonly db access skill and start investigating. If I expect long, deep shenanigans, I’ll often say something like „use a sonnet subagent and instruct it to use the db query skill to debug the behavior we’re seeing”. And it can keep going like that: skills give extra capabilities on the fly, subagents isolate context to prevent bloat. Intuitively, it seems that by the agent running itself via bash with different prompts _might_ come close but a bit less streamlined? I’d have to check and see.
About subagents: as of right now, the entire agent runs on one context buffer, so it doesn't support subagents in order to keep it lean; but there is a great chance that subagents will be added, as explore-heavy tasks often bloat the context window
So in that way it's not like skills at all, neither of those result in paying full read price on the entire session, just the skill prompt itself.
Something else I noticed... In the Anthropic implementation it doesn't seem to be using 'cache_control' in the body. Assuming my understanding is current, without that the Anthropic API won't do any caching at all (unlike most other APIs that do some level of automatic caching without it being requested). So that would result in paying full read price on every turn.
Of course I could be missing something, it was a quick look. Can you clarify?
https://www.star-history.com/?repos=anthropics%2Fclaude-code...
Compared to Codex CLI, Claude Code is insanely slow.
$ time claude --version
2.1.143 (Claude Code)
________________________________________________________
Executed in 4.39 secs fish external
usr time 29.68 millis 0.26 millis 29.41 millis
sys time 71.30 millis 1.30 millis 70.00 millis
5 seconds to show me the version number!I'm guessing Claude Code also needs a rewrite in Rust. But from what I saw in the leaked TypeScript code, a line-to-line port will be pretty bad. It requires a new architecture that matches Rust idioms
I suspect we'll soon see someone make a persistent Claude shell mode, with the reverse of a !, where you work in shell and send a message to Claude, and Claude sees all the context.
Imo, this shouldn't be embedded in the executor layer. Orchestration should handle this.
1. Couldn't be built only using prompts
2. Couldn't be built only using MCP servers
3. Would have improved my UX experience (as i hope, your UX experience).
From those three conditions, I chose integrated git worktrees and loops
E.g. how to use official, vendor provided skills with zerostack? https://github.com/elestio/elestio-skill
'"The skill description": if this applies, read /path/to/skill/definition.md'
To your agents.md
At least currently skills don't let you set the model (to my knowledge), so that's not a distinction either here (it would be with agent definitions)
Everyone wants to write one, building a new one is easy to start with, but tough to get to “prod ready” and the landscape is littered with failed attempts?
Certainly feels like it.
This is really good though; works well and at least has a clearly articulated raison d'être.
Now I tried to install zerostack, but the compilation freezes at a certain package.
Is there a static binary available for linux?
Will try to rebuild it with static flag.
1. The tools that are given to the agent are almost the same to the one defined in Opencode, except for Skills and Subagents (both features not implemented in zerostack)
2. Zerostack is prompt-based, so that it ships with a set of .md files, stored in ~/.config/zerostack/prompt, and that can be selected from the TUI in order to activate different 'agents': as you can see from the README, it is designed to contain the most important feautres of superpower + Claude's front-end design + git worktree support and Ralph Wiggum loops (both as integrated features)
2. As said before, there are no benchmarks right now, but it is good enough for me, so I hope it's good enough for y'all :)
3. Transfering settings from other agents is out-of-scope for a minimalstic coding agent, but the idea is that, apart from MCP server, the rest might just force you to learn how zerostack works, because of design choices such as not having Skills or having certain specialized tools integrated (worktrees and loops).
I’ve been implementing custom coding agent in https://playcode.io for 3 years already. Far beyond of 7K LoCs.
So when you compare to “shitty slow” Claude code - I don’t agree.
For 3 years, your Lovable clone is something that Claude Code could make in a couple of days, but good luck shitting on other project I guess.
Avoid lock in to stack from one provider (things like a harness that only works with models from one provider and so on).
Use local models (a couple of them do work a bit now, if you have 20Gb video RAM), which saves money and is more private, and works offline.
Can improve the harness, fix bugs in it, make it compatible with different systems and techniques.
This game happens every time in new cycles of developer technology. The good bet historically has always been to use open source - there's a reason most developer tooling just pre-AI revolution was open source (even things like Java and .NET which used to be proprietary).
Is there any API like Pi so that I can create extensions.
I've found is that nearly every extension on the official pi.dev/packages is vibe coded trash, like for example the most popular subagents extension.
Instead of just giving you a basic subagent, it's a whole kitchen sink of recursion, teams, chains, confusingly named agents like "oracle" etc. Basically feels like someone kept prompting "what else could we add here?".
They're all like that. It's no wonder these slow down pi.
What I've done is just have the agent write my own.
Get a local copy of e.g. that kitchen sink subagents extension. Have the agent list all the features, then I give back a much smaller list of the features I want and say "write me a new extension with just these new features" and every time it one shots it (using GPT 5.3 usually), then 20-30 minutes later I have a working, lightweight extension tuned to my exact workflow.
I've done this for I guess about 8 extensions now (subagents, a lightweight typescript LSP, web search, background processes, Claude style hooks, plan mode are the main ones) and it's very fast and snappy.
Bigger harnesses need to balance upping your token usage and being helpful.
Also, can I configure zerostack to always require a sandbox? I don't want to accidentally forget to call it with --sandbox.
It's a bit amusing that coding agents rely on drawing 1000W+ and using 2TB+ of memory in a datacenter to run, yet people really focus on the last few watts and few hundred megabytes of memory on their laptop (which get dwarfed by the energy cost of compiling their code anyways). But I suppose making them a bit faster and lighter wouldn't hurt.
Making the client side coding agent more efficient isn't about saving the climate. It is about extending the workday (which might actually make the climate worse)
What do Jetbrains users use then? Amp?
Although people are complaining about its RAM usage in this thread, I haven't bothered to check how much RAM it uses.
Got this on iPhone firefox
a low level language. please no more scripting language TUIs!
It just does not rely on GC and allows to manage resources efficiently. This efficiency is partly due to its being so high-level.
This is obv only a technical talk, as writing an AI TUI in pure C would be rather... ehhh
Lower-level languages like Zig or even Go, to say nothing of C, lack many of the high-level language features that power this efficiency.
-- So is deepseek-tui.
nobody actually cares about rust, let alone likes it
For example I have an agent in Claude Code that has strict rules to do something before implementing every phase in the plan. Sometimes it decides not to do it. "But, wait the feature is simple enough so I can proceed straight to implementation..."
Just because this is written in Rust won't solve the biggest issues most users have with coding agents.
Given how an LLM works, you can never be sure it will always work. LLMs are not deterministic.
/s
Always funny how Hacker News works with traction, posted about a rust based TUI agent I'm working on a couple days ago too :P
I vibed a comparison/review of these two systems using my llm wiki: https://zby.github.io/commonplace/work/pi-agent-zerostack-co...
(the prompt is in https://zby.github.io/commonplace/work/pi-agent-zerostack-co...)