We know that a lack of control over their environment makes animals, including humans, depressed.
The software we use has so much of this lack of control. It's their way, their branding, their ads, their app. You're the guest on your own device.
It's no wonder everyone hates technology.
A world with software that is malleable, personal, and cheap - this could do a lot of good. Real ownership.
The nerds could always make a home with their linux desktop. Now everyone can. It'll change the equation.
I'm quite optimistic for this future.
Building it exactly to my design specs, giving it only the tool calls I need, owning all the data it stores about me for RAG, integrating it to the exact services/pipelines I care about... It's nothing short of invigorating to have this degree of control over something so powerful.
In a couple of days work, I have a discord bot that's about as useful as chatgpt, using open models, running on a VPS I manage, for less than $20/mo (including inference). And I have full control over what capabilities I add to it in the future. Truly wild.
Probelm is, to be able to do what you're describing, you still need the source code and the permission to modify it. So you will need to switch to the FOSS tools the nerds are using.
Strip away the ads, the data harvesting, add back the power features, and we'll be happy again. I'm more willing than ever to pay a one-time fee good software. I've started donating to all the free apps I use on a regular basis.
I don't want to own my own slop. That doesn't help me. Use your AI tools to build out the software if you want, but make sure it does a good job. Don't make me fiddle with indeterministic flavor-of-the-month AI gents.
Yes, which is why this model of development is basically dead-in-the-water in terms of institutional adoption. No large firm or government is going to allow that.
Yet, the first impact on FOSS seems to be quite the opposite: maintainers complaining about PRs and vulnerability disclosures that turn out to be AI hallucinations, wasting their time. It seems to be so bad that now GitHub is offering the possibility of turning off pull requests for repositories. What you present here is an optimistic view, and I would be happy for it to be correct, but what we've seen so far unfortunately seems to point in a different direction.
With that said, we are all dealing with AI still convincingly writing code that doesn't work despite passing tests or introducing hard to find bugs. It will be some time until we iron that out fully for more reliable output I suspect.
Unfortunately we won't be able to stop humans thinking they are software engineers when they are not now that the abstraction language is the human language so guarding from spam will be more important than ever.
I use a lot of my own software. Most of it is strictly worse both in terms of features and bugs than more intentional, planned projects. The reason I do it is because each of those tools solve my specific pain points in ways that makes my life better.
A concrete example: I have a personal dashboard. It was written by Claude in its entirety. I've skimmed the code, but no more than that. I don't review individual changes. It works for me. It pulls in my calendar, my fitbit data, my TODO list, various custom reminders to work around my tendency to procrastinate, it surfaces data from my coding agents, it provides a nice interface for me to browse various documentation I keep to hand, and a lot more.
I could write a "proper" dashboard system with cleanly pluggable modules. If I were to write it manually I probably would because I'd want something I could easily dip in and out of working on. But when I've started doing stuff like that in the past I quickly put it aside because it cost more effort than I got out of it. The benefit it provides is low enough that even a team effort would be difficult to make pay off.
Now that equation has fundamentally changed. If there's something I don't like, I tell Claude, and a few minutes - or more - later, I reload the dashboard and 90% of the time it's improved.
I have no illusions that code is generic enough to be usable for others, and that's fine, because the cost of maintaining it in my time is so low that I have no need to share that burden with others.
I think this will change how a lot of software is written. A "dashboard toolkit" for example would still have value to my "project". But for my agent to pull in and use to put together my dashboard faster.
A lot of "finished products" will be a lot less valuable because it'll become easier to get exactly what you want by having your agent assemble what is out there, and write what isn't out there from scratch.
I build my own inspired by Beads, not quite as you're describing, but I store todo's in a SQLite database (beads used SQLite AND git hooks, I didn't want to be married to git), and I let them sync to and from GitHub Issues, so in theory I can fork a GitHub repo, and have my tool pull down issues from the original repo (havent tried it when its a fork, so that's a new task for the task pile).
https://github.com/Giancarlos/guardrails/issues
You can see me dogfeeding my tool to my tools codebase and having my issues on the github for anyone to see, you can see the closed ones. I do think we will see an increase in local dev tooling that is tried and tested by its own creators, which will yield better purpose driven tooling that is generic enough to be useful to others.
I used to use Beads for all my Claude Code projects, now I just use GuardRails because it has safety nets and works without git which is what I wanted.
I could have forked Beads, but the other thing is Beads is a behemoth of code, it was much easier to start from nothing but a very detailed spec and Claude Code ;)
Maintainers won’t have to deal with an endless stream of PRs. Now people will just clone your library the second it has traction and make it perfect for their specific use case.
Cherry pick the best features and build something perfect for them. They’ll be able to do things your product can’t, and individual users will probably find a better fit in these spinoffs than in the original app.
Is it possible that software is not like anything else, that it is meant to be discarded: that the whole point is to always see it as a soap bubble?
--Alan PerlisIt's because you put 2by4's in place of the shocks, you absolute muppet. And then they either give them a massive bill to fix it properly or politely show them out.
Same will happen in self-modifying software. Some people are self-aware enough to know that "I made this, it's my problem to fix", some will complain to the maker of the harness they used and will be summarily shown the door.
It’s going to be very likely that once something is patched is to be considered as diverged and very hard to upgrade
I’m still looking for a generic agent interaction protocol (to make it worth building around) and thought ACP might be it. But (and this is from a cursory look) it seems that even OpenCode, which does support ACP, doesn’t use it for its own UI. So what’s wrong with it and are there better options to hopefully take its place?
I've started using OpenCode for some things in a big window because its side-by-side diff is great.
The best option will always be in-memory exchanges. Right now I am still using the pi RPC, and that also involves a bit of conversion, but it’s much lighter.
Model + prompt + function calls.
There are many such wrappers, and they differ largely on UI deployment/integration. Harness feels like a decent term, though "coding harness" feels a bit vague.
and you can build cool stuff on top of it too!
Pleased to meet you!
For me, it just didn’t compare in quality with Claude CLI and OpenCode. It didn’t finish the job. Interesting for extending, certainly, but not where my productivity gains lie.
I dislike many things about Claude Code, but I'll pick subagents as one example. Don't want to use them? Tough luck. (AFAIK, it's been a while since I used CC, maybe it is configurable now or was always and I never discovered that.)
With Pi, I just didn't install an extension for that, which I suspect exists, but I have a choice of never finding out.
Can't do that with Claude =)
With Emacs modes like agent-shell.el available and growing, why not invest in learning a tool that is likely to survive and have mindshare beyond the next few months?
And if there’s any feature codex has that you want, just have pi run codex in a tmux session and interrogate it how said feature works, and recreate it in pi.
The extensibility is really nice. It was easy to get it using my preferred issue tracker; and I've recently overridden the built-in `read` and `write` commands to use Emacs buffers instead. I'd like to override `edit` next, but haven't figured out an approach that would play to the strengths of LLMs (i.e. not matching exact text) and Emacs (maybe using tree-sitter queries for matches?). I also gave it a general-purpose `emacs_eval`, which it has used to browse documentation with EWW.
Let me also drop a link to the Pi Emacs mode here for anyone who wants to check it out: https://github.com/dnouri/pi-coding-agent -- or use: M-x package-install pi-coding-agent
We've been building some fun integrations in there like having RET on the output of `read`, `write`, `edit` tool calls open the corresponding file and location at point in an Emacs buffer. Parity with Pi's fantastic session and tree browsing is hopefully landing soon, too. Also: Magit :-)
The implementation is pretty terrible: a giant string of vibe-coded Emacs Lisp is sent to emacsclient, which performs the actions and sends back a string of JSON.
It's been interesting to iterate on the approach: watching the LLM (in my case Claude) attempting to use the tools; noticing when it struggles or makes incorrect assumptions; and updating the tool, documentation and defaults to better match those expectations.
I've also written some Emacs Lisp which opens Pi and tells it to "Action the request/issue/problem at point in buffer '<current-buffer>'" https://github.com/Warbo/warbo-emacs-d/blob/a13a1e02f5203476...
It feels similar to the file-watching provided by Aider (which uses inotify to spot files containing `# AI!` or `# AI?`), which I've previously used with FIXME and TODO comments in code; but it also works well in non-file things, e.g. error messages and test failures in `shell-mode`, and issues listed in the Emacs UI I wrote for the Artemis bug tracker (Claude just gets the issue number from the current line, and plugs that into a Pi extension I made for Artemis :-) )
https://github.com/can1357/oh-my-pi
More of a batteries included version of pi.
I am building my own assistant as an AI harness - that is definitely getting sandboxed to run only as a VM on my Mac.
I have this in my shell rc.
# bun
export BUN_INSTALL="$HOME/.bun"
export PATH="$BUN_INSTALL/bin:$PATH"
alias my-omp= "bun/Users/aravindhsampathkumar/ai_playground/oh-my-pi/packages/coding-agent/src/cli.ts"
and do1. git pull origin main
2. bun install
3. bun run build:native
every time I pull changes from upstream.
Until yesterday, this process was purely bliss - my own minimal custom system prompt, minimal AGENTS.md, and self curated skills.md. One thing I was wary of switching from pi to oh-my-pi was the use of Rust tools pi-native using NAPI. The last couple of days whatever changes I pulled from upstream is causing the models to get confused about which tool to use and how while editing/patching files. They are getting extremely annoyed - I see 11 iterations of a tool call to edit a damn file and the model then resorted to rewriting the whole file from memory, and we al know how that goes. This may not be a bug in oh-my-pi per se. My guess is that the agent developed its memory based on prior usage of the tools and my updating oh-my-pi brought changes in their usage. It might be okay if I could lose all agent memory and begin again, but I dont want to.
I'm going to be more diligent about pulling upstream changes from now on, and only do when I can afford a full session memory wipe.
Otherwise, the integrations with exa for search, LSP servers on local machine, syntax highlighting, steering prompts, custom tools (using trafilatura to fetch contents of any url as markdown, use calculator instead of making LLM do arithmetic) etc work like a charm. I haven't used the IPython integration nor do I plan to.
Pi appears to have a smaller, less “pre-made” ecosystem, but with more flexibility, enthusiasm and extensibility.
Is this correct? Should I look towards Pi over OpenCode? What are the UI options?
Honestly, it's been a dream, I have it running in a docker-sandbox with access to a single git repo (not hosted) that I am using for varied things with my business.
Try it out, it's super easy to setup. If you use docker sandbox, you can just follow what is necessary for claude, spin up the sandbox, exit out, exec into it with bash and switch to Pi.
I looked a bit into the reasoning for Pi’s design (https://mariozechner.at/posts/2025-11-30-pi-coding-agent/#to...) and, while it does seem to do a lot of things very well around extensibility, I do miss support for permissions, MCP and perhaps Todos and a server mode. OpenCode seems a lot more complete in that regard, but I can well imagine that people have adapted Pi for these use cases (OpenClaw seems to have all of these). So it’s definitely not out of the race yet, but I still appreciate OpenCodes relative seeming completeness in comparison.
Went from codex/claude code -> opencode -> pi -> oh-my-pi
It's just an opinionated fork, either you like it or you don't. I personally really like it.
This is quite nice — I do think there’s a version of pi’s design choices which could live in a static harness, but fully covering the same capabilities as pi without a dynamic language would be difficult. (You could imagine specifying a programmable UI, etc — various ways to extend the behavior of the system, and you’d like end up with an interpreter in the harness)
At least, you’d like to have a way to hot reload code (Elixir / Erlang could be interesting)
This is my intuition, at least.
Don't hate me aha and no, there is no reason other than I can
pi plugins support adding hooks at every stage, from tool calls to compaction and let you customize the TUI UI as well. I use it for my multi-tenant Openclaw alternative https://github.com/lobu-ai/lobu
If you're building an agent, please don't use proprietary SDKs from model providers. Just stick to ai-sdk or pi agent.
("Limited experimentation" = a few months ago I threw $10 into the Anthropic console and did a bit of vibe coding and found my $10 disappeared within a couple of hours).
If so, that would support your concern, it does kinda sound like they're selling marginal Claude Code / Gemini CLI tokens at a loss. Which definitely smells like an aggressive lockin strategy.
Anthropic docs is intentionally not clear about how 3P tools are defined, is it calling Claude app or the Anthropic API with the OAuth tokens?
https://yepanywhere.com/subscription-access-approaches/
Captures the ai-sdk and pi-mono.
In an ideal world we would have a pi-cli-mono or similar, like something that is not as powerful as pi but gives a least common denominator sort of interface to access at least claude/codex.
ACP is also something interesting in this space, though I don't honestly know how that fits into this story.
You can open a grid of windows inside vscode too and it comes back up exactly as it was on reload.
Think of it more like directing a coworker or subcontractor via text chat. You tell them what you want and get a result, then you test it if it's what you want and give more instructions if needed.
I literally just fixed a maintenance program on my own server while working my $dayjob. ssh to server, start up claude and tell it what's wrong, tab away. Then I came back some time later, read what it had done, tested the script and immediately got a few improvement ideas. Gave them to Claude, tabbed out, etc.
Took me maybe 15 minutes of active work while chatting on Slack and managing my other tasks. I never needed to look at the code at any point. If it works and tests pass, why do I care what it looks like?
In my own experience I cannot blindly accept code without even looking at it even for a few moments because I've had many situations where the code was simply doing the wrong things... including tests are completely wrong and testing the wrong assumptions.
So yah, even when I review trivial changes I still look at the diff view to see if it makes sense. And IDEs make code review a lot easier than diff.
Btw, this experience is not from lack of trying. We use coding agent extensively (I would assume more than the typical org looking at our bill) and while they are certainly very, very helpful and I cannot describe how much effort they are really saving us, there is absolutely zero chance of pushing something out without reviewing it first - same applies for code written by AI agent or a coworker.
Because then you need to make an extension for every IDE. Isn't it better to make a CLI tool with a server, and let people make IDE extensions to communicate with it?
Claude Code has an update every few days. Imagine now propagating those changes to 20+ IDEs.
I agree. I tried Gemini CLI for a while, and didn't like how separate I felt from the underlying files: rather than doing minor cleanup myself, the activation energy of switching to a separate editor and opening the same files was too high, so I'd prompt the LLM to it instead. Which was often an exercise in frustration, as it would take many rounds of explanation for such tiny payoffs; maybe even fiddling with system prompts and markdown files, to try and avoid wasting so much time in the future...
I've been using Pi for a few weeks now, and have managed to integrate it quite deeply into Emacs. I run it entirely via RPC mode (JSON over stdio), so I don't really know (or care) about its terminal UI :-)
CLI within VSCode is workable but most of my VSCode envs are within a docker container. This is a pattern that I'm moving more and more away from as agents within a container kind of suck.
EDIT: Thank you to both responders. I'll just try the two options out then.
Fwiw the janky `claude -p` approach you described is actually pretty solid once you stop fighting it, the simplicity is the feature I think.
That's how the pi-coding-agent Emacs package interacts with pi; and it's how I write automated tests for my own pi extensions (along with a dummy LLM that emits canned responses).
Wondering if you wanted a similar interface (though a GUI not just CLI) where it's not for coding what would you call that?
Same idea cycle through models, ask question, drag-drop images, etc...
"harness" fits pretty nicely IMO. It can be used as a single word, and it's not too semantically overloaded to be useful in this context.
For me I'm looking to stick with Python so will whip something up with Tkinter later for the desktop GUI aspect although I still like Electron/JS primarily.
https://github.com/elyase/awesome-personal-ai-assistants?tab...
Pass for me.
Does anyone have an idea as to why this would be a feature? don't you want to have a discussion with your agent to iron out the details before moving onto the implementation (build) phase?
In any case, looks cool :)
EDIT 1: Formatting EDIT 2: Thanks everyone for your input. I was not aware of the extensibility model that pi had in mind or that you can also iterate your plan on a PLAN.md file. Very interesting approach. I'll have a look and give it a go.
As for subagents, Pi has sessions. And it has a full session tree & forking. This is one of my favorite things, in all harnesses: build the thing with half the context, then keep using that as a checkpoint, doing new work, from that same branch point. It means still having a very usable lengthy context window but having good fundamental project knowledge loaded.
There are already multiple implementations of everything.
With a powerful and extensible core, you don't need everything prepackaged.
https://github.com/badlogic/pi-mono/tree/main/packages/codin...
Qwen3.5 released a couple of days ago but I'm not that RAM rich
ChatGPT $20/month is alright but I got locked out for a day after a couple hours. Considering the GitHub pro plus plan.
Nearly all of its value is facilitating your interaction with the LLM, the tools it can use, and how it uses them.
No plan mode. Write plans to files, or build it with extensions, or install a package. No built-in to-dos. Use a TODO.md file, or build your own with extensions. No background bash. Use tmux. Full observability, direct interaction.
This is very important to have control and ownership.
Pi is not for everyone, but the ones eventually want to have tools like (read, bash, edit, write, grep, find, ls) as building blocks.
Open source: https://github.com/isagawa-co/isagawa-kernel
If you want bags of features, rather clone oh-my-pi somewhere, and get your agent to bring in bits of it a time, checking, reviewing, customising as you go.
PI is what it looks like when you treat your Plugin sdk as the golden path
It seems stange also that even Steinberger in his interviews is not giving pi the proper attribution.
Here’s an example config: https://github.com/earendil-works/gondolin/blob/main/host/ex...
Note that if you sandbox to literally just the working directly, pi itself wont run since pretty much every linux application needs to be able to read from /usr and /etc
I think I published it. Check the pi package page.
Eg some form of comprehensive planning/spec workflow is best modeled as an extension vs natively built in. And the extension still ends up feeling “native” in use
I really like the customization aspect of it and you can build tools on fly and even switch model mid session
There’s another project here called oh my pi has anyone here tried it
They are all open source though so you can just find out whats going on if you want right?
pi.dev domain graciously donated by exe.dev
Notwithstanding the donation, this domain must have costed $$$$There's something in the default Codex harness that makes it fight with both arms behind its back, maybe the sandboxing is overly paranoid or something.
With Pi I can one-shot many features faster and more accurately than with Codex-cli.
It seems pretty well done: when I added permission requests to the `bash` tool, the "Are you sure y/N" requests started appearing just like they were native to Emacs.
The prompt shown is
"Who's your daddy and what does he do?"
Is this a joke or tech? Is the author a dev or a clown?
This coding agent certainly couldn't give a fuck.
After my max sub expired I decided to try Kimi on a more open harness, and it ended up being one of the worst (and eye opening experiences) I had with the agentic world so far.
It was completely alienating and so much 'not for me', that afterwards I went back and immediately renewed my claude sub.
Where did you get this perspective from?
> I thought pi and its tools were supposed to be minimal and extensible. So why is a subagent extension bundling six agents I never asked for that I can’t disable or remove?
Why do you think a random subagents extension is under the same philosophy as pi?
Your blog post says little about pi proper, it's essentially concerned with issues you had with the ecosystem of extensions, often made by random people who either do or do not get the philosophy? Why would that be up to pi to enforce?
Here's the problem with Claude Code: it acts like it's got security, but it's the equivalent of a "do not walk on grass" sign. There's no technical restrictions at play, and the agent can (maliciously or accidentally) bypass the "restrictions".
That's why Pi doesn't have restrictions by default. The logic is: no matter what agent you are using, you should be using it in a real sandbox (container, VM, whatever).
In the meantime, the codex/cc defaults are better for me.
Yep. This is why I've been going "Hell, no!" and will probably keep doing so.
Anyway, even if you give your agent permission, there's no secure way to know whether what they're asking to is what they'll actually do, etc.
Pi supports permission popups, but doesn't use them by default. Their example extensions show how to do it (add an event listener for `tool_call` events; to block the call put `block: true` in its result).
> there's no secure way to know whether what they're asking to is what they'll actually do
What do you mean? `tool_call` event listeners are given the parameters of the tool call; so e.g. a call to the `bash` tool will show the exact command that will execute (unless we block it, of course).