When they don't merge cleanly, it is time for human intervention, and the integration step would leave traces on which branches failed to merge.
Finally, when you do need to debug individual agents:
- Because mngr is, at the low level, just managed tmux sessions (local and remote), it's very easy to just attach to those sessions (`mngr connect`). It works even if the agent has been stopped, because mngr remembers enough about an agent to resurrect it.
- `mngr message` also allows you batch-message a bunch of agents. So if you do need to resume a lot of agents, you can experiment on one agent, figure out a good prompt, and then batch-message every other agent.
In this testing scenario, most agents don't actually require human intervention, and we've found that just connecting to a few individual agents to resolve problems is smooth and easy enough.
Bloggers: Here's how we use 3,000 parallel agents to write, test, and ship a new feature to production every 17 minutes in an 8M-LOC codebase (all agent-generated!).
... I'm doing something wrong, or other people are doing something wrong?
I think this is the difference. These toy examples of using parallel agents are *not* running against large codebases, allowing them to iterate more effectively. Once you are in real codebases (>1M LoC), these systems break down.
But our reaction to it has been to say "ok, well the best practice in software engineering is to make small, well-isolated components anyway, so what if we did that?"
We've been trying to really break things apart into smaller pieces (and that's even evident in mngr, where much of the code is split out into separate plugins), and have been having a ton of success with it.
I realize that that might not be an option for more brownfield / existing / legacy projects, but when making something new, I've really been enjoying this way of building things.
I understand that the natural instinct is to correct the output when you see your agent doing something wrong.
That is not productive.
The instinct should be to tweak the agent to do it right.
At this point I am almost not writing any code in an enterprise code base.
I'm extremely doubtful of this. It doesn't save time to tell it "you have an error on line 19", because that's (often) just as much work as fixing the error. Likewise, saying "be careful and don't make mistakes" is not going to achieve anything. So how can you possibly tweak the agent to "do it right" reliably without human intervention? That's not even a solved problem for working with _humans_ who don't have the context window limitations, let alone an LLM that deletes everything past 30k tokens.
Ah, yes; must always remember to add "And don't make any mistakes" into the prompt /s
I believe we can use these types of tools to make software more understandable, and mngr is an example of how to do that.
In our case study, we're using AI to increase our test coverage, and if you look at it, I would argue that we are making it more understandable--now instead of just having 100's of tests, we simply have a document that describes how the software is supposed to work, and the tests are linked to that document, and checked to ensure that they conform.
That means that anyone--not just the author of the software--is now able to read through the high level tutorial description of how the commands work in order to understand what the program should do!
And as for the tests themselves, we've been able to make nice testing infrastructure--like the transcripts and recordings that were highlighted in the post--to make it even easier for us to verify the behavior of the software.
We also have an incredibly detailed style guide and set of tests and guidelines to ensure that the entire code base is consistent, and high quality. You can drop into any of the code and pretty quickly understand what is happening. And if not, claude will do an excellent job of describing how any given component works, and how it relates to the others.
Finally, mngr itself is designed to be fully transparent when it is running--you can literally attach to the coding agent you are running and see exactly what is happening, and the program makes extensive log outputs for everything it does (feel free to open a PR if you'd like to see more!)
It's not perfect formal verification, but it does feel like we're making meaningful progress on making it easier to understand software--not harder.
And it is great! Really! Reading your post I was thinking if I could not do the same to write tests in an automated way in project I am working on. It would be awesome!
Though in an other hand we are living in a corporate, capitalistic, and a lot inhumane economic system. If this way of automation would work and deliver consistent output in a way of working software for 2 or 3 years, how long it would take to C-level suits to figure out that it is way better to have 2 or 3 Product Owners and maybe one Designer to write description of the entire programme and then just feed it to one of those automation pipeline? If tech giants will price product like that reasonably and it will work actually, how long it will be till it will cause entire industry to collapse and you will be able to produce software by paying to those tech giants? And it there will be like 5 of those only in the entire world - because nobody else will have enough GPUs. How soon till they came to agreement and split the world in areas of monopoly:
- if your company is in Asia you can either buy your application from Google or Alibaba.
In a world when everything is done in a computer via the software, such concentration of power would be bad for everyone.
Of course I doubt it will come that, simply because this would be very hard to achieve with our level of technology and some human involment will be necessary. But maybe I am kiding myself and I will loose my job entirely in few years along with tens of thousands other Software Engineers in a few years.
I think we'll be fine.
This feels more like Y2K panic than grounded in truth. Senior software engineers guide these systems effectively today without creating a mess. I'm sure in some years agents will fill the role of maintainability engineer too. We are not special or irreplaceable.
It's not like we won't be spending an incredible amount of energy to overcome issues with understandably and maintenance. The sheer economic forces will absolutely will this problem solved. It must be solved, because trillions of dollars urgently want it to be solved. That's evolutionary pressure if I've ever seen it.
Also, we ceremoniously ascribe too much value to the software we create. With the exception of a few places, almost all of it gets replaced before our careers are over. At the end of the day, business automation is value creation. It's not sacred. It has a finite life, and then it too dies.
The software artifact just needs to facilitate economic/interest flux long enough to be useful, then it can be replaced with something better or more relevant.
Thinking about that always makes me think about Foundation, The Merchant Princess. Mallow travels to the edge of the Empire to look how things are on one of those worlds. He learns that there is the cast of the tech priests and those people have absolutely no idea how those devices actually work.
He said:
> The machines work from generation to generation automatically, and the caretakers are a hereditary caste who would be helpless if a single D-tube in all that vast structure burned out
It was a sign of severe decline of the entire empire. People had no idea how devices work and they would not be able to reproduce it or even repair if one would broke.
It was recurring premise of civilisation decline in the series: no proper maintaince and people loosing interests and knowledge how things are done and how they work.
I just wondering if this is not the same thing starting to happining know with our civilisation.
And evolution? Evolution means mass extinction of species and its normal. I am not sure about you but I would rather avoid any mass extinction regarding humanity.
The agent orchestration library (mngr) is open source, so we aren't selling anything. There is literally no way for us to make money on it.
We shipped it this way instead of trying to monetize because we believe open agents must win over closed / verticalized platforms in order for humans to live freely in our AI future. We have plenty of money and runway as a company, and this feels much more important to work on.
what the hell?
each agent run against a real codebase probably spends 20-50k tokens just on context: repo structure, relevant files, recent changes. multiply that by 100 agents running every hour across 10-20 repos, and you're already hitting millions of tokens a day before any actual work happens. add in re-runs for failures or retries, and the cost curve gets steep quickly.
the harder problem is observability. with one agent you can read logs and understand what went wrong. with 100 agents you need aggregation, pattern detection, alerting on the common failure modes. if 3 agents fail silently but identically, was that a real issue or just rate limiting? if 40 agents all timeout at the same step, was it a dependency problem or infrastructure saturation? at scale you're debugging distributions, not individual runs.
also helps to be ruthless about concurrency. the async pattern isn't "run as many as possible at once"—it's "run exactly as many as the API and your budget can support without making the failure modes harder to diagnose." for claude api work that's usually smaller than people expect.
Are people just not going to open source anything anymore since licenses don't matter? Might as well just keep the code secret, right?
I'm also not sure that the current precedent on the matter is _quite_ as strong as you're thinking. The high-profile case you're most likely thinking of was from a guy Stephen Thaler, who was seeking not just to claim copyright on AI-generated content but to specify the AI as the sole author. (IIUC, he planned to still own the copyright on the theory that it was a work-for-hire.)