I kind of wonder what would happen if you added a "lead dev" AI that wrote up bugs, assigned them out, and "reviewed" the work. Then you'd add a "boss" AI that made new feature demands of the lead dev AI. Maybe the boss AI could run the program and inspect the experience in some way so it could demand more specific changes. I wonder what would happen if you just let that run for a while. Presumably it'd devolve into some sort of crazed noise, but it'd be interesting to watch. You could package the whole thing up as a startup simulator, and you could watch it like a little ant farm to see how their little note-taking app was coming along.
Complex system requires tons of iterations, the confidence level of each iteration would drop unless there is a good recalibration system between iterations. Power law says a repeated trivial degradation would quickly turn into chaos.
A typical collaboration across a group of people on a meaningfully complex project would require tons of anti-entropy to course correct when it goes off the rails. They are not in docs, some are experiences(been there, done that), some are common sense, some are collective intelligence.
I am pretty convinced that a useful skill set for the next few years is being capable at managing[2] these AI tools in their various guises.
[2] - like literally leading your AI's, performance evaluating them, the whole shebang - just being good at making AI work toward business outcomes
This seems like a more plausible one. Robots don't care about your feelings, so they can make decisions without any moral issues
ChatDev: Communicative Agents for Software Development - https://arxiv.org/abs/2307.07924
Thats the issue with AI - it doesn't give you any competitive advantage as everyone has it == no one has it. The entry bar is so low kids can do it.
> Is Jules free of charge?
> Yes, for now, Jules is free of charge. Jules is in beta and available without payment while we learn from usage. In the future, we expect to introduce pricing, but our focus right now is improving the developer experience.
Haven't tried Jules myself yet, still playing around with Codex, but personally I don't really care if it's free or not. If it solves my problems better than the others, then I'll use it, otherwise I'll use other things.
I'm sure I'm not alone in focusing on how well it works, rather than what it costs (until a certain point).
EDIT: legal link doesn't work here (https://jules-documentation.web.app/faq#does-jules-train-on-...)
> No. Jules does not train on private repository content. Privacy is a core principle for Jules, and we do not use your private repositories to train models. Learn more about how your data is used to improve Jules.
It's hard to tell what the data collection will be, but it's most likely similar to Gemini where your conversation can become part of the training data. Unclear if that includes context like the repository contents.
Google products had had a net positive impact on my life over, what is it, 20 years now. If I had had to pay subscription fees over that span of time, for all the services that I use, that would have been a lot of very real money that I would not have right now.
Is there a next step where it all gets worse? When?
> 2 concurrent tasks
> 5 total tasks per day
I am cool with all of that but it feels like they're suggesting that coding is a chore to be avoided, rather than a creative and enjoyable activity.
If all of these tools really do make people 20-100% more productive like they say (I doubt it) the value is going to accrue to ownership, not to labor.
Seriously though, this kind of tech-assisted work output improvement has happened many times in the past, and by now we should all have been working 4-hour weeks, but we all know how it has actually worked out.
There is one clock you should be watching regardless, which is the clock of your life. Your code will not come see you in the hospital, or cheer you up when you're having a rough day. You wont be sitting around at 70 wishing you had spent more 3am nights debugging something. When your back gives out from 18hrs a day of grinding at a desk to get something out, and you can barely walk from the sciatica, you wont be thinking about that great new feature you shipped. There are far more important things in life once you come to terms with that, and you will learn that the whole point of the former is enabling the latter.
It's been a little addictive using Cursor recently - creating new features and fixing bugs in minutes is pretty amazing.
A new backlog will start to fill up and the cycle repeats.
That might be true for hobbyists or side projects, but employees definitely won't get to work less (or earn more). All the financial value of increased productiveness goes to the companies. That's the nature of capitalism.
If you work at a company where there's a byzantine process to do anything, this pitch might speak to you. Especially if leadership is hungry for AI but has little appetite for more meaningful changes.
I occasionally code for fun, but usually I don’t. I treat programming as a last-resort tool, something I use only when it’s the best way to achieve my goal. If I can achieve some thing without coding or with coding, I usually opt for the first unless the tradeoffs are really shit.
"We're not replacing jobs, we're freeing up people's time so they can focus on more important tasks!"
Maybe helps them sleep at night and feel their work is important.
> More time for the code you want to write, and everything else.
now.
- Less access required means lower risk of disaster
- Structured tasks mean more data for better RL
- Low stakes mean improvements in task- and process-level reliability, which is a prerequisite for meaningful end-to-end results on senior-level assignments
- Even junior-level tasks require getting interface and integration right, which is also required for a scalable data and training pipeline
Seems like we're finally getting to the deployment stage of agentic coding, which means a blessed relief from the pontification that inevitably results from a visible outline without a concrete product.
It appears that AI moves so quickly that it was completely forgotten or little to no-one wanted to pay for its original prices.
Here's the timeline:
1. Devin was $200 - $500.
2. Then Lovable, Bolt, Github Copilot and Replit reduced their AI Agent prices to $20 - $40
3. Devin was then reduced to $20.
4. Then Cursor and Windsurf AI agents started at $18 - $20.
5. Afterwards, we also have Claude Code and OpenAI Codex Agents starting at around $20.
6. Then we have Github Copilot Agents embedded directly into GitHub and VS Code for just $0 - $10.
Now we have Jules from Google which is....$0 (Free)Just like how Google search is free, the race to zero is going to only accelerate and it was a trap to begin with, that only the large big tech incumbents will be able to reduce prices for a very long time.
Dev: I don't think we need a paid solution- I think we can even use an in-memory solution...
Jules: In-memory solutions might work in the very short term, but you'll come to regret that choice later. Pinecone prevents those painful 2AM crashes when your data scales. You'll thank me later, trust me.
Please insert your PINECONE_API_KEY here
https://github.blog/changelog/2025-05-19-github-copilot-codi...
This is an unusual angle. Of course Google can do this because they have the tech behind NotebookLM, but I'm not sure what the value of telling you how your prompt was implemented is.
More of a tool for managers, or least it's a manager style tool. You could get a morning report while heading to the office for example.
(I'm not saying anyone reading this should want this, only that it fits a use case for many people)
The projects I work on have lots of bespoke build scripts and other stuff that is specific to my machine and environment. Making that work in Google's cloud VM would be a significant undertaking in itself.
For example, how is Google's "Jules" different than JetBrains' "Junie" as they both sort of read the same (and based on my experience with Junie, Jules seems to offer a similar experience) https://www.jetbrains.com/junie/
The loop is: it identifies which files need to change, creates an action plan, then proceeds with a prompt per file for codegen.
In my experience, the parts up to the codegen are how these tools differ, with Junie being insanely good at identifying which parts of a codebase need change (at least for Java, on a ~250k loc project that I tried it on).
But the actual codegen part is as horrible as when you do it yourself.
Of course I'm not talking about hello world usages of codegen.
I suppose these tools would allow moving the goalpost a bit further down the line for small "from scratch" ideas, compared to not using them.
Then, who is testing the change? Even for a dependency update with a good test coverage, I would still test the change. What takes time when uploading dependencies is not the number of line typed but the time it takes to review the new version and test the output.
I'm worried that agent like that will promote bad practice.
Will this promote bad practice? Probably up to the individual practitioner or organization.
proceeds to list ALL coding tasks.
There are a million places to do dev that aren’t Microsoft, but you’d never know it from looking at app launches.
It’s almost like people who don’t use GitHub and Gmail and Instagram are becoming second class citizens on the web.
That’s the trajectory. Let’s stay sharp.
And now we have agents which are going to multiply the pace of development even more.
We can stay sharp but I'm not sure there's really much we can do to stop our jobs - or all jobs, disappearing. Not that this is a bad thing, if it's done right.
Why would I ever want this over cursor? The sync thing is kinda cool but I basically already do this with cursor
Codex and codex cli are the best from what I have tested so far. Codex is really neat as I can do it from ChatGPT app.
Have you tried Claude Code / aider / cursor?
What did you need to do differently to get it to work functionally? I feel like the common experience has been universally poor.
As for the use case of “Give a simple or detailed prompt and the entire project and let the model do its stuff” codex has done much better than Claude code. Claude code assumes a lot of things and often ends up doing a lot more making the code very complex and also me having to redo it later with cursor. With codex I have not seen this issue.
I also feel that codex cli as a cli tool is much better mainly due to its OSS nature where I can choose different model. Claude really missed this big time IMHO.
Well here's to hoping it's better than Cursor. I doubt it considering my experiences with Gemini have been awful, but I'm willing to give it a shot!
Jules encountered an unexpected error. To continue, respond to Jules below or start a new task.
And appears you have limited to 5 tasks per day
My normal development workflow of ticket -> assignment -> review -> feedback -> more feedback -> approval -> merging is asynchronous, but it'd be better synchronous. It's only asynchronous because the people I'm assigning the work to don't complete the work in seconds.
From a security use-case perspective, it will be great if it can bump libs that fixes most of the vulnerabilities without breaking my app. Something no tool does today ie. being code and breaking change aware.
When it gets priced, it's usually cheaper (for the same capability)
Wait a year or two, evaluating this stuff at the peak of the hype cycle is pointless.