Starting back in 2022/2023:
- (~2022) It can auto-complete one line, but it can't write a full function.
- (~2023) Ok, it can write a full function, but it can't write a full feature.
- (~2024) Ok, it can write a full feature, but it can't write a simple application.
- (~2025) Ok, it can write a simple application, but it can't create a full application that is actually a valuable product.
- (~2025+) Ok, it can write a full application that is actually a valuable product, but it can't create a long-lived complex codebase for a product that is extensible and scalable over the long term.
It's pretty clear to me where this is going. The only question is how long it takes to get there.
I don't think its a guarantee. all of the things it can do from that list are greenfield, they just have increasing complexity. The problem comes because even in agentic mode, these models do not (and I would argue, can not) understand code or how it works, they just see patterns and generate a plausible sounding explanation or solution. agentic mode means they can try/fail/try/fail/try/fail until something works, but without understanding the code, especially of a large, complex, long-lived codebase, they can unwittingly break something without realising - just like an intern or newbie on the project, which is the most common analogy for LLMs, with good reason.
What if we get to the point where all software is basically created 'on the fly' as greenfield projects as needed? And you never need to have complex large long lived codebase?
It is probably incredibly wasteful, but ignoring that, could it work?
Sure, create a one-off app to post things to your Facebook page. But a one-off app for the OS it's running on? Freshly generating the code for your bank transaction rules? Generating an authorization service that gates access to your email?
The only reason it's quick to create green-field projects is because of all these complex, large, long-lived codebases that it's gluing together. There's ample training data out there for how to use the Firebase API, the Facebook API, OS calls, etc. Without those long-lived abstraction layers, you can't vibe out anything that matters.
But then maybe this means what is a "codebase". If a code base is just a structured set of specs that compile to code ala typescript -> javascript. sure, but then, it's still a long-lived <blank>
But maybe you would have to elaborate on, what does "creating software on the fly" look like,. because I'm sure there's a definition where the answer is yes.
Case in point: Self driving cars.
Also, consider that we need to pirate the whole internet to be able to do this, so these models are not creative. They are just directed blenders.
Of course the tech will be useful and ethical if these problems are solved or decided to be solved the right way.
This is clear from the fact that you can distill the logic ability from a 700b parameter model into a 14b model and maintain almost all of it.
You just lose knowledge, which can be provided externally, and which is the actual "pirated" part.
The logic is _learned_
You'd need to prove that this assertion applies here. I understand that you can't deduce the future gains rate from the past, but you also can't state this as universal truth.
Knowledge engineering has a notion called "covered/invisible knowledge" which points to the small things we do unknowingly but changes the whole outcome. None of the models (even AI in general) can capture this. We can say it's the essence of being human or the tribal knowledge which makes experienced worker who they are or makes mom's rice taste that good.
Considering these are highly individualized and unique behaviors, a model based on averaging everything can't capture this essence easily if it can ever without extensive fine-tuning for/with that particular person.
I've heard this same thing repeated dozens of times, and for different domains/industries.
It's really just a variation of the 80/20 rule.
We've been having same progression with self driving cars and they are also stuck on the last 10% for last 5 years
As long as it can do the thing on a faster overall timeline and with less human attention than a human doing it fully manually, it's going to win. And it will only continue to get better.
And I don't know why people always jump to self-driving cars as the analogy as a negative. We already have self-driving cars. Try a Waymo if you're in a city that has them. Yes, there are still long-tail problems being solved there, and limitations. But they basically work and they're amazing. I feel similarly about agentic development, plus in most cases the failure modes of SWE agents don't involve sudden life and death, so they can be more readily worked around.
Does it matter that 49 of them "failed"? It cost me fractions of a cent, so not really.
If every one of the 50 variants was drawn by a human and iterated over days, there would've been a major cost attached to every image and I most likely wouldn't have asked for 50 variations anyway.
It's the same with code. The agent can iterate over dozens of possible solutions in minutes or a few hours. Codex Web even has a 4x mode that gives you 4 alternate solutions to the same issue. Complete waste of time and money with humans, but with LLMs you can just do it.
And this isn't a pessimistic take! I love this period of time where the models themselves are unbelievably useful, and people are also focusing on the user experience of using those amazing models to do useful things. It's an exciting time!
But I'm still pretty skeptical of "these things are about to not require human operators in the loop at all!".
The question in my mind is where we are on the s-curve. Are we just now entering hyper-growth? Or are we starting to level out toward maturity?
It seems like it must still be hyper-growth, but it feels less that way to me than it did a year ago. I think in large part my sense is that there are two curves happening simultaneously, but at different rates. There is the growth in capabilities, and then there is the growth in adoption. I think it's the first curve that seems to be to have slown a bit. Model improvements seem both amazing and also less revolutionary to me than they did a year or two ago.
But the other curve is adoption, and I think that one is way further from maturity. The providers are focusing more on the tooling now that the models are good enough. I'm seeing "normies" (that is, non-programmers) starting to realize the power of Claude Code in their own workflows. I think that's gonna be huge and is just getting started.
The trend is definitely here, but even today, heavily depends on the feature.
While extra useful, it requires intense iteration and human insight for > 90% of our backlog. We develop a cybersecurity product.
> The only question is how long it takes to get there.
This is the question and I would temper expectations with the fact that we are likely to hit diminishing returns from real gains in intelligence as task difficulty increases. Real world tasks probably fit into a complexity hierarchy similar to computational complexity. One of the reasons that the AI predictions made in the 1950s for the 1960s did not come to be was because we assumed problem difficulty scaled linearly. Double the computing speed, get twice as good at chess or get twice as good at planning an economy. P, NP separation planed these predictions. It is likely that current predictions will run into similar separations.
It is probably the case that if you made a human 10x as smart they would only be 1.25x more productive at software engineering. The reason we have 10x engineers is less about raw intelligence, they are not 10x more intelligent, rather they have more knowledge and wisdom.
By your logic, does it mean that engineers will never get replaced?
- (~2022) "It's so over for developers". 2022 ends with more professional developers than 2021.
- (~2023) "Ok, now it's really over for developers". 2023 ends with more professional developers than 2022.
- (~2024) "Ok, now it's really, really over for developers". 2024 ends with more professional developers than 2023.
- (~2025) "Ok, now it's really, really, absolutely over for developers". 2025 ends with more professional developers than 2024.
- (~2025+) etc.
Sources: https://www.jetbrains.com/lp/devecosystem-data-playground/#g...
I suspect that the timeline from autocomplete-one-line to autocomplete-one-app, which was basically a matter of scaling and RL, may in retrospect turn out to have been a lot faster that the next LLM to AGI step where it becomes capable of using human level judgement and reasoning, etc, to become a developer, not just a coding tool.
They're definitely better now, but it's not like ChatGPT 3.5 couldn't write a full simple todo list app in 2023. There were a billion blog posts talking about that and how it meant the death of the software industry.
Plus I'd actually argue more of the improvements have come from tooling around the models rather than what's in the models themselves.
There's a lot of non-ergonomic copy and pasting but it's definitely using an LLM to build a full application.