I’ve been pretty active in the open model space and 2 years ago you would have had to pay 20k to run models that were nowhere near as powerful. It wouldn’t surprise me if in two more years we continue to see more powerful open models on even cheaper hardware.
Have you tried 4.6 as a comparison to Kimi K2.5?
Are you just using the API mode?
This is what I've been increasingly understanding is the wrong way to understand how LLMs are changing things.
I fully agree that LLMs are not suitable for creating production code. But the bigger question you need to ask is 'why do we need production code?' (and to be clear, there are and always will be cases where this is true, just increasingly less of them)
The entire paradigm of modern software engineering is fairly new. I mean it wasn't until the invention of the programmable microprocessor that we even had the concept of software and that was less than 100 years ago. Even if you go back to the 80s, a lot of software doesn't need to be distributed or serve a endless variety of users. I've been reading a lot of old Common Lisp books recently and it's fascinating how often you're really programming lisp for you and your experiments. But since the advent of the web and scaling software to many users with diverse needs we've increasingly needed to maintain systems that have all the assumed properties of "production" software.
Scalable, robust, adaptable software is only a requirement because it was previously infeasible for individuals to build non-trivial systems for solving any more than a one or two personal problems. Even software engineers couldn't write their own text editor and still have enough time to also write software.
All of the standard requirements of good software exist for reasons that are increasingly becoming less relevant. You shouldn't rely on agents/LLMs to write production code, but you also should increasingly question "do I need production code?"
Consider design patterns, or clean code, or patterns for software development, or any other system that people use to write their code, and reviewers use to review the code. What are they actually for? This question is going to seem bizarre to most programmers at first, because it is so ingrained in us, that we almost forget why we have those patterns.
The entire point is to ensure the code is maintainable. In order to maintain it, we must easily understand it, and and be sure we're not breaking something when we do. That is what design patterns solve, making easier to understand and more maintainable.
So, I can imagine a future where the definition of "production code" changes.
That's a wild assumption. I personally know engineers who _alone_ wrote things like compilers, emulators, editors, complex games and management systems for factories, robots. That was before internet was widely available and they had to use physical books to learn.
So above and beyond frontier models? Because they certainly aren't "flawless" yet, or we have very different understanding of that word.
During the day I am working on building systems that move lots of data around where context and understanding of the business problem is everything. I largely use LLMs for assistance. This is because I need the system to be robust, scalable, maintainable by other people and adaptable to large range of future needs. LLMs will never be flawless in a meaningful sense in this space (at least in my opinion).
When I'm using Kimi I'm using it for purely vibe coded projects where I don't look at the code (and if I do I consider this a sign I'm not thinking about the problem correctly). Are these programs robust, scalable, generalizable, adaptable to future use case? No, not at all. But they don't need to be, they need to serve a single user for exactly the purpose I have. There are tasks that used to take me hours that now run in the background while I'm at work.
In this latter sense I say "flawless" because 90% of my requests solve the problem on the first pass, and the 10% of the time where there is some error, it is resolved in a single request, and I don't have to ever look at the code. For me that "don't have to look at the code" is a big part of my definition of "flawless".