Less 9's are a reasonable tradeoff for the ability to ship AI to everyone I suppose. That's one way to prove the technology isn't reliable enough to be shipped into autonomous kill chains just yet lol.
FWIW I use AI daily to help me code...
And apparently the output of LLMs are normalizing single 9's too: which may or may not be sufficient.
From all the security SNAFUs, performance issues, gigantic amount of kitchen-skinky boilerplate generated (which shall require maintenance and this has always been the killer) and now uptime issues this makes me realize we all need to use more of our brains, not less, to use these AI tools. And that's not even counting the times when the generated code simply doesn't do what it should.
For a start if you don't know jack shit about infra, it looks like you're already in for a whole world of hurt: when that agent is going to rm -rf your entire Git repo and FUBAR your OS because you had no idea how to compartmentalize it, you'll feel bad. Same once all your secrets are going to publicly exposed.
It looks like now you won't just be needing strong basis about coding: you'll also be needing to be at ease with the entire stack. Learning to be a "prompt engineer" definitely sounds like it's the very easy part. Trivial even.
I'm not saying that it should excuse straight up bad engineering practices, but I'd rather have them iterate on the core product (and maybe even make their Electron app more usable: not to have switching conversations take 2-4 seconds sometimes when those should be stored locally and also to have bare minimum such as some sort of an indicator when something is happening, instead of "Let me write a plan" and then there is nothing else indicating progress vs a silently dropped connection) than pursue near-perfect uptime.
Sorry about the usability rant, but my point is that I'd expect medical systems and planes to have amazing uptime, whereas most other things that have lower stakes I wouldn't be so demanding of. The context I've not mentioned so far is that I've seen whole systems get developed poorly, because they overengineered the architecture and crippled their ability to iterate, sometimes thinking they'd need scale when a simpler architecture, but a better developed one would have sufficed!
Ofc there's a difference between sometimes having to wait in a queue for a request to be serviced or having a few requests get dropped here and there and needing to retry them vs your system just having a cascading failure that it can't automatically recover from and that brings it down for hours. Having not enough cards feels like it should result in the former, not the latter.
Also remember, using claude to code might make the company you're working for richer. But you are forgetting your skills (seen it first hand), and you're not learning anything new. Professionally you are downgrading. Your next interview won't be testing your AI skills.
You are living under quite a big rock.
C'mon let's be real here, there's either "testing AI skills" versus "using AI agents like you would on the daily".
The signal got from leetcode is already dubious to assert profeciency and it's mostly used as a filter for "Are you willing to cram useless knowledge and write code under pressure to get the job?" just like system design is. You won't be doing any system design for "scale" anywhere in any big tech because you have architects for that nor do you need to "know" anything, it's mostly gatekeeping but the truth is, LLMs democratized both leetcode and system design anyway. Anyone with the right prompting skills can now get to an output that's good for 99% of the cases and the other 1% are reserved for architecs/staff engineers to "design" for you.
The crux of the matter is, companies do not want to shift how they approach interviews for the new era because we have collectively believed that the current process is good enough as-is. Again, I'd argue this is questionable given how sometimes these services break with every new product launch or "under load" (where YO SYSTEM DESIGN SKILLZ AT).
There's a massive gap between just using an LLM and using it optimally, e.g. with a proper harness, customised to your workflows, with sub-agents etc.
It's a different skill-set, and if you're going to go into another job that requires manual coding without any AI tools, by all means, then you need to focus on keeping those skills sharp.
Meanwhile, my last interview already did test my AI skills.
Curious to hear more about this.
Depends on what you consider your "skills". You can always relearn syntax, but you're certainly not going to forget your experience building architectures and developing a maintainable codebase. LLMs only do the what for you, not the why (or you're using it wrong).
For the people who started before the LLM craze, they won't lose their skills if they are just focusing on their original roles. The truth is people are being assigned more than their original roles in most companies. Backend developers being tasked with frontend, devops, qa roles and then letting go of the others. This is happening right now. https://www.reddit.com/r/developersIndia/comments/1rinv3z/ju... When this happens, they don't care or have the mental capacity to care about a codebase in a language they never worked before. People here talk about guiding the llms, but at most places they are too exhausted to carry that context and let claude review it's own code.
For the people who are starting right now, they're discouraged from all sides for writing code themselves. They'll never understand why an architecture is designed a certain way. Sure ask the llm to explain but it's like learning to swim by reading a book. They have to blindly trust the code and keep hitting it like a casino machine (forgot the name, excuse me) burning tokens which makes these companies more money.
For the people who are yet to begin, sorry for having to start in a world where a few companies hold everyone's skills hostage.
That is a very real concern, I've had to chase engineers to ensure that they are not blindly accepting everything that the LLM is saying, encouraging them to first form some sense of what the solution could be and then use the LLM to refine it further.
As more and more thinking is offloaded to LLMs, people lose their gut instinct about how their systems are designed.
Not that I disagree with your overall point, but have you interviewed recently? 90% of companies I interacted with required (!) AI skills, and me telling them how exactly I "leverage" it to increase my productivity.
Huge disagree. Or likely more "depends on how you use it". I've learned a lot since I started using AI to help me with my projects, as I prompt it in such a way that if I'm going about something the "wrong" way, it'll tell me and suggest a better approach. Or just generally help me fill out my knowledge whenever I'm vague in my planning.
This is just false. I may forget how to write code by hand, but I'm playing with things I never imagined I would have time and ability to, and getting engineering experience that 15 years of hands on engineering couldn't give me.
> Your next interview won't be testing your AI skills.
Which will be a very good signal to me that it's not a good match. If my next interview is leetcode-style, I will fail catastrophically, but then again, I no longer have any desire to be a code writer - AI does it better than me. I want to be a problem solver.
This is the equivalent of how watching someone climb mountain everest in a tv show or youtube makes you feel like you did it too. You never did, your brain got the feeling that you did and it'll never motivate you to do it yourself.
It is the contrary!
You learn using a very powerfool tool. This is a tool, like text editor and compiler.
But you focus on the logic and function more instead of syntax details and whims of the computer languages used in concert.
The analogy from construction is to be elevated from being a bricklayer to an engineer. Or using various shaped shovels with wheelbarrel versus mechanized tools like excavators and dumpers in making earthworks.
... of course for those the focus is in being the master of bricklayers, which is noble, no pun intended, saying with agreeing straight face, bricklaying is a fine skill with beautiful outputs in their area of use. For those AI is really unnecessary. An existential threat, but unnecessary.
> But you focus on the logic and function more instead of syntax details and whims of the computer languages used in concert.
This is exactly my point. I learned logical mistakes when my first if else broke. Only reason you or I can guide these into good logic is because we dealt with bad ones before all this. I use claude myself a lot because it saves me time. But we're building a culture where no one ever reads the code, instead we're building black boxes.
Again you could see it as the next step in abstraction but not when everyone's this dependent on a few companies prepared to strip the world of its skills so they can sell it back to them.
I cannot imagine how you can properly supervise an LLM agent if you can't effectively do the work yourself, maybe slightly slower. If the agent is going a significant amount faster than you could do it, you're probably not actually supervising it, and all kinds of weird crap could sneak in.
Like, I can see how it can be a bit quicker for generating some boilerplate, or iterating on some uninteresting API weirdness that's tedious to do by hand. But if you're fundamentally going so much faster with the agent than by hand, you're not properly supervising it.
So yeah, just go back to coding by hand. You should be doing tha probably ~20% of the time anyhow just to keep in practice.
The ways that agents actually make me "faster" are typically: 1. more fun to slog through tedious/annoying parts 2. fast code review iterations 3. parallel agents
Patterns that have helped in production:
1. Multi-provider fallback. For conversational systems, route to Claude by default, fall back to GPT-4 on 5xx errors. The response quality difference is usually acceptable for the 2-3% of requests that hit the fallback. This turns a hard outage into a slight quality degradation.
2. Async queuing for non-real-time workflows. If you're processing documents, generating reports, or running batch analysis — don't call the API synchronously. Queue the work, retry with exponential backoff, and let the system self-heal when the API recovers. Most of our automation pipelines run with a 15-minute SLA, not a 500ms one.
3. Graceful degradation in real-time systems. For chatbots and voice agents, have a scripted fallback path. "I'm having trouble processing that right now — let me transfer you to a human" is infinitely better than a hung connection or error message.
The broader issue: we're all building on infrastructure where "four nines" isn't even on the roadmap yet. That's fine if you architect for it — treat LLM APIs like any other unreliable external dependency, not like a database query.
> Have a verification code instead?
> Enter the code generated from the link sent to [...]
> We are experiencing delivery issues with some email providers and are working to resolve this.
> Check your junk/spam and quarantine folders and ensure that support@mail.anthropic.com is on your allowed senders list.
I'm still waiting for a code from one hour ago. Meanwhile I managed to fix my source code alone, like twelve months ago.
I’ve just mentioned to one of my friend yesterday, that you cannot do this anymore properly with new things. I’ve started a new project with some few years old Android libraries, and if I encounter a problem, then there is a high chance that there is nothing about it on the public internet anymore. And yesterday I suffered greatly because of this. I tried to fix a problem, I had a clearly suboptimal solution from myself after several hours, but I hated it, but I couldn’t find any good information about it (multi library AndroidManifest merging in case of instrumented tests). Then I hit Claude Code with a clear example where it fails. It solved it, perfectly. Then I asked in a separate session how this merging works, and why its own solution works. It answered well, then I asked for sources, and it cannot provide me anything. I tried Google and Kagi, and I couldn’t find anything. Even after I knew the solution. The information existed only hidden from the public (or rather deep in AGP’s source code), in the LLM. And I’m quite sure that I wasn’t the only one who had this problem before, yet there is no proper example to solve this on the internet at all, or even anything to suggests how the merging works. The existing information is about a completely separate procedure without instrumented tests.
So, you cannot be sure anymore, that you can solve it by yourself. Because people don’t share that much anymore. Just look at StackOverflow.
https://www.cs.ucdavis.edu/~koehl/Teaching/ECS188/PDF_files/...
We build systems that can fail in unpredictable ways, and without knowing the system we built deeply is hard to understand what's going on.
More datacenters? I thought it was just one.
(More seriously I wonder if they'd consider using Openai or Gemini for this purpose)
They decohere much faster as the context grows. Which is fine, or not, depending on whether you consider yourself a software engineer amplifying your output by automating the boilerplate, or an LLM cornac.
Models in the 700B+ category (GLM5, Kimi K2.5) are decent, but running those on your own hardware is a six-figure investment. Realistic for a company, for a private person instead pick someone you like from openrouter's list of inference providers.
If you really want local on a realistic budget, Qwen 3.5 35B is ok. But not anywhere near Claude Opus
New hardware keeps on coming with large gains in performance.
"Do this"
"User wants me to [do complete opposite]"
Seems not to be as capable as a month ago.
I haven't been using the service long enough to comment on the quality of the responses/code generation, although the outages are really quite impactful.
I feel like half of my attempted times using Claude have been met with an Error or Outage, meanwhile the usage limits seem quite intense on Claude Code. I asked Claude to make a website to search a database. It took about 6 minutes for Claude to make it, meanwhile it used 60% of my 4h quota window. I wasn't able to re-find it past asking it to make some basic font changes until I became limited. Under 30 minutes and my entire 4 hour window was used up.
Meanwhile with ChatGPT Codex, a multi-hour coding session would still have 20%+ available at the end of the 4/5 hour window.
I pay about $1500 per month on personal api use fyi.
Switched to Claude max just because I can combine both. I can say since the weekend, I only have had problems. When it works it’s great. But I am seriously thinking to just cancelling this experiment.
And yeah, any serious use completely assumes a Max sub.
Only one 9 of availability means you are seriously unreliable.