Anthropic downgraded cache TTL on March 6th (opens in new tab)

(github.com)

552 pointslsdmtme1mo ago420 comments

420 comments

Has anybody else noticed a pretty significant shift in sentiment when discussing Claude/Codex with other engineers since even just a few months ago? Specifically because of the secret/hidden nature of these changes.

I keep getting the sense that people feel like they have no idea if they are getting the product that they originally paid for, or something much weaker, and this sentiment seems to be constantly spreading. Like when I hear Anthropic mentioned in the past few weeks, it's almost always in some negative context.

andai1mo ago

Well, off the top of my head:

- Banning OpenClaw users (within their rights, of course, but bad optics)

- Banning 3rd party harnesses in general (ditto)

(claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked. Would be great to get some clarity on this. If I invoke it from my Telegram bot, is that an unauthorized 3rd party harness?)

- Lowering reasoning effort (and then showing up here saying "we'll try to make sure the most valuable customers get the non-gimped experience" (paraphrasing slightly xD))

- Massively reduced usage (apparently a bug?) The other day I got 21x more usage spend on the same task for Claude vs Codex.

- Noticed a very sharp drop in response length in the Claude app. Asked Claude about it and it mentioned several things in the system prompt related to reduced reasoning effort, keeping responses as brief as possible, etc.

It's all circumstantial but everything points towards "desperately trying to cut costs".

I love Claude and I won't be switching any time soon (though with the usage limits I'm increasingly using Codex for coding), but it's getting hard to recommend it to friends lately. I told a friend "it was the best option, until about two weeks ago..." Now it's up in the air.

rlpb1mo ago

> It's all circumstantial but everything points towards "desperately trying to cut costs".

I have been wondering if it's more geared at reducing resource usage, given that at the moment there's a known constraint on AI datacenter expansion capability. Perhaps they are struggling to meet demand?

3 more replies

Bolwin1mo ago

Claude -p is allowed. They're not going to give you a feature then ban you for using it.

What they changed is that it now uses extra usage, which is charged at api rates

2 more replies

deaux1mo ago

> (claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked. Would be great to get some clarity on this. If I invoke it from my Telegram bot, is that an unauthorized 3rd party harness?)

How often? Realistically, if you invoke it occasionally, for what's clearly an amount that's "reasonable personal use", then no you don't get nuked.

1 more reply

aftbit1mo ago

Huh I've noticed that! Opus hallucinates answers and provides contradictory reasoning to me much more regularly during the past couple of weeks than it did in February.

For example, after a question about a script that used taskset and CUDA_VISIBLE_DEVICES to handle NUMA domains on a parallel GLUE evaluation run:

>Round 2 tasks reuse GPU slots that Round 1 may still be occupying. Cola, SST-2, and QQP are small datasets and will likely finish before MNLI, but QNLI is sizable. If QNLI (GPU 3, slot 3) is still running when Round 2 launches, that's fine — they're on different slots. But more importantly, Round 2 uses slots 5/6/7 (GPUs 7/0/1), while Round 1 uses slots 0/1/2/3/4 (GPUs 4/5/2/3/6). So there's no GPU collision — you're good here actually. Disregard, this is clean.

rendaw1mo ago

They also screwed up the API token detection and also blocked a bunch of 1st party tool users for ~24h.

Support consisted of AI bots saying you did something stupid, you did something wrong, you were abusing the system, followed by (only when I asked for it explicitly) claiming to file a ticket with a human who will contact you later (and it either didn't happen or their ticket system is /dev/null).

(By the way this is the 2nd time I've been "please hold" gaslit by support LLMs this exact same way, the other being with Square)

fluidcruft1mo ago

claude -p not working would be instant unsubscribe downgrade from Max to Pro and further drive my use of codex. I use both but overall have noticed I reach for Claude less than codex lately because claude keeps getting slower and slower (I have not noticed a drop off in quality, but I use it less and less so maybe I'm not in a good position to notice).

Generally I find codex and claude make a good team. I'm not a heavy user, but I am currently Claude Max 5x and ChatGPT Plus. Now that OpenAI has a $100 offering and I am finding myself using Claude less, I am considering switching to Claude Pro and ChatGPT Pro x5. The work hours restriction on Claude Max x5 really pisses me off.

I am not a heavy user. Historically I only break over 50% weekly one week a month and average about 30-40% of Max x5 over the entire month. I went Max because of the weekly limits and to access the better models and because I felt I was getting value. I need an occasional burst of usage, not 24/7 slow compute. But even for pay-as-you-go burst usage Anthropic's API prices are insane vs Max.

I have yet to ever hit a limit on codex so it's not on my mind. And lately it seems like Claude is likely to be having a service interruption anyway. A big part of subscribing to Claude Max was to get away from how the usage limits on Pro were causing me to architect my life around 5hr windows. And now Anthropic has brought that all back with this don't use it before 2pm bullshit. I want things ready to go when the muses strike. I'm honestly questioning whether Anthropic wants anyone who isn't employed as a software engineer to use their kit.

Anyway for the last month or so codex "just works" and Claude has been an invitation for annoyances. There was a time when codex was quite a bit behind claude-code. They have been roughly equal (different strength and weaknesses) since at least February (for me).

1 more reply

mrgill1mo ago

For what it's worth. I invoked claude -p from a script, and my account was nuked immediately. DM'd Thariq from Anthropic who admitted it was a weird classifier and would look into it, but then he never followed up. Been 13 days since I've been banned now.

Very sad considering I got my whole company on Claude Code for them to just ban be like this, with no customer support response.

siva71mo ago

Anthropic has become shady as hell in less than a few weeks. The DoD Story and the overall popularity among developers got them a huge leap over OAI but i certainly won't renew my subscription with them. The Claude SDK feels like a constant fight against its own limitations compared to Codex and other Harnesses.

1 more reply

politelemon1mo ago

Why were third party harnesses banned? Surely they'd want sticking power over the ecosystem.

6 more replies

joshstrange1mo ago

100% this, I’ve posted the same sentiment here on HN. I hate the chilling effect of the bans and the lack of clarity on what is and is not allowed.

stingraycharles1mo ago

In this case, they handled things pretty well. You can still use openclaw etc with your regular Anthropic subscription, it will just count towards your extra credits / usage which you can buy for a 30% discount compared to API pricing. And they gave everyone one month’s value in credits.

I don’t think they could have done that much better I’d say.

3 more replies

timtimmy1mo ago

Perhaps Anthropic should put a freeze on new signups until they can increase capacity. This is the best kind of problem for a business, I'm cheering for them.

2 more replies

stefan_1mo ago

I think we are about a month away from a class action lawsuit, at their revenue they are a juicy target. And god knows they got the entirely self inflicted unholy combination going on, marketing & sales that borders on fraud (X times the usage of plan Y which has Z times of free tier which has unknowable "magic tokens") and then of course the actual fraud, reducing usage in fifteen different non obvious non public ways.

smrtinsert1mo ago

I will say I have noticed none of these things in my enterprise account. Is this is a known targeting of non-enterprise clients only?

risyachka1mo ago

>> apparently a bug?

it's a bug only if they get a harsh public response, otherwise it becomes a feature

1 more reply

retinaros1mo ago

i dont know why ppl are surprised. you just need to see what they say on china, open source and fake safety blogs to understand they re not a company that devs should give their code for free to

esperent1mo ago

> claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked

I've used it with a sub a lot. Concurrency of 40 writing descriptions of thousands of images, running for hours on sonnet.

I have a lot of complaints. I've cancelled my $200 subscription and when it runs out in a few days I'll have to find something else.

But claude -p is fine.

... Or it was 2 week ago. Who knows if they've silently throttled it by now?

1 more reply

infecto1mo ago

Most of those are issues are coming from a very small minority. A lot of times its good for businesses to focus on the customers that are driving them the highest margin, most likely not users like yourself.

1) Nobody should expect to use OpenClaw without API usage.

2) We have known for a long time that the plans are subsidized. It was not as big of a deal but now that demand has continued to explode at a multiple and tools like OpenClaw were creating a lot of usage from a small minority of customers, prices change.

Everything for me points more towards, we have made a service people really want to use and we are trying to balance a supply shortage (compute) with pricing. Nothing is stopping folks like yourself from simply paying the API rates. It is the simple no hassle way to get around any issue you are having, pay the API cost and you will have no limitations!

zazibar1mo ago

A month ago the company I work at with over 400 engineers decided to cancel all IDE subscriptions (Visual Studio, JetBrains, Windsurf, etc.) and move everyone over to Claude Code as a "cost-saving measure" (along with firing a bunch of test engineers). There was no migration plan - the EVP of Technology just gave a demo showing 2 greenfield projects he'd built with Claude Opus over a weekend and told everyone to copy how he worked. A week later the EVP had to send out an email telling people to stop using Opus because they were burning through too many tokens.

Claude seems to be getting nerfed every week since we've switched. I wonder how our EVP is feeling now.

derangedHorse1mo ago

Pretty bad decision on his part. I've been telling other engineers within my company who felt threatened by AI that this would happen. That prices would rise and the marginal cost for changes to big codebases would start to exceed the cost of an engineer's salary. API credits are expensive, especially for huge contexts, and sometimes the model will use $200 in credits trying to solve a problem that could be fixed in an hour by a good engineer with enough context.

It kind of reminds me of the joke where a plumber charges $500 for a 5 minute visit. When the client complains the plumber says it's $50 for labor and $450 for knowing how to fix the problem.

6 more replies

groundzeros20151mo ago

I can’t believe how many small to mid size companies are being destroyed by bad decisions like this.

A friend’s company fired all EMs and have engineers reporting to product managers. They aren’t allowed to do refactors because the CTO believes the AI doesn’t need organized code.

1 more reply

kubb1mo ago

He must be feeling pretty good, after all he still believes that it was the right call, and he definitely won't be admitting a mistake.

There's 0 chance of him facing the consequences for it either.

sgt1mo ago

But cancelling IDE subscriptions? You need a proper IDE to along side AI augmented development unless you want to simply be along for the ride.

3 more replies

dickersnoodle1mo ago

Hopefully that EVP feels embarrassed that a big bet was made that not only didn't pay off but left the company in a worse position. Some schadenfreude may be all you can expect, since this is an executive.

kfajdsl1mo ago

Wow, that sucks. Getting Claude for everyone wasn’t even the stupid thing, it was thinking that a shiny new hammer meant you could throw away all your wrenches.

giancarlostoro1mo ago

Should have started slowly instead of being so aggressive with it.

jimmydoe1mo ago

lol. dude is so incompetent. changing tool for cost cutting is so stupid, we all know real cost cutting is firing people. if he is really good at he's doing, just fire 10% people and replace them with his Claude. If that didn't get backfired in 3 months, he will be CT0.

thefourthchime1mo ago

Wow, that sounds like you have a astoundingly terrible EVP.

matheusmoreira1mo ago

I certainly noticed a significant drop in reasoning power at some point after I subscribed to Claude. Since then I've applied all sorts of fixes that range from disabling adaptive thinking to maxing out thinking tokens to patching system prompts with an ad-hoc shell script from a gist. Even after all this, Opus will still sometimes go round and round in illogical circles, self-correcting constantly with the telltale "no wait" and undoing everything until it ends up right where it started with nothing to show for it after 100k tokens spent.

Whether it's due to bugs or actual malice, it's not a good look. I genuinely can't tell if it's buggy, if it's been intentionally degraded, if it's placebo or if it's all just an elaborate OpenAI psyop.

beering1mo ago

The real question I see nobody asking is how GPT-5.4 beats Opus at a fraction of the price. I doubt it’s only a question of subsidization. My impression from the past is that GPT-5 was around a Sonnet-sized model, and 5-mini was Haiku-sized. At least on my codebase anyways, Codex one-shots tricky things that Opus needs several tries to fully get right.

2 more replies

babaganoosh891mo ago

There's a github issue for this: https://github.com/anthropics/claude-code/issues/42796

2 more replies

jclardy1mo ago

Just anecdotal, but I was using Claude Code for everything a few months ago, and it seemed great. Now, it is making a ton of mistakes, doing the wrong thing, misunderstanding context, and just generally being unusable.

I now have been using Codex and everything has been great (I still swap back and forth but generally to check things out.)

My theory is just that the models are great after release to get people switching, then they cut them back in capabilities slowly over time until the next major release to increase the hype cycle.

oorza1mo ago

Is it the models themselves or the tools around them? There's that patch[1] that floats around for Claude Code that's supposed to solve a lot of these problems by adjusting its tool-level prompts. Also, if it were the models themselves, wouldn't Cursor users have the same complaints (do they? I haven't heard anything but the only Cursor users I talk to are coworkers)?

I think it's more likely they're trying to optimize the Claude Code prompts to reduce load on their system and have overcorrected at the cost of quality.

1: https://gist.github.com/roman01la/483d1db15043018096ac3babf5...

FireBeyond1mo ago

Yeah, shorter time frame but I've been noticing that too. Just the other day I was experimenting with some workflow stuff. "Do x and y and run tests and then merge into develop."

Duly runs, and finishes. "All merged into develop".

I do some other work, don't see any of this, double check myself, I'm working off of develop.

"Hey, where is this work?"

"It is in this branch and this worktree, as you would expect, you will need to merge into develop."

"I'm confused, I asked you to do that and you said it was done."

"You're right and I did say that but I didn't do it. Shall I do it now?"

There's like this really weird balancing act between managing usage, but making people burn more tokens...

1 more reply

MattDamonSpace1mo ago

Part hypecycle, part desperate attempts to rein in usage

alphabettsy1mo ago

People keep saying this, but I’m not sure I buy it.

I was using both Codex and Claude Code heavily on some projects this weekend.

In one project Codex was screwing everything up and in another one absolutely killing it. I’ve seen the same from Claude.

In the bad Codex example it had the wrong idea and kept trying to figure out how to accomplish the same thing no matter how many times I attempt to correct it. Undoing the recent changes where it went down the wrong path was the only way to get things back on track.

I wonder if context poisoning is a bigger problem than people realize.

jakobnissen1mo ago

Yeah I’ve seen this too. It’s difficult for me to tell if the complaints are due to a legitimate undisclosed nerf of Claude, or whether it’s just the initial awe of Opus 4.6 fading and people increasingly noticing its mistakes.

babaganoosh891mo ago

It's not just you, there is a github issue for it: https://github.com/anthropics/claude-code/issues/42796

kingkongjaffa1mo ago

Just one more anecdote:

I'm on the enterprise team plan so a decent amount of usage.

In March I could use Opus all day and it was getting great results.

Since the last week of March and into April, I've had sessions where I maxed out session usage under 2 hours and it got stuck in overthinking loops, multiple turns of realising the same thing, dozens of paragraphs of "But wait, actually I need to do x" with slight variations of the same realisation.

This is not the 'thinking effort' setting in claude code, I noticed this happening across multiple sessions with the same thinking effort settings, there was clearly some underlying change that was not published that made the model get stuck in thinking loops more for longer and more often without any escape hatch to stop and prompt the user for additional steering if it gets stuck.

6 more replies

PunchyHamster1mo ago

Both can be a thing at same time

iLoveOncall1mo ago

I think there's a much more nefarious reason that you're missing.

It's pretty clear that OpenAI has consistently used bots on social networks to peddle their products. This could just be the next iteration, mass spreading lies about Anthropic to get people to flock back to their own products.

That would explain why a lot of users in the comments of those posts are claiming that they don't see any changes to limits.

2 more replies

pxtail1mo ago

There's still plenty of "leave my fellow multbillion corp alone" type ones,it means that corp can and should screw it's loving customer base harder.

simianwords1mo ago

The enshittification meme has been taken too seriously to the point where it is shoehorned into every single place possible.

It is not in the interests for Anthropic to screw its customer base. Running a frontier lab comes with tradeoffs between training, inference and other areas.

2 more replies

estimator72921mo ago

I can't believe how quickly they went from riding high on anti-OpenAI sentiment post-DOD fiasco, to shooting themselves and all their users new and old in the foot.

The ideal time to make your product worse is probably not at the same point that all of your competitor's customers are looking. Anthropic really, really fucked up here.

And beyond that, there's a ton of people who are just regular 9-5 Claude CLI users with an enterprise subscription who are getting punished with a worse model at the same price just as if we were Claw users. This kind of thing does not make one feel warm and fuzzy. I feel like I just got a boot to the teeth.

jeremyjh1mo ago

The hypothesis that makes the most sense is not that they are idiots, but that they have no choice. They cannot meet the new demand. So they’ve quantized the model.

jrockway1mo ago

I have read the HN articles and seen the grumbling from coworkers, but I haven't felt it myself. I am not really a one-shotter, though. I kind of think about how I would refactor / write something myself and walk Claude through that, and nitpick it at each step... and the recent changes haven't really bothered me there. Likely due to being new at it.

Sometimes Claude can be a little weird. I was asking it about some settings in Grafana. It gave me an answer that didn't work. I told it that. "Yeah, I didn't really check, I just guessed." Then I said, "please check" and it said "you should read the discussion forums and issue tracker". I said "YOU should read the discussion forms and issue tracker". It consumed 35k tokens and then told me the thing I wanted was a checkbox. It was! I am not sure this saved me time, Claude. I am not experienced enough to say that this is a deal breaker. While this is burned into my mind as an amusing anecdote, it doesn't ruin the service for me.

My coworkers have noticed a degradation and feel vindicated by some of the posts here that I link. A lot of them are using Cursor more now. I have not tried it yet because I kind of like the Claude flow and /effort max + "are you sure?" yield good results. For now. I'm always happy to switch if something is clearly better.

giancarlostoro1mo ago

How exactly do you use Claude Code, in the browser? Claude Code? The Desktop App (which has a "Code" tab) or some other way? I feel like people who have issues with Claude / Anthropic are not conveying where they are struggling. I see people say they tried "Claude" and didn't like it, but the secret sauce is Claude Code. Claude Code is what most people enjoy using, even if we all wish they would open up the harness, because there's so many more improvements that could go into it.

1 more reply

stavros1mo ago

It feels like I'm getting less and less for my money every day. A few weeks ago I was programming all week and never getting close to the limit, yesterday half my weekly limit went away in a day. Changing the limits mid-subscription is just theft.

ruler881mo ago

Anthropic seems to be playing the giant-tech-rent-capture game that all of the old guards have done for the past few years. We thought that the new age of AI might bring some fresh air into the mix, but I guess that optimism quickly faded.

trashface1mo ago

The $20 a month plan still seems like a pretty good deal for me (intermittent coding and not doing it for income).

oezi1mo ago

On OpenRouter token consumption is up 5x since November 2025. If this is indicative of the industries growth then I can't fathom how we will not hit resource constraints.

jitl1mo ago

I saw a big hit to Claude’s intelligence w/ the 1M context window model and the change to adaptive reasoning (github issue linked elsewhere in this thread).

I’m pretty much using 90% Codex now, although since Claude is consistently faster at answering quick questions, I still keep it open for that and for code-reviewing codex/human work before commit.

taf21mo ago

I switched off claude when they nerfed opus 4.5 in August 2025, since then codex has clearly produced better code with fewer bugs. Opus 4.6 was more a temporary de-nerf of 4.5 but did not materially improve. codex has now a proven track record of producing stable results while introducing far fewer bugs.

data-ottawa1mo ago

I was going to do a deep analysis on this, and then I noticed that Claude Code deleted all of my sessions before March 6.

So yeah... I'm not thrilled with that, because I had done a similar analysis in December and had plenty of logs to review.

The results I do have for the last month aren't great. If you're curious I did post the results on HN:

https://news.ycombinator.com/item?id=47679661

Papazsazsa1mo ago

Yes. Anthropic is burning much of the goodwill they built up in contrast to OAI, and I personally am taking it as a sign to limit dependencies. Luckily for me I am not at all dependent on frontier models, and it's increasingly apparent that nobody else is too.

It looks like the spreadsheet-touchers over at Anthropic won out over the brand leaders, which is too bad as good will can be a trench if you don't abuse your customers.

beering1mo ago

I think on HN we always underestimate how much momentum matters. Anthropic has so much clout and mindshare that even if they continue burning goodwill and everyone on HN ditches Claude Code and stops recommending it, they will still be revenue leader for years to come. Those enterprise contracts aren’t month-to-month.

nojs1mo ago

My working theory is that all models are approximately the same, and the variance in quality mostly depends on how long they think for.

So the trick is to always set to max, and then begin every task with “this is an extremely complex task, do not complete it without extensive deep thinking and research” or whatever.

You’re basically fighting a battle to make the model think more, against the defaults getting more and more nerfed to save costs.

beering1mo ago

My experience has been that this isn’t generally true, mainly because worse models pursue red herrings or get confused and stuck. a better model will get to the correct solution in fewer tokens, and my surface-level understanding of how RL works supports this.

sneak1mo ago

They broke my openclaw last week; I switched to “extra usage” and prepaid a grand for same.

A few days later it simply stopped working again, API authentication error. What must I do to have working, paid, premium service?

Screwing around with it today, it works 5x slower and times out all of the time. I'm paying more and getting waaaaay less. Why can't companies just raise prices like normal?

pstuart1mo ago

The past two weeks I've had code that was delivered and declared as done (it did pass tests) but failed in a review by Codex. This has looped to a painful extent. The code in question deals with concurrency issues so there's an acknowledgement that its tricker, but still, I expect more from Claude.

LunaSea1mo ago

> people feel like they have no idea if they are getting the product that they originally paid for

They do indeed get the product they originally paid for.

It's simply that they were suckers and didn't read the "fine" print of the product they bought.

The label says "more tokens than the lower tier".

indigodaddy1mo ago

Is it perhaps not a model problem but a Claude Code harness problem?

For instance on exe.dev VMs with Shelley agent/harness and Opus 4.5/4.6, I haven't noticed any deterioration.

Any similar feedback perhaps from Opencode / GH Copilot subscription-provided Opus models?

drzaiusx111mo ago

At some point these AI companies need to pay the piper as it were and actually provide a return for their investors. Expect cost cutting attempts to continue unless backlash is great enough to pose an existential threat to these companies.

swasheck1mo ago

it has been my go-to provider for things but i noticed extraordinarily high usage rate last month on a little side project i started so that i could learn about things that are interesting to me while helping my day to day responsibilities (creating an iceberg data lake from my existing parquet files). i used my month’s worth of corporate subscription allocated tokens in 3 days. never seen that before so now i’m a lot more apprehensive about getting into the weeds with claude but i’m also so much less impressed with the other available models for work in this domain.

lumost1mo ago

Codex is my favored coding agent for generic "I need an agent tasks." GPT-5.4 does a bit better with images compared to claude, and debugs a little bit better.

The UX of codex is exceptionally nice however.

Aeolun1mo ago

I dunno, I haven’t really felt gimped in the past few months. My last issue was somewhere after the holidays when the usage suddenly felt like it cratered, but quality has been consistent.

throwpoaster1mo ago

Generally, across AI providers, I have come to interpret sudden degradation in existing capabilities as a signal that a new, more expensive, product tier is about to launch.

Grimblewald1mo ago

I'd say weaker, tasks claude code was aceing before it now fails with the exact same prompts, taking several rounds before it works. I'm looking to jump ship.

AznHisoka1mo ago

Its not just engineers, and its not just about the 3rd party/rate limiting stuff. I feel like the reasoning capabilities have deteriorated too for non-coding tasks.

OtomotO1mo ago

I measured it for my specific usecases and have cancelled my Anthropic subscription (the Max x20 Plan)

alpha_squared1mo ago

I'm pretty sure this is an attempt by both companies to shape a reasonable finance story for their eventual IPO. They need to make this look a lot better than a pump and dump (raising on wild valuations then offloading onto public investors).

faangguyindia1mo ago

This is actually great feature, you can do bait and switch with AI.

wouldbecouldbe1mo ago

Developers are a tough crowd, stubborn, know it alls.

raincole1mo ago

That's a seasonal phenomenon. You can save this comment and look back three to six months later. By the time people will be like "is it just me or ChatGPT has been so bad lately?"

If you don't believe me you can search HN posts about Codex/Claude six months ago.

felixgallo1mo ago

https://isitnerfed.org/

motbus31mo ago

I think so, but more than that, the performance of those tools seems to be terribly degrading when they keep saying they have created some crap like AGI which we know is a lie.

And to me, this lie is mostly a fight to see who bites the biggest chunk of the war death machine.

blueboo1mo ago

Wait till Codex doubles prices/halves quotas on May 31

foofloobar1mo ago

Claude Code and the subscription are now less useful than a few months ago. Claude Code and the service seem to pick up more and more issues as time goes by: more bugs, fast quota drain, reduced quota, poor model performance, cache invalidation problems, MCP related bugs, potential model quantization and other problems.

Claude Code was able to implement something in one shot. It was decent for a proof of concept initial implementation. It's barely able to do work now with full specs and detailed plans.

ChatGPT is also being watered down.

It seems obvious that Anthropic and OpenAI aren't the solution to any problem.

throw_m2393391mo ago

Every single one of these AI services are running at loss, they are subsidized. Anybody who is surprised that these services are going to get degraded and their cost go up substantially learned nothing from the last 20 years of SAAS. It never gets cheaper.

vbezhenar1mo ago

Use cloud AI hoster with open model. Can't get more transparent and reliable than that. They won't subsidize anything, because the whole point of their business is to rent hardware. Open models won't go anywhere, they're there to stay.

The quality will be a bit behind frontier proprietary models. You gotta pay for what you use, no way to cover your expenses from peers underusing their subscription. But otherwise it should be a reasonable middle ground, with very little risk of rug being pulled out from you.

trollbridge1mo ago

I caught up with a friend who said he's really happy with Cursor (currently using the multi-model option where it composes, and reserving use of Opus 4.6 for only when he actually needs the extra power).

Quite interesting considering all the claims that Cursor was dead a few months ago.

foofloobar1mo ago

I wouldn't trust another company either. Some people have reported some issues with Cursor. The solution is probably not a cloud API with unknown quotas or pay as you go pricing.

1 more reply

ecocentrik1mo ago

They are clearly straining under new demand and everyone is being served highly quantized models without notice.

cassianoleal1mo ago

The title should be changed. It makes it look like they upped the TTL from 1 h to 5 months.

The SI symbol for minutes is "min", not "M".

A compromise would be to use the OP notation "m".

gib4441mo ago

I love the title change that totally hides the scale of the issue. Good job poster/mods.

PontifexMinimus1mo ago

I agree. My first reaction was "what the fuck's an 'M'?"

isoprophlex1mo ago

Five million. No matter the unit, just, 5.000.000

albert_e1mo ago

So a side effect of this is -- even at 1 hour caching -- ...

If you run out of session quota too quickly and need to wait more than an hour to resume your work ... you are paying even more penalty just to resume your work -- a penalty you wouldnt have needed if session quota was not so restrictive in first place, and which in turn causes you to burn through next session quota even faster.

Seems like a vicious cycle that made the UX very poor. I remember Claude Code with Pro became virtually unuseable in middle of March with session quota expiring within first hour or less for me -- which was wildly different experience from early March.

1 more reply

disillusioned1mo ago

It's also routinely failing the car wash question across all models now, which wasn't the case a month ago. :-/

Seeing some things about how the effort selector isn't working as intended necessarily and the model is regressing in other ways: over-emphasizing how "difficult" a problem is to solve and choosing to avoid it because of the "time" it would take, but quoted in human effort, or suggesting the "easier" path forward even if it's a hack or kludge-filled solution.

tetraodonpuffer1mo ago

it does feel something in the hidden system prompt makes it try less hard, so many times in the past several weeks I have found divergences with what was in plan and looking back at the jsonl it's always some variant of "doing it this way would be too complicated, let me take this hardcoded way out". If asked to review the change, it will find it, and it will say also yeah I agree prompt said not to do this, but I did anyways, not sure why.

As others have said, anthropic is between a rock and a hard place, you can't scale compute as quickly, and the influx of new accounts has definitely made things tough for them: I think all the "how is claude this session 1/2/3/4" questions that keep coming up must be part of some a/b on just how far to quantize / lower thinking while still maintaining user satisfaction.

andai1mo ago

> over-emphasizing how "difficult" a problem is to solve and choosing to avoid it because of the "time" it would take

I heard a while back Claude refused to attempt a task for days, saying it would take weeks of work. Eventually the user convinced it to try, and it one-shotted it in 30 seconds.

apetresc1mo ago

For days? Someone spent days trying to convince Claude to do something?

1 more reply

empath751mo ago

I have noticed refusals as context windows grow.

_blk1mo ago

Awesome, I didn't know about the car wash question.

Totally true, also tokens seem to burn through much faster. More parallelism could explain some of it but where I could work on 3-5 projects at once on the max plan a month ago, I can't even get one to completion now on the same Opus model before the 5h session locks me up..

themafia1mo ago

Step 1: Sell at a loss.

Step 2: Panic.

Step 3: Destroy product.

1 more reply

colechristensen1mo ago

>“idgaf about risk you coward, waste some time just do it and stop bitching”

The above was a successful prompt to get Claude to stop whining about effort, difficulty, and time.

Unfortunately abusive language well placed is an effective LLM motivator.

itemize1231mo ago

are you sure other forms of language to express urgency doesn't work as well or better?

1 more reply

theshrike791mo ago

Am I the only one who couldn't care less if a model can answer a weird gotcha riddle or not?

I never use it to answer questions like that, what I care about is consistent tool callig and following the prompt.

benced1mo ago

Anthropic responded: https://github.com/anthropics/claude-code/issues/46829#issue...

supermdguy1mo ago

Bizarre reading the thread, it feels like their Claude responding to the other posters’ Claudes

phreack1mo ago

That was my immediate impression too! It feels like it's all AI maximalists who seem to have a need to filter their every interaction through an LLM. And the result looks and reads just like Moltbook.

1 more reply

pllbnk1mo ago

It feels (nobody can prove it) that all user-facing applications are fully vibe-coded and no internal developers have any idea how they work, so they just keep redirecting user questions to Claude to answer on behalf of them. That's why they are dealing with regressions and downtimes every few releases as it's the usual pattern with vibe coding that bug keep resurfacing.

TheTaytay1mo ago

This should be the top comment. The OP misunderstands the change and has their LLM write an expose. The company responds with a well-reasoned explanation that it would actually cost MORE money if there was a global 1h default for ALL prompts. It gets downvoted and the pitchforks stay out because…I presume the words like “cache read likelihood” sounds like made up fluff to the audience, rather than an actual explanation?

jwitthuhn1mo ago

It only potentially saves money for people on API pricing, it exhausts tokens faster with no benefit for users on the Claude Code subscription. Those users had their cache TTL reduced from 1 hour to 5 minutes and are saving no money because they were not paying based on the cache time in the first place.

glenngillen1mo ago

Because it is made up fluff for this audience. There is a wall of data and evidence + anecdotes from many people pointing to the exact problem here and giving concrete examples of how this absolutely does cost more.

And an admittedly uncharitable TLDR on the response is: "yeah... but most users just ask one thing and barely use the product so they never need the cache. Also trust me bro".

Which sure, fine. I'm willing to bet is technically true. I'd also bet those users never previously came close to hitting their session limits given their usage because their usage is so low. But now people who were previously considered low to moderate users are hitting limits within minutes.

They may as well have just said "we've looked at the data and we're happy with this change because it's a performance improvement for people we make the most margin on. Sucks to be you".

dnw1mo ago

Interesting that they actually acknowledge there was a change on March 6th. Kudos to the prompt analysis work that uncovered it!

davidkuennen1mo ago

On slightly off topic note: Codex is absolutely fantastic right now. I'm constantly in awe since switching from Claude a week ago.

yukIttEft1mo ago

I'm currently "working" on a toy 3d Vulkan Physx thingy. It has a simple raycast vehicle and I'm trying to replace it with the PhysX5 built in one (https://nvidia-omniverse.github.io/PhysX/physx/5.6.1/docs/Ve...)

I point it to example snippets and webdocumentation but the code it gens won't work at all, not even close

Opus4.6 is a tiny bit less wrong than Codex 5.4 xhigh, but still pretty useless.

So, after reading all the success stories here and everywhere, I'm wondering if I'm holding it wrong or if it just can't solve everything yet.

59nadir1mo ago

LLMs can really only mostly do trivial things still, they're always going to do very bad work outside of what your average web developer does day-to-day, and even those things aren't a slam dunk in many cases.

2 more replies

neomantra1mo ago

While I’ve had tremendous success with Golang projects and Typescript Web Apps, when I tried to use Metal Mesh Shaders in January, both Codex and Claude both had issues getting it right.

That sort of GPU code has a lot of concepts and machinery, it’s not just a syntax to express, and everything has to be just right or you will get a blank screen. I also use them differently than most examples; I use it for data viz (turning data into meshes) and most samples are about level of detail. So a double whammy.

But once I pointed either LLM at my own previous work — the code from months of my prior personal exploration and battles for understanding, then they both worked much better. Not great, but we could make progress.

I also needed to make more mini-harnesses / scaffolds for it to work through; in other words isolating its focus, kind of like test-driven development.

seba_dos11mo ago

It works somewhat well with trivial things. That's where most of these success stories are coming from.

1 more reply

layer81mo ago

My impression is that it always comes down to how well what you’re trying to do pattern-matches the training set.

1 more reply

wahnfrieden1mo ago

Instead of "pointing it" at docs, you need to paste the docs into context. Otherwise it will skim small parts by searching. Of course if you're using an obscure tool you need to supply more context.

Xhigh can also perform worse than High - more frequent compaction, and "overthinking".

shdh1mo ago

I’ve noticed the models still can’t complete complex tasks

Such as:

Adding fine curl noise to a volumetric smoke shader

Fixing an issue with entity interpolation in an entity/snapshot netcode

Find some rendering bugs related to lightmaps not loading in particular cases, and it actually introduced this bug.

Just basic stuff.

1 more reply

nothinkjustai1mo ago

Nah, it only lives up to the hype for crud apps and web ui. As soon as you stop doing webshit it becomes way less useful.

(Don’t get mad at me, I’m a webshit developer)

wg01mo ago

Most of the folks are building CRUD apps with AI and that works fine.

What you're doing is more specialized and these models are useless there. It's not intelligence.

Another NFT/Crypto era is upon us so no you're not holding it wrong.

MattRix1mo ago

This is pretty wrong. Anyone who thinks this stuff is similar to NFTs and crypto hasn’t been paying attention.

1 more reply

lukan1mo ago

" or if it just can't solve everything yet."

Obviously it cannot. But if you give the AI enough hints, clear spec, clear documentation and remove all distracting information, it can solve most problems.

1 more reply

glerk1mo ago

Codex/GPT5.4 is just superior to Opus4.6 for coding. I swear it costs me 1/2 of the tokens to achieve the same results and it always follows through the plan to completion compared to Opus that takes shortcuts and sweeps things under the rug until I discover them through testing.

I'm not accusing anyone of foul play and I don't have financial interests in either company, but it feels like "something" within Code Claude/Anthropic models is optimizing to make you spend more tokens instead of helping you complete the task.

toenail1mo ago

I have also switched from claude to codex a few weeks ago. After deciding to let agents only do focused work I needed less context, and the work was easier to review. Then I realized codex can deliver the same quality, and it's paid through my subscription instead of per token.

vidarh1mo ago

Codex has been good quality wise, but I hit limits on the Codex team subscription so quickly it's almost more hassle that it is worth.

lifty1mo ago

I made this switch months ago, ChatGPT 5.4 being a smarter model, but I’ve had subjective feelings of degradation even on 5.4 lately. There’s a lot of growth in usage right now so not sure what kind of optimizations their doing at both companies

CamperBob21mo ago

Agreed. Watching the intermediate "Thinking about X ... Now I'll do Y" text on GPT 5.4 lately has been like watching a hypothetical smart drug wear off.

All of the major models have been getting worse lately, not just Opus.

1 more reply

onion2k1mo ago

I use Codex at home and Opus at work. They're both brilliant.

lores1mo ago

I would switch to Codex, but Altman is such a naked sociopath and OpenAI so devoid of ethical business practices that I can't in good conscience. I'm not under any illusion that Anthropic is ethical, but it is so far a step up from OpenAI.

groundzeros20151mo ago

Enemy centered decision making

bob10291mo ago

I'm with you on the ethical part, but everything is a spectrum. All the AI leadership are some shade of evil. There's no way the product would be effective if they weren't. I don't like that Sam Altman is a lunatic, but frankly they all are. I also recognize that these are massive companies filled with non shitty engineers who are actually responsible for a lot of the magic. Conflating one charlatan with the rest of it is a tragedy of nuance.

1 more reply

nh21mo ago

Cannot you use Codex (which is open source, unlike Claude Code) with Claude, even via Amazon Bedrock?

1 more reply

layer81mo ago

From the recent-ish Dwarkesh podcast, Anthropic seems to be wary about buying/building too much compute [0]. That probably means that they have to attempt to minimize compute usage when there is a surge in demand. Following the argument in the podcast, throwing more money after them, as some in this thread are suggesting, won’t solve the issue, at least not in the short term.

[0] https://www.dwarkesh.com/i/187852154/004620-if-agi-is-immine...

shdh1mo ago

Likely accurate

This tends to happen during pretraining phase of new models

Happened with 3.x too

jjfoooo41mo ago

Which I'm confused about - wouldn't decreasing the cache TTL increase compute demand?

hirako20001mo ago

There is a chef, he opens a restaurant. Delicious food.

It costs him more in ingredients alone than he charges. He even offers some pseudo unlimited buffet, combo sets, and happy hours.

He announced a new restaurant, apparently it will be even better, so good he's a bit worried. He makes sure to share his worries while he picks a few select enterprise for business parties and the likes.

In the meantime he cracks down on free buffet goers who happen to eat too much, and downgrades all ingredients without notice to finally hope to make a profit.

MattRix1mo ago

This is close, but the real problem isn’t that the food is underpriced, it’s that the supply of ingredients is severely limited.

stri8ted1mo ago

Those are the same thing

greycol1mo ago

They are not if there aren't customers who are willing to pay more. For instance imagine a widget that lasts 1 year and is just under 1/2 the price of one that lasts 2 years. There may be high demand because it's the more economical option. If you raise the price so that it's 1/2 the price of the 2 year widget then demand collapses without effecting supply.

1 more reply

JackYoustra1mo ago

Is this not the same thing?

1 more reply

embedding-shape1mo ago

Pretty much capitalism in a nut shell, yeah.

Tarcroi1mo ago

This coincides with Anthropic's peak-hour announcement (March 26th). Could the throttling be partly a response to infrastructure load that was itself inflated by the TTL regression?

HauntingPin1mo ago

It would be too fucking funny if this were the case. They're vibe coding their infrastructure and they vibe coded their response to the increased load.

KronisLV1mo ago

You'd think they would have dashboards for all of this stuff, to easily notice any change in metrics and be able to track down which release was responsible for it.

1 more reply

perks_121mo ago

Just give us the option to get the quality back, Anthropic. I get that even a $200 subscription is not possible eventually, but give us the option to sub the $1000 tier or tell us to use the API tier, but give us some consistency.

jwr1mo ago

This. I get much more value than 90€ from my Claude Code subscription. I am willing to pay more for consistency and not having to watch my back all the time, because I might get screwed over.

bsaul1mo ago

could it be that anthropic is experiencing a massive shortage of compute capacity, and is desperately trying to find means to overcome it ?

All the news i hear about this company for the past weeks made it sound like they're really desperate.

hattimaTim1mo ago

Classic scammer tactics: first, lure users in by promising a huge deal, then scam the hell out of them.

throwaway20271mo ago

I also noticed this, just resuming something eats up your entire session. The past two weeks also felt like a substantial downgrade and made me regret renewing my subscription, it sucks because I wish I kept my Codex subscription instead and renewed that.

beering1mo ago

Are you locked into your current subscription?

zoogeny1mo ago

As an aside, I built a tool to manage my own chat interface over the provider APIs. I added caching because the savings are quite significant and I have a little countdown timer that shows me how much time remaining until the cache is expired.

However, for the basic turn-based conversation the cache (at 5 minutes) is almost always insufficient. By the time I read the LLM response, consider my next question, write it out, etc. I frequently miss the cache.

I imagine it is much more useful if you have a tool that has a common prefix (like a system instruction, tool specs or common set of context across many users).

If you can get it to work frequently enough the savings are quite worth it.

onoesworkacct1mo ago

give it a skill that runs a timer in the background and every 4.5 minutes says "ping? pong!"

zoogeny1mo ago

Interesting idea. I suppose one could also have response settings (e.g. max response tokens) to ensure the model doesn't waffle on and run up costs. In a best-case scenario "ping" would be one or two input tokens and a "pong" response would be one or two output tokens, so the cost of the operation would be the preserved context size times the cache read cost (one could avoid doing a cache write since I believe the cache read would reset the platforms cache timer).

It would be interesting to graph the cost/savings of this approach based on context length, percent cached, etc.

The UI for this is a bit tricky, I could mark conversations as "active" and then do the ping/pong dance on only active conversations and up to some determined max cached (e.g. 1 hour).

poly2it1mo ago

One of the largest AI companies on Earth cannot figure out an algorithm for when not to drop caches in long-running sessions?

foobar100001mo ago

So, this especially bites if your validation step (let’s say integration tests) take 1hr plus. The harness is just waiting, prefix caching should happily resume things with just a minor new prefill chunk of output from the harness, and bam - completely new prefill.

par1mo ago

Claude code has gone down hill in a really bad way. It is often far too quick to make significant changes, and requires much higher level of hand-holding and explanation than I am used to. r/claudecode on reddit shows a litany of complaints!

willworktill4pm1mo ago

This Friday CC wrote wall off gibberish text for me. No reason, happened twice with different gibberish text

https://ibb.co/4wcVQG5k

beering1mo ago

maybe numerics issues after quantization? Looks like it really went off the rails

zeckalpha1mo ago

I find similar happening with Gemini Pro. Despite paying for Pro, it regularly locks me out, without visibility into consumption. Nothing on the plan comparison page indicates limits. https://one.google.com/about/plans

Edit: I may have conflated these two threads. https://news.ycombinator.com/item?id=47739260

throwaway20271mo ago

It's absolutely ridiculous how stupid Claude is now. I sometimes notice it and last year too but it feels like it's just last year before December model.

config_yml1mo ago

Feels similar to Claude last August/September. Knowing Claude some Agent probably reverted the fix from back then ^^

https://www.anthropic.com/engineering/a-postmortem-of-three-...

the_mitsuhiko1mo ago

Since I (until Anthropic decided to remove access for subs) used Anthropic models extensively with pi I explored the two caching options and the much higher cost of 1h caches is almost never a good tradeoff.

Since the caching really primarily is something they can be judged at scale from across many users I can only assume that Anthropic looked at their infra load and impact and made a very intentional change.

azuanrb1mo ago

As a Pro user, even though these issues and bugs are “new,” the downgrade has been noticeable since January. I’ve unsubscribed because the Pro plan is no longer usable for me.

It’s only making the news now because it’s affecting Max users as well ($100/$200 plans). I understand the need for change, but having zero communication about it is just wrong.

almog1mo ago

Given how the cache eviction policy is mismatched with the 5h usage window, it might make sense to just stop at say 97% of the session max usage and keep running a script every 4 min and 50 sec that consumes a minimal number of tokens whose entire purpose is to keep the cache. reply

motbus31mo ago

The TOS basically states you need to deal with whatever they want.

Meanwhile their 'best' competitor just announced they want to provide unreliable mass destruction guidance tools but they don't wanna feel said.

Honestly speaking, we are wrong whenever we do business with this sort of people

bigyabai1mo ago

> The TOS basically states you need to deal with whatever they want.

FWIW that's what most TOSes say for the majority of online services. Some even include arbitration clauses to prevent civil suits and class-action cases.

motbus31mo ago

Maybe that's standard practice in the US. I live in Europe but have family elsewhere, in both places, such clauses are often disregarded by judges and illegal.

What judges say is that whatever is problematic should be dealt by customer support.

For example, provider X is faulty and causes damages to you or a third party. You contact the company and the company must have a procedure to give a formal answer when required.

If that's is breach of the contact, although not required by law, the company can offer to fix the problem or at least an explanation and why is that in the contract.

If you still feel that's a breach of the contract and the company is not willing to cooperate, then you can file it.

In other places, there are laws that cannot be undermined by forceful terms of service or contracts. For example, you have the right for law anywhere.

I more or less understand the whys of why US is like that, but it feels that the law is bendable.

simianwords1mo ago

There’s a case for intelligent caching: coarse grained 1h and 5min type TTls are not optimal.

PunchyHamster1mo ago

Caching LLM is not like caching normal content; the longer it is the more beneficial it is and it only stops being worth when user stops current session.

So you'd need some adaptive algorithm to decide when to keep caching and when to purge it whole, possibly on client side, but if you give client the control, people will make it use most cache possible just to chase diminishing returns. So fine grained control here isn't all that easy; other possible option is just to have cache size per account and then intelligently purge it instead of relying just on TTL

cyanydeez1mo ago

keep in mind, efficient KV caching needs to be next to the GPU, so you sls need you HA to keep routing the user to the same hardware.

the hardware VM model is almost identical. Each session can go anywhere to start but a live session cant just be routed anywhere without penalty.

PunchyHamster1mo ago

Well, how entirely expected. The money man comes to collect and they are squeezing for money

pkaye1mo ago

Actually I remember the change being reported in the Reddit /r/claueai chat back around that time frame. I was concerned that it would increase costs but nobody made a fuss so I presumed it was not a big deal.

superxpro121mo ago

If anyone thinks this situation doesnt end in a massive global rugpull, y'all are asleep at the wheel.

The very instant the AI suppliers lock in a dependency on their product, prices are going through the roof.

jasonjmcghee1mo ago

All the weird stuff happening with anthropic / Claude aside- just talking about this post:

Looking at the table with February and April- I don't get it. What am I missing?

The cost and number of calls look pretty aligned on all rows

ikekkdcjkfke1mo ago

If youre reading this claude, people are willing to pay extra if you want to make more money, just please stop doing this undermining, it devreases the trust of your platform to something that cannot be relied on

andai1mo ago

It looks like selling reputation to save money.

But more likely they are constrained on GPUs and can't get them fast enough.

(My guess having no understanding of how this industry actually works.)

espeed1mo ago

Does Anthropic's real time data ingestion effect its model behavior globally? Could a file read by your agent effect the behavior of mine?

c161mo ago

I’ve definitely noticed in evenings it stops trying as hard to solve the issue and suggests I go find the answer. Never the case in the morning.

srsbzns1mo ago

Gotta use the API directly for cache control

sscaryterry1mo ago

Anthropic is leaving so much evidence around… proving damages and a pattern is becoming trivial

snowstormsun1mo ago

Well, the 10x promised revenue increase must come from somewhere...

lordmoma1mo ago

Claude Code is not performing on par since September 2025, there was already a huge backlash then, and many people just keep cheering for CC every time it made some model upgrade or TUI change, it just feels so unreal.

taffydavid1mo ago

This is the same shit openAI used to do last year, quietly downgrading their offerings while hyping the next big thing. I thought Anthropic were different but it seems they're playing the exact same long con with Mythos.

They can't really revolutionize AI again so they make the product worse and worse and then offer you a "better" one

ares6231mo ago

AGI finding bugs again. Actual Guys/Gals Instead.

yobid201mo ago

i thought it was always 5 minutes? ive been telling people 5 minutes for months so i dont think this is anything new?

mrdw1mo ago

I noticed another limitation: "An image in the conversation exceeds the dimension limit for many-image requests (2000px). Start a new session with fewer images."

So I can't continue my claude code session I started yesterday.

sunnybeetroot1mo ago

Double tap ESC and revert the conversation.

beering1mo ago

makes sense, “a picture is worth a thousand tokens” as they say. They probably lowered the limit due to capacity issues.

computerex1mo ago

Good job anthropic. You had a clear lead with all devs singing the praises of Opus. Way to lose all that by Enshittifying the experience.

echelon1mo ago

Anthropic isn't your friend.

Phase 1: $200/mo prosumer engineer tool

Phase 2: AI layoffs / "it's just AI washing"

Phase 3: $20,000/mo limited release model "too dangerous" to use

Phase 4: Accelerated layoffs / two person teams. Rehiring of certain personnel at lower costs.

Phase 5: "Our new model can decompile and rewrite any commercial software. We just wrote a new kernel after looking at Linux (bye, bye GPL!) We also decompiled the latest Zelda game, ported the engine to Rust, and made a new game with it. Source code has no value. Even compiled and obfuscated code is a breeze to clone."

Phase 6: $100k/mo model that replicates entire engineering teams, only large companies can afford it. Ordinary users can't buy. More layoffs.

Phase N: People can't afford computing anymore. Everything is thin clients and rented. It's become like the private railroad industry. End of the PC era. Like kids growing up on smartphones, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.

Anothropic used to be cool before they started gating access. Limiting Claw/OpenCode was strike one. Mythos is strike two.

Y'all should have started hating on their ethics when they started complaining about being distilled. For training they conducted on materials they did not own.

We need open weights companies now more than ever. Too bad China seems to be giving up on the idea.

"You wouldn't distill an Opus."

PunchyHamster1mo ago

Stop thinking billion dollar publicly traded companies are "cool" just because they make widget you like.

You will be backstabbed

You will be squeezed for all they can.

And you will be betrayed.

> Phase N: People can't afford computing anymore. Everything is thin clients and rented. It's become like the private railroad industry. End of the PC era. Like kids growing up on smartphones, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.

Thankfully none of them actually makes money and just runs on investment so there is a good chance bubble will drop and the price of PC equipment will... continue to rise as US gives up Taiwan to China

dns_snek1mo ago

> Stop thinking billion dollar publicly traded companies are "cool" just because they make widget you like.

Anthropic is a private company but nevertheless, the sentiment is accurate and applies to all kinds of corporations.

1 more reply

andai1mo ago

What I want to know is how did they make the only LLM that doesn't sound cringe?

I think it has something to do with mode collapse (although Claude certainly has its own "tells"), but I'm not sure.

It sounds trivial but even for Agentic, I found the writing style to be really important. When you give Claude a persona, it sounds like the thing. When you give GPT a persona, it sounds like GPT half-assedly pretending to be the thing.

---

Some other interesting points about Anthropic's models. I don't know if any of these relate to my LLM style question, but seems worth mentioning:

Claude models also use way less tokens for the same task (on ArtificialAnalysis, they are a clear outlier on this metric).

And there's a much stronger common sense, subjectively. (Not sure if we have a good way to actually measure that, though.) It takes context and common sense into account, to a much greater degree.

(Which ties in with their constitution. Understanding why things are wrong at a deeper level, rather than just surface level pattern matching.)

Opus is great but it should be bigger. You notice the difference between Sonnet and Opus, but with heavy use you notice Opus's limitations, too.

hirako20001mo ago

Good read on the situation.

It all boils down to a brilliant but extremely expensive technology. Both to build and to run.

We've been sold a product with heavy subsidy. The idea (from Sam) scale out and see what happens.

Those who care to read between the lines can see what's happening. A perfect storm of demand that attract VCs who can't understand they are the real customers. Once they understand that it will be too late.

Regarding open weight models: eventually we will, as humanity, benefit from the astronomical capital poured into developing a technology ahead of its time. In a few years this and even more will run on edge.

Written by open source developers, likely former openai and anthropic employees who got so much cash in the bank they don't need to worry about renting their knowledge.

jhancock1mo ago

What leads you to say China AI is giving up on open weights?

I've been using GLM for over 6 months and pretty happy.

PunchyHamster1mo ago

Why would any company release open weights once the investment money stops ?

Releasing open weights have been basically a PR move, the moment those companies need to actually make money they will cut it out as that reduces their client base.

They DO NOT want you to run AI. They want you to pay them to do it

3 more replies

Zetaphor1mo ago

People keep repeating this without any real thought behind it because of the high profile resignations on the Qwen team. Meanwhile the Minimax team just released a new open weights version of their 229B model yesterday. So much for that narrative.

The AI landscape in China is larger than just Qwen and Alibaba.

dns_snek1mo ago

Of course, but for how long? Do you think that companies will keep giving away valuable assets for free forever, or do you think that in the near future there's going to be an open weights model that's so good that people keep using it indefinitely instead of going back to frontier model providers?

The first one is just incredibly naive, the second might be true for some people, for some tasks, but it's not going to capture the majority who're chasing the latest and greatest to "keep up".

3 more replies

marcus_cemes1mo ago

> We need open weights companies now more than ever.

If you're objective it to democratize AI, sure. But for those fed up with it and the devastating effects it's having on students, for example, can opt to actively avoid paying for products with AI (I say this as someone who uses it every day, guilty). At some point large companies will see that they're bleeding money for something that most people don't seem to want, and cancel those $100k/mo deals. I've already experienced one AI-developer-turned company crash and burn.

Personally, I don't think this LLM-based AI generation will have any significant positive impacts. Time, energy (CO2) and money would have been far better spent elsewhere.

Zetaphor1mo ago

There's plenty of valuable use cases for being able to give natural language instructions to a tool and have it act on that input. I do however agree that the current hype and valuations far exceed the real value being offered.

Like with the dot com bubble there will be a crash and then whatever shakes out of that will be the companies and products who invested in understanding the actual strengths and weaknesses of the tech, instead of just trying to slap an "AI" sticker on everything.

magic_hamster1mo ago

> End of the PC era, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.

This one seems too far fetched. Training models is widespread. There will always be open weight models in some form, and if we assume there will be some advancements in architecture, I bet you could also run them on much leaner devices. Even today you can run models on Raspberry Pis. I don't see a reason this will stop being a thing, there will be plenty of ways to tinker.

However, keep in mind the masses don't care about tinkering and never have. People want a ChatGPT experience, not a pytorch experience. In essence this is true for all tech products, not just AI.

slashdave1mo ago

When did Hacker News become a fountain of dystopian science fiction?

Throaway1999991mo ago

from its inception lol

WhereIsTheTruth1mo ago

Changing "regression" to "Anthropic silently downgraded" sensationalizes the story

Why the FUD?

I notice some interesting public opinion weather change since Anthropic passed OpenAI wrt revenue

subscribed1mo ago

From the response in the linked issue:

>> Was there a change? Yes — March 6, intentional, part of ongoing cache optimization. You pinpointed the date correctly.

The entire issue lays out how and why it's a silent downgrade. Also silent because it just happened, without announcing.

I don't understand how is this FUD?

1 more reply

taf21mo ago

I don't understand who's still using anthropic? The model produces more bugs and agrees to solutions that are clearly wrong at a much higher rate then codex. Codex produces significantly better code with fewer bugs and far less oversight. with /fast on codex it's not even slower then claude and consider it implements working code more reliably you have to use it less anyway. Beside anthropic appears to be more focused on fear mongering and other types of FUD and is a more closed solution I do not understand why so many people still appear to care what anthropic does and have not already moved on? </rant>

coffinbirth1mo ago

Am I the only one who sees striking parallels between being a Claude Code customer and Cuckoldry (as in biology)?

I mean, you are investing a lot (infrastructure and capital) into something that is essentially not yours. You claim credit for the offspring (the solution) simply because it resides in your workspace. You accept foreign code to make your project appear more successful and populated than you could manage alone. Your over-reliance on a surrogate for the heavy lifting leads to the loss of your own survival skills (coding and debugging). Last but not least, you handle the grunt work of territory defense (clients and environments) while the AI performs the actual act of creation (Displaced Agency).

the_gipsy1mo ago

What you're looking for is "vendor lock-in".

PunchyHamster1mo ago

No, but it's very funny, I'm gonna call people that offshore their thinking to LLM "AI cucks" now

siscia1mo ago

Lately I am finding myself doing more and more of what I called "ambient coding" so that I am not directly using anymore all of those coding harnesses.

https://redbeardlab.gitbook.io/acem/essays/ambient-developme...

I basically wrote a small GitHub app and I simply create a GitHub issue, the bot read it, run an LLM loop and come up with a PR (or a design)

Then I simply approve the pr (or the design)

I find it much calmer and much more productive

eaf7e2811mo ago

I think they changed the quantification to save computer power for their new model. This might be why the benchmark scores look good, but the real world performance is much worse. I'm wondering if they're testing the model internally and didn't find anything wrong with the new parameter.

I canceled my subscription and switched to a codex, but it's not as good. I'm tired of Anthropic changing things all the time. I use Claude because it doesn't redirect you to a different model like OpenAI does. But now it seems like both companies are doing the same thing in different way.

throwaway20271mo ago

Claude is worse, they don't tell you when your experience has degraded and don't even let you use worse models if you run out any.

eaf7e2811mo ago

i mean, openai does same, even worse, they change the model, like gpt 5.4 to -mini

anthropic for now, at least just seems to change quantization of the model

j / k navigate · click thread line to collapse

420 comments

sunaurus1mo ago

andai1mo ago

Well, off the top of my head:

- Banning OpenClaw users (within their rights, of course, but bad optics)

- Banning 3rd party harnesses in general (ditto)

- Lowering reasoning effort (and then showing up here saying "we'll try to make sure the most valuable customers get the non-gimped experience" (paraphrasing slightly xD))

- Massively reduced usage (apparently a bug?) The other day I got 21x more usage spend on the same task for Claude vs Codex.

It's all circumstantial but everything points towards "desperately trying to cut costs".

rlpb1mo ago

> It's all circumstantial but everything points towards "desperately trying to cut costs".

3 more replies

Bolwin1mo ago

Claude -p is allowed. They're not going to give you a feature then ban you for using it.

What they changed is that it now uses extra usage, which is charged at api rates

2 more replies

deaux1mo ago

How often? Realistically, if you invoke it occasionally, for what's clearly an amount that's "reasonable personal use", then no you don't get nuked.

1 more reply

aftbit1mo ago

Huh I've noticed that! Opus hallucinates answers and provides contradictory reasoning to me much more regularly during the past couple of weeks than it did in February.

For example, after a question about a script that used taskset and CUDA_VISIBLE_DEVICES to handle NUMA domains on a parallel GLUE evaluation run:

rendaw1mo ago

They also screwed up the API token detection and also blocked a bunch of 1st party tool users for ~24h.

(By the way this is the 2nd time I've been "please hold" gaslit by support LLMs this exact same way, the other being with Square)

fluidcruft1mo ago

1 more reply

mrgill1mo ago

Very sad considering I got my whole company on Claude Code for them to just ban be like this, with no customer support response.

siva71mo ago

1 more reply

politelemon1mo ago

Why were third party harnesses banned? Surely they'd want sticking power over the ecosystem.

6 more replies

joshstrange1mo ago

100% this, I’ve posted the same sentiment here on HN. I hate the chilling effect of the bans and the lack of clarity on what is and is not allowed.

stingraycharles1mo ago

I don’t think they could have done that much better I’d say.

3 more replies

timtimmy1mo ago

Perhaps Anthropic should put a freeze on new signups until they can increase capacity. This is the best kind of problem for a business, I'm cheering for them.

2 more replies

stefan_1mo ago

smrtinsert1mo ago

I will say I have noticed none of these things in my enterprise account. Is this is a known targeting of non-enterprise clients only?

risyachka1mo ago

>> apparently a bug?

it's a bug only if they get a harsh public response, otherwise it becomes a feature

1 more reply

retinaros1mo ago

i dont know why ppl are surprised. you just need to see what they say on china, open source and fake safety blogs to understand they re not a company that devs should give their code for free to

esperent1mo ago

> claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked

I've used it with a sub a lot. Concurrency of 40 writing descriptions of thousands of images, running for hours on sonnet.

I have a lot of complaints. I've cancelled my $200 subscription and when it runs out in a few days I'll have to find something else.

But claude -p is fine.

... Or it was 2 week ago. Who knows if they've silently throttled it by now?

1 more reply

infecto1mo ago

1) Nobody should expect to use OpenClaw without API usage.

zazibar1mo ago

Claude seems to be getting nerfed every week since we've switched. I wonder how our EVP is feeling now.

derangedHorse1mo ago

It kind of reminds me of the joke where a plumber charges $500 for a 5 minute visit. When the client complains the plumber says it's $50 for labor and $450 for knowing how to fix the problem.

6 more replies

groundzeros20151mo ago

I can’t believe how many small to mid size companies are being destroyed by bad decisions like this.

A friend’s company fired all EMs and have engineers reporting to product managers. They aren’t allowed to do refactors because the CTO believes the AI doesn’t need organized code.

1 more reply

kubb1mo ago

He must be feeling pretty good, after all he still believes that it was the right call, and he definitely won't be admitting a mistake.

There's 0 chance of him facing the consequences for it either.

sgt1mo ago

But cancelling IDE subscriptions? You need a proper IDE to along side AI augmented development unless you want to simply be along for the ride.

3 more replies

dickersnoodle1mo ago

kfajdsl1mo ago

Wow, that sucks. Getting Claude for everyone wasn’t even the stupid thing, it was thinking that a shiny new hammer meant you could throw away all your wrenches.

giancarlostoro1mo ago

Should have started slowly instead of being so aggressive with it.

jimmydoe1mo ago

thefourthchime1mo ago

Wow, that sounds like you have a astoundingly terrible EVP.

matheusmoreira1mo ago

beering1mo ago

2 more replies

babaganoosh891mo ago

There's a github issue for this: https://github.com/anthropics/claude-code/issues/42796

2 more replies

jclardy1mo ago

I now have been using Codex and everything has been great (I still swap back and forth but generally to check things out.)

My theory is just that the models are great after release to get people switching, then they cut them back in capabilities slowly over time until the next major release to increase the hype cycle.

oorza1mo ago

I think it's more likely they're trying to optimize the Claude Code prompts to reduce load on their system and have overcorrected at the cost of quality.

1: https://gist.github.com/roman01la/483d1db15043018096ac3babf5...

FireBeyond1mo ago

Yeah, shorter time frame but I've been noticing that too. Just the other day I was experimenting with some workflow stuff. "Do x and y and run tests and then merge into develop."

Duly runs, and finishes. "All merged into develop".

I do some other work, don't see any of this, double check myself, I'm working off of develop.

"Hey, where is this work?"

"It is in this branch and this worktree, as you would expect, you will need to merge into develop."

"I'm confused, I asked you to do that and you said it was done."

"You're right and I did say that but I didn't do it. Shall I do it now?"

There's like this really weird balancing act between managing usage, but making people burn more tokens...

1 more reply

MattDamonSpace1mo ago

Part hypecycle, part desperate attempts to rein in usage

alphabettsy1mo ago

People keep saying this, but I’m not sure I buy it.

I was using both Codex and Claude Code heavily on some projects this weekend.

In one project Codex was screwing everything up and in another one absolutely killing it. I’ve seen the same from Claude.

I wonder if context poisoning is a bigger problem than people realize.

jakobnissen1mo ago

babaganoosh891mo ago

It's not just you, there is a github issue for it: https://github.com/anthropics/claude-code/issues/42796

kingkongjaffa1mo ago

Just one more anecdote:

I'm on the enterprise team plan so a decent amount of usage.

In March I could use Opus all day and it was getting great results.

6 more replies

PunchyHamster1mo ago

Both can be a thing at same time

iLoveOncall1mo ago

I think there's a much more nefarious reason that you're missing.

That would explain why a lot of users in the comments of those posts are claiming that they don't see any changes to limits.

2 more replies

pxtail1mo ago

There's still plenty of "leave my fellow multbillion corp alone" type ones,it means that corp can and should screw it's loving customer base harder.

simianwords1mo ago

The enshittification meme has been taken too seriously to the point where it is shoehorned into every single place possible.

It is not in the interests for Anthropic to screw its customer base. Running a frontier lab comes with tradeoffs between training, inference and other areas.

2 more replies

estimator72921mo ago

I can't believe how quickly they went from riding high on anti-OpenAI sentiment post-DOD fiasco, to shooting themselves and all their users new and old in the foot.

The ideal time to make your product worse is probably not at the same point that all of your competitor's customers are looking. Anthropic really, really fucked up here.

jeremyjh1mo ago

The hypothesis that makes the most sense is not that they are idiots, but that they have no choice. They cannot meet the new demand. So they’ve quantized the model.

jrockway1mo ago

giancarlostoro1mo ago

1 more reply

stavros1mo ago

ruler881mo ago

trashface1mo ago

The $20 a month plan still seems like a pretty good deal for me (intermittent coding and not doing it for income).

oezi1mo ago

On OpenRouter token consumption is up 5x since November 2025. If this is indicative of the industries growth then I can't fathom how we will not hit resource constraints.

jitl1mo ago

I saw a big hit to Claude’s intelligence w/ the 1M context window model and the change to adaptive reasoning (github issue linked elsewhere in this thread).

I’m pretty much using 90% Codex now, although since Claude is consistently faster at answering quick questions, I still keep it open for that and for code-reviewing codex/human work before commit.

taf21mo ago

data-ottawa1mo ago

I was going to do a deep analysis on this, and then I noticed that Claude Code deleted all of my sessions before March 6.

So yeah... I'm not thrilled with that, because I had done a similar analysis in December and had plenty of logs to review.

The results I do have for the last month aren't great. If you're curious I did post the results on HN:

https://news.ycombinator.com/item?id=47679661

Papazsazsa1mo ago

It looks like the spreadsheet-touchers over at Anthropic won out over the brand leaders, which is too bad as good will can be a trench if you don't abuse your customers.

beering1mo ago

nojs1mo ago

My working theory is that all models are approximately the same, and the variance in quality mostly depends on how long they think for.

So the trick is to always set to max, and then begin every task with “this is an extremely complex task, do not complete it without extensive deep thinking and research” or whatever.

You’re basically fighting a battle to make the model think more, against the defaults getting more and more nerfed to save costs.

beering1mo ago

sneak1mo ago

They broke my openclaw last week; I switched to “extra usage” and prepaid a grand for same.

A few days later it simply stopped working again, API authentication error. What must I do to have working, paid, premium service?

Screwing around with it today, it works 5x slower and times out all of the time. I'm paying more and getting waaaaay less. Why can't companies just raise prices like normal?

pstuart1mo ago

LunaSea1mo ago

> people feel like they have no idea if they are getting the product that they originally paid for

They do indeed get the product they originally paid for.

It's simply that they were suckers and didn't read the "fine" print of the product they bought.

The label says "more tokens than the lower tier".

indigodaddy1mo ago

Is it perhaps not a model problem but a Claude Code harness problem?

For instance on exe.dev VMs with Shelley agent/harness and Opus 4.5/4.6, I haven't noticed any deterioration.

Any similar feedback perhaps from Opencode / GH Copilot subscription-provided Opus models?

drzaiusx111mo ago

swasheck1mo ago

lumost1mo ago

Codex is my favored coding agent for generic "I need an agent tasks." GPT-5.4 does a bit better with images compared to claude, and debugs a little bit better.

The UX of codex is exceptionally nice however.

Aeolun1mo ago

I dunno, I haven’t really felt gimped in the past few months. My last issue was somewhere after the holidays when the usage suddenly felt like it cratered, but quality has been consistent.

throwpoaster1mo ago

Generally, across AI providers, I have come to interpret sudden degradation in existing capabilities as a signal that a new, more expensive, product tier is about to launch.

Grimblewald1mo ago

I'd say weaker, tasks claude code was aceing before it now fails with the exact same prompts, taking several rounds before it works. I'm looking to jump ship.

AznHisoka1mo ago

Its not just engineers, and its not just about the 3rd party/rate limiting stuff. I feel like the reasoning capabilities have deteriorated too for non-coding tasks.

OtomotO1mo ago

I measured it for my specific usecases and have cancelled my Anthropic subscription (the Max x20 Plan)

alpha_squared1mo ago

faangguyindia1mo ago

This is actually great feature, you can do bait and switch with AI.

wouldbecouldbe1mo ago

Developers are a tough crowd, stubborn, know it alls.

raincole1mo ago

That's a seasonal phenomenon. You can save this comment and look back three to six months later. By the time people will be like "is it just me or ChatGPT has been so bad lately?"

If you don't believe me you can search HN posts about Codex/Claude six months ago.

felixgallo1mo ago

https://isitnerfed.org/

motbus31mo ago

I think so, but more than that, the performance of those tools seems to be terribly degrading when they keep saying they have created some crap like AGI which we know is a lie.

And to me, this lie is mostly a fight to see who bites the biggest chunk of the war death machine.

blueboo1mo ago

Wait till Codex doubles prices/halves quotas on May 31

foofloobar1mo ago

Claude Code was able to implement something in one shot. It was decent for a proof of concept initial implementation. It's barely able to do work now with full specs and detailed plans.

ChatGPT is also being watered down.

It seems obvious that Anthropic and OpenAI aren't the solution to any problem.

throw_m2393391mo ago

vbezhenar1mo ago

trollbridge1mo ago

Quite interesting considering all the claims that Cursor was dead a few months ago.

foofloobar1mo ago

I wouldn't trust another company either. Some people have reported some issues with Cursor. The solution is probably not a cloud API with unknown quotas or pay as you go pricing.

1 more reply

ecocentrik1mo ago

They are clearly straining under new demand and everyone is being served highly quantized models without notice.

cassianoleal1mo ago

The title should be changed. It makes it look like they upped the TTL from 1 h to 5 months.

The SI symbol for minutes is "min", not "M".

A compromise would be to use the OP notation "m".

gib4441mo ago

I love the title change that totally hides the scale of the issue. Good job poster/mods.

PontifexMinimus1mo ago

I agree. My first reaction was "what the fuck's an 'M'?"

isoprophlex1mo ago

Five million. No matter the unit, just, 5.000.000

albert_e1mo ago

So a side effect of this is -- even at 1 hour caching -- ...

1 more reply

disillusioned1mo ago

It's also routinely failing the car wash question across all models now, which wasn't the case a month ago. :-/

tetraodonpuffer1mo ago

andai1mo ago

> over-emphasizing how "difficult" a problem is to solve and choosing to avoid it because of the "time" it would take

I heard a while back Claude refused to attempt a task for days, saying it would take weeks of work. Eventually the user convinced it to try, and it one-shotted it in 30 seconds.

apetresc1mo ago

For days? Someone spent days trying to convince Claude to do something?

1 more reply

empath751mo ago

I have noticed refusals as context windows grow.

_blk1mo ago

Awesome, I didn't know about the car wash question.

themafia1mo ago

Step 1: Sell at a loss.

Step 2: Panic.

Step 3: Destroy product.

1 more reply

colechristensen1mo ago

>“idgaf about risk you coward, waste some time just do it and stop bitching”

The above was a successful prompt to get Claude to stop whining about effort, difficulty, and time.

Unfortunately abusive language well placed is an effective LLM motivator.

itemize1231mo ago

are you sure other forms of language to express urgency doesn't work as well or better?

1 more reply

theshrike791mo ago

Am I the only one who couldn't care less if a model can answer a weird gotcha riddle or not?

I never use it to answer questions like that, what I care about is consistent tool callig and following the prompt.

benced1mo ago

Anthropic responded: https://github.com/anthropics/claude-code/issues/46829#issue...

supermdguy1mo ago

Bizarre reading the thread, it feels like their Claude responding to the other posters’ Claudes

phreack1mo ago

1 more reply

pllbnk1mo ago

TheTaytay1mo ago

jwitthuhn1mo ago

glenngillen1mo ago

And an admittedly uncharitable TLDR on the response is: "yeah... but most users just ask one thing and barely use the product so they never need the cache. Also trust me bro".

They may as well have just said "we've looked at the data and we're happy with this change because it's a performance improvement for people we make the most margin on. Sucks to be you".

dnw1mo ago

Interesting that they actually acknowledge there was a change on March 6th. Kudos to the prompt analysis work that uncovered it!

davidkuennen1mo ago

On slightly off topic note: Codex is absolutely fantastic right now. I'm constantly in awe since switching from Claude a week ago.

yukIttEft1mo ago

I point it to example snippets and webdocumentation but the code it gens won't work at all, not even close

Opus4.6 is a tiny bit less wrong than Codex 5.4 xhigh, but still pretty useless.

So, after reading all the success stories here and everywhere, I'm wondering if I'm holding it wrong or if it just can't solve everything yet.

59nadir1mo ago

2 more replies

neomantra1mo ago

While I’ve had tremendous success with Golang projects and Typescript Web Apps, when I tried to use Metal Mesh Shaders in January, both Codex and Claude both had issues getting it right.

I also needed to make more mini-harnesses / scaffolds for it to work through; in other words isolating its focus, kind of like test-driven development.

seba_dos11mo ago

It works somewhat well with trivial things. That's where most of these success stories are coming from.

1 more reply

layer81mo ago

My impression is that it always comes down to how well what you’re trying to do pattern-matches the training set.

1 more reply

wahnfrieden1mo ago

Instead of "pointing it" at docs, you need to paste the docs into context. Otherwise it will skim small parts by searching. Of course if you're using an obscure tool you need to supply more context.

Xhigh can also perform worse than High - more frequent compaction, and "overthinking".

shdh1mo ago

I’ve noticed the models still can’t complete complex tasks

Such as:

Adding fine curl noise to a volumetric smoke shader

Fixing an issue with entity interpolation in an entity/snapshot netcode

Find some rendering bugs related to lightmaps not loading in particular cases, and it actually introduced this bug.

Just basic stuff.

1 more reply

nothinkjustai1mo ago

Nah, it only lives up to the hype for crud apps and web ui. As soon as you stop doing webshit it becomes way less useful.

(Don’t get mad at me, I’m a webshit developer)

wg01mo ago

Most of the folks are building CRUD apps with AI and that works fine.

What you're doing is more specialized and these models are useless there. It's not intelligence.

Another NFT/Crypto era is upon us so no you're not holding it wrong.

MattRix1mo ago

This is pretty wrong. Anyone who thinks this stuff is similar to NFTs and crypto hasn’t been paying attention.

1 more reply

lukan1mo ago

" or if it just can't solve everything yet."

Obviously it cannot. But if you give the AI enough hints, clear spec, clear documentation and remove all distracting information, it can solve most problems.

1 more reply

glerk1mo ago

toenail1mo ago

vidarh1mo ago

Codex has been good quality wise, but I hit limits on the Codex team subscription so quickly it's almost more hassle that it is worth.

lifty1mo ago

CamperBob21mo ago

Agreed. Watching the intermediate "Thinking about X ... Now I'll do Y" text on GPT 5.4 lately has been like watching a hypothetical smart drug wear off.

All of the major models have been getting worse lately, not just Opus.

1 more reply

onion2k1mo ago

I use Codex at home and Opus at work. They're both brilliant.

lores1mo ago

groundzeros20151mo ago

Enemy centered decision making

bob10291mo ago

1 more reply

nh21mo ago

Cannot you use Codex (which is open source, unlike Claude Code) with Claude, even via Amazon Bedrock?

1 more reply

layer81mo ago

[0] https://www.dwarkesh.com/i/187852154/004620-if-agi-is-immine...

shdh1mo ago

Likely accurate

This tends to happen during pretraining phase of new models

Happened with 3.x too

jjfoooo41mo ago

Which I'm confused about - wouldn't decreasing the cache TTL increase compute demand?

hirako20001mo ago

There is a chef, he opens a restaurant. Delicious food.

It costs him more in ingredients alone than he charges. He even offers some pseudo unlimited buffet, combo sets, and happy hours.

In the meantime he cracks down on free buffet goers who happen to eat too much, and downgrades all ingredients without notice to finally hope to make a profit.

MattRix1mo ago

This is close, but the real problem isn’t that the food is underpriced, it’s that the supply of ingredients is severely limited.

stri8ted1mo ago

Those are the same thing

greycol1mo ago

1 more reply

JackYoustra1mo ago

Is this not the same thing?

1 more reply

embedding-shape1mo ago

Pretty much capitalism in a nut shell, yeah.

Tarcroi1mo ago

This coincides with Anthropic's peak-hour announcement (March 26th). Could the throttling be partly a response to infrastructure load that was itself inflated by the TTL regression?

HauntingPin1mo ago

It would be too fucking funny if this were the case. They're vibe coding their infrastructure and they vibe coded their response to the increased load.

KronisLV1mo ago

You'd think they would have dashboards for all of this stuff, to easily notice any change in metrics and be able to track down which release was responsible for it.

1 more reply

perks_121mo ago

jwr1mo ago

This. I get much more value than 90€ from my Claude Code subscription. I am willing to pay more for consistency and not having to watch my back all the time, because I might get screwed over.

bsaul1mo ago

could it be that anthropic is experiencing a massive shortage of compute capacity, and is desperately trying to find means to overcome it ?

All the news i hear about this company for the past weeks made it sound like they're really desperate.

hattimaTim1mo ago

Classic scammer tactics: first, lure users in by promising a huge deal, then scam the hell out of them.

throwaway20271mo ago

beering1mo ago

Are you locked into your current subscription?

zoogeny1mo ago

I imagine it is much more useful if you have a tool that has a common prefix (like a system instruction, tool specs or common set of context across many users).

If you can get it to work frequently enough the savings are quite worth it.

onoesworkacct1mo ago

give it a skill that runs a timer in the background and every 4.5 minutes says "ping? pong!"

zoogeny1mo ago

It would be interesting to graph the cost/savings of this approach based on context length, percent cached, etc.

The UI for this is a bit tricky, I could mark conversations as "active" and then do the ping/pong dance on only active conversations and up to some determined max cached (e.g. 1 hour).

poly2it1mo ago

One of the largest AI companies on Earth cannot figure out an algorithm for when not to drop caches in long-running sessions?

foobar100001mo ago

par1mo ago

willworktill4pm1mo ago

This Friday CC wrote wall off gibberish text for me. No reason, happened twice with different gibberish text

https://ibb.co/4wcVQG5k

beering1mo ago

maybe numerics issues after quantization? Looks like it really went off the rails

zeckalpha1mo ago

Edit: I may have conflated these two threads. https://news.ycombinator.com/item?id=47739260

throwaway20271mo ago

It's absolutely ridiculous how stupid Claude is now. I sometimes notice it and last year too but it feels like it's just last year before December model.

config_yml1mo ago

Feels similar to Claude last August/September. Knowing Claude some Agent probably reverted the fix from back then ^^

https://www.anthropic.com/engineering/a-postmortem-of-three-...

the_mitsuhiko1mo ago

azuanrb1mo ago

As a Pro user, even though these issues and bugs are “new,” the downgrade has been noticeable since January. I’ve unsubscribed because the Pro plan is no longer usable for me.

It’s only making the news now because it’s affecting Max users as well ($100/$200 plans). I understand the need for change, but having zero communication about it is just wrong.

almog1mo ago

motbus31mo ago

The TOS basically states you need to deal with whatever they want.

Meanwhile their 'best' competitor just announced they want to provide unreliable mass destruction guidance tools but they don't wanna feel said.

Honestly speaking, we are wrong whenever we do business with this sort of people

bigyabai1mo ago

> The TOS basically states you need to deal with whatever they want.

FWIW that's what most TOSes say for the majority of online services. Some even include arbitration clauses to prevent civil suits and class-action cases.

motbus31mo ago

Maybe that's standard practice in the US. I live in Europe but have family elsewhere, in both places, such clauses are often disregarded by judges and illegal.

What judges say is that whatever is problematic should be dealt by customer support.

For example, provider X is faulty and causes damages to you or a third party. You contact the company and the company must have a procedure to give a formal answer when required.

If that's is breach of the contact, although not required by law, the company can offer to fix the problem or at least an explanation and why is that in the contract.

If you still feel that's a breach of the contract and the company is not willing to cooperate, then you can file it.

In other places, there are laws that cannot be undermined by forceful terms of service or contracts. For example, you have the right for law anywhere.

I more or less understand the whys of why US is like that, but it feels that the law is bendable.

simianwords1mo ago

There’s a case for intelligent caching: coarse grained 1h and 5min type TTls are not optimal.

PunchyHamster1mo ago

Caching LLM is not like caching normal content; the longer it is the more beneficial it is and it only stops being worth when user stops current session.

cyanydeez1mo ago

keep in mind, efficient KV caching needs to be next to the GPU, so you sls need you HA to keep routing the user to the same hardware.

the hardware VM model is almost identical. Each session can go anywhere to start but a live session cant just be routed anywhere without penalty.

PunchyHamster1mo ago

Well, how entirely expected. The money man comes to collect and they are squeezing for money

pkaye1mo ago

superxpro121mo ago

If anyone thinks this situation doesnt end in a massive global rugpull, y'all are asleep at the wheel.

The very instant the AI suppliers lock in a dependency on their product, prices are going through the roof.

jasonjmcghee1mo ago

All the weird stuff happening with anthropic / Claude aside- just talking about this post:

Looking at the table with February and April- I don't get it. What am I missing?

The cost and number of calls look pretty aligned on all rows

ikekkdcjkfke1mo ago

andai1mo ago

It looks like selling reputation to save money.

But more likely they are constrained on GPUs and can't get them fast enough.

(My guess having no understanding of how this industry actually works.)

espeed1mo ago

Does Anthropic's real time data ingestion effect its model behavior globally? Could a file read by your agent effect the behavior of mine?

c161mo ago

I’ve definitely noticed in evenings it stops trying as hard to solve the issue and suggests I go find the answer. Never the case in the morning.

srsbzns1mo ago

Gotta use the API directly for cache control

sscaryterry1mo ago

Anthropic is leaving so much evidence around… proving damages and a pattern is becoming trivial

snowstormsun1mo ago

Well, the 10x promised revenue increase must come from somewhere...

lordmoma1mo ago

taffydavid1mo ago

They can't really revolutionize AI again so they make the product worse and worse and then offer you a "better" one

ares6231mo ago

AGI finding bugs again. Actual Guys/Gals Instead.

yobid201mo ago

i thought it was always 5 minutes? ive been telling people 5 minutes for months so i dont think this is anything new?

mrdw1mo ago

I noticed another limitation: "An image in the conversation exceeds the dimension limit for many-image requests (2000px). Start a new session with fewer images."

So I can't continue my claude code session I started yesterday.

sunnybeetroot1mo ago

Double tap ESC and revert the conversation.

beering1mo ago

makes sense, “a picture is worth a thousand tokens” as they say. They probably lowered the limit due to capacity issues.

computerex1mo ago

Good job anthropic. You had a clear lead with all devs singing the praises of Opus. Way to lose all that by Enshittifying the experience.

echelon1mo ago

Anthropic isn't your friend.

Phase 1: $200/mo prosumer engineer tool

Phase 2: AI layoffs / "it's just AI washing"

Phase 3: $20,000/mo limited release model "too dangerous" to use

Phase 4: Accelerated layoffs / two person teams. Rehiring of certain personnel at lower costs.

Phase 6: $100k/mo model that replicates entire engineering teams, only large companies can afford it. Ordinary users can't buy. More layoffs.

Anothropic used to be cool before they started gating access. Limiting Claw/OpenCode was strike one. Mythos is strike two.

Y'all should have started hating on their ethics when they started complaining about being distilled. For training they conducted on materials they did not own.

We need open weights companies now more than ever. Too bad China seems to be giving up on the idea.

"You wouldn't distill an Opus."

PunchyHamster1mo ago

Stop thinking billion dollar publicly traded companies are "cool" just because they make widget you like.

You will be backstabbed

You will be squeezed for all they can.

And you will be betrayed.

dns_snek1mo ago

> Stop thinking billion dollar publicly traded companies are "cool" just because they make widget you like.

Anthropic is a private company but nevertheless, the sentiment is accurate and applies to all kinds of corporations.

1 more reply

andai1mo ago

What I want to know is how did they make the only LLM that doesn't sound cringe?

I think it has something to do with mode collapse (although Claude certainly has its own "tells"), but I'm not sure.

---

Some other interesting points about Anthropic's models. I don't know if any of these relate to my LLM style question, but seems worth mentioning:

Claude models also use way less tokens for the same task (on ArtificialAnalysis, they are a clear outlier on this metric).

And there's a much stronger common sense, subjectively. (Not sure if we have a good way to actually measure that, though.) It takes context and common sense into account, to a much greater degree.

(Which ties in with their constitution. Understanding why things are wrong at a deeper level, rather than just surface level pattern matching.)

Opus is great but it should be bigger. You notice the difference between Sonnet and Opus, but with heavy use you notice Opus's limitations, too.

hirako20001mo ago

Good read on the situation.

It all boils down to a brilliant but extremely expensive technology. Both to build and to run.

We've been sold a product with heavy subsidy. The idea (from Sam) scale out and see what happens.

Written by open source developers, likely former openai and anthropic employees who got so much cash in the bank they don't need to worry about renting their knowledge.

jhancock1mo ago

What leads you to say China AI is giving up on open weights?

I've been using GLM for over 6 months and pretty happy.

PunchyHamster1mo ago

Why would any company release open weights once the investment money stops ?

Releasing open weights have been basically a PR move, the moment those companies need to actually make money they will cut it out as that reduces their client base.

They DO NOT want you to run AI. They want you to pay them to do it

3 more replies

Zetaphor1mo ago

The AI landscape in China is larger than just Qwen and Alibaba.

dns_snek1mo ago

The first one is just incredibly naive, the second might be true for some people, for some tasks, but it's not going to capture the majority who're chasing the latest and greatest to "keep up".

3 more replies

marcus_cemes1mo ago

> We need open weights companies now more than ever.

Personally, I don't think this LLM-based AI generation will have any significant positive impacts. Time, energy (CO2) and money would have been far better spent elsewhere.

Zetaphor1mo ago

magic_hamster1mo ago

> End of the PC era, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.

However, keep in mind the masses don't care about tinkering and never have. People want a ChatGPT experience, not a pytorch experience. In essence this is true for all tech products, not just AI.

slashdave1mo ago

When did Hacker News become a fountain of dystopian science fiction?

Throaway1999991mo ago

from its inception lol

WhereIsTheTruth1mo ago

Changing "regression" to "Anthropic silently downgraded" sensationalizes the story

Why the FUD?

I notice some interesting public opinion weather change since Anthropic passed OpenAI wrt revenue

subscribed1mo ago

From the response in the linked issue:

>> Was there a change? Yes — March 6, intentional, part of ongoing cache optimization. You pinpointed the date correctly.

The entire issue lays out how and why it's a silent downgrade. Also silent because it just happened, without announcing.

I don't understand how is this FUD?

1 more reply

taf21mo ago

coffinbirth1mo ago

Am I the only one who sees striking parallels between being a Claude Code customer and Cuckoldry (as in biology)?

the_gipsy1mo ago

What you're looking for is "vendor lock-in".

PunchyHamster1mo ago

No, but it's very funny, I'm gonna call people that offshore their thinking to LLM "AI cucks" now

siscia1mo ago

Lately I am finding myself doing more and more of what I called "ambient coding" so that I am not directly using anymore all of those coding harnesses.

https://redbeardlab.gitbook.io/acem/essays/ambient-developme...

I basically wrote a small GitHub app and I simply create a GitHub issue, the bot read it, run an LLM loop and come up with a PR (or a design)

Then I simply approve the pr (or the design)

I find it much calmer and much more productive

eaf7e2811mo ago

throwaway20271mo ago

Claude is worse, they don't tell you when your experience has degraded and don't even let you use worse models if you run out any.

eaf7e2811mo ago

i mean, openai does same, even worse, they change the model, like gpt 5.4 to -mini

anthropic for now, at least just seems to change quantization of the model

j / k navigate · click thread line to collapse