Claude Opus 4.6 (opens in new tab)

(anthropic.com)

2346 pointsHellsMaddy3mo ago1031 comments

1031 comments

data-ottawa3mo ago

I wonder if I’ve been in A/B test with this.

Claude figured out zig’s ArrayList and io changes a couple weeks ago.

It felt like it got better then very dumb again the last few days.

1 more reply

mattacular3mo ago

It's hard to tell with these releases if Anthropic's astroturfing campaign has come to HN or not but I feel like it probably has

g-mork3mo ago

the top 5 comments on this thread are from accounts that are around 10 years old each. What gives you any reason to believe this is an astroturfing campaign?

2 more replies

hatkid953mo ago

It would be height of foolishness to believe it didn’t

timcobb3mo ago

Anthropic's models are really good!

ramesh313mo ago

Am I alone in finding no use for Opus? Token costs are like 10x yet I see no difference at all vs. Sonnet with Claude Code.

mnicky3mo ago

On my tasks (mostly data science), Opus has significantly lower probability of making stupid mistakes than Sonnet.

I'd still appreciate more intelligence than Opus 4.5 so I'm looking forward to trying 4.6.

jpcompartir3mo ago

4.6 is a beast.

Everything in plan mode first + AskUserQuestionTool, review all plans, get it to write its own CLAUDE.md for coding standards and edit where necessary and away you go.

Seems noticeably better than 4.5 at keeping the codebase slim. Obviously it still needs to be kept an eye on, but it's a step up from 4.5.

nwienert3mo ago

Not clearly a step up for me, it's way more hesitant it seems and I don't notice context being larger at all it seems to compact just as often.

sega_sai3mo ago

Based on these news it seems that Google is losing this game. I like Gemini and their CLI has been getting better, but not enough to catch up. I don't know if it is lack of dedicated models that is problem (my understanding Google's CLI just relies on regular Gemini) or something else.

laxk3mo ago

Google knows how to wait. Let's give them a chance.

jonatron3mo ago

Can someone ask: "what is the current carrying capacity of 25mm multicore armoured thermoplastic insulated cables with aluminium conductors, on perforated cable tray?" just to see how well it can look up information in BS 7671?

jaggederest3mo ago

> what is the current carrying capacity of 25mm multicore armoured thermoplastic insulated cables with aluminium conductors, on perforated cable tray?

  This is an electrical engineering question rather than a software engineering one, but let me look it up for you.

  WEB SEARCH ("current carrying capacity 25mm multicore armoured thermoplastic insulated cable aluminium conductors perforated cable tray BS 7671")
   ↳ Found 10 results.

   Based on BS 7671 Table 4D4B (multicore armoured thermoplastic insulated cables with aluminium conductors), the current-carrying capacity for a 25mm cable on a perforated cable tray (Reference Method E) is approximately 80A for a 3- or 4-core cable at 30C ambient temperature and 70C conductor operating temperature.

    Note: The exact value depends on the number of cores (2-core vs 3/4-core) and whether any rating/correction factors (grouping, ambient temperature, thermal insulation) apply. I'd recommend confirming against the actual BS 7671:2018 Table 4D4B for your specific configuration, as the aluminium conductor ratings are lower than the copper equivalents (~102A for copper in the same conditions).

    This is outside my core domain of software engineering -- for critical electrical installations, always verify with the relevant standards documentation or a qualified electrical engineer.

jonatron3mo ago

That's useless, unexpectedly just Google gives the correct answer.

esafak3mo ago

Great. So it did not waste its memory on trivia. Don't memorize what you can look up.

osti3mo ago

Somehow regresses on SWE bench?

lkbm3mo ago

I don't know how these benchmarks work (do you do a hundred runs? A thousand runs?), but 0.1% seems like noise.

SubiculumCode3mo ago

That benchmark is pretty saturated, tbh. A "regression" of such small magnitude could mean many different things or nothing at all.

usaar3333mo ago

i'd interpret that as rounding error. that is unchanged

swe-bench seems really hard once you are above 80%

Squarex3mo ago

it's not a great benchmark anymore... starting with it being python / django primarily... the industry should move to something more representative

usaar3333mo ago

Openai has; they don't even mention score on gpt-5.3-codex.

On the other hand, it is their own verified benchmark, which is telling.

m-hodges3mo ago

> In Claude Code, you can now assemble agent teams to work on tasks together.

nprz3mo ago

I was just reading about Steve Yegge's Gas Town[0], it sounds like agent orchestration is now integrated into Claude Code?

[0]https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...

jorl173mo ago

This is the first model to which I send my collection of nearly 900 poems and an extremely simple prompt (in Portuguese), and it manages to produce an impeccable analysis of the poems, as a (barely) cohesive whole, which span 15 years.

It does not make a single mistake, it identifies neologisms, hidden meaning, 7 distinct poetic phases, recurring themes, fragments/heteronyms, related authors. It has left me completely speechless.

Speechless. I am speechless.

Perhaps Opus 4.5 could do it too — I don't know because I needed the 1M context window for this.

I cannot put into words how shocked I am at this. I use LLMs daily, I code with agents, I am extremely bullish on AI and, still, I am shocked.

I have used my poetry and an analysis of it as a personal metric for how good models are. Gemini 2.5 pro was the first time a model could keep track of the breadth of the work without getting lost, but Opus 4.6 straight up does not get anything wrong and goes beyond that to identify things (key poems, key motifs, and many other things) that I would always have to kind of trick the models into producing. I would always feel like I was leading the models on. But this — this — this is unbelievable. Unbelievable. Insane.

This "key poem" thing is particularly surreal to me. Out of 900 poems, while analyzing the collection, it picked 12 "key poems, and I do agree that 11 of those would be on my 30-or-so "key poem list". What's amazing is that whenever I explicitly asked any model, to this date, to do it, they would get maybe 2 or 3, but mostly fail completely.

What is this sorcery?

emp173443mo ago

This sounds wayyyy over the top for a mode that released 10 mins ago. At least wait an hour or so before spewing breathless hype.

pb73mo ago

He just explained a specific personal example why he is hyped up, did you read a word of it?

emp173443mo ago

Yeah, I read it.

“Speechless, shocked, unbelievable, insane, speechless”, etc.

Not a lot of real substance there.

1 more reply

euph0ria3mo ago

Could you please post the key poems? Would love to read them.

jorl173mo ago

I am way too self-conscious to do that :) Plus they are almost all in Portuguese!

wartywhoa233mo ago

> What is this sorcery?

The one you'll be seeking counter-spells against pretty soon.

scrollop3mo ago

Can you compare the result to using 5.2 thinking and gemini 3 pro?

jorl173mo ago

I can run the comparison again, and also include OpenAI's new release (if the context is long enough), but, last time I did it, they weren't even in the same league.

When I last did it, 5.X thinking (can't remember which it was) had this terrible habit of code-switching between english and portuguese that made it sound like a robot (an agent to do things, rather than a human writing an essay), and it just didn't really "reason" effectively over the poems.

I can't explain it in any other way other than: "5.X thinking interprets this body of work in a way that is plausible, but I know, as the author, to be wrong; and I expect most people would also eventually find it to be wrong, as if it is being only very superficially looked at, or looked at by a high-schooler".

Gemini 3, at the time, was the worst of them, with some hallucinations, date mix ups (mixing poems from 2023 with poems from 2019), and overall just feeling quite lost and making very outlandish interpretations of the work. To be honest it sort of feels like Gemini hasn't been able to progress on this task since 2.5 pro (it has definitely improved on other things — I've recently switched to Gemini 3 on a product that was using 2.5 before)

Last time I did this test, Sonnet 4.5 was better than 5.X Thinking and Gemini 3 pro, but not exceedingly so. It's all so subjective, but the best I can say is it "felt like the analysis of the work I could agree with the most". I felt more seen and understood, if that makes sense (it is poetry, after all). Plus when I got each LLM to try to tell me everything it "knew" about me from the poems, Sonnet 4.5 got the most things right (though they were all very close).

Will bring back results soon.

Edit:

I (re-)tested:

- Gemini 3 (Pro)

- Gemini 3 (Flash)

- GPT 5.2

- Sonnet 4.5

Having seen Opus 4.5, they all seem very similar, and I can't really distinguish them in terms of depth and accuracy of analysis. They obviously have differences, especially stylistic ones, but, when compared with Opus 4.5 they're all on the same ballpark.

These models produce rather superficial analyses (when compared with Opus 4.5), missing out on several key things that Opus 4.5 got, such as specific and recurring neologisms and expressions, accurate connections to authors that serve as inspiration (Claude 4.5 gets them right, the other models get _close_, but not quite), and the meaning of some specific symbols in my poetry (Opus 4.5 identifies the symbols and the meaning; the other models identify most of the symbols, but fail to grasp the meaning sometimes).

Most of what these models say is true, but it really feels incomplete. Like half-truths or only a surface-level inquiry into truth.

As another example, Opus 4.5 identifies 7 distinct poetic phases, whereas Gemini 3 (Pro) identifies 4 which are technically correct, but miss out on key form and content transitions. When I look back, I personally agree with the 7 (maybe 6), but definitely not 4.

These models also clearly get some facts mixed up which Opus 4.5 did not (such as inferred timelines for some personal events). After having posted my comment to HN, I've been engaging with Opus4.5 and have managed to get it to also slip up on some dates, but not nearly as much as other models.

The other models also seem to produce shorter analyses, with a tendency to hyperfocus on some specific aspects of my poetry, missing a bunch of them.

To be fair, all of these models produce very good analyses which would take someone a lot of patience and probably weeks or months of work (which of course will never happen, it's a thought experiment).

It is entirely possible that the extremely simple prompt I used is just better with Claude Opus 4.5/4.6. But I will note that I have used very long and detailed prompts in the past with the other models and they've never really given me this level of....fidelity...about how I view my own work.

vinhnx3mo ago

Just used Opus 4.6 via GitHub Copilot. It feels very different. Inference seems slow for now. I guess Opus 4.6 has adaptive thinking activated by default.

vinhnx3mo ago

Confirm by PM lead at VS Code team

> "We have high thinking as default + adaptive thinking, first time we’ve run with these settings..."

> https://x.com/pierceboggan/status/2019645801769689486

christophilus3mo ago

It dos seem noticeably slower. I may stick with 4.5 which was good enough for me for most tasks.

vinhnx3mo ago

VS Code confirms that they are experimenting with the new adaptive thinking and high reasoning effort params. https://x.com/pierceboggan/status/2019645801769689486

paxys3mo ago

Hmm all leaks had said this would be Claude 5. Wonder if it was a last minute demotion due to performance. Would explain the few days' delay as well.

trash_cat3mo ago

I think the naming schemes are quite arbitrary at this point. Going to 5 would come with massive expectations that wouldn't meet reality.

mrandish3mo ago

After the negative reactions to GPT 5, we may see model versioning that asymptotically approaches the next whole number without ever reaching it. "New for 2030: Claude 4.9.2!"

esafak3mo ago

Or approaching a magic number like e (Metafont) or π (TeX).

Squarex3mo ago

the standard used to be that major version means a new base model / full retrain... but now it is arbitrary i guess

cornedor3mo ago

Leaks were mentioning Sonnet 5 and I guess later (a combination of) Opus 4.6

scrollop3mo ago

Sonnet 5 was mentioned initially.

ra3mo ago

Why are Anthropic such a horrible company to deal with?

danielbln3mo ago

Care to elaborate?

ra3mo ago

obscure billing, unreachable customer support gatekeeped by an overzealous chatbot, no transparency about inclusions, or changes to inclusions over time... just from recent experience.

niobe3mo ago

Is there a good technical breakdown of all these benchmarks that get used to market the latest greatest LLMs somewhere? Preferably impartial.

Aztar3mo ago

I just ask claude and ask for sources for each one.

niobe3mo ago

Reminds me of how if you make a complaint against a lawyer or a judge it's evaluated by lawyers and judges.

small_model3mo ago

I have the max subscription wondering if this gives access to the new 1M context, or is it just the API that gets it?

joshstrange3mo ago

For now it's just API, but hopefully that's just their way of easing in and they open it up later.

small_model3mo ago

Ok thanks, hopefully, its annoying to lose or have context compacted in the middle of a large coding session

watson3mo ago

I've heard rumors this might be Sonnet 5 rebranded as Opus 4.6. But why? Profit? WDYT?

spruce_tips3mo ago

Opus is a superior brand line to Sonnet because historically it’s been a more powerful model. I think the thinking behind a rebrand is that people wouldn’t have as willingly switched their usage over from opus 4.5 since that model has been so popular since December 2025.

Calling it part of the Sonnet line would not provide the same level of blind buy in as calling it part of the Opus line does

1970-01-013mo ago

Here's one I've been using for awhile. The 'smarter' LLMs will overconfidently spit out 7. The dumber ones ask for more info. Opus 4.6 fails.

     A round drink coaster with a diameter of 9 sits between a beer glass and a wood table. The glass has a wall thickness of 1. What is the inner diameter of the glass?

raincole3mo ago

Gemini 3 Pro gives me this:

> Based on the information provided, it is impossible to determine the inner diameter of the glass.

Here is why:

The Coaster Dimension: Knowing the coaster has a diameter of 9 only tells us the size of the object under the glass. It does not mean the glass is the same size; the glass could be smaller (fitting entirely on the coaster) or larger (overhanging the coaster). The Glass Dimensions: While you provided the wall thickness (1), you did not provide the outer diameter of the glass. To find the inner diameter, the formula is: Inner Diameter = Outer Diameter - (2 × Wall Thickness)

Without knowing the Outer Diameter of the glass, the calculation cannot be completed.

raincole3mo ago

GPT 5.2 Chat:

> There isn’t enough information to determine the inner diameter *for sure*.

All we’re told is: - The coaster has a diameter of 9. - The glass sits on the coaster. - The glass wall thickness is 1.

Nothing explicitly states that the *outer diameter of the glass equals the coaster’s diameter*. The glass could be smaller than the coaster, which is very common.

*However*, if we make the usual implicit assumption in this kind of problem—that the coaster’s diameter matches the *outer diameter of the glass base*—then:

- Outer diameter of glass = 9 - Wall thickness = 1 on each side

So: \[ \text{Inner diameter} = 9 - 2(1) = 7 \]

*Inner diameter = 7 (same units as the coaster)*

Without that assumption, the problem is underdetermined.

tototrains3mo ago

Opus 4.6 Extended thinking:

```

Finding the Inner Diameter The coaster sits beneath the glass on the table, meaning the glass rests on top of it. Assuming the coaster matches the outer diameter of the glass base:

Outer diameter of glass = diameter of coaster = 9 Wall thickness = 1 (on each side, so subtract twice)

dinner=9−2(1)=7d_{\text{inner}} = 9 - 2(1) = 7dinner =9−2(1)=7 The inner diameter of the glass is 7.

```

Makes its assumption clear, seems reasonable?

1 more reply

mikalauskas3mo ago

Minimax M2.1:

The inner diameter of the glass is *7*.

Here's the reasoning: - The coaster (diameter 9) sits between the glass and table, meaning the glass sits directly on the coaster - This means the *outer diameter of the glass equals the coaster diameter = 9* - The glass has a wall thickness of 1 on each side - *Inner diameter = Outer diameter - 2 × wall thickness* - Inner diameter = 9 - 2(1) = 9 - 2 = *7*

blibble3mo ago

> We build Claude with Claude. Our engineers write code with Claude Code every day

well that explains quite a bit

jsheard3mo ago

CC has >6000 open issues, despite their bot auto-culling them after 60 days of inactivity. It was ~5800 when I looked just a few days ago so they seem to be accelerating towards some kind of bug singularity.

dkersten3mo ago

Just anecdotally, each release seems to be buggier than the last.

To me, their claim that they are vibe coding Claude code isn’t the flex they think it is.

I find it harder and harder to trust anthropic for business related use and not just hobby tinkering. Between buggy releases, opaque and often seemingly glitches rate limits and usage limits, and the model quality inconsistency, it’s just not something I’d want to bet a business on.

csomar3mo ago

Since version 2.1.9, performance has degraded significantly after extended use. After 30-40 prompts with substantial responses, memory usage climbs above 25GB, making the tool nearly unusable. I'm updating again to see if it improves.

Unlike what another commenter suggested, this is a complex tool. I'm curious whether the codebase might eventually reach a point where it becomes unfixable; even with human assistance. That would be an interesting development. We'll see.

zahlman3mo ago

I think I would be much more frightened if it were working well.

1 more reply

marcd353mo ago

Doesn’t this just exacerbate the “black box” conundrum if they just keep piling on more and more features without fully comprehending what’s being implemented

tgtweak3mo ago

plot twist, it's all claude code instances submitting bug reports on behalf of end users.

trescenzi3mo ago

I literally hit a claude code bug today, tried to use claude desktop to debug it which didn't help and it offered to open a bug report for me. So yes 100%. Some of the titles also make it pretty clear they are auto submitted. This is my favorite which was around the top when I was creating my bug report 3 hours ago and is now 3 pages back lol.

> Unable to process - no bug report provided. Please share the issue details you'd like me to convert into a GitHub issue title

https://github.com/anthropics/claude-code/issues/23459

accrual3mo ago

It's Claude, all the way down.

elAhmo3mo ago

Insane to think that a relatively simple CLI tool has so many open issues...

emilsedgh3mo ago

It's not really a simple CLI tool though it's really interactive.

trymas3mo ago

What’s so simple about it?

1 more reply

bjackman3mo ago

Well part of the issue is that it isn't actually a CLI tool. It takes control of the whole terminal and then badly reimplements a CLI...

luckydata3mo ago

It's far from simple

dwaltrip3mo ago

sips coffee… ahh yes, let me find that classic Dropbox rsync comment

1 more reply

ericrallen3mo ago

The rate of Issues opened on a popular repo is at least one order of magnitude beyond the number of Issues whoever is able to deal with them can handle.

paxys3mo ago

Half of them were probably opened yesterday during the Claude outage.

anematode3mo ago

Nah, it was at like 5500 before.

8note3mo ago

its really popular, and having claude code makes it really easy to sumbit issues and make wild guesses about issues.

i wonder how many are real and not duplicates. i doubt github is well setup to handle said volume

gjsman-10003mo ago

Also explains why Claude Code is a React app outputting to a Terminal. (Seriously.)

krystofbe3mo ago

I did some debugging on this today. The results are... sobering.

Memory comparison of AI coding CLIs (single session, idle):

  | Tool        | Footprint | Peak   | Language      |
  |-------------|-----------|--------|---------------|
  | Codex       | 15 MB     | 15 MB  | Rust          |
  | OpenCode    | 130 MB    | 130 MB | Go            |
  | Claude Code | 360 MB    | 746 MB | Node.js/React |

That's a 24x to 50x difference for tools that do the same thing: send text to an API.

vmmap shows Claude Code reserves 32.8 GB virtual memory just for the V8 heap, has 45% malloc fragmentation, and a peak footprint of 746 MB that never gets released, classic leak pattern.

On my 16 GB Mac, a "normal" workload (2 Claude sessions + browser + terminal) pushes me into 9.5 GB swap within hours. My laptop genuinely runs slower with Claude Code than when I'm running local LLMs.

I get that shipping fast matters, but building a CLI with React and a full Node.js runtime is an architectural choice with consequences. Codex proves this can be done in 15 MB. Every Claude Code session costs me 360+ MB, and with MCP servers spawning per session, it multiplies fast.

atonse3mo ago

Jarred Sumner (bun creator, bun was recently acquired by Anthropic) has been working exclusively on bringing down memory leaks and improving performance in CC the last couple weeks. He's been tweeting his progress.

This is just regular tech debt that happens from building something to $1bn in revenue as fast as you possibly can, optimize later.

They're optimizing now. I'm sure they'll have it under control in no time.

CC is an incredible product (so is codex but I use CC more). Yes, lately it's gotten bloated, but the value it provides makes it bearable until they fix it in short time.

3 more replies

badlogic3mo ago

OpenCode is not written in Go. It's TS on Bun, with OpenTUI underneath which is written in Zig.

Weryj3mo ago

I believe they use https://bun.com/ Not Node.js

slopusila3mo ago

why do you care about uncommitted virtual memory? that's practically infinite

krona3mo ago

Sounds like a web developer defined the solution a year before they knew what the problem was.

jama2113mo ago

Nah. It’s just web development languages are a better fit for agentic coding presently. They weighed the pros and cons, they’re not stupid.

3 more replies

jama2113mo ago

There’s nothing wrong with that, except it lets ai skeptics feel superior

everforward3mo ago

There are absolutely things wrong with that, because React was designed to solve problems that don't exist in a TUI.

React fixes issues with the DOM being too slow to fully re-render the entire webpage every time a piece of state changes. That doesn't apply in a TUI, you can re-render TUIs faster than the monitor can refresh. There's no need to selectively re-render parts of the UI, you can just re-render the entire thing every time something changes without even stressing out the CPU.

It brings in a bunch of complexity that doesn't solve any real issues beyond the devs being more familiar with React than a TUI library.

1 more reply

RohMin3mo ago

https://www.youtube.com/watch?v=LvW1HTSLPEk

I thought this was a solid take

1 more reply

overgard3mo ago

I haven't looked at it directly, so I can speak on quality, but it's a pretty weird way to write a terminal app

1 more reply

38362936483mo ago

Oh come on. It's massively wrong. It is always wrong. It's not always wrong enough to be important, but it doesn't stop being wrong

2 more replies

exe343mo ago

I use AI and I can call AI slop shit if it smells like shit.

1 more reply

thehamkercat3mo ago

Same with opencode and gemini, it's disgusting

Codex (by openai ironically) seems to be the fastest/most-responsive, opens instantly and is written in rust but doesn't contain that many features

Claude opens in around 3-4 seconds

Opencode opens in 2 seconds

Gemini-cli is an abomination which opens in around 16 second for me right now, and in 8 seconds on a fresh install

Codex takes 50ms for reference...

If their models are so good, why are they not rewriting their own react in cli bs to c++ or rust for 100x performance improvement (not kidding, it really is that much)

g947o3mo ago

Great question, and my guess:

If you build React in C++ and Rust, even if the framework is there, you'll likely need to write your components in C++/Rust. That is a difficult problem. There are actually libraries out there that allow you to build web UI with Rust, although they are for web (+ HTML/CSS) and not specifically CLI stuff.

So someone needs to create such a library that is properly maintained and such. And you'll likely develop slower in Rust compared to JS.

These companies don't see a point in doing that. So they just use whatever already exists.

3 more replies

azinman23mo ago

Why does it matter if Claude Code opens in 3-4 seconds if everything you do with it can take many seconds to minutes? Seems irrelevant to me.

4 more replies

wahnfrieden3mo ago

Codex team made the right call to rewrite its TypeScript to Rust early on

shoeb00m3mo ago

codex cli is missing a bunch of ux features like resizing on terminal size change.

Opencode's core is actually written in zig, only ui orchestration is in solidjs. It's only slightly slower to load than neo-vim on my system.

https://github.com/anomalyco/opentui

bdangubic3mo ago

50ms to open and then 2hrs to solve a simple problem vs 4s to open and then 5m to solve a problem, eh?

2 more replies

flashgordon3mo ago

I am really flabbergasted. How are they thinking using React for a TUI is a flex? Having 5 sessions open - and all idea - is taking up 98% of CPU. Is this another case of - "When all you is hammer, everything looks like nails"?

CooCooCaCha3mo ago

It’s really not that crazy.

React itself is a frontend-agnostic library. People primarily use it for writing websites but web support is actually a layer on top of base react and can be swapped out for whatever.

So they’re really just using react as a way to organize their terminal UI into components. For the same reason it’s handy to organize web ui into components.

dreamteam13mo ago

And some companies use it to write start menus.

tayo423mo ago

Is this a react feature or did they build something to translate react to text for display in the terminal?

sbarre3mo ago

React, the framework, is separate from react-dom, the browser rendering library. Most people think of those two as one thing because they're the most popular combo.

But there are many different rendering libraries you can use with React, including Ink, which is designed for building CLI TUIs..

1 more reply

pkkim3mo ago

They used Ink: https://github.com/vadimdemedes/ink

I've used it myself. It has some rough edges in terms of rendering performance but it's nice overall.

1 more reply

embedding-shape3mo ago

Not a built-in React feature. The idea been around for quite some time, I came across it initially with https://github.com/vadimdemedes/ink back in 2022 sometime.

tayo423mo ago

i had claude make a snake clone and fix all the flickering in like 20 minutes with the library mentioned lol

CamperBob23mo ago

Also explains why Claude Code is a React app outputting to a Terminal. (Seriously.)

Who cares, and why?

All of the major providers' CLI harnesses use Ink: https://github.com/vadimdemedes/ink

sweetheart3mo ago

React's core is agnostic when it comes to the actual rendering interface. It's just all the fancy algos for diffing and updating the underlying tree. Using it for rendering a TUI is a very reasonable application of the technology.

skydhash3mo ago

The terminal UI is not a tree structure that you can diff. It’s a 2D cells of characters, where every manipulation is a stream of texts. Refreshing or diffing that makes no sense.

4 more replies

raincole3mo ago

It explains how important dogfooding is if you want to make an extremely successful product.

jama2113mo ago

It’s extremely successful, not sure what it explains other than your biases

blibble3mo ago

Microsoft's products are also extremely successful

they're also total garbage

simianwords3mo ago

but they have the advantage of already being a big company. Anthropic is new and there's no reason for people to use it

2 more replies

jama2113mo ago

Well there you have it, proof you’re not being reasonable. Microsoft’s products annoy HN users but they are absolutely not total garbage. They’re highly functional and valuable and if they weren’t they truely wouldn’t be used, they’re just flawed.

1 more reply

holoduke3mo ago

Claude is by far the most popular and best assistant currently available for a developer.

3 more replies

acedTrex3mo ago

Something being successful and something being a high quality product with good engineering are two completely different questions.

mvdtnz3mo ago

Anthropic has perhaps the most embarrassing status page history I have ever seen. They are famous for downtime.

https://status.claude.com/

ronsor3mo ago

As opposed to other companies which are smart enough not to report outages.

1 more reply

Computer03mo ago

The competition doesn't currently have all 99's - https://status.openai.com/

djeastm3mo ago

The best way to use Claude's models seems to be some other inference provider (either OpenRouter or directly)

derwiki3mo ago

Shades of Fail Whale

dimgl3mo ago

And yet people still use them.

cedws3mo ago

The sandboxing in CC is an absolute joke, it's no wonder there's an explosion of sandbox wrappers at the moment. There's going to be a security catastrophe at some point, no doubt about it.

quietsegfault3mo ago

What does it explain, oh snark master supreme?

exe343mo ago

Can Claude fix the flicker in Claude yet?

1 more reply

spruce_tips3mo ago

Ah yes, explains why it takes 3 seconds for a new chat to load after I click new chat in the macOS app.

EcommerceFlow3mo ago

Anecdotal, but it 1 shot fixed a UI bug that neither Opus 4.5/Codex 5.2-high could fix.

epolanski3mo ago

+1, same experience, switched model as I've read the news thinking "let's try".

But it spent lots and lots of time thinking more than 4.5, did you had the same impression.

EcommerceFlow3mo ago

I didn't compare to that level, just had it create a plan first then implemented it.

fergie3mo ago

Say I am just an average coder doing a days work with Claude. How much will that cost?

joelmanner3mo ago

I've only barely hit the 5h limit when working intensively with plan mode on the $100/mo plan. Never had a problem with the weekly limit.

swalsh3mo ago

What I’d love is some small model specializing in reading long web pages, and extracting the key info. Search fills the context very quickly, but if a cheap subagent could extract the important bits that problem might be reduced.

danielbln3mo ago

So send off haiku subtasks and have them come back with the results.

casey23mo ago

Google already won the AI race. It's very silly to try and make AGI by hyperfocusing on outdated programming paradigms. You NEED multimodal to do anything remotely interesting with these systems.

esafak3mo ago

Coding, maths, writing, and science are not interesting??

j / k navigate · click thread line to collapse

1031 comments

data-ottawa3mo ago

I wonder if I’ve been in A/B test with this.

Claude figured out zig’s ArrayList and io changes a couple weeks ago.

It felt like it got better then very dumb again the last few days.

1 more reply

mattacular3mo ago

It's hard to tell with these releases if Anthropic's astroturfing campaign has come to HN or not but I feel like it probably has

g-mork3mo ago

the top 5 comments on this thread are from accounts that are around 10 years old each. What gives you any reason to believe this is an astroturfing campaign?

2 more replies

hatkid953mo ago

It would be height of foolishness to believe it didn’t

timcobb3mo ago

Anthropic's models are really good!

ramesh313mo ago

Am I alone in finding no use for Opus? Token costs are like 10x yet I see no difference at all vs. Sonnet with Claude Code.

mnicky3mo ago

On my tasks (mostly data science), Opus has significantly lower probability of making stupid mistakes than Sonnet.

I'd still appreciate more intelligence than Opus 4.5 so I'm looking forward to trying 4.6.

jpcompartir3mo ago

4.6 is a beast.

Everything in plan mode first + AskUserQuestionTool, review all plans, get it to write its own CLAUDE.md for coding standards and edit where necessary and away you go.

Seems noticeably better than 4.5 at keeping the codebase slim. Obviously it still needs to be kept an eye on, but it's a step up from 4.5.

nwienert3mo ago

Not clearly a step up for me, it's way more hesitant it seems and I don't notice context being larger at all it seems to compact just as often.

sega_sai3mo ago

laxk3mo ago

Google knows how to wait. Let's give them a chance.

jonatron3mo ago

jaggederest3mo ago

> what is the current carrying capacity of 25mm multicore armoured thermoplastic insulated cables with aluminium conductors, on perforated cable tray?

  This is an electrical engineering question rather than a software engineering one, but let me look it up for you.

  WEB SEARCH ("current carrying capacity 25mm multicore armoured thermoplastic insulated cable aluminium conductors perforated cable tray BS 7671")
   ↳ Found 10 results.

   Based on BS 7671 Table 4D4B (multicore armoured thermoplastic insulated cables with aluminium conductors), the current-carrying capacity for a 25mm cable on a perforated cable tray (Reference Method E) is approximately 80A for a 3- or 4-core cable at 30C ambient temperature and 70C conductor operating temperature.

    Note: The exact value depends on the number of cores (2-core vs 3/4-core) and whether any rating/correction factors (grouping, ambient temperature, thermal insulation) apply. I'd recommend confirming against the actual BS 7671:2018 Table 4D4B for your specific configuration, as the aluminium conductor ratings are lower than the copper equivalents (~102A for copper in the same conditions).

    This is outside my core domain of software engineering -- for critical electrical installations, always verify with the relevant standards documentation or a qualified electrical engineer.

jonatron3mo ago

That's useless, unexpectedly just Google gives the correct answer.

esafak3mo ago

Great. So it did not waste its memory on trivia. Don't memorize what you can look up.

osti3mo ago

Somehow regresses on SWE bench?

lkbm3mo ago

I don't know how these benchmarks work (do you do a hundred runs? A thousand runs?), but 0.1% seems like noise.

SubiculumCode3mo ago

That benchmark is pretty saturated, tbh. A "regression" of such small magnitude could mean many different things or nothing at all.

usaar3333mo ago

i'd interpret that as rounding error. that is unchanged

swe-bench seems really hard once you are above 80%

Squarex3mo ago

it's not a great benchmark anymore... starting with it being python / django primarily... the industry should move to something more representative

usaar3333mo ago

Openai has; they don't even mention score on gpt-5.3-codex.

On the other hand, it is their own verified benchmark, which is telling.

m-hodges3mo ago

> In Claude Code, you can now assemble agent teams to work on tasks together.

nprz3mo ago

I was just reading about Steve Yegge's Gas Town[0], it sounds like agent orchestration is now integrated into Claude Code?

[0]https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...

jorl173mo ago

It does not make a single mistake, it identifies neologisms, hidden meaning, 7 distinct poetic phases, recurring themes, fragments/heteronyms, related authors. It has left me completely speechless.

Speechless. I am speechless.

Perhaps Opus 4.5 could do it too — I don't know because I needed the 1M context window for this.

I cannot put into words how shocked I am at this. I use LLMs daily, I code with agents, I am extremely bullish on AI and, still, I am shocked.

What is this sorcery?

emp173443mo ago

This sounds wayyyy over the top for a mode that released 10 mins ago. At least wait an hour or so before spewing breathless hype.

pb73mo ago

He just explained a specific personal example why he is hyped up, did you read a word of it?

emp173443mo ago

Yeah, I read it.

“Speechless, shocked, unbelievable, insane, speechless”, etc.

Not a lot of real substance there.

1 more reply

euph0ria3mo ago

Could you please post the key poems? Would love to read them.

jorl173mo ago

I am way too self-conscious to do that :) Plus they are almost all in Portuguese!

wartywhoa233mo ago

> What is this sorcery?

The one you'll be seeking counter-spells against pretty soon.

scrollop3mo ago

Can you compare the result to using 5.2 thinking and gemini 3 pro?

jorl173mo ago

I can run the comparison again, and also include OpenAI's new release (if the context is long enough), but, last time I did it, they weren't even in the same league.

Will bring back results soon.

Edit:

I (re-)tested:

- Gemini 3 (Pro)

- Gemini 3 (Flash)

- GPT 5.2

- Sonnet 4.5

Most of what these models say is true, but it really feels incomplete. Like half-truths or only a surface-level inquiry into truth.

The other models also seem to produce shorter analyses, with a tendency to hyperfocus on some specific aspects of my poetry, missing a bunch of them.

vinhnx3mo ago

Just used Opus 4.6 via GitHub Copilot. It feels very different. Inference seems slow for now. I guess Opus 4.6 has adaptive thinking activated by default.

vinhnx3mo ago

Confirm by PM lead at VS Code team

> "We have high thinking as default + adaptive thinking, first time we’ve run with these settings..."

> https://x.com/pierceboggan/status/2019645801769689486

christophilus3mo ago

It dos seem noticeably slower. I may stick with 4.5 which was good enough for me for most tasks.

vinhnx3mo ago

VS Code confirms that they are experimenting with the new adaptive thinking and high reasoning effort params. https://x.com/pierceboggan/status/2019645801769689486

paxys3mo ago

Hmm all leaks had said this would be Claude 5. Wonder if it was a last minute demotion due to performance. Would explain the few days' delay as well.

trash_cat3mo ago

I think the naming schemes are quite arbitrary at this point. Going to 5 would come with massive expectations that wouldn't meet reality.

mrandish3mo ago

After the negative reactions to GPT 5, we may see model versioning that asymptotically approaches the next whole number without ever reaching it. "New for 2030: Claude 4.9.2!"

esafak3mo ago

Or approaching a magic number like e (Metafont) or π (TeX).

Squarex3mo ago

the standard used to be that major version means a new base model / full retrain... but now it is arbitrary i guess

cornedor3mo ago

Leaks were mentioning Sonnet 5 and I guess later (a combination of) Opus 4.6

scrollop3mo ago

Sonnet 5 was mentioned initially.

ra3mo ago

Why are Anthropic such a horrible company to deal with?

danielbln3mo ago

Care to elaborate?

ra3mo ago

obscure billing, unreachable customer support gatekeeped by an overzealous chatbot, no transparency about inclusions, or changes to inclusions over time... just from recent experience.

niobe3mo ago

Is there a good technical breakdown of all these benchmarks that get used to market the latest greatest LLMs somewhere? Preferably impartial.

Aztar3mo ago

I just ask claude and ask for sources for each one.

niobe3mo ago

Reminds me of how if you make a complaint against a lawyer or a judge it's evaluated by lawyers and judges.

small_model3mo ago

I have the max subscription wondering if this gives access to the new 1M context, or is it just the API that gets it?

joshstrange3mo ago

For now it's just API, but hopefully that's just their way of easing in and they open it up later.

small_model3mo ago

Ok thanks, hopefully, its annoying to lose or have context compacted in the middle of a large coding session

watson3mo ago

I've heard rumors this might be Sonnet 5 rebranded as Opus 4.6. But why? Profit? WDYT?

spruce_tips3mo ago

Calling it part of the Sonnet line would not provide the same level of blind buy in as calling it part of the Opus line does

1970-01-013mo ago

Here's one I've been using for awhile. The 'smarter' LLMs will overconfidently spit out 7. The dumber ones ask for more info. Opus 4.6 fails.

     A round drink coaster with a diameter of 9 sits between a beer glass and a wood table. The glass has a wall thickness of 1. What is the inner diameter of the glass?

raincole3mo ago

Gemini 3 Pro gives me this:

> Based on the information provided, it is impossible to determine the inner diameter of the glass.

Here is why:

Without knowing the Outer Diameter of the glass, the calculation cannot be completed.

raincole3mo ago

GPT 5.2 Chat:

> There isn’t enough information to determine the inner diameter *for sure*.

All we’re told is: - The coaster has a diameter of 9. - The glass sits on the coaster. - The glass wall thickness is 1.

Nothing explicitly states that the *outer diameter of the glass equals the coaster’s diameter*. The glass could be smaller than the coaster, which is very common.

*However*, if we make the usual implicit assumption in this kind of problem—that the coaster’s diameter matches the *outer diameter of the glass base*—then:

- Outer diameter of glass = 9 - Wall thickness = 1 on each side

So: \[ \text{Inner diameter} = 9 - 2(1) = 7 \]

*Inner diameter = 7 (same units as the coaster)*

Without that assumption, the problem is underdetermined.

tototrains3mo ago

Opus 4.6 Extended thinking:

```

Finding the Inner Diameter The coaster sits beneath the glass on the table, meaning the glass rests on top of it. Assuming the coaster matches the outer diameter of the glass base:

Outer diameter of glass = diameter of coaster = 9 Wall thickness = 1 (on each side, so subtract twice)

dinner=9−2(1)=7d_{\text{inner}} = 9 - 2(1) = 7dinner =9−2(1)=7 The inner diameter of the glass is 7.

```

Makes its assumption clear, seems reasonable?

1 more reply

mikalauskas3mo ago

Minimax M2.1:

The inner diameter of the glass is *7*.

blibble3mo ago

> We build Claude with Claude. Our engineers write code with Claude Code every day

well that explains quite a bit

jsheard3mo ago

dkersten3mo ago

Just anecdotally, each release seems to be buggier than the last.

To me, their claim that they are vibe coding Claude code isn’t the flex they think it is.

csomar3mo ago

zahlman3mo ago

I think I would be much more frightened if it were working well.

1 more reply

marcd353mo ago

Doesn’t this just exacerbate the “black box” conundrum if they just keep piling on more and more features without fully comprehending what’s being implemented

tgtweak3mo ago

plot twist, it's all claude code instances submitting bug reports on behalf of end users.

trescenzi3mo ago

> Unable to process - no bug report provided. Please share the issue details you'd like me to convert into a GitHub issue title

https://github.com/anthropics/claude-code/issues/23459

accrual3mo ago

It's Claude, all the way down.

elAhmo3mo ago

Insane to think that a relatively simple CLI tool has so many open issues...

emilsedgh3mo ago

It's not really a simple CLI tool though it's really interactive.

trymas3mo ago

What’s so simple about it?

1 more reply

bjackman3mo ago

Well part of the issue is that it isn't actually a CLI tool. It takes control of the whole terminal and then badly reimplements a CLI...

luckydata3mo ago

It's far from simple

dwaltrip3mo ago

sips coffee… ahh yes, let me find that classic Dropbox rsync comment

1 more reply

ericrallen3mo ago

The rate of Issues opened on a popular repo is at least one order of magnitude beyond the number of Issues whoever is able to deal with them can handle.

paxys3mo ago

Half of them were probably opened yesterday during the Claude outage.

anematode3mo ago

Nah, it was at like 5500 before.

8note3mo ago

its really popular, and having claude code makes it really easy to sumbit issues and make wild guesses about issues.

i wonder how many are real and not duplicates. i doubt github is well setup to handle said volume

gjsman-10003mo ago

Also explains why Claude Code is a React app outputting to a Terminal. (Seriously.)

krystofbe3mo ago

I did some debugging on this today. The results are... sobering.

Memory comparison of AI coding CLIs (single session, idle):

  | Tool        | Footprint | Peak   | Language      |
  |-------------|-----------|--------|---------------|
  | Codex       | 15 MB     | 15 MB  | Rust          |
  | OpenCode    | 130 MB    | 130 MB | Go            |
  | Claude Code | 360 MB    | 746 MB | Node.js/React |

That's a 24x to 50x difference for tools that do the same thing: send text to an API.

vmmap shows Claude Code reserves 32.8 GB virtual memory just for the V8 heap, has 45% malloc fragmentation, and a peak footprint of 746 MB that never gets released, classic leak pattern.

atonse3mo ago

This is just regular tech debt that happens from building something to $1bn in revenue as fast as you possibly can, optimize later.

They're optimizing now. I'm sure they'll have it under control in no time.

CC is an incredible product (so is codex but I use CC more). Yes, lately it's gotten bloated, but the value it provides makes it bearable until they fix it in short time.

3 more replies

badlogic3mo ago

OpenCode is not written in Go. It's TS on Bun, with OpenTUI underneath which is written in Zig.

Weryj3mo ago

I believe they use https://bun.com/ Not Node.js

slopusila3mo ago

why do you care about uncommitted virtual memory? that's practically infinite

krona3mo ago

Sounds like a web developer defined the solution a year before they knew what the problem was.

jama2113mo ago

Nah. It’s just web development languages are a better fit for agentic coding presently. They weighed the pros and cons, they’re not stupid.

3 more replies

jama2113mo ago

There’s nothing wrong with that, except it lets ai skeptics feel superior

everforward3mo ago

There are absolutely things wrong with that, because React was designed to solve problems that don't exist in a TUI.

It brings in a bunch of complexity that doesn't solve any real issues beyond the devs being more familiar with React than a TUI library.

1 more reply

RohMin3mo ago

https://www.youtube.com/watch?v=LvW1HTSLPEk

I thought this was a solid take

1 more reply

overgard3mo ago

I haven't looked at it directly, so I can speak on quality, but it's a pretty weird way to write a terminal app

1 more reply

38362936483mo ago

Oh come on. It's massively wrong. It is always wrong. It's not always wrong enough to be important, but it doesn't stop being wrong

2 more replies

exe343mo ago

I use AI and I can call AI slop shit if it smells like shit.

1 more reply

thehamkercat3mo ago

Same with opencode and gemini, it's disgusting

Codex (by openai ironically) seems to be the fastest/most-responsive, opens instantly and is written in rust but doesn't contain that many features

Claude opens in around 3-4 seconds

Opencode opens in 2 seconds

Gemini-cli is an abomination which opens in around 16 second for me right now, and in 8 seconds on a fresh install

Codex takes 50ms for reference...

If their models are so good, why are they not rewriting their own react in cli bs to c++ or rust for 100x performance improvement (not kidding, it really is that much)

g947o3mo ago

Great question, and my guess:

So someone needs to create such a library that is properly maintained and such. And you'll likely develop slower in Rust compared to JS.

These companies don't see a point in doing that. So they just use whatever already exists.

3 more replies

azinman23mo ago

Why does it matter if Claude Code opens in 3-4 seconds if everything you do with it can take many seconds to minutes? Seems irrelevant to me.

4 more replies

wahnfrieden3mo ago

Codex team made the right call to rewrite its TypeScript to Rust early on

shoeb00m3mo ago

codex cli is missing a bunch of ux features like resizing on terminal size change.

Opencode's core is actually written in zig, only ui orchestration is in solidjs. It's only slightly slower to load than neo-vim on my system.

https://github.com/anomalyco/opentui

bdangubic3mo ago

50ms to open and then 2hrs to solve a simple problem vs 4s to open and then 5m to solve a problem, eh?

2 more replies

flashgordon3mo ago

CooCooCaCha3mo ago

It’s really not that crazy.

React itself is a frontend-agnostic library. People primarily use it for writing websites but web support is actually a layer on top of base react and can be swapped out for whatever.

So they’re really just using react as a way to organize their terminal UI into components. For the same reason it’s handy to organize web ui into components.

dreamteam13mo ago

And some companies use it to write start menus.

tayo423mo ago

Is this a react feature or did they build something to translate react to text for display in the terminal?

sbarre3mo ago

React, the framework, is separate from react-dom, the browser rendering library. Most people think of those two as one thing because they're the most popular combo.

But there are many different rendering libraries you can use with React, including Ink, which is designed for building CLI TUIs..

1 more reply

pkkim3mo ago

They used Ink: https://github.com/vadimdemedes/ink

I've used it myself. It has some rough edges in terms of rendering performance but it's nice overall.

1 more reply

embedding-shape3mo ago

Not a built-in React feature. The idea been around for quite some time, I came across it initially with https://github.com/vadimdemedes/ink back in 2022 sometime.

tayo423mo ago

i had claude make a snake clone and fix all the flickering in like 20 minutes with the library mentioned lol

CamperBob23mo ago

Also explains why Claude Code is a React app outputting to a Terminal. (Seriously.)

Who cares, and why?

All of the major providers' CLI harnesses use Ink: https://github.com/vadimdemedes/ink

sweetheart3mo ago

skydhash3mo ago

The terminal UI is not a tree structure that you can diff. It’s a 2D cells of characters, where every manipulation is a stream of texts. Refreshing or diffing that makes no sense.

4 more replies

raincole3mo ago

It explains how important dogfooding is if you want to make an extremely successful product.

jama2113mo ago

It’s extremely successful, not sure what it explains other than your biases

blibble3mo ago

Microsoft's products are also extremely successful

they're also total garbage

simianwords3mo ago

but they have the advantage of already being a big company. Anthropic is new and there's no reason for people to use it

2 more replies

jama2113mo ago

1 more reply

holoduke3mo ago

Claude is by far the most popular and best assistant currently available for a developer.

3 more replies

acedTrex3mo ago

Something being successful and something being a high quality product with good engineering are two completely different questions.

mvdtnz3mo ago

Anthropic has perhaps the most embarrassing status page history I have ever seen. They are famous for downtime.

https://status.claude.com/

ronsor3mo ago

As opposed to other companies which are smart enough not to report outages.

1 more reply

Computer03mo ago

The competition doesn't currently have all 99's - https://status.openai.com/

djeastm3mo ago

The best way to use Claude's models seems to be some other inference provider (either OpenRouter or directly)

derwiki3mo ago

Shades of Fail Whale

dimgl3mo ago

And yet people still use them.

cedws3mo ago

The sandboxing in CC is an absolute joke, it's no wonder there's an explosion of sandbox wrappers at the moment. There's going to be a security catastrophe at some point, no doubt about it.

quietsegfault3mo ago

What does it explain, oh snark master supreme?

exe343mo ago

Can Claude fix the flicker in Claude yet?

1 more reply

spruce_tips3mo ago

Ah yes, explains why it takes 3 seconds for a new chat to load after I click new chat in the macOS app.

EcommerceFlow3mo ago

Anecdotal, but it 1 shot fixed a UI bug that neither Opus 4.5/Codex 5.2-high could fix.

epolanski3mo ago

+1, same experience, switched model as I've read the news thinking "let's try".

But it spent lots and lots of time thinking more than 4.5, did you had the same impression.

EcommerceFlow3mo ago

I didn't compare to that level, just had it create a plan first then implemented it.

fergie3mo ago

Say I am just an average coder doing a days work with Claude. How much will that cost?

joelmanner3mo ago

I've only barely hit the 5h limit when working intensively with plan mode on the $100/mo plan. Never had a problem with the weekly limit.

swalsh3mo ago

danielbln3mo ago

So send off haiku subtasks and have them come back with the results.

casey23mo ago

Google already won the AI race. It's very silly to try and make AGI by hyperfocusing on outdated programming paradigms. You NEED multimodal to do anything remotely interesting with these systems.

esafak3mo ago

Coding, maths, writing, and science are not interesting??

j / k navigate · click thread line to collapse