Claude figured out zig’s ArrayList and io changes a couple weeks ago.
It felt like it got better then very dumb again the last few days.
I'd still appreciate more intelligence than Opus 4.5 so I'm looking forward to trying 4.6.
Everything in plan mode first + AskUserQuestionTool, review all plans, get it to write its own CLAUDE.md for coding standards and edit where necessary and away you go.
Seems noticeably better than 4.5 at keeping the codebase slim. Obviously it still needs to be kept an eye on, but it's a step up from 4.5.
This is an electrical engineering question rather than a software engineering one, but let me look it up for you.
WEB SEARCH ("current carrying capacity 25mm multicore armoured thermoplastic insulated cable aluminium conductors perforated cable tray BS 7671")
↳ Found 10 results.
Based on BS 7671 Table 4D4B (multicore armoured thermoplastic insulated cables with aluminium conductors), the current-carrying capacity for a 25mm cable on a perforated cable tray (Reference Method E) is approximately 80A for a 3- or 4-core cable at 30C ambient temperature and 70C conductor operating temperature.
Note: The exact value depends on the number of cores (2-core vs 3/4-core) and whether any rating/correction factors (grouping, ambient temperature, thermal insulation) apply. I'd recommend confirming against the actual BS 7671:2018 Table 4D4B for your specific configuration, as the aluminium conductor ratings are lower than the copper equivalents (~102A for copper in the same conditions).
This is outside my core domain of software engineering -- for critical electrical installations, always verify with the relevant standards documentation or a qualified electrical engineer.swe-bench seems really hard once you are above 80%
On the other hand, it is their own verified benchmark, which is telling.
[0]https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...
It does not make a single mistake, it identifies neologisms, hidden meaning, 7 distinct poetic phases, recurring themes, fragments/heteronyms, related authors. It has left me completely speechless.
Speechless. I am speechless.
Perhaps Opus 4.5 could do it too — I don't know because I needed the 1M context window for this.
I cannot put into words how shocked I am at this. I use LLMs daily, I code with agents, I am extremely bullish on AI and, still, I am shocked.
I have used my poetry and an analysis of it as a personal metric for how good models are. Gemini 2.5 pro was the first time a model could keep track of the breadth of the work without getting lost, but Opus 4.6 straight up does not get anything wrong and goes beyond that to identify things (key poems, key motifs, and many other things) that I would always have to kind of trick the models into producing. I would always feel like I was leading the models on. But this — this — this is unbelievable. Unbelievable. Insane.
This "key poem" thing is particularly surreal to me. Out of 900 poems, while analyzing the collection, it picked 12 "key poems, and I do agree that 11 of those would be on my 30-or-so "key poem list". What's amazing is that whenever I explicitly asked any model, to this date, to do it, they would get maybe 2 or 3, but mostly fail completely.
What is this sorcery?
The one you'll be seeking counter-spells against pretty soon.
When I last did it, 5.X thinking (can't remember which it was) had this terrible habit of code-switching between english and portuguese that made it sound like a robot (an agent to do things, rather than a human writing an essay), and it just didn't really "reason" effectively over the poems.
I can't explain it in any other way other than: "5.X thinking interprets this body of work in a way that is plausible, but I know, as the author, to be wrong; and I expect most people would also eventually find it to be wrong, as if it is being only very superficially looked at, or looked at by a high-schooler".
Gemini 3, at the time, was the worst of them, with some hallucinations, date mix ups (mixing poems from 2023 with poems from 2019), and overall just feeling quite lost and making very outlandish interpretations of the work. To be honest it sort of feels like Gemini hasn't been able to progress on this task since 2.5 pro (it has definitely improved on other things — I've recently switched to Gemini 3 on a product that was using 2.5 before)
Last time I did this test, Sonnet 4.5 was better than 5.X Thinking and Gemini 3 pro, but not exceedingly so. It's all so subjective, but the best I can say is it "felt like the analysis of the work I could agree with the most". I felt more seen and understood, if that makes sense (it is poetry, after all). Plus when I got each LLM to try to tell me everything it "knew" about me from the poems, Sonnet 4.5 got the most things right (though they were all very close).
Will bring back results soon.
Edit:
I (re-)tested:
- Gemini 3 (Pro)
- Gemini 3 (Flash)
- GPT 5.2
- Sonnet 4.5
Having seen Opus 4.5, they all seem very similar, and I can't really distinguish them in terms of depth and accuracy of analysis. They obviously have differences, especially stylistic ones, but, when compared with Opus 4.5 they're all on the same ballpark.
These models produce rather superficial analyses (when compared with Opus 4.5), missing out on several key things that Opus 4.5 got, such as specific and recurring neologisms and expressions, accurate connections to authors that serve as inspiration (Claude 4.5 gets them right, the other models get _close_, but not quite), and the meaning of some specific symbols in my poetry (Opus 4.5 identifies the symbols and the meaning; the other models identify most of the symbols, but fail to grasp the meaning sometimes).
Most of what these models say is true, but it really feels incomplete. Like half-truths or only a surface-level inquiry into truth.
As another example, Opus 4.5 identifies 7 distinct poetic phases, whereas Gemini 3 (Pro) identifies 4 which are technically correct, but miss out on key form and content transitions. When I look back, I personally agree with the 7 (maybe 6), but definitely not 4.
These models also clearly get some facts mixed up which Opus 4.5 did not (such as inferred timelines for some personal events). After having posted my comment to HN, I've been engaging with Opus4.5 and have managed to get it to also slip up on some dates, but not nearly as much as other models.
The other models also seem to produce shorter analyses, with a tendency to hyperfocus on some specific aspects of my poetry, missing a bunch of them.
--
To be fair, all of these models produce very good analyses which would take someone a lot of patience and probably weeks or months of work (which of course will never happen, it's a thought experiment).
It is entirely possible that the extremely simple prompt I used is just better with Claude Opus 4.5/4.6. But I will note that I have used very long and detailed prompts in the past with the other models and they've never really given me this level of....fidelity...about how I view my own work.
> "We have high thinking as default + adaptive thinking, first time we’ve run with these settings..."
Calling it part of the Sonnet line would not provide the same level of blind buy in as calling it part of the Opus line does
A round drink coaster with a diameter of 9 sits between a beer glass and a wood table. The glass has a wall thickness of 1. What is the inner diameter of the glass?> Based on the information provided, it is impossible to determine the inner diameter of the glass.
Here is why:
The Coaster Dimension: Knowing the coaster has a diameter of 9 only tells us the size of the object under the glass. It does not mean the glass is the same size; the glass could be smaller (fitting entirely on the coaster) or larger (overhanging the coaster). The Glass Dimensions: While you provided the wall thickness (1), you did not provide the outer diameter of the glass. To find the inner diameter, the formula is: Inner Diameter = Outer Diameter - (2 × Wall Thickness)
Without knowing the Outer Diameter of the glass, the calculation cannot be completed.
> There isn’t enough information to determine the inner diameter *for sure*.
All we’re told is: - The coaster has a diameter of 9. - The glass sits on the coaster. - The glass wall thickness is 1.
Nothing explicitly states that the *outer diameter of the glass equals the coaster’s diameter*. The glass could be smaller than the coaster, which is very common.
*However*, if we make the usual implicit assumption in this kind of problem—that the coaster’s diameter matches the *outer diameter of the glass base*—then:
- Outer diameter of glass = 9 - Wall thickness = 1 on each side
So: \[ \text{Inner diameter} = 9 - 2(1) = 7 \]
*Inner diameter = 7 (same units as the coaster)*
Without that assumption, the problem is underdetermined.
```
Finding the Inner Diameter The coaster sits beneath the glass on the table, meaning the glass rests on top of it. Assuming the coaster matches the outer diameter of the glass base:
Outer diameter of glass = diameter of coaster = 9 Wall thickness = 1 (on each side, so subtract twice)
dinner=9−2(1)=7d_{\text{inner}} = 9 - 2(1) = 7dinner =9−2(1)=7 The inner diameter of the glass is 7.
```
Makes its assumption clear, seems reasonable?
The inner diameter of the glass is *7*.
Here's the reasoning: - The coaster (diameter 9) sits between the glass and table, meaning the glass sits directly on the coaster - This means the *outer diameter of the glass equals the coaster diameter = 9* - The glass has a wall thickness of 1 on each side - *Inner diameter = Outer diameter - 2 × wall thickness* - Inner diameter = 9 - 2(1) = 9 - 2 = *7*
well that explains quite a bit
To me, their claim that they are vibe coding Claude code isn’t the flex they think it is.
I find it harder and harder to trust anthropic for business related use and not just hobby tinkering. Between buggy releases, opaque and often seemingly glitches rate limits and usage limits, and the model quality inconsistency, it’s just not something I’d want to bet a business on.
Unlike what another commenter suggested, this is a complex tool. I'm curious whether the codebase might eventually reach a point where it becomes unfixable; even with human assistance. That would be an interesting development. We'll see.
> Unable to process - no bug report provided. Please share the issue details you'd like me to convert into a GitHub issue title
i wonder how many are real and not duplicates. i doubt github is well setup to handle said volume
Memory comparison of AI coding CLIs (single session, idle):
| Tool | Footprint | Peak | Language |
|-------------|-----------|--------|---------------|
| Codex | 15 MB | 15 MB | Rust |
| OpenCode | 130 MB | 130 MB | Go |
| Claude Code | 360 MB | 746 MB | Node.js/React |
That's a 24x to 50x difference for tools that do the same thing: send text to an API.vmmap shows Claude Code reserves 32.8 GB virtual memory just for the V8 heap, has 45% malloc fragmentation, and a peak footprint of 746 MB that never gets released, classic leak pattern.
On my 16 GB Mac, a "normal" workload (2 Claude sessions + browser + terminal) pushes me into 9.5 GB swap within hours. My laptop genuinely runs slower with Claude Code than when I'm running local LLMs.
I get that shipping fast matters, but building a CLI with React and a full Node.js runtime is an architectural choice with consequences. Codex proves this can be done in 15 MB. Every Claude Code session costs me 360+ MB, and with MCP servers spawning per session, it multiplies fast.
This is just regular tech debt that happens from building something to $1bn in revenue as fast as you possibly can, optimize later.
They're optimizing now. I'm sure they'll have it under control in no time.
CC is an incredible product (so is codex but I use CC more). Yes, lately it's gotten bloated, but the value it provides makes it bearable until they fix it in short time.
React fixes issues with the DOM being too slow to fully re-render the entire webpage every time a piece of state changes. That doesn't apply in a TUI, you can re-render TUIs faster than the monitor can refresh. There's no need to selectively re-render parts of the UI, you can just re-render the entire thing every time something changes without even stressing out the CPU.
It brings in a bunch of complexity that doesn't solve any real issues beyond the devs being more familiar with React than a TUI library.
Codex (by openai ironically) seems to be the fastest/most-responsive, opens instantly and is written in rust but doesn't contain that many features
Claude opens in around 3-4 seconds
Opencode opens in 2 seconds
Gemini-cli is an abomination which opens in around 16 second for me right now, and in 8 seconds on a fresh install
Codex takes 50ms for reference...
--
If their models are so good, why are they not rewriting their own react in cli bs to c++ or rust for 100x performance improvement (not kidding, it really is that much)
If you build React in C++ and Rust, even if the framework is there, you'll likely need to write your components in C++/Rust. That is a difficult problem. There are actually libraries out there that allow you to build web UI with Rust, although they are for web (+ HTML/CSS) and not specifically CLI stuff.
So someone needs to create such a library that is properly maintained and such. And you'll likely develop slower in Rust compared to JS.
These companies don't see a point in doing that. So they just use whatever already exists.
Opencode's core is actually written in zig, only ui orchestration is in solidjs. It's only slightly slower to load than neo-vim on my system.
React itself is a frontend-agnostic library. People primarily use it for writing websites but web support is actually a layer on top of base react and can be swapped out for whatever.
So they’re really just using react as a way to organize their terminal UI into components. For the same reason it’s handy to organize web ui into components.
But there are many different rendering libraries you can use with React, including Ink, which is designed for building CLI TUIs..
I've used it myself. It has some rough edges in terms of rendering performance but it's nice overall.
Who cares, and why?
All of the major providers' CLI harnesses use Ink: https://github.com/vadimdemedes/ink
they're also total garbage
But it spent lots and lots of time thinking more than 4.5, did you had the same impression.