undefined | Better HN

0 pointsklaussilveira4mo ago0 comments

I made a similar comment on a different thread, but I think it also fits here: I think the disconnect between engineers is due to their own context. If you work with frontend applications, specially React/React Native/HTML/Mobile, your experience with LLMs is completely different than the experience of someone working with OpenGL, io_uring, libev and other lower level stuff. Sure, Opus 4.5 can one shot Windows utilities and full stack apps, but can't implement a simple shadowing algorithm from a 2003 paper in C++, GLFW, GLAD: https://www.cse.chalmers.se/~uffe/soft_gfxhw2003.pdf

Codex/Claude Code are terrible with C++. It also can't do Rust really well, once you get to the meat of it. Not sure why that is, but they just spit out nonsense that creates more work than it helps me. It also can't one shot anything complete, even though I might feed him the entire paper that explains what the algorithm is supposed to do.

Try to do some OpenGL or Vulkan with it, without using WebGPU or three.js. Try it with real code, that all of us have to deal with every day. SDL, Vulkan RHI, NVRHI. Very frustrating.

Try it with boost, or cmake, or taskflow. It loses itself constantly, hallucinates which version it is working on and ignores you when you provide actual pointers to documentation on the repo.

I've also recently tried to get Opus 4.5 to move the Job system from Doom 3 BFG to the original codebase. Clean clone of dhewm3, pointed Opus to the BFG Job system codebase, and explained how it works. I have also fed it the Fabien Sanglard code review of the job system: https://fabiensanglard.net/doom3_bfg/threading.php

We are not sleeping on it, we are actually waiting for it to get actually useful. Sure, it can generate a full stack admin control panel in JS for my PostgreSQL tables, but is that really "not normal"? That's basic.

0 comments

JDye4mo ago

We have an in-house, Rust-based proxy server. Claude is unable to contribute to it meaningfully outside of grunt work like minor refactors across many files. It doesn't seem to understand proxying and how it works on both a protocol level and business logic level.

With some entirely novel work we're doing, it's actually a hindrance as it consistently tells us the approach isn't valid/won't work (it will) and then enters "absolutely right" loops when corrected.

I still believe those who rave about it are not writing anything I would consider "engineering". Or perhaps it's a skill issue and I'm using it wrong, but I haven't yet met someone I respect who tells me it's the future in the way those running AI-based companies tell me.

dpc_012344mo ago

> We have an in-house, Rust-based proxy server. Claude is unable to contribute to it meaningfully outside

I have a great time using Claude Code in Rust projects, so I know it's not about the language exactly.

My working model is is that since LLM are basically inference/correlation based, the more you deviate from the mainstream corpus of training data, the more confused LLM gets. Because LLM doesn't "understand" anything. But if it was trained on a lot of things kind of like the problem, it can match the patterns just fine, and it can generalize over a lot layers, including programming languages.

Also I've noticed that it can get confused about stupid stuff. E.g. I had two different things named kind of the same in two parts of the codebase, and it would constantly stumble on conflating them. Changing the name in the codebase immediately improved it.

So yeah, we've got another potentially powerful tool that requires understanding how it works under the hood to be useful. Kind of like git.

lisperforlife4mo ago

Recently the v8 rust library changed it from mutable handle scopes to pinned scopes. A fairly simple change that I even put in my CLAUDE.md file. But it still generates methods with HandleScope's and then says... oh I have a different scope and goes on a random walk refactoring completely unrelated parts of the code. All the while Opus 4.5 burns through tokens. Things work great as long as you are testing on the training set. But that said, it is absolutely brilliant with React and Typescript.

dpc_012344mo ago

Well, it's not like it never happened to me to "burn tokens" with some lifetime issue. :D But yeah, if you're working in Rust on something with sharp edges, LLM will get get hurt. I just don't tend to have these in my projects.

Even more basic failure mode. I told it to convert/copy a bit (1k LOC) of blocking code into a new module and convert to async. It just couldn't do a proper 1:1 logical _copy_. But when I manually `cp <src> <dst>` the file and then told it to convert that to async and fix issues, it did it 100% correct. Because fundamentally it's just non-deterministic pattern generator.

fullstackchris4mo ago

hot take (that shouldn't be?): if your code is super easy to follow as a human, it will be super easy to follow for an LLM. (hint: guess where the training data is coming from!)

kevin424mo ago

This isn't meant as a criticism, or to doubt your experience, but I've talked to a few people who had experiences like this. But, I helped them get Claude code setup, analyze the codebase and document the architecture into markdown (edit as needed after), create an agent for the architecture, and prompt it in an incremental way. Maybe 15-30 minutes of prep. Everyone I helped with this responded with things like "This is amazing", "Wow!", etc.

For some things you can fire up Claude and have it generate great code from scratch. But for bigger code bases and more complex architecture, you need to break it down ahead of time so it can just read about the architecture rather than analyze it every time.

ryandrake4mo ago

Is there any good documentation out there about how to perform this wizardry? I always assumed if you did /init in a new code base, that Claude would set itself up to maximize its own understanding of the code. If there are extra steps that need to be done, why don't Claude's developers just add those extra steps to /init?

kevin424mo ago

Not that I have seen, which is probably a big part of the disconnect. Mostly it's tribal knowledge. I learned through experimentation, but I've seen tips here and there. Here's my workflow (roughly)

> Create a CLAUDE.md for a c++ application that uses libraries x/y/z

[Then I edit it, adding general information about the architecture]

> Analyze the library in the xxx directory, and produce a xxx_architecture.md describing the major components and design

> /agent [let claude make the agent, but when it asks what you want it to do, explain that you want it to specialize in subsystem xxx, and refer to xxx_architecture.md

Then repeat until you have the major components covered. Then:

> Using the files named with architecture.md analyze the entire system and update CLAUDE.md to use refer to them and use the specialized agents.

Now, when you need to do something, put it in planning mode and say something like:

> There's a bug in the xxx part of the application, where when I do yyy, it does zzz, but it should do aaa. Analyze the problem and come up with a plan to fix it, and automated tests you can perform if possible.

Then, iterate on the plan with it if you need to, or just approve it.

One of the most important things you can do when dealing with something complex is let it come up with a test case so it can fix or implement something and then iterate until it's done. I had an image processing problem and I gave it some sample data, then it iterated (looking at the output image) until it fixed it. It spent at least an hour, but I didn't have to touch it while it worked.

3 more replies

HDThoreaun4mo ago

> if you did /init in a new code base, that Claude would set itself up to maximize its own understanding of the code.

This is definitely not the case, and the reason anthropic doesnt make claude do this is because its quality degrades massively as you use up its context. So the solution is to let users manage the context themselves in order to minimize the amount that is "wasted" on prep work. Context windows have been increasing quite a bit so I suspect that by 2030 this will no longer be an issue for any but the largest codebases, but for now you need to be strategic.

turkey994mo ago

Are you still talking about Opus 4.5 I’ve been working on a Rust, kotlin and c++ and it’s been doing well. Incredible at C++, like the number of mistakes it doesn’t make

parliament324mo ago

> I still believe those who rave about it are not writing anything I would consider "engineering".

Correct. In fact, this is the entire reason for the disconnect, where it seems like half the people here think LLMs are the best thing ever and the other half are confused about where the value is in these slop generators.

The key difference is (despite everyone calling themselves an SWE nowadays) there's a difference between a "programmer" and an "engineer". Looking at OP, exactly zero of his screenshotted apps are what I would consider "engineering". Literally everything in there has been done over and over to the death. Engineering is.. novel, for lack of a better word.

woah4mo ago

> Engineering is.. novel, for lack of a better word.

Tell that to the guys drawing up the world's 10 millionth cable suspension bridge

drysine4mo ago

Actually, 10000th

https://www.bridgemeister.com/fulllist.htm

ryandrake4mo ago

I don't think it's that helpful to try to gatekeep the "engineering" term or try to separate it into "pure" and "impure" buckets, implying that one is lesser than the other. It should be enough to just say that AI assisted development is much better at non-novel tasks than it is at novel tasks. Which makes sense: LLMs are trained on existing work, and can't do anything novel because if it was trained on a task, that task is by definition not novel.

parliament324mo ago

Respectfully, it's absolutely important to "gatekeep" a title that has an established definition and certain expectations attached to the title.

OP says, "BUT YOU DON’T KNOW HOW THE CODE WORKS.. No I don’t. I have a vague idea, but you are right - I do not know how the applications are actually assembled." This is not what I would call an engineer. Or a programmer. "Prompter", at best.

And yes, this is absolutely "lesser than", just like a middleman who subcontracts his work to Fiverr (and has no understanding of the actual work) is "lesser than" an actual developer.

1 more reply

scottyah4mo ago

It's how you use the tool that matters. Some people get bitter and try to compare it to top engineers' work on novel things as a strawman so they can go "Hah! Look how it failed!" as they swing a hammer to demonstrate it cannot chop down a tree. Because the tool is so novel and it's use us a lot more abstract than that of an axe, it is taking awhile for some to see its potential, especially if they are remembering models from even six months ago.

Engineering is just problem solving, nobody judges structural engineers for designing structures with another Simpson Strong Tie/No.2 Pine 2x4 combo because that is just another easy (and therefore cheap) way to rapidly get to the desired state. If your client/company want to pay for art, that's great! Most just want the thing done fast and robustly.

wolvoleo4mo ago

I think it's also that the potential is far from being realized yet we're constantly bombarded by braindead marketers trying to convince us that it's the best thing ever already. This is tiring especially when the leadership (not held back by any technical knowledge) believes them.

I'm sure AI will get there, I also think it's not very good yet.

loandbehold4mo ago

Coding agents as of Jan 2026 are great at what 95% of software engineers do. For remaining 5% that do really novel stuff -- the agents will get there in a few years.

3oil34mo ago

When he said 'just look at what I'v been able to build', I was expecting anything but an 'image converter'

wild_egg4mo ago

I've had Opus 4.5 hand rolling CUDA kernels and writing a custom event loop on io_uring lately and both were done really well. Need to set up the right feedback loops so it can test its work thoroughly but then it flies.

jaggederest4mo ago

Yeah I've handed it a naive scalar implementation and said "Make this use SIMD for Mac Silicon / NEON" and it just spits out a working implementation that's 3-6x faster and passes the tests, which are binary exact specifications.

jonstewart4mo ago

It can do this at the level of a function, and that's -useful-, but like the parent reply to top-level comment, and despite investing the time, using skills & subagents, etc., I haven't gotten it to do well with C++ or Rust projects of sufficient complexity. I'm not going to say they won't some day, but, it's not today.

rtfeldman4mo ago

Anecdotally, we use Opus 4.5 constantly on Zed's code base, which is almost a million lines of Rust code and has over 150K active users, and we use it for basically every task you can think of - new features, bug fixes, refactors, prototypes, you name it. The code base is a complex native GUI with no Web tech anywhere in it.

I'm not talking about "write this function" but rather like implementing the whole feature by writing only English to the agent, over the course of numerous back-and-forth interactions and exhausting multiple 200K-token context windows.

For me personally, definitely at least 99% all of the Rust code I've committed at work since Opus 4.5 came out has been from an agent running that model. I'm reading lots of Rust code (that Opus generated) but I'm essentially no longer writing any of it. If dot-autocomplete (and LLM autocomplete) disappeared from IDE existence, I would not notice.

4 more replies

jaggederest4mo ago

I don't know if you've tried Chatgpt-5.2 but I find codex much better for Rust mostly due to the underlying model. You have to do planning and provide context, but 80%+ of the time it's a oneshot for small-to-medium size features in an existing codebase that's fairly complex. I honestly have to say that it's a better programmer than I am, it's just not anywhere near as good a software developer for all of the higher and lower level concerns that are the other 50% of the job.

If you have any opensource examples of your codebase, prompt, and/or output, I would happily learn from it / give advice. I think we're all still figuring it out.

Also this SIMD translation wasn't just a single function - it was multiple functions across a whole region of the codebase dealing with video and frame capture, so pretty substantial.

1 more reply

andai4mo ago

Is that a context issue? I wonder if LSP would help there. Though Claude Code should grep the codebase for all necessary context and LSP should in theory only save time, I think there would be a real improvement to outcomes as well.

The bigger a project gets the more context you generally need to understand any particular part. And by default Claude Code doesn't inject context, you need to use 3rd party integrations for that.

lelandfe4mo ago

I'm a quite senior frontend using React and even I see Sonnet 4.5 struggle with basic things. Today it wrote my Zod validation incorrectly, mixing up versions, then just decided it wasn't working and attempted to replace the entire thing with a different library.

baq4mo ago

There’s little reason to use sonnet anymore. Haiku for summaries, opus for anything else. Sonnet isn’t a good model by today’s standards.

lelandfe4mo ago

I have been chastened in the opposite direction by others. I've also subjectively really disliked Opus's speed and I've seen Opus do really silly things too. But I'll try out using it as a daily driver and see if I like it more.

subomi4mo ago

Why do we all of a sudden hold these agents to some unrealistic high bar? Engineers write bugs all the time and write incorrect validations. But we iterate. We read the stacktrace in Sentry and realise what the hell I was thinking when I wrote that, and we fix things. If you're going to benefit from these agents, you'd need to be a bit more patient and point them correctly to your codebase.

My rule of thumb is that if you can clearly describe exactly what you want to another engineer, then you can instruct the agent to do it too.

puttycat4mo ago

> Engineers write bugs all the time

Why do we hold calculators to such high bars? Humans make calculation mistakes all the time.

Why do we hold banking software to such high bars? People forget where they put their change all the time.

Etc etc.

Der_Einzige4mo ago

I don't hold calculators to high bars. They think 0.1 + 0.2 = 0.30000000000000004:

https://qntm.org/notpointthree

1 more reply

lelandfe4mo ago

my unrealistic bar lies somewhere above "pick a new library" bug resolution

CapsAdmin4mo ago

I built an open to "game engine" entirely in Lua a many years ago, but relying on many third party libraries that I would bind to with FFI.

I thought I'd revive it, but this time with Vulkan and no third-party dependencies (except for Vulkan)

4.5 Sonet, Opus and Gemini 3.5 flash has helped me write image decoders for dds, png jpg, exr, a wayland window implementation, macOS window implementation, etc.

I find that Gemini 3.5 flash is really good at understanding 3d in general while sonnet might be lacking a little.

All these sota models seem to understand my bespoke Lua framework and the right level of abstraction. For example at the low level you have the generated Vulkan bindings, then after that you have objects around Vulkan types, then finally a high level pipeline builder and whatnot which does not mention Vulkan anywhere.

However with a larger C# codebase at work, they really struggle. My theory is that there are too many files and abstractions so that they cannot understand where to begin looking.

3D304974204mo ago

I'll second this. I'm making a fairly basic iOS/Swift app with an accompanying React-based site. I was able to vibe-code the React site (it isn't pretty, but it works and the code is fairly decent). But I've struggled to get the Swift code to be reliable.

Which makes sense. I'm sure there's lots of training data for React/HTML/CSS/etc. but much less with Swift, especially the newer versions.

rootusrootus4mo ago

I had surprising success vibe coding a swift iOS app a while back. Just for fun, since I have a bluetooth OBD2 dongle and an electric truck, I told Claude to make me an app that could connect to the truck using the dongle, read me the VIN, odometer, and state of charge. This was middle of 2025, so before Opus 4.5. It took Claude a few attempts and some feedback on what was failing, but it did eventually make a working app after a couple hours.

Now, was the code quality any good? Beats me, I am not a swift developer. I did it partly as an experiment to see what Claude was currently capable of and partly because I wanted to test the feasibility of setting up a simple passive data logger for my truck.

I'm tempted to take another swing with Opus 4.5 for the science.

billbrown4mo ago

I hate "vibe code" as a verb. May I suggest "prompt" instead? "I was able to prompt the React site…."

bigDinosaur4mo ago

You aren't prompting the React site, you're prompting the LLM.

3485124697214mo ago

> It also can't do rust really well

I have not had this experience at all. It often doesn't get it right on the first pass, yes, but the advantage with Rust vibecoding is that if you give it a rule to "Always run cargo check before you think you're finished" then it will go back and fix whatever it missed on the first pass. What I find particularly valuable is that the compiler forces it to handle all cases like match arms or errors. I find that it often misses edge cases when writing typescript, and I believe that the relative leniency of the typescript compiler is why.

In a similar vein, it is quite good at writing macros (or at least, quite good given how difficult this otherwise is). You often have to cajole it into not hardcoding features into the macro, but since macros resolve at compile time they're quite well-suited for an LLM workflow as most potential bugs will be apparent before the user needs to test. I also think that the biggest hurdle of writing macros to humans is the cryptic compiler errors, but I can imagine that since LLMs have a lot of information about compilers and syntax parsing in their training corpus, they have an easier time with this than the median programmer. I'm sure an actual compiler engineer would be far better than the LLM, but I am not that guy (nor can I afford one) so I'm quite happy to use LLMs here.

For context, I am purely a webdev. I can't speak for how well LLMs fare at anything other than writing SQL, hooking up to REST APIs, React frontend, and macros. With the exception of macros, these are all problems that have been solved a million times thus are more boilerplate than novelty, so I think it is entirely plausible that they're very poor for different domains of programming despite my experiences with them.

jessoteric4mo ago

i've also been using opus 4.5 with lots of heavy rust development. i don't "vibe code", but lead it with a relatively firm hand- and it produces pretty good results in surprisingly complicated tasks.

for example, one of our public repos works with rust compiler artifacts and cache restoration (https://github.com/attunehq/hurry); if you look at the history you can see it do some pretty surprisingly complex (and well made, for an LLM) changes. its code isn't necessarily what i would always write, or the best way to solve the problem, but it's usually perfectly serviceable if you give it enough context and guidance.

UncleOxidant4mo ago

I've had pretty good luck with LLM agents coding C. In this case a C compiler that supports a subset of C and targets a customizable microcoded state machine/processor. Then I had Gemini code up a simulator/debugger for the target machine in C++ and it did it in short order and quite successfully - lets you single step through the microcode and examine inputs (and set inputs), outputs & current state - did that in an afternoon and the resulting C++ code looks pretty decent.

HarHarVeryFunny4mo ago

That's remarkably similar to something I've just started on - I want to create a self-compiling C compiler targeting (and to run on) an 8-bit micro via a custom VM. This a basically a retro-computing hobby project.

I've worked with Gemini Fast on the web to help design the VM ISA, then next steps will be to have some AI (maybe Gemini CLI - currently free) write an assembler, disassembler and interpreter for the ISA, and then the recursive descent compiler (written in C) too.

I already had Gemini 3.0 Fast write me a precedence climbing expression parser as a more efficient drop-in replacement for a recursive descent one, although I had it do that in C++ as a proof-of-concept since I don't know yet what C libraries I want to build and use (arena allocator, etc). This involved a lot of copy-paste between Gemini output and an online C++ dev environment (OnlineGDB), but that was not too bad, although Gemini CLI would have avoided that. Too bad that Gemini web only has "code interpreter" support for Python, not C and/or C++.

Using Gemini to help define the ISA was an interesting process. It had useful input in a "pair-design" process, working on various parts of the ISA, but then failed to bring all the ideas together into a single ISA document, repeatedly missing parts of what had been previously discussed until I gave up and did that manually. The default persona of Gemini seems not very well suited to this type of work flow where you want to direct what to do next, since it seems they've RL'd the heck out of it to want to suggest next step and ask questions rather than do what is asked and wait for further instruction. I eventually had to keep asking it to "please answer then stop", and interestingly quality of the "conversation" seemed to fall apart after that (perhaps because Gemini was now predicting/generating a more adversarial conversation than a collaborative one?).

I'm wondering/hoping that Gemini CLI might be better at working on documentation than Gemini web, since then the doc can be an actual file it is editing, and it can use it's edit tool for that, as opposed to hoping that Gemini web can assemble chunks of context (various parts of the ISA discussion) into a single document.

HarHarVeryFunny4mo ago

Just as a self follow-up here (I hate to do it!), after chatting to Gemini some more more about document creation alternatives, it does seem that Gemini CLI is by far the best way to go, since it's working in similar fashion to Claude Code and making targeted edits (string replacements) to files, rather than regenerating from scratch (unless it has misinterpreted something you said as a request to do that, which would be obvious when it showed you the suggested diff).

Another alternative (not recommended due to potential for "drift") is to use Gemini's Canvas capability where it is working on a document rather than a specification being spread out over Chat, but this document is fully regenerated for every update (unlike Claude's artifacts), so there is potential for it to summarize or drop sections of the document ("drift") rather than just making requested changes. Canvas also doesn't have Artifact's versioning to allow you to go back to undo unwanted drifts/changes.

mattarm4mo ago

Yeah, the online Gemini app is not good for long lived conversations that build up a body of decisions. The context window gets too large and things drop.

What I’ve learned is that once you reach that point you’ve got to break that problem down into smaller pieces that the AI can work productively with.

If you’re about to start with Gemini-cli I recommend you look up https://github.com/github/spec-kit. It’s a project out of Microsoft/Github that encodes a rigorous spec-then-implement multi pass workflow. It gets the AI to produce specs, double check the specs for holes and ambiguity, plan out implementation, translate that into small tasks, then check them off as it goes. I don’t use spec-kit all the time, but it taught me that what explicit multi pass prompting can do when the context is held in files on disk, often markdown that I can go in and change as needed. I think it ask basically comes down to enforcing enough structure in the form of codified processes, self checks and/or tests for your code.

Pro tip, tell spec-kit to do TDD in your constitution and the tests will keep it on the rails as you progress. I suspect “vibe coding” can get a bad rap due to lack of testing. With AI coding I think test coverage gets more important.

1 more reply

nycdatasci4mo ago

Have you experimented with all of these things on the latest models (e.g. Opus 4.5) since Nov 2025? They are significantly better at coding than earlier models.

klaussilveiraOP4mo ago

Yes, December 2025 and January 2026.

ryandrake4mo ago

I've found it to be pretty hit-or-miss with C++ in general, but it's really, REALLY bad at 3D graphics code. I've tried to use it to port an OpenGL project to SDL3_GPU, and it really struggled. It would confidently insist that the code it wrote worked, when all you had to do was run it and look at the output to see a blank screen.

Wowfunhappy4mo ago

I hope I’m not committing a faux pas by saying this—and please feel free to tell me that I’m wrong—but I imagine a human who has been blind since birth would also struggle to build 3D graphics code.

The Claude models are technically multi-modal, but IME the vision side of the equation is really lacking. As a result, Claude is quite good at reasoning about logic, and it can build e.g. simpler web pages where the underlying html structure is enough to work with, but it’s much worse at tasks that inherently require seeing.

ryandrake4mo ago

Yea, for obvious reasons, it seems to be best at code that transforms data: text/binary input to text/binary output. And where the logic can be tracked and verified at runtime with sufficient (text) logging. In other words, it's much better close loop than open loop. I tried to help it by prompting it to please take a screen capture of its output to verify functionality, but it seems LLMs aren't quite ready for that yet.

mattarm4mo ago

They work much better off a test that must pass. That they can “see”. Without it they are just making up some other acceptance criteria.

antonvs4mo ago

> It also can't do Rust really well, once you get to the meat of it. Not sure why that is

Because types are proofs and require global correctness, you can't just iterate, fix things locally, and wait until it breaks somewhere else that you also have to fix locally.

nopakos4mo ago

I have not tried C++, but Codex did a good job with low-level C code, shaders as well as porting 32 bit to 64 bit assembly drawing routines. I have also tried it with retro-computing programming with relative success.

ivm4mo ago

> Mobile

From what I've seen, CC has troubles with the latest Swift too, partially because of it being latest and partially because it's so convoluted nowadays.

But it's übercharged™ for C#

j / k navigate · click thread line to collapse