undefined | Better HN

0 pointsaschobel1y ago0 comments

Give it another shot but with Claude Sonnet 3.5. It’s my daily driver for coding tasks.

It seems especially strong with Python but a bit medium with Swift.

0 comments

I just signed up for the free version. Claude Sonnet does properly use malloc/free to manage the buffer where GPT-4o screws up (Yay!) It manages to gets through the whole process of initializing a graphics device and grabbing a queue from the device. It took some questionable shortcuts to get there (and didn't leave any comments explaining those shortcuts and the problems they could cause down the road), but fine, the code works.

After that it goes completely off the rails by trying to issue draw commands before binding a graphics pipeline, which is both illogical and illegal. After a lot of prodding, I did manage to get it to bind a graphics pipeline, but it forgot about the texture.

So Claude Sonnet is definitely better than GPT-4o, but it still feels raw, like a game of whack-a-mole where I can get it to fix a mistake, but it reintroduces an old one. I also have to be the one offering the expertise. I can prompt it to fix the issues because I know exactly what the issues are. If I was using this to try to fill in for a gap in my knowledge, I would be stuck when I ran the code and it crashed - I would have no idea where to go next.

Update: Took about 50 min of experimenting, but I did get Claude to generate code that doesn't have any obvious defects on first inspection, although it cut off about halfway through because of the generation limit. That's the best result that I've seen from an LLM yet. But that's after about a dozen very broken broken programs, and again, I think the domain expertise here is key in order to be able to reprompt and correct.

zone4111y ago

LLMs are much better at Python and JavaScript than at C/C++. This simple difference can account for much of the variation in people's experiences.

Calavar1y ago

I agree, that could explain a lot of it. I also suspect that the length of the generated code plays a role. In my experience, LLMs sometimes peter out a bit and give up if the generated program gets too long, even if it's well within their context limit. (Where giving up means writing a comment that says "the rest of the implementation goes here" or starting to have consistency issues.) Python and JavaScript tend to be more succinct and so that issue probably comes up less.

remoroid1y ago

Yes, you have figured it out. LLMs are terrible for graphics programming. Web development - much better. Sonnet 3.5 is the only good model around for now. GPT 4o is very poor.

roywiggins1y ago

I've had some moderate success asking Claude to "translate" smallish pieces of code from, eg, C++ to Python. One simple C++ file parser it managed to translate basically 100% right on one try. I wouldn't particularly trust the output- well, not until I run it through tests- but for quick "could this possibly work, what does the performance look like, what might this code look like" exploratory stuff it's been very useful, especially for code that you are likely to throw away anyway.

One example that I am still using: I wanted to generate a random DICOM file with specific types of garbage in it to use as input for some unit tests, and Claude was able to generate some Python that grabs some random DICOM tags and shoves vaguely plausible garbage data into them, such that it is a valid but nonsensical DICOM dataset. This is not hard, but it's a lot faster to ask Claude to do it.

j / k navigate · click thread line to collapse

0 comments

Calavar1y ago

zone4111y ago

LLMs are much better at Python and JavaScript than at C/C++. This simple difference can account for much of the variation in people's experiences.

Calavar1y ago

remoroid1y ago

Yes, you have figured it out. LLMs are terrible for graphics programming. Web development - much better. Sonnet 3.5 is the only good model around for now. GPT 4o is very poor.

roywiggins1y ago

j / k navigate · click thread line to collapse