undefined | Better HN

0 pointssampton20d ago0 comments

That's been my experience as well switching from Opus to Codex. Reasoning takes longer but answers are precise. Claude is sloppy in comparison.

0 comments

solenoid093720d ago

Weird, I have had the opposite experience. Codex is good at doing precisely what I tell it to do, Opus suggests well thought out plans even if it needs to push back to do it.

slopinthebag20d ago

This is just the stochastic nature of LLM's at play. I think all of the SOTA models are roughly equivalent, but without enough samples people end up reading into it too much.

oorza20d ago

There's a certain amount of variance in the way that people utilize these agents. Put five people in a room and ask them to compose the same prompt and you have five distinct prompts. Couple this with the fact that models respond better/worse to certain prompts depending on the stylistic composition of the prompt itself. And since people tend to write in the same style, you'd get people who have more luck with one model over another, where one model happens to align more readily with their prompt style.

To wit, I have noticed that I tend to prefer Codex's output for planning and review, but Opus for implementation; this is inverted from others at work.

1 more reply

meowface20d ago

I used to feel like you do, but I don't agree. I would just say it is not consistent. For a given codebase and given goal, sometimes Claude will be the more sensible, creative, thoughtful planner and sometimes Codex will be, sometimes Claude will make a serious oversight that Codex catches and sometimes the opposite. But the trend for me and seemingly a lot of people is that Claude is a more "human-like/human-smart" planner than Codex (in a positive way) but is more likely to make mistakes or forget details when implementing major codebase changes.

throwaway91128220d ago

codex has been really good so far and the fast mode is cherry on top! and the very generous limits is another cherry on top

slopinthebag20d ago

It's well worth the $20 to not deal with any limits and have it handle all the boilerplate repetitive BS us programmers seem forced to deal with. I think 80% of the benefit comes from spending that $20 (20%? :P) and just having it do the lame shit that we probably shouldn't have to do but somehow need to.

j / k navigate · click thread line to collapse

0 comments

solenoid093720d ago

Weird, I have had the opposite experience. Codex is good at doing precisely what I tell it to do, Opus suggests well thought out plans even if it needs to push back to do it.

slopinthebag20d ago

This is just the stochastic nature of LLM's at play. I think all of the SOTA models are roughly equivalent, but without enough samples people end up reading into it too much.

oorza20d ago

To wit, I have noticed that I tend to prefer Codex's output for planning and review, but Opus for implementation; this is inverted from others at work.

1 more reply

meowface20d ago

throwaway91128220d ago

codex has been really good so far and the fast mode is cherry on top! and the very generous limits is another cherry on top

slopinthebag20d ago

j / k navigate · click thread line to collapse