undefined | Better HN

0 pointsmeasurablefunc28d ago0 comments

The workflow is not the issue. You are welcome to try the same challenge yourself if you want. Extra test cases (https://drive.proton.me/urls/6Z6557R2WG#n83c6DP6mDfc) & specification (https://claude.ai/public/artifacts/5581b499-a471-4d58-8e05-1...). I know enough about compilers, bytecode VMs, parsers, & interpreters to know that this is well within the capabilities of any reasonably good software engineer but the implementation from Gemini 3.1 Pro (high & low) & Claude Opus 4.6 (thinking) have been less than impressive.

0 comments

pushedx28d ago

sorry, needed to edit this comment to ask the same question as the sibling:

have you run these models in an agent mode that allows for executing the tests, the agent views the output, and iterates on its own for a while? up to an hour or so?

you will get vastly different output if you ask the agent to write 200 of its own test cases, and then have it iterate from there

Kim_Bruning28d ago

Possibly a dumb question: but are you running this in claude code, or an ide, or basically what are you using to allow for iteration?

measurablefuncOP28d ago

I'm using Google's antigravity IDE. I initially had it configured to run allowed commands (cargo add|build|check|run, testing shell scripts, performance profiling shell scripts, etc.) so that it would iterate & fix bugs w/ as little intervention from me as possible but all it did was burn through the daily allotted tokens so I switched to more "manual" guidance & made a lot more progress w/o burning through the daily limits.

What I've learned from this experiment is that the hype does not actually live up to the reality. Maybe the next iteration will manage the task better than the current one but it's obvious that basic compiler & bytecode virtual machine design in a language like Rust is still beyond the capabilities of the current coding agents & whoever thinks I'm wrong is welcome to implement the linked specification to see how far they can get by just "vibing".

Kim_Bruning28d ago

That's roughly where I'm at too. I have seen people have some more success after having practices though. Possibly the actual workflows needed for full auto are still kind of tacit. Smaller green-field projecs do work for me already though.

1 more reply

j / k navigate · click thread line to collapse

0 pointsmeasurablefunc28d ago0 comments

0 comments

pushedx28d ago

sorry, needed to edit this comment to ask the same question as the sibling:

have you run these models in an agent mode that allows for executing the tests, the agent views the output, and iterates on its own for a while? up to an hour or so?

you will get vastly different output if you ask the agent to write 200 of its own test cases, and then have it iterate from there

Kim_Bruning28d ago

Possibly a dumb question: but are you running this in claude code, or an ide, or basically what are you using to allow for iteration?

measurablefuncOP28d ago

Kim_Bruning28d ago

1 more reply

j / k navigate · click thread line to collapse