undefined | Better HN

0 pointssimonw2mo ago0 comments

I'm definitely an outlier - I've been pushing the boundaries of these tools for three years now and this month I've been deliberately throwing some absurdly ambitious problems at Opus 4.5 (like this one: https://static.simonwillison.net/static/2025/claude-code-mic...) to see how far it can go.

0 comments

fancyfredbot2mo ago

Very interesting example. It's an insanely complex task even with a reference implementation in another language.

It's surprising that it manages the majority of the test cases but not all of them. That's not a very human-like result. I would expect humans to be bimodal with some people getting stuck earlier and the rest completing everything. Fractal intelligence strikes again I guess?

Do you think the way you specified the task at such a high level made it easier for Claude? I would have probably tried to be much more specific for example by translating on a file by file or function by function basis. But I've no idea if this is a good approach. I'm really tempted to try this now! Very inspiring.

simonwOP2mo ago

> Do you think the way you specified the task at such a high level made it easier for Claude?

Absolutely. The trick I've found works best for these longer tasks is to give it an existing test suite and a goal to get those tests to pass, see also: https://simonwillison.net/2025/Dec/15/porting-justhtml/

In this case ripping off the MicroQuickJS test suite was the big unlock.

I have a WebAssembly runtime demo I need to publish where I used the WebAssembly specification itself, which it turns out has a comprehensive test suite built in as well.

j / k navigate · click thread line to collapse