undefined | Better HN

0 pointsgjulianm3mo ago0 comments

> Read the freaking article

The freaking article omits several issues in the "compiler". My bet is because they didn't actually challenged the output of the LLM, as it usually happens.

If you go to the repository, you'll find fun things, like the fact that it cannot compile a bunch of popular projects, and that it compiles others but the code doesn't pass the tests. It's a bit surprising, specially when they don't explain why those failures exist (are they missing support for some extensions? any feature they lack?)

It gets less surprising, though, when you start to see that the compiler doesn't actually do any type checking, for example. It allows dereferences to non-pointers. It allows calling functions with the wrong number of arguments.

There's also this fantastic part of the article where they explain that the LLM got the code to a point where any change or bug fix breaks a lot of the existing tests, and that further progress is not possible.

Then the fact that this article points out that the kernel doesn't actually link. How did they "boot it"? It might very well be possible that it crashed soon after boot and wasn't actually usable.

So, as usual, the problem here is that a lot of people look at LLM outputs and trust what they're saying they achieved.

0 comments

red75prime3mo ago

The purpose of this project is not to create a state-of-the-art C compiler on par with projects that represent tens of thousands of developer-years. The goal is to assess the current capabilities of a largely autonomous software-building pipeline: it's not yet limitless, but better than it was. What a shocker.

I’ve had my share of build errors while compiling the Linux kernel for custom targets, so I wouldn’t be so sure that linker errors on x86_64 can’t be fixed with changes to the build script.

gjulianmOP3mo ago

> The goal is to assess the current capabilities of a largely autonomous software-building pipeline: it's not yet limitless, but better than it was. What a shocker.

Of course, but we're trying to assess the capabilities by looking at the LLM output as if it were a program written by a person. If someone told me to check out their new C compiler that can build the kernel, I'd assume that other basic things, such as not compiling incorrect programs, are already pretty much covered. But with an LLM we can't assume that. We need to really check what's happening and not trust the agent's word for it.

And the reason why it's important it's because we really need to check whether it's actually "better than it was" or just "doing things incorrectly for longer". Let's say your goal was writing a gcc replacement. Does this autonomous pipeline get you closer? Or does it just get you farther away through the wrong path? Considering that it's full of bugs and incomplete implementations and cannot be changed without things breaking down, I'd say it seems to be the latter.