> Nothing in the article suggests it did not autonomously do the work.
I don’t know how to respond to that other than to ask you to quote the part of the blog post where the author described the language model running into a problem that it could not fix and then described the details of how he manually intervened to fix the problem that the language model could not fix when you elaborate on your definition of “nothing” in that sentence.
>Every agent would hit the same bug, fix that bug, and then overwrite each other's changes. Having 16 agents running didn't help because each was stuck solving the same task.
>The fix was to use GCC as an online known-good compiler oracle to compare against. I wrote a new test harness that randomly compiled most of the kernel using GCC
As for:
> Because a lot of naysayers here pretend as if this is somehow trivial.
This is an answer to “why do you want someone to do that?” You have already established that you would like that to happen. It doesn’t answer “why would a real human being (who is not you) that isn’t impressed by the compiler that doesn’t work put their time into making Anthropic look good?”
For example “I will pay a naysayer $20,000 to try” or “I know a guy that will pay a naysayer to try this, succeed or fail” or “I will give a naysayer a bunch of hardware to play with in exchange for attempting this” would be motivation to work for Anthropic and not get paid by Anthropic. Saying “I want you to do that because I think you’ll feel bad and waste your time” and then getting no takers isn’t really an assault on “the naysayers” decision not to do work for Anthropic without getting paid by Anthropic.
As for this, that’s a good question but I would say the bare minimum would be “useful”
> What, exactly would such a tool that'd somehow make the people dismissing this change their minds look like?
It is pretty common for tech companies to release free useful software. For example pytorch, react, Hack/hhvm etc. from Meta
https://opensource.fb.com/projects/
Or chromium from Google. Chromium is a good example, there’s a decent chance that you’re using a chromium based browser to read this. There’s also a ton of other stuff, golang comes to mind as another example.
https://opensource.google/
Or if you want stuff made by a business that’s a fraction the valuation of Anthropic, there’s Campfire and Writebook by 37signals. https://once.com/
> Because that wasn't the purpose.
I know that. That was the premise of my question.
I saw that they put a bunch of resources into making something that is not useful and asked why they did not put a bunch of resources into that was useful. Surely they could make something that is both useful and made their model look good?
For me it seems like the obvious answer would be either that they can’t make something useful:
> Their goal was to test the limits of what the model can achieve. They did that.
Or they don’t want to
> Because that wasn't the purpose.
I was asking if anyone had any substantive knowledge or informed opinion about whether it was one or the other but it seems like you’re saying it’s… both? They don’t want to make and release a useful tool and also they can not make and release a useful tool because this compiler, which is not useful, is the limit of what their model can achieve.
Like you want us all to know that they cannot and do not want to make any sort of useful tool. That is your clearly-stated opinion about their desires and capabilities. And also you want these “naysayers”, who are not you, to put their time and effort into… also not making something useful? To prove… what?