undefined | Better HN

0 pointsstonemetal121y ago0 comments

Rather any Logic puzzle you post on the internet as something AIs are bad at is in the next round of training data so AIs get better at that specific question. Not because AI companies are optimizing for a benchmark but because they suck up everything.

0 comments

modeless1y ago

ARC has two test sets that are not posted on the Internet. One is kept completely private and never shared. It is used when testing open source models and the models are run locally with no internet access. The other test set is used when testing closed source models that are only available as APIs. So it could be leaked in theory, but it is still not posted on the internet and can't be in any web crawls.

You could argue that the models can get an advantage by looking at the training set which is on the internet. But all of the tasks are unique and generalizing from the training set to the test set is the whole point of the benchmark. So it's not a serious objection.

foobiekr1y ago

Given the delivery mechanism for OpenAI, how do they actually keep it private?

modeless1y ago

> So it could be leaked in theory

That's why they have two test sets. But OpenAI has legally committed to not training on data passed to the API. I don't believe OpenAI would burn their reputation and risk legal action just to cheat on ARC. And what they've reported is not implausible IMO.

1 more reply

j / k navigate · click thread line to collapse

0 comments

modeless1y ago

foobiekr1y ago

Given the delivery mechanism for OpenAI, how do they actually keep it private?

modeless1y ago

> So it could be leaked in theory

1 more reply

j / k navigate · click thread line to collapse