This is incredibly good for science. arXiv is free, but it's a privilege not a right!
I'm not seeing this clearly listed on https://info.arxiv.org/help/policies/index.html so it's possible this is planned but not live yet - or perhaps I'm not digging deeply enough?
As a certain doctor once said: the whole point of the doomsday machine is lost if you keep it a secret!
This is good for reference checking, but I doubt this will do much for the most likely shoddy science that accompanies hallucinated references.
Catch-22 [1]:
You will need to provide an arXiv article id. This is necessary for your paper to be processed correctly, the submitted version must be the same as the arXiv version.
[1] https://jcap.sissa.it/jcap/help/helpLoader.jsp?pgType=author
The penalty cannot be high enough.
ArXiv doesn't even check the submission closely, so how can they know?
They say "errors, mistakes"
They use an automated system to check if the basic requirements were met, and sometimes papers are flagged for further superficial human review, but there is no way they can possibly do this at scale or check every reference. This would be like trying to do peer review, but for a preprint archive that gets easily 100x more volume than any journal.
Second, there is such a huuuuge gap between publishing on arvix and peer review. I can attest personally that it's not even close. I've gotten probably dozen rejections from peer review and no problems publishing in arxiv math. This is because peer review checks not just for if something is new or correct, but also if it's of "interest to math community," which is inherently subjective, but also makes peer review many magnitudes harder than publishing on arxiv.
Even when a well-known professor in number theory praised the paper when I got an endorsement and a second emailed me and and encouraged me to publish it, it still got rejected 3 times and still waiting.
Being required to publish in a peer reviewed journal will close off arxiv for many researchers for good. It also defeats the point of it being a pre-print.
Nothing stops someone from putting a PDF on the internet. I'm fine with ArXiv holding a high standard.
They can be informed by people who read the papers and check the citations. A zero-tolerance policy provides an incentive to report sloppy papers (namely, that you can be confident something will be done about it), and each time a paper is removed or an author is banned, it incrementally increases the value of the arXiv as a whole.
> Being required to publish in a peer reviewed journal will close off arxiv for many researchers for good.
At the end of the day, demanding that people carefully proofread their LLM-generated papers before sharing them on the arXiv seems like a relative low bar to clear, and I sort of question whether it's reasonable to call individuals who find it too onerous "researchers" in the first place.
It's enough for them to place this policy and enforce it when they become aware of violations. Someone reading the slopped paper (or, here, trying to follow a reference) will notice sooner or later.
> Being required to publish in a peer reviewed journal will close off arxiv for many researchers for good. It also defeats the point of it being a pre-print.
You sound like it's impossible for researchers to write papers without slopped references, and inevitable to get hit by this policy.
I disagree. It's just one darn hallucinated citation for heaven's sake, not fraud or something. It doesn't account for the substance or quality of their work at all. A one-year ban seems plenty sufficient for a minor first time mistake like this. People make mistakes and a good fraction of them can learn from those mistakes. There's no need to permanently cripple someone's ability to progress their life or contribute to humanity just because an AI hallucinated a reference one time in their life. That's punitive instead of rehabilitative.
It is fraud.
> It doesn't account for the substance or quality of their work at all.
References are part of the work. If you're making up the references, what else are you making up?
> People make mistakes and a good fraction of them can learn from those mistakes. There's no need to permanently cripple someone's ability to progress their life or contribute to humanity just because an AI hallucinated a reference one time in their life.
A one year ban is not permanent. Having a negative consequence for making poor decisions seems like an inducement to learn from the mistake?
In an ideal world, one would be keeping notes on references used while doing the research that lead to writing the paper. Choosing not to do that is one poor decision.
Having a positive outlook, if asking an AI to provide references that may have been missed, one should at least verify the references exist and are relevant. Choosing not to do that is also a poor decision, even if one did take notes on references used while researching.
Your standards are lower than what they would accept at my high-school. Seriously.
And generally, if you are generating papers with LLMs, let other LLMs read them. Why would we waste human hours considering something that was generated? At this point publish your prompt because that's the actual work you're doing.
I don't think you need to publish on arXive to contribute meaningfully to humanity.
> That's punitive instead of rehabilitative.
Unfortunately science is competitive. Yours is a race to the bottom where the people who can afford the most expensive models and who are least concerned with the truth can publish the most papers and benefit financially and professionally by doing so. This is not a zero sum arena, grant money and opportunities will possibly be rewarded to them, and not to another team who is producing more careful and genuine output.
Your being set behind is less important than the fact that your publishing is setting everyone else behind.
Such a banned person is being helped to "step out of the way", and someone more competent will assuredly step forward to consume the limited maintenance labour more thoughtfully
It's not even that they "don't like LLMs". They just don't like academic fraud! If references were fabricated with a Markov chain it would be just as bad!
Bonkers. At the same time peak hn
I'm here because I enjoy building things. And today this mostly happens with AI. I could do without the often thoughtless comments and conspiracy theories about "LLM hypers" posted by people who don't like LLMs.
If it’s not worth your time to check the output of your LLM carefully, it’s not worth my time to read it.
Ever pick a random one and really dive in?
And I’m not talking about good faith research that didn’t pan out, I mean research that is completely useless for any other purpose other than convincing a casual observer that the authors are doing research.
$ curl -L "https://doi.org/10.47397/tb/43-1/tb133chernoff-widows" -H 'Accept: application/x-bibtex'
@article{Chernoff_2022, title={Automatically removing widows and orphans with <tt>lua-widow-control</tt>}, volume={43}, ISSN={0896-3207}, url={http://dx.doi.org/10.47397/tb/43-1/tb133chernoff-widows}, DOI={10.47397/tb/43-1/tb133chernoff-widows}, number={1}, journal={TUGboat}, publisher={TeX Users Group}, author={Chernoff, Max}, year={2022}, pages={28–39} }
This is the exact same method that Zotero uses internally, so this won't ever give you better results, but I still find it kinda neat.So there are absolutely a bunch of tasks that could be evaled/benchmarked, but "hallucination rate" isn't particularly applicable/interesting as a metric of how good the tool is
that said, we do use various LLMs (mostly local, fine-tuned, small, for things like NER/parsing/metadata comparison, etc.). and they can and do hallucinate, but we have very hard constraints on the validation, so any extraction results that don't match 1:1 back to the input text are discarded for example. so again, rather than hallucination risk we prefer hard constraints
To be a coauthor on a preprint that you have not submitted, you have to actively "claim" it (using a password given to the author who submitted). It's on you to double-check before claiming.
I surely hope that only "confirmed" coauthors will get the ban, it's only logical.
The authors should value the time of the reviewers higher than their own time. So, if you include AI nonsense in your paper, it is insulting.
If you can't validate that your bibliography is full of real articles, you shouldn't get published.
LLMs have just poured gasoline on the fire.
He's toast if SSRN were to adopt a similar policy.
I'm a screen reader user and usually read papers as raw TeX. I've seen everything: slurs, demeaning comments towards reviewers and professors, admissions of fraud, instructions to coauthors to commit further fraud before paper submission to mask the earlier fraud... it's all there. There's far less of it than I would think, definitely <1% of papers, but it's there.
I think it would be useful to run an LLM anti-fraud pass on the TeX source of all new arxiv papers. It wouldn't catch everything, but it would catch some of the dumbest fraudsters.
On the positive side, you can also find stronger claims that didn't survive review, additional explanations that didn't make the cut due to the conference's page limit, as well as experimental results that the authors felt weren't really worth including. Those need to be approached with an abundance of caution, but are genuinely useful sometimes.
I have standing instructions with my agents: do not commit anything without review by at least two sub agents.
The chance of two agents hallucinating at the same time in the same way is almost nil. Why do people still do it?
Now: „Generate a bibliography. Make no mistakes.“
The deeper question is whether legitimate AI generated results are allowed or not? Test - In the extreme - think proof of Riemann Hypothesis autonomously generated (end to end) formally proven - is it allowed or not?
The thread specifically points out that if authors can’t be arsed to simply proofread their text the rest can not be trusted either.
It’s a simple heuristic against low quality submissions, not an anti-ai measure.
I expect arXiv will still have problems with slop submissions but, at least, their references should actually exist going forward.
Sorry to be rude, but this seems like a dumb question. I want science to progress. A primary purpose of these journals is to progress science. A full proof of the Riemann Hypothesis progresses science. I don't care how it was produced, if Hitler is coauthor, etc, I just care that it is correct. Whether the authors should be rewarded for whatever methods they used can be a separate question.
The short of it is he argues how first to correctness shouldn't be the only goal / isn't a great optimisation incentive. Presentation and digestibility of correct results is a missing 1/3 when you've finished generation and verification. I completely agree with him. You don't just need an AI generated proof of the Reimann Hypothesis. You would really like it to be intentional and structured for others to understand.
A really beautiful quote I learned of in the talk is this:
> "We are not trying to meet some abstract production quota of definitions, theorems, and proofs. The measure of our success is whether what we do enables people to understand and think more clearly and effectively about math." - William Thurston