undefined | Better HN

0 pointsnradov12d ago0 comments

The frontier LLMs are getting pretty good at checking this sort of thing. You could prompt them to not only verify the references are real but that they actually state what the article claims. Some human review will still be needed but I'll bet this approach could find a lot of academic fraud.

0 comments

nomel12d ago

> The frontier LLMs are getting pretty good at checking this sort of thing.

No, this is career ending high stakes. it requires old school "actually check a record of reality" type methods, like a database query or http get to one of the many services that hold this info.

charcircuit11d ago

LLMs can make tool calls to do database and http queries to search for, buy, and cross reference a citation.

small_scombrus12d ago

I think they're saying that frontier LLMs may be usable to spot citations that are correct by shape (a real citation) but incorrect by usage (unrelated to the text)

I kind of hate the idea, but you probably could do a lazy LLM check of every paper and every citation and have it flag possible wrong (second sense) citations for human review

But you'd need a LOT of tokens and a LOT of human-hours

mpalmer11d ago

> have it flag possible wrong (second sense) citations for human review

And then what, we're done? How have we avoided the need for the same exhaustive human review? It only saves human review time if you trust the LLM not to miss things.

1 more reply

nradovOP11d ago

Right, that's what I'm saying. The LLM can identify and prioritize possible cases of academic fraud (or serious incompetence) for human review. As the cost of tokens drops it will become practical to go back and do AI reviews of every scholarly journal article ever written.

netdevphoenix11d ago

Your approach is good for catching stuff that human reviewers might miss not as a first line default-only unit. The whole reason this is happening is because humans are not doing their job. Your solution (humans not doing their job) is just increasing the scope of the problem.

vrighter11d ago

why is the standard response to "this tech isn't reliable enough for this" to run its output through the same unreliable tech?

The device-fixer started breaking devices instead of fixing them. Tell it to fix itself!

mpalmer11d ago

Yeah...

The amount of people who confidently tell on themselves in these discussions continues to bum me out.

CaptainNegative11d ago

why is the standard response when someone comes down with a serious illness to bring them into a facility where serious illnesses spread readily?

sometimes the presently available solutions are subpar. people go with what's available. it's not ideal, but it is practical.

nomel11d ago

And then those people get banned for a year when the same ai tools that created a hallucination also think that hallucination is real. I don't see a problem here.

j / k navigate · click thread line to collapse

0 comments

nomel12d ago

> The frontier LLMs are getting pretty good at checking this sort of thing.

No, this is career ending high stakes. it requires old school "actually check a record of reality" type methods, like a database query or http get to one of the many services that hold this info.

charcircuit11d ago

LLMs can make tool calls to do database and http queries to search for, buy, and cross reference a citation.

small_scombrus12d ago

I think they're saying that frontier LLMs may be usable to spot citations that are correct by shape (a real citation) but incorrect by usage (unrelated to the text)

I kind of hate the idea, but you probably could do a lazy LLM check of every paper and every citation and have it flag possible wrong (second sense) citations for human review

But you'd need a LOT of tokens and a LOT of human-hours

mpalmer11d ago

> have it flag possible wrong (second sense) citations for human review

And then what, we're done? How have we avoided the need for the same exhaustive human review? It only saves human review time if you trust the LLM not to miss things.

1 more reply

nradovOP11d ago

netdevphoenix11d ago

vrighter11d ago

why is the standard response to "this tech isn't reliable enough for this" to run its output through the same unreliable tech?

The device-fixer started breaking devices instead of fixing them. Tell it to fix itself!

mpalmer11d ago

Yeah...

The amount of people who confidently tell on themselves in these discussions continues to bum me out.

CaptainNegative11d ago

why is the standard response when someone comes down with a serious illness to bring them into a facility where serious illnesses spread readily?

sometimes the presently available solutions are subpar. people go with what's available. it's not ideal, but it is practical.

nomel11d ago

And then those people get banned for a year when the same ai tools that created a hallucination also think that hallucination is real. I don't see a problem here.

j / k navigate · click thread line to collapse