undefined | Better HN

0 points3eb7988a16639mo ago0 comments

If you are genuinely asking a question, how are you supposed to know the first answer was incorrect?

0 comments

I briefly got excited about the possibility of local LLMs as an offline knowledge base. Then I tried asking Gemma for a list of the tallest buildings in the world and it just made up a bunch. It even provided detailed information about the designers, year of construction etc.

I still hope it will get better. But I wonder if an LLM is the right tool for factual lookup - even if it is right, how do I know?

I wonder how quickly this will fall apart as LLM content proliferates. If it’s bad now, how bad will it be in a few years when there’s loads of false but credible LLM generated blogspam in the training data?

galaxyLogic9mo ago

That's the beauty of using AI to generate code: All code is "fictional".

mulmen9mo ago

> I wonder how quickly this will fall apart as LLM content proliferates. If it’s bad now, how bad will it be in a few years when there’s loads of false but credible LLM generated blogspam in the training data?

There is already misinformation online so only the marginal misinformation is relevant. In other words do LLMs generate misinformation at a higher rate than their training set?

For raw information retrieval from the training set misinformation may be a concern but LLMs aren’t search engines.

Emergent properties don’t rely on facts. They emerge from the relationship between tokens. So even if an LLM is trained only on misinformation abilities may still emerge at which point problem solving on factual information is still possible.

socalgal29mo ago

The person that started this conversation verified the answers were incorrect. So it sounds like you just do that. Check the results. If they turn out to be false, tell the LLM or make sure you're not on a bad one. It still likely to be faster than searching yourself.

mtlmtlmtlmtl9mo ago

That's all well and good for this particular example. But in general, the verification can often be so much work it nullifies the advantage of the LLM in the first place.

Something I've been using perplexity for recently is summarizing the research literature on some fairly specific topic(e.g. the state of research on the use of polypharmacy in treatment of adult ADHD). Ideally it should look up a bunch of papers, look at them and provide a summary of the current consensus on the topic. At first, I thought it did this quite well. But I eventually noticed that in some cases it would miss key papers and therefore provide inaccurate conclusions. The only way for me to tell whether the output is legit is to do exactly what the LLM was supposed to do; search for a bunch of papers, read them and conclude on what the aggregate is telling me. And it's almost never obvious from the output whether the LLM did this properly or not.

The only way in which this is useful, then, is to find a random, non-exhaustive set of papers for me to look at(since the LLM also can't be trusted to accurately summarize them). Well, I can already do that with a simple search in one of the many databases for this purpose, such as pubmed, arxiv etc. Any capability beyond that is merely an illusion. It's close, but no cigar. And in this case close doesn't really help reduce the amount of work.

This is why a lot of the things people want to use LLMs for requires a "definiteness" that's completely at odds with the architecture. The fact that LLMs are food at pretending to do it well only serves to distract us from addressing the fundamental architectural issues that need to be solved. I think think any amount of training of a transformer architecture is gonna do it. We're several years into trying that and the problem hasn't gone away.

csallen9mo ago

> The only way for me to tell whether the output is legit is to do exactly what the LLM was supposed to do; search for a bunch of papers, read them and conclude on what the aggregate is telling me. And it's almost never obvious from the output whether the LLM did this properly or not.

You're describing a fundamental and inescapable problem that applies to literally all delegated work.

1 more reply

lazide9mo ago

Yup, and worse since the LLM gives such a confident sounding answer, most people will just skim over the ‘hmm, but maybe it’s just lying’ verification check and move forward oblivious to the BS.

1 more reply

Tarq0n9mo ago

I'd be very interested in hearing what conclusions you came to in your research, if you're willing to share.

lechatonnoir9mo ago

I somehow can't reply to your child comment.

It depends on whether the cost of search or of verification dominates. When searching for common consumer products, yeah, this isn't likely to help much, and in a sense the scales are tipped against the AI for this application.

But if search is hard and verification is easy, even a faulty faster search is great.

I've run into a lot of instances with Linux where some minor, low level thing has broken and all of the stackexchange suggestions you can find in two hours don't work and you don't have seven hours to learn about the Linux kernel and its various services and their various conventions in order to get your screen resolutions correct, so you just give up.

Being in a debug loop in the most naive way with Claude, where it just tells you what to try and you report the feedback and direct it when it tunnel visions on irrelevant things, has solved many such instances of this hopelessness for me in the last few years.

skydhash9mo ago

So instead of spending seven hours to get at least an understanding how the Linux kernel work and the interaction of various user-land programs, you've decided to spend years fumbling in the dark and trying stuff every time an issue arises?

1 more reply

insane_dreamer9mo ago

> It still likely to be faster than searching yourself.

No, not if you have to search to verify their answers.

worthless-trash9mo ago

This is the right question.

graphememes9mo ago

scientific method??

j / k navigate · click thread line to collapse

0 comments

leoedin9mo ago

I still hope it will get better. But I wonder if an LLM is the right tool for factual lookup - even if it is right, how do I know?

galaxyLogic9mo ago

That's the beauty of using AI to generate code: All code is "fictional".

mulmen9mo ago

There is already misinformation online so only the marginal misinformation is relevant. In other words do LLMs generate misinformation at a higher rate than their training set?

For raw information retrieval from the training set misinformation may be a concern but LLMs aren’t search engines.

socalgal29mo ago

mtlmtlmtlmtl9mo ago

That's all well and good for this particular example. But in general, the verification can often be so much work it nullifies the advantage of the LLM in the first place.

csallen9mo ago

You're describing a fundamental and inescapable problem that applies to literally all delegated work.

1 more reply

lazide9mo ago

1 more reply

Tarq0n9mo ago

I'd be very interested in hearing what conclusions you came to in your research, if you're willing to share.

lechatonnoir9mo ago

I somehow can't reply to your child comment.

But if search is hard and verification is easy, even a faulty faster search is great.

skydhash9mo ago

1 more reply