undefined | Better HN

0 pointsZigurd7mo ago0 comments

The weights, so to speak, come from the knowledge base. That means you can't get away from the quality of the knowledge base. That isn't uniform across all domains of knowledge. Then the problem becomes how do you make the training material uniformly high-quality in every knowledge domain? At best it becomes the meta problem of determining the quality of knowledge in some way that makes an LLM able to calibrate confidence to a knowledge domain. But more likely we're stuck with the dubious quality that comes from human bias and wishful thinking in supposedly authoritative material.

0 comments

chpatrick7mo ago

Sure, it's only as good as the training data. But human experts also output tokens with some statistical distribution. That doesn't mean anything.

ZigurdOP7mo ago

That sounds plausible. But it doesn't explain why LLM's make laughably bad errors that even a biased and haphazard human researcher wouldn't make.

ZigurdOP7mo ago

Gemini seems to have a user interface that, for the way most people encounter Gemini, is more closely linked to search results. This leads me to suspect that Google's approach to training could be uniquely informed by both current and historic web crawling.

chpatrick7mo ago

I think that's been a lot less true over the last year or so. Gemini 2.5 Pro is the first LLM I actually find pretty damn reliable.

contagiousflow7mo ago

If you think talking to an LLM is the same experience as talking to a human you should probably talk to more humans

chpatrick7mo ago

That's not what I said. What I said is that the claim "LLMs aren't intelligent because they stochastically produce characters" doesn't hold because humans do that too even if they're intelligent and authorative.

1 more reply

nijave7mo ago

MCP and agents seem like a solutions but as far as I know maintaining sufficient context is still a problem

I.e. ability to plug in expert data sources

ZigurdOP7mo ago

Find tuning and RAG should, in theory, enable applications of LLM's to perform better in specific knowledge, domains, by focusing annotation of knowledge on the domains specific to the application.

JamesSwift7mo ago

I think youre missing the point. The issue is not the amount of knowledge it possesses. The problem is that theres no way to go from "statistically generate the next word" to "what is your confidence level in the fact you just stated". Maybe, with an enormous amount of computation we could layer another AI on top to evaluate or add confidence intervals, but I just dont see how we get there wihthout another quantum leap.

chpatrick7mo ago

Of course there is. If its training forces it to develop a theory of mind then it will weight the dice so that it's more likely to output "I don't know". Most likely the culprit is that it's hard to make training data for things that it doesn't know.

j / k navigate · click thread line to collapse

0 pointsZigurd7mo ago0 comments

0 comments

chpatrick7mo ago

Sure, it's only as good as the training data. But human experts also output tokens with some statistical distribution. That doesn't mean anything.

ZigurdOP7mo ago

That sounds plausible. But it doesn't explain why LLM's make laughably bad errors that even a biased and haphazard human researcher wouldn't make.

ZigurdOP7mo ago

chpatrick7mo ago

I think that's been a lot less true over the last year or so. Gemini 2.5 Pro is the first LLM I actually find pretty damn reliable.

contagiousflow7mo ago

If you think talking to an LLM is the same experience as talking to a human you should probably talk to more humans

chpatrick7mo ago

1 more reply

nijave7mo ago

MCP and agents seem like a solutions but as far as I know maintaining sufficient context is still a problem

I.e. ability to plug in expert data sources

ZigurdOP7mo ago

Find tuning and RAG should, in theory, enable applications of LLM's to perform better in specific knowledge, domains, by focusing annotation of knowledge on the domains specific to the application.

JamesSwift7mo ago

chpatrick7mo ago

j / k navigate · click thread line to collapse