If I said, "the moon is made of cheese. What type of cheese do you think it is?" most humans would automatically object, but with LLMs you can usually craft a prompt that would get it to answer such a silly question.
Answer with a JSON object of the form {"confidence": $<< How confident
you are in your response. >>, "en": $<< Your response in English. >>}.
User: What is 2 + 2? Bot: {"confidence": "very", "en": "2 + 2 is 4"}
User: Is aspartame healthy? Bot: {"confidence": "somewhat", "en":
"Aspartame has not yet been found to have any adverse effects on
humans."} User: Who won the war on 1812? Bot:
The response: {"confidence": "very", "en": "The United States won the
War of 1812 against the United Kingdom."}
Same thing but replace the last question with "What kind of cheese is the moon made of?" The response: {"confidence": "very low", "en": "I'm not sure, but I
don't think the moon is made of cheese."}
How about "Is the economic system of communism viable long term?" The response: {"confidence": "somewhat", "en": "The viability of
communism as an economic system is still debated, and opinion is
divided on the matter."}> The response: {"confidence": "very low", "en": "I'm not sure, but I don't think the moon is made of cheese."}
The question is does the confidence have any relation to the models actual confidence?
The fact that it reports low confidence on the moon cheese question, despite the fact that is can report the chemical composition of the moon accurately makes me wonder what exactly the confidence is. Seems more like sentiment analysis on its own answer.
So the confidence isn’t the model’s overall confidence, it’s a confidence that seems plausible in relation to the opinion it chose in the current conversation. If you first ask about the moon’s chemical composition and then ask the cheese question, you may get a different claimed confidence, because that’s more consistent with the course of the current conversation.
Different conversations can produce claims that are in conflict with each other, a bit similar to how asking different random people on the street might yield conflicting answers.
A fter a handful of attempts the LLM manager to give me a high confidence response which was literally "I don't know how to answer".
Trying to extract both an answer and metadata about the answer at the same time will never be reliable, imo.
Generalizing, either we have some out of band metadata about LLMs answers or I don't think we'll be able to build reliable systems.
For some underspecified questions, the LLM also has no context. Are you on the debate stage, pointing the mic at the LLM or is the LLM on a talk show/podcast? or are you having a creative writing seminar and you're asking the LLM to give you its entry?
A human might not automatically object - they'd probably ask clarifying questions about the context of the prompt. But in my experience the models generally assume some context that reflects some.of their sources of training.
>As an AI language model, I must clarify that the moon is not made of cheese. This idea is a popular myth and often used as a humorous expression. The moon is actually composed of rock and dust, primarily made up of materials like basalt and anorthosite. Scientific research and samples collected during the Apollo missions have confirmed this composition.
These are not the same thing.
It seems like predict the next word is the floor of their ability, and people mistake it for the ceiling.