> Wouldn't "grok says it is Claude 3.5 Sonnet yet other LLMs do not" make you update your chance that grok is actually just Claude 3.5 sonnet?
Not if you're familiar with Large Language Models.
As an example, "R1 distilled llama" is a model trained by Meta fine-tuned on Deepseek R1 outputs, but if you ask it, it claims to be trained by OpenAI.
Right. But given all pairs of mainstream LLM combinations, it seems a model is more likely to say “yes I am X” when it is X than when it isn’t X, even if it still has a high chance of being wrong.
Which means you should (as a bayesian actor) update on it saying “I am X” as evidence it is X