The result is actually richer than ‘predicted output’ - it’s a probability distribution over all possible output.-- This is, uh, false. If an LLM output a "probability distribution over all possible output", it would be producing a huge, a vast, vector each time. It doesn't. ChatGPT, GPT-3 etc produce a string output, that's it. You can say it's following a probability distribution of outputs from output space but just about anything the output does that.
Think about how when you’re coding, autocomplete suggestions help you pick the right ‘next token’ with greater accuracy.
-- Uh, you missed where I said "in-context predicted output". The Transformers architecture is where the LLM magic happens. It's what allows "X but in pig Latin" etc.
It's hard to get that these systems are neither "fancy autocomplete" nor AGI/something magic but an interest but sometimes deceptive middle ground.