It's important to recognize that predicting text is not merely about guessing the next letter or word, but rather a complex set of probabilities grounded in language and context. When we look at language, we might see intricate relationships between letters, words, and ideas.
Starting with individual letters, like 't,' we can assign probabilities to their occurrence based on the language and alphabet we've studied. These probabilities enable us to anticipate the next character in a sequence, given the context and our familiarity with the language.
As we move to words, they naturally follow each other in a logical manner, contingent on the context. For instance, in a discussion about electronics, the likelihood of "effect" following "hall" is much higher than in a discourse about school buildings. These linguistic probabilities become even more pronounced when we construct sentences. One type of sentence tends to follow another, and the arrangement of words within them becomes predictable to some extent, again based on the context and training data.
Nevertheless, it's not only about probabilities and prediction. Language models, such as Large Language Models (LLMs), possess a capacity that transcends mere prediction. They can grapple with 'thoughts'—an abstract concept that may not always be apparent but is undeniably a part of their functionality. These 'thoughts' can manifest as encoded 'ideas' or concepts associated with the language they've learned.
It may be true that LLMs predict the next "thought" based on the corpus they were trained on, but it's not to say they can generalize this behavior, past what "ideas" they were trained on. I'm not claiming generalized intelligence exists, yet.
Much like how individual letters and words combine to create variables and method names in coding, the 'ideas' encoded within LLMs become the building blocks for complex language behavior. These ideas have varying weights and connections, and as a result, they can generate intricate responses. So, while the outcome may sometimes seem random, it's rooted in the very real complex interplay of ideas and their relationships, much like the way methods and variables in code are structured by the 'idea' they represent when laid out in a logical manner.
Language is a means to communicate thought, so it's not a huge surprise that words, used correctly, might convey an idea someone else can "process", and that likely includes LLMs. That we get so much useful content from LLMs is a good indication that they are dealing with "ideas" now, not just letters and words.
I realize that people are currently struggling with whether or not LLMs can "reason". For as many times as I've thought it was reasoning, I'm sure there are many times it wasn't reasoning well. But, did it ever "reason" at all, or was that simply an illusion, or happy coincidence based on probability?
The rub with the word "reasoning" is that it directly involves "being logical" and how we humans arrive at being logical is a bit of a mystery. It's logical to think a cat can't jump higher than a tree, but what if it was a very small tree? The ability to reason about cats jumping abilities doesn't require understanding trees come in different heights, rather that when we refer to "tree" we mean "something tall". So, reasoning has "shortcuts" to arrive at an answer about a thing, without looking at all the things probabilities. For whatever reason, most humans won't argue with you about tree height at that point and just reply "No, cats can't jump higher than a tree, but they can climb it." By adding the latter part, they are not arguing the point, but rather ensuring that someone can't pigeonhole their idea of truth of the matter.
Maybe when LLMs get as squirrely as humans in their thinking we'll finally admit they really do "reason".