Further, why does that mean “it doesn’t reason”. Logic can be encoded in language, symbols or code. If I say “all apples are red” -> “all fruit in the bowl are apples” -> “therefor all the fruit are red”. It doesn’t really matter if I understand the logic or what red is or fruit/apples are, the logic is contained in the structure of the syntax. If an LLM can output the conclusion reliably from predictive operations it is able to have the effect of reason and we don’t need to know or care about whether it “understands” the reasoning.
If the behavior of the llm is the same as the behavior of reasonable people then the behavior of the llm is reasonable, regardless of how black of a box they generate tokens out of.
Reasonable people will generate divergent specs for the same prompt. Thus it is reasonable for an LLM to generate divergent specs out of the same prompt.
Edit: I use “reasonable” here in the legal sense of the “reasonable person” standard, not to imply any reasoning process.
I assure you I've met many devs and "engineers" that reason less than LLMs, and are black boxes, especially in terms of the code they write.
No, they don't.
They are token predictors that use statistical techniques to emit the randomly weighted next most likely token given the previous token list.
The result is a strange mimic of human reasoning, because the tokens it predicts are trained on strings that were produced by humans that were reasoning, but that's not the same thing.
Human cognition is complex and poorly understood, and the nature of the mind is an area of study almost as old as consciousness itself. We don't know exactly how it works, or what its exact relationship to the brain is, but we do know that it is not a simple token predictor.
LLMs, by their very nature are constrained to the concept of language and the relationship between existing words in a corpus. This is a box they can not escape.
Modern neuroscience suggests that the human brain is much more vast than that, and in many ways looks like it is constrained by language, but certainly not limited to it.
Sounds like an implementation detail. Now describe how human reasoning works and explain why that process of chemical and electrical signals results in "reasoning" whereas what LLMs do isn't.
The problem with being this reductive is you can do it to anything, including humans. You can’t be reductive about LLMs and refuse to be reductive about humans - that's poor reasoning, and an LLM would out-reason you on this point, further negating your case.
Reasoning is making analogies between logical patterns found in conceptual space, with a direction of time (statements precede conclusions). For example. A => B and B => C. You may now deduce A => C. For something fuzzier, A~D and B~E, you may now deduce that D~=>E. This is the sort of thing that higher layer attention mechanism is capable of doing.
> This is a box they can not escape.
Would you say that Helen Keller was less capable of abstract reasoning because she had more constrained access to sensory input?
Technically if it has that, it'd be singularity no? So basically the premise is they are doing nothing of the sort. Prove any LLM enough and it really does show it has no quarrels contradicting itself or being bossed around. Has no belief / no orientation etc. It's truly mindless but tricks our mind and soul (or whatever) probably.
Decision making can be done by trained machines following rules, but that's different that reasoning. A thermostat isn't reasoning when it decides to turn on the air conditioner, to argue otherwise expands the definition of "reason" to be so broad that it becomes useless.
LLMs are trained on human knowledge and reasoning that results from human cognition, and they are excellent at stochastic mimicry - if the argument is that they are actually reasoning, then some sort of equivalent to human cognition must be present for that to be true. Lacking that, they are nothing more than "token extrusion machines" with some potentially useful characteristics.
With moral agency and the ability to learn (even if we presume you are correct, which I don't think you are).
It really can be useful. It's very different from old world programming.