But we don't appear to have entirely done that yet. It's just curious to me that the linguistic structure is there while the "intelligence", as you call it, is not.
Not necessarily. You can check this yourself by building a very simple Markov Chain. You can then use the weights generated by feeding it Moby Dick or whatever, and this gap will be way more obvious. Generated sentences will be "grammatically" correct, but semantically often very wrong. Clearly LLMs are way more sophisticated than a home-made Markov Chain, but I think it's helpful to see the probabilities kind of "leak through."
Nobody knows what they are saying either, the brain is just (some form) of a neural net that produces output which we claim as our own. In fact most people go their entire life without noticing this. The words I am typing right now are just as mysterious to me as the words that pop on screen when an LLM is outputting.
I feel confident enough to disregard duelists (people who believe in brain magic), that it only leaves a neural net architecture as the explanation for intelligence, and the only two tools that that neural net can have is deterministic and random processes. The same ingredients that all software/hardware has to work with.
I'm a dualist, but I promise no to duel you :) We might just have some elementary disagreements, then. I feel like I'm pretty confident in my position, but I do know most philosophers generally aren't dualists (though there's been a resurgence since Chalmers).
> the brain is just (some form) of a neural net that produces output
We have no idea how our brain functions, so I think claiming it's "like X" or "like Y" is reaching.
Our brains just make words in the same way we catch a tune in our heads.
Then we are culturally conditioned to claim ownership over them and justify them post-hoc (i.e., the ego).
As to what the experience maps to, I think the simplest answer is that our phenomenal experiences are encoded as structures in our brain, but that's not necessary to understanding the difference between words that describe experiences and experiences themselves.
It's a difficult thing to produce a body of text that conveys a particular meaning, even for simple concepts, especially if you're seeking brevity. The editing process is not in the training set, so we're hoping to replicate it simply by looking at the final output.
How effectively do you suppose model training differentiates between low quality verbiage and high quality prose? I think that itself would be a fascinatingly hard problem that, if we could train a machine to do, would deliver plenty of value simply as a classifier.
If it contains the entire corpus of recorded human knowledge…
And most of everything is shit…