It’s likely that anything I write has already been discussed and researched, but since you’re knowledgeable on this, I’d love to get your take and perhaps a lead on other’s work!
I think Bengio’s approach is generally right with the global workspace theory of consciousness, but I think Michael Graziano’s work on Attention-Schema Theory (AST) both is more concrete, and is more aligned with the gains we see with ML’s success with self-attention models. It’s not surprising to me that as researchers optimize for instinct-as-intelligence that they will begin implementing pieces of conscious reasoning in an unintentional manner. Model-based reinforcement learning, especially Ha’s recent work involving attention (Neuroevolution of Self-Interpretable Agents), along with multi-agent RL, seems to be inching closer to AST. Perhaps intentionally?
It seems to me that in order to train a model for conscious reasoning — for qualia — you need some way to test for it. I’d say “measure”, but my premise here is that this consciousness is a binary measurement (unless you subscribe to the Integrated Information theory).
For that reason, I think that it is easier to find a behavioral proxy for consciousness — the kind of activity that only conscious beings display. Objectively, only conscious entities have access to the dataset of qualia. As an individual, this data would be all noise and no signal. But as a member of a group of conscious entities, qualia is a shared meta-dataset.
This means that conscious entities have more data about other conscious entities than non-conscious entities — because even though we can’t quantify qualia, we know qualia exist, and we know that qualia are affective in our social behavior.
For example, the philosophical zombie (if one can imagine instincts so highly refined as to resemble human intelligence, like GPT-1-million) would lack all empathy. While the p-zombie might be able to reproduce behavior according to its training dataset, it would never be able to generalize for (i.e., identify, capture, and process) real qualia, because it has no access to that kind of data. It would resemble a sociopath attempting to mimic human emotions and respond to human emotions, without having the slightest understanding of them. Qualia can only be understood from the inside.
Moreover, even thoughts and ideas are qualia. A philosophical zombie - a generally intelligent entity without conscious reasoning - is a contradiction of terms, which I think is the point.
So what social behaviors can be rewarded that would lead to qualia? Biologically, only mammals have a neocortex. And only mammals are unambiguously experiences of qualia (some birds and octopus are up for debate, and there’s no reason evolution couldn’t have found different ways to achieve the same thing if it improves fitness). The relevant thing about mammals is that we seem to be biologically oriented toward social behavior, specifically “parental care”. While many species have varying levels of parental care, mammals have a biological mandate: gestation and milk production.
If consciousness improves fitness most especially within social contexts where qualia becomes a shared meta-dataset (e.g., solving the prisoners dilemma), then a species whose very survival depends on social success would be driven toward qualia. Hard to say what came first: milk or consciousness, but they are self-reinforcing. If all this is correct - that social fitness drives consciousness (and thus intelligence), it isn’t surprising that the animal that requires the most parental care and the most social cooperation is Homo Sapiens.
So, that’s were my thoughts stand: that even if we can’t measure consciousness, we can create behavioral scenarios where consciousness is the only path to success. In this sense, designing an environment may be more important than designing an architecture.
When agents starts burying their dead, engaging in play, and committing suicide (horrifying, but a dead-ringer for qualia), we’ll know it is time to scale for intelligence instead of consciousness.