Obviously, I'm assuming that GPT-4 wasn't trained on the exams that it was tested against.
[1] https://mathai-iclr.github.io/papers/papers/MATHAI_29_paper....
Dennett thinks consciousness, in the sense of the hard problem/subjectivity, is some kind of trick of the brain. So he proposes a linguistic trick. Language fools us into thinking there is something more than a functional stream of information.
GPT-4 knew to use linear programming and acknowledged the constraints, even without me formatting the tabular data so the labels were with the values and properly separated! It also ran all of the 2-3 digit integer multiplications/divisions/subtractions/additions correctly. It still failed to "put it all together" in the final step and forgot some constraints. I prompted it "won't I run out of time?" and it acknowledged it then redid it forgetting a different constraint. I wasn't able to get it to come to the right conclusion.
It feels like it has learned a pattern for solving these types of questions but hasn't really gained any actual reasoning about whether it's applying the pattern in a way that makes sense. It confidently announces that it followed all of the constraints when the pattern it chose to follow didn't involve one of the constraints. It then acknowledges it was wrong but doesn't apply reason as much as knows to apply a different pattern that fixes that specific issue.
Another example is I asked it to configure some network interfaces on a Cisco switch in a certain way. I gave it 3 VLANs to configure the interface with knowing 1 was incorrect (in the 5000s, VLANs are only 12 bits long). It created the answer with tagging VLAN 5031. I asked what problems I'd run into running the generated commands and it gave some hypothetical risks, one of which being that VLANs must be in a certain range, but didn't reason that the commands included an invalid VLAN. I told it "isn't VLAN 5031 invalid?" and it apologize and corrected it. I then told it "isn't VLAN 1000 invalid?" and it apologized for it not being a valid VLAN and corrected it all the same even though it was valid.
All that testing the limits said... it may not have emergent deductive ability but I think this learned pattern matching approach based on training situations extends far past where most people would think it would. I think GPT-5 or GPT-6 may well avoid the above problems without necessarily gaining emergent logical reasoning for them as much as just having a larger depth in the patterns.
Large number operations are still interesting though and I'm not sure how they fit in. 646864613385/41348.5 returns "approximately" 15652.172205 which has the right first 3 digits but is off by a factor of 1000 and the rest of the digits are made up. I'm not sure if this is similarly explained by applying a pattern without reasoning about it but it feels like it could be.
All that said I really don't know much about how the system is constructed, I just use it :).