I think internal understanding requires internal processing.
According to this functional definition, the way we are currently using language models basically excludes understanding. We are asking them to dream up or brainstorm things – to tell us the first things they associate with the prompt.
Maybe it's possible to set up the system with some kind of self-feedback loop, where it continues evaluating and improving its answers without further prompts. If that works, it would be one step closer to a true AGI that can be said to understand things.
There is a lot of confusion around the Chinese Room Argument. I think it makes a valid point by demonstrating that input/output behavior alone is insufficient for evaluating whether a system is intelligent and understands things. In order to do that, we need to see (or assume) the internal mechanism.