I wonder sometimes whether it's relevant that I'm a heavy Emacs user, in which short command gestures often include non-modified keys and are conceptually close to the physically more text-based M-x invocations. Maybe that type of experience (or what other types? Maybe CLI?) creates a different mental map of the distinction or lack thereof between text entry and shortcut keymaps. Emacs on Windows is especially awkward for me as a result of the QWERTY-on-Control behavior, because e.g. C-x C-t and C-x t now involve different positions for the T. Or maybe people who start out on non-QWERTY layouts on Windows specifically are pushed to remember shortcuts by their location early because the keysyms are illogical, and then they continue doing that, but people who stay on QWERTY all the time could go either way?
As others have mentioned, this also doesn't happen as much in gaming, where commands are often bound positionally, with WASD motion (GAST motion in my layout…) as a central example. There's still some expected-keysym mnemonic influence in which of multiple candidate keys to bind to a function as one moves away from the central motion cluster. The vi keys mentioned elsewhere are also very positional in nature, but I rarely use vi bindings, and when I do, the nav-cluster keys are usually an accepted alternative…
Gosh. With how much has wound up in this thread, I kind of wonder whether there's more serious ergonomics research on this difference in mental modeling now.