I'm not an expert. I speculate that the compiler is unlikely to optimize the wasm binary better than an x86 binary. Furthermore, every VM instruction is on average going to need more than 1 cpu instructions to be executed. Intuitively, that would suggest slower execution. That is also what we see happen in practice with VMs.
Python is not a particularly fast language in the first place due to bad utilization of memory, hash table lookups everywhere and a high function call overhead.