LOL! You are asking the right person, because I used to work on both Chinese and English speech recognition systems, including the first large vocabulary continuous speech recognition system to deal well with Chinese tones. I can say they are essentially the same phenomenon under the hood, although linguists haven't grappled with this reality yet apparently.
However, I don't have any more evidence than you do, just my assertions to yours. So I'll wrap up with a fitting quote from Frederick Jelinek: "Every time I fire a linguist, the performance of the speech recognizer goes up."