Who cares about token speed? What is the quality of the results like? I don't know why people are so fixated on token speed, since no one cares how quickly it can spew garbage. Most reasonable people are happier waiting a bit more for accurate results.
It also matters for thinking models and for agentic workflows, especially in software engineering, where a lot of tokens need to be output in iterative loops before the user sees any result.