“CPU” isn’t necessarily the benchmark, though. Most smartphones going back years have ML inference accelerators built in, and both Intel and AMD are starting to build in instructions to accelerate inference. Apple’s M1 and M2 have the same inference accelerator hardware as their phones and tablets. The question is whether this model is a good fit for those inference accelerators, and how well it works there, or how well it works running on the integrated GPUs these devices all have.
Brute forcing the model with just traditional CPU instructions is fine, but… obviously going to be pretty slow.
I have no experience on the accuracy of Talon, but I’ve heard that most open source models are basically overfit to the test datasets… so their posted accuracy is often misleading. If Whisper is substantially better in the real world, that’s the important thing, but I have no idea if that’s the case.