Are we going to simply say "Nu uh" at each other, or do you want to throw down some specific examples so I can show you how mistaken they are?
Perhaps I'm just not experienced enough with the programming model, but I've found them to be strictly less flexible/more tricky than GPUs, especially for things like conditional execution, multiple graphs, variable size inputs and custom ops.
The central reason that TPUs feel less flexible is Google's awful mistake in encouraging everyone to use TPUEstimator as the One True API For Doing TPU Programming. Getting off that API was the single biggest boost to my TPU skills.
You can see an example of how to do that here: https://github.com/shawwn/ml-notes/blob/master/train_runner.... This is a repo that can train GPT-2 1.5B at 10 examples/sec on a TPUv3-8 (aka around 10k tokens/sec).
Happy to answer any specific questions or peek at codebases you're hoping to run on TPUs.
I'll make it easier for you, directly from Google's website:
TPUs Cloud TPUs are optimized for specific workloads. In some situations, you might want to use GPUs or CPUs on Compute Engine instances to run your machine learning workloads.
Please tell me a workload a gpu can't do that a TPU can.
In my experience, well over 80% of these operations are implemented on TPU CPUs, and at least 60% are implemented on TPU cores.
Again, if you give a specific example, I can simply write a program demonstrating that it works. What kind of custom reduction do you want? What's a peak search?
As for workloads that GPUs can't do, we regularly train GANs at 500+ examples/sec across a total dataset size of >3M photos. Rather hard to do that with GPUs.