Also, while I certainly agree that the performance/dollar comparison is highly relevant to customers at a given instant, that may only tell us that Google is subsidizing this hardware now that they've deployed it, and/or that, lacking serious competition, NVIDIA has been building crazy margins into their P100/V100 prices. In understanding fundamental technological tradeoffs, and even the limits of what the pricing in a more competitive market could be, it is relevant to compare performance per unit of hardware resources (mm^2, die/package, watt, GB of HBM, etc.)
In short, these comparisons are hard, and there is no one which tells a complete story. I pushed back because, while the post includes some nuance, it brushes a great deal under the rug and focuses primarily on problematic comparison
(Further disclosure: I'm at least the third person in this sub-thread with some Google Brain/Cloud affiliation. I am speaking in my independent academic voice. I also think TPUs are great, having them publicly available now is great, and competition and diversity of architectural approach in accelerators is great. I appreciate the effort of the authors, but think the subtlety of these comparisons requires serious care.)