But that's not what the actual data says.
Here's some figures from an actual benchmark [1] w.r.t. training costs:
1. [Mar 2020] $7.43 (AlibabaCloud, 8xV100, TF v2.1)
2. [Sep 2018] $12.60 (Google, 8 TPU cores, TF v1.11)
3. [Mar 2020] $14.42 (AlibabaCloud, 128xV100, TF v2.1)
--
Training time didn't go down exponentially either [1]:
1. [Mar 2020] 0:02:38 (AlibabaCloud, 128 x V100, TF v2.1)
2. [May 2019] 0:02:43 (Huawei Cloud, 128 x V100, TF v1.13)
3. [Dec 2018] 0:09:22 (Huawei Cloud, 128 x V100, MXNet)
So again, I have to ask where exactly do these magical improvement occur (regarding training - inference is another matter entirely, I understand that)? I've yet to find a source that supports 4x to 10x cost reductions.
[1] https://dawn.cs.stanford.edu/benchmark/index.html