Comparing Google’s TPUv2 against Nvidia’s V100 on ResNet-50 (opens in new tab)

(blog.riseml.com)

171 pointshenningpeters8y ago127 comments

127 comments

Thanks for sharing and very insightful. Guess the TPUs are the real deal. About 1/2 the cost for similar performance.

Would assume Google is able to do that because of the less power required.

I am actually more curious to get a paper on the new speech NN Google is using. Suppose to be 16k samples a second through a NN is hard to imagine how they did that and was able to roll it out as you would think the cost would be prohibitive.

You are ultimately competing with a much less compute heavy solution.

https://cloudplatform.googleblog.com/2018/03/introducing-Clo...

Suspect this was only possible because of the TPUs.

Can't think of anything else where controlling the entire stack including the silicon would be more important than AI applications.

nojvek8y ago

Half the cost? Where are you reading that? Yeah on demand rental in AWS is expensive, but both long term and buying V100 yourself is significantly cheaper. Cloud companies have pretty fat margins on on demand rentals.

You can’t buy a TPU, it’s a cloud only thing. They also show it’s not a huge difference in both perf and time to converge (albeit only one architecture)

I would say kudos to V100 and this benchmark that breaks the TPU hype.

jacksmith210068y ago

The chart has 6.7 per hour for 3186 images Google and 12.2 per hour for 3128 AWS.

Or maybe reading it wrong?

That is close to half has much to use Google is it not?

BTW, The TPUs are also about twice as fast also.

Sounds like Google is pretty far ahead of Nvidia. Which really just makes sense as Google does the entire stack and just going to have the data to optimize the silicon.

About half the cost is hype?

I want in the cloud and not have to deal with updating, etc. Would think most are the same for anything of any scale. Could not imagine any longer building up rigs and dealing with all the issues. Plus much harder to scale.

nightski8y ago

It's more a comparison of AWS vs. Google Cloud pricing than Nvidia vs. TPUv2.

2 more replies

deepnotderp8y ago

fwiw, the "TPU instance " has more than one tpu chip on it.

2 more replies

ndesaulniers8y ago

See the chart titled: Performance in images per second per $.

TPUv2 is has 1.27x-1.86x the images/s/$.

And the other chart titled: Cost to reach 75.7% top-1 accuracy.

Where TPUv2 costs 62.5% the reserved GPU instance and 42.6% the unreserved GPU cost.

Key takeaway from the article:

> While the V100s perform similarly fast, the higher price and slower convergence of the implementation results in a considerably higher cost-to-solution.

eanzenberg8y ago

The impression I got was opposite: TPU is not the hot shit that Google claims it is. Pricing is kind of irrelevant since they can subsidize this to create that story.

PaulHoule8y ago

I know an engineer who prototypes GPU-like systems with FPGA and he has told me to be skeptical about performance miracles.

No matter how fast a system is on the inside you have to get data in and out of it -- at the very least to memory. SRAM takes too much area and there is a limit DRAM bandwidth despite technologies such as eDRAM and HBM. Some tasks are compute intensive, but for general tasks, a processor that is 100x faster would need 100x faster memory to really be 100x faster.

Thus advances in real-life performance are likely to be more like a factor of 2.

For training I never pay full price in the AWS cloud, rather I run interruptable instances and pay a fraction of the list price. People I know who train in the Google cloud seem to get interrupted all the time even though they are paying full price.

Inference is another story. Once you have the trained model, you will usually need to run inference many many more times than you run training and this gets more so the bigger scale you are running at. That hits your unit costs and it is where you need to pinch every penny.

ndesaulniers8y ago

> Pricing is kind of irrelevant since they can subsidize this to create that story.

Depends on how much you plan to use the hardware. If it's running near continuously, total cost of ownership is very important. Power costs can quickly dominate TCO.

eanzenberg8y ago

At the pricing extreme, Google could make their TPUs free to use and charge elsewhere in their cloud. This shows that literal pricing is pretty irrelevant.

1 more reply

ndesaulniers8y ago

Did you get that impression from this line in the article?

> While the V100s perform similarly fast, the higher price and slower convergence of the implementation results in a considerably higher cost-to-solution.

rprenger8y ago

Full disclosure, I currently work at Nvidia on speech synthesis.

You can definitely do this on a GPU. We use the older auto-regressive WaveNets (not Parallel Wavenet) for inference on GPUs, with the newly released nv-wavenet code. Here's a link to a blog post about it:

https://devblogs.nvidia.com/nv-wavenet-gpu-speech-synthesis

That code will generate audio samples at 48khz, or if you're worried about throughput, it'll do a batch of 320 parallel utterances at 16khz.

smallnamespace8y ago

> About 1/2 the cost for similar performance.

I would expect a dedicated accelerator to need at least a 5-10X advantage to outweigh all the other infrastructure and ecosystem costs.

GPUs are more useful for a wide variety of data-parallel tasks, and many more NN frameworks work on top of CUDA than work on the TPU.

In terms of horizontal scalability, nvidia has been rapidly iterating on increasing both memory and interlink bandwidth (including NVSwitch [1]), while each 'TPU' is actually 4 chips interconnected so likely has less upward scalability.

Also note that the tensor cores on a V100 take roughly 25-30% of the actual area. If Nvidia wanted to, they could probably easily make a pure tensor chip that beat the TPU in performance, could be produced in volume on their existing process, and also had full compatibility with their entire stack.

All in all, a 2x price/performance advantage for a hyper-specialized accelerator is basically a loss, just like how nobody installs a Soundblaster card anymore, how consumer desktops don't run discrete GPUs even though integrated graphics are a few times slower, or

[1] https://www.nextplatform.com/2018/04/04/inside-nvidias-nvswi...

hencoappel8y ago

If that 2x price/performance scales for all of Google's inferencing then it is definitely not a loss for them. If they can halve their running costs for inferencing then they are saving themselves a ton of money. Their TPUv2 was announced slightly before the V100 and the money savings they make by not paying Nvidia premiums probably helps. From the customer point of view, what is a GPU other than a specialised accelerator. Without more details we can't know how a TPU really compares, but if your aim is to train/run inference of Tensorflow models, then they're a really competitive product at the moment.

smallnamespace8y ago

I agree, but chip development is an expensive business. There is nothing preventing Nvidia from immediately turning around and building a specialised ML accelerator with better software integration and higher bandwidth. For all we know they could already be working on one.