undefined | Better HN

0 pointssillysaurusx5y ago0 comments

AI training is moving away from CUDA and toward TPUs anyway. DGX clusters can't keep up.

0 comments

And Nvidia's GPUs now include the same type of hardware that TPUs have, so there's no reason to believe that TPUs will win out over GPUs.

sillysaurusxOP5y ago

The key difference between a TPU and a GPU is that a TPU has a CPU. It's an entire computer, not just a piece of hardware. Is nVidia moving in that direction?

newsclues5y ago

They just bought ARM for 40$billion. I think they want to integrate CPU,GPU and high speed networks

dikei5y ago

In term of cutting edge tech, They have their own GPUs, CPUs from ARM, Networking from Mellanox, so I'd say they're pretty much set to build a kick ass TPU.

shaklee35y ago

A TPU is a chip you cannot program. It's purpose built and can't run the fraction of the type of workloads that a GPU can.

1 more reply

lostmsu5y ago

Where did you get this from? AFAIK GPT-3 (for example) was trained on a GPU cluster, not TPUs.

sillysaurusxOP5y ago

Experience, for one. TPUs are dominating MLPerf benchmarks. That kind of performance can't be dismissed so easily.

GPT-2 was trained on TPUs. (There are explicit references to TPUs in the source code: https://github.com/openai/gpt-2/blob/0574c5708b094bfa0b0f6df...)

GPT-3 was trained on a GPU cluster probably because of Microsoft's billion-dollar Azure cloud credit investment, not because it was the best choice.

lostmsu5y ago

I checked MLPerf website, and it looks like A100 is outperforming TPUv3, and is also more capable (there does not seem to be a working implementation of RL for Go on TPU).

To be fair, TPUv4 is not out yet, and it might catch up using the latest processes (7nm TSMC or 8nm Samsung).

https://mlperf.org/training-results-0-7

option5y ago

no they are not. Go read recent MLPerf results more carefully and not Google’s blogpost. NVIDIA won 8/8 benchmarks for publicly available SW/HW combo. Also 8/8 on per chip performance. Google did show better results with some “research” system which is not available to anyone other then them yet.

1 more reply

make35y ago

this is just false

j / k navigate · click thread line to collapse

0 comments

ladberg5y ago

And Nvidia's GPUs now include the same type of hardware that TPUs have, so there's no reason to believe that TPUs will win out over GPUs.

sillysaurusxOP5y ago

The key difference between a TPU and a GPU is that a TPU has a CPU. It's an entire computer, not just a piece of hardware. Is nVidia moving in that direction?

newsclues5y ago

They just bought ARM for 40$billion. I think they want to integrate CPU,GPU and high speed networks

dikei5y ago

In term of cutting edge tech, They have their own GPUs, CPUs from ARM, Networking from Mellanox, so I'd say they're pretty much set to build a kick ass TPU.

shaklee35y ago

A TPU is a chip you cannot program. It's purpose built and can't run the fraction of the type of workloads that a GPU can.

1 more reply

lostmsu5y ago

Where did you get this from? AFAIK GPT-3 (for example) was trained on a GPU cluster, not TPUs.

sillysaurusxOP5y ago

Experience, for one. TPUs are dominating MLPerf benchmarks. That kind of performance can't be dismissed so easily.

GPT-2 was trained on TPUs. (There are explicit references to TPUs in the source code: https://github.com/openai/gpt-2/blob/0574c5708b094bfa0b0f6df...)

GPT-3 was trained on a GPU cluster probably because of Microsoft's billion-dollar Azure cloud credit investment, not because it was the best choice.

lostmsu5y ago

I checked MLPerf website, and it looks like A100 is outperforming TPUv3, and is also more capable (there does not seem to be a working implementation of RL for Go on TPU).

To be fair, TPUv4 is not out yet, and it might catch up using the latest processes (7nm TSMC or 8nm Samsung).

https://mlperf.org/training-results-0-7

option5y ago

1 more reply

make35y ago

this is just false

j / k navigate · click thread line to collapse