undefined | Better HN

0 pointssillysaurusx5y ago0 comments

Sure, here you go: https://www.tensorflow.org/api_docs/python/tf/raw_ops

In my experience, well over 80% of these operations are implemented on TPU CPUs, and at least 60% are implemented on TPU cores.

Again, if you give a specific example, I can simply write a program demonstrating that it works. What kind of custom reduction do you want? What's a peak search?

As for workloads that GPUs can't do, we regularly train GANs at 500+ examples/sec across a total dataset size of >3M photos. Rather hard to do that with GPUs.

0 comments

shaklee35y ago

Well, there you go. For one TensorFlow is not a generic framework like cuda is, so you lose a whole bunch of the configurability you have with cuda. So, for example, even though there is an FFT raw function, there doesn't appear to be a way to do more complicated FFTs, such as an overlap-save. This is trivial to do on a GPU, and is built into the library. The raw functions it provides is not direct access to the hardware and memory subsystem. It's a set of raw functions that is a small subset of the total problem space. And certainly if you are saying that running something on a TPU's CPU cores are in any way going to compete with a gpu, then I don't know what to tell you.

You did not give an example of something GPUs can't do. all you said was that TPUs are faster for a specific function in your case.

sillysaurusxOP5y ago

For one TensorFlow is not a generic framework like cuda is, so you lose a whole bunch of the configurability you have with cuda

Why make generalizations like this? It's not true, and we've devolved back into the "nu uh" we originally started with.

This is trivial to do on a GPU, and is built into the library

Yes, I'm sure there are hardwired operations that are trivial to do on GPUs. That's not exactly a +1 in favor of generic programmability. There are also operations that are trivial to do on TPUs, such as CrossReplicaSum across a massive cluster of cores, or the various special-case Adam operations. This doesn't seem related to the claim that TPUs are less flexible.

The raw functions it provides is not direct access to the hardware and memory subsystem.

Not true. https://www.tensorflow.org/api_docs/python/tf/raw_ops/Inplac...

Jax is also going to be giving even lower-level access than TF, which may interest you.

You did not give an example of something GPUs can't do. all you said was that TPUs are faster for a specific function in your case.

Well yeah, I care about achieving goals in my specific case, as you do yours. And simply getting together a VM that can feed 500 examples/sec to a set of GPUs is a massive undertaking in and of itself. TPUs make it more or less "easy" in comparison. (I won't say effortless, since it does take some effort to get yourself into the TPU programming mindset.)

shaklee35y ago

I gave you an example of something you can't do, which is an overlap-save FFT, and you ignored that completely. Please implement it, or show me any example of someone implementing any custom FFT that's not a simple, standard, batched FFT. I'll take any example of implementing any type of signal processing pipeline on TPU, such as a 5G radio.

Your last sentence is pretty funny: a GPU can't do certain workloads because one it can do is too slow for you. Yet it remains a fact that TPU cannot do certain workloads without offloading to the CPU (making it orders of magnitude slower), and that's somehow okay? It seems where this discussion is going is you pointed to a TensorFlow library that may or may not offload to a TPU, and it probably doesn't. But even that library is incomplete to implement things like a 5G LDPC decoder.

1 more reply

j / k navigate · click thread line to collapse

0 comments

shaklee35y ago

You did not give an example of something GPUs can't do. all you said was that TPUs are faster for a specific function in your case.

sillysaurusxOP5y ago

For one TensorFlow is not a generic framework like cuda is, so you lose a whole bunch of the configurability you have with cuda

Why make generalizations like this? It's not true, and we've devolved back into the "nu uh" we originally started with.

This is trivial to do on a GPU, and is built into the library

The raw functions it provides is not direct access to the hardware and memory subsystem.

Not true. https://www.tensorflow.org/api_docs/python/tf/raw_ops/Inplac...

Jax is also going to be giving even lower-level access than TF, which may interest you.

You did not give an example of something GPUs can't do. all you said was that TPUs are faster for a specific function in your case.

shaklee35y ago

1 more reply

j / k navigate · click thread line to collapse