not necessarily -- CUDA produces many function calls, which in API remoting setups could degrade performance -- in our case, we draw the abstraction line a bit "higher" in the stack so it's more like a "functional" API remoting -- eg. image classification