undefined | Better HN

0 pointssynergy203y ago0 comments

I mean SIMT(CUDA style), or SPMD, and also Tensorflow host|device runtime scheduling, it's a mystery to me how Nvidia etc schedule the AI workload(along with intrinsic) to achieve huge volume parallelism.

0 comments

pca0061323y ago

For CUDA, iirc these tasks (kernel launches) are put into streams (similar to threads) and scheduled for execution by the driver. Within each kernel launch, each 32 threads, called a warp, are executed together in a single unit, skipping some instructions when they have different control flow. I think the driver perhaps schedule in a warp level, and these warps are executing similar things so can be scheduled together. I am not an expert in this so I am not sure if this is how they do it.

synergy20OP3y ago

That's pretty much how that works, and they sync threads inside warp, I have been groping in the dark for a while, and always hoped someone can write up some details to help me to get the idea straight.

Intel, AMD, Google(TPU) all have their own way to schedule 'kernel's which are very different from CUDA, there are no details about them, I was just curious like 'how do they work across CPU|GPU'?

Thanks for the reply.

maxgio923y ago

Thank you all for this. Scheduling on GPUs is a topic in the dark for me, to be discovered

1 more reply

j / k navigate · click thread line to collapse