undefined | Better HN

0 pointsjunipertea5y ago0 comments

I found training multiple models on same GPU hit other bottlenecks (mainly memory capacity/bandwidth) fast. I tend to train one model per GPU and just scale the number of computers. Also, if nothing else, we tend to push the models to fit the GPU memory.

0 comments

volta875y ago

Memory became less of an issue for me with V100, and isn't really an issue with A100, at least when quickly iterating for newer models, when the sizes are still relatively small.

j / k navigate · click thread line to collapse

0 comments

volta875y ago

Memory became less of an issue for me with V100, and isn't really an issue with A100, at least when quickly iterating for newer models, when the sizes are still relatively small.

j / k navigate · click thread line to collapse