Skip to content
Better HN
GShard: Scaling giant models with conditional computation and automatic sharding | Better HN