undefined | Better HN

0 pointspmoriarty3y ago0 comments

"They seem to scale up, not out, so grids don't really work."

Can someone explain what this means? I don't understand.

0 comments

https://openmetal.io/docs/edu/openstack/horizontal-scaling-v...

In a typical fully connected hidden layer, the neurons each need to compute the values of the all others in the previous layer, so you need all the data in one place. Obviously you can distribute the actual calculations which is what a GPU does, but distributing that over networked CPUs will be incredibly slow and require the whole thing to be loaded into memory on all instances.

My bet is on some kind of light based or analog electric accelerator PCIE card to be the next best thing for this sort of inference, since it should be able to calculate multiple layers at once. FPGAs also work but only for fixed weights.

regularfry3y ago

Further than that, with big models and training rounds that want to update potentially all the values, you can't even split the work by saying "report the fitness of this model against this cost function and report back in however much time your CPU needs" because shipping around the model and data is impractical.

moffkalast3y ago

I mean yeah, even just doing regular inference is borderline impossible on a normal machine given that we're even having this discussion. Training is just completely unfeasible.

balloonfencist3y ago

Up=bigger machine

Out=lots of machines through network

pmarreck3y ago

The more you split it up outwards (across more nodes), the more communication among nodes that is required, which doesn’t lend itself well to regular Internet connections, which means it would prefer to scale upwards with more GPU/CPU/memory capacity per node.

natmaka3y ago

https://en.wikipedia.org/wiki/Scalability#Horizontal_or_scal...

j / k navigate · click thread line to collapse

0 comments

moffkalast3y ago

https://openmetal.io/docs/edu/openstack/horizontal-scaling-v...

regularfry3y ago

moffkalast3y ago

I mean yeah, even just doing regular inference is borderline impossible on a normal machine given that we're even having this discussion. Training is just completely unfeasible.

balloonfencist3y ago

Up=bigger machine

Out=lots of machines through network

pmarreck3y ago

natmaka3y ago

https://en.wikipedia.org/wiki/Scalability#Horizontal_or_scal...

j / k navigate · click thread line to collapse