undefined | Better HN

0 pointsp1esk6y ago0 comments

I do research in ML hw field: there are currently a couple hundred designs to run a convolutional NN inference. A couple of dozen have been built. They have pretty different underlying technologies (CMOS, floating gate, ReRAM/memristors, etc), different ideas (systolic arrays, analog crossbars, cache organization, lookup tables, data reuse, TDM, using spikes, etc), wildly different power (from microwatts to hundreds of Watts), size, speed, precision, flexibility, cost, ease of use/integration, etc. This is just convnet inference. Lots more is needed to do training in hw, again with multiple choices on how to do it.

So which one design you suggest we all use for all our ML needs?

0 comments

londons_explore6y ago

The duplication I see in the commercial world in ML inference hardware is in designs similar to the TPU... So a big ~128x28 accumulating mat-mul array, with enough memory throughput to get one operand in and out fast, and enough cache to store the other operand (weights) and switch between which weights are used so the mat-mul array can very efficiently do larger matrix sizes.

Also lookup tables for a bunch of activation functions.

That basic design can efficiently implement nearly any neural net architecture as long as the layer sizes are at least 128x128 and fixed point is okay.

The other exotic designs you suggested are more academic research things, and not yet deployed at scale in anyone's datacenters.

j / k navigate · click thread line to collapse

0 comments

londons_explore6y ago

Also lookup tables for a bunch of activation functions.

That basic design can efficiently implement nearly any neural net architecture as long as the layer sizes are at least 128x128 and fixed point is okay.

The other exotic designs you suggested are more academic research things, and not yet deployed at scale in anyone's datacenters.

j / k navigate · click thread line to collapse