They also released their API a week or 2 ago. Its significantly faster than anything from OpenAI right now. Mixtral 8x7b operates at around 500 tokens per second. https://groq.com/
NVIDIA GPUs were optimised for different workloads, such as 3D rendering, that have different optimal ratios.
This “supercomputer” isn’t brute force or wasteful because it allows more requests per second. By having each response get processed faster it can pipeline more of them through per unit time and unit silicon area.
Just to put these numbers in perspective a desktop 8 core 7800x3d has 96MB of L3 cache, and the top-end 96-core Epyc 9684X has 1.15GB of L3.
I imagine Mixtral by itself would only take something like 200-300 LPUs
>processes GPT-2 with an ultra-low power consumption of 400 milliwatts and a high speed of 0.4 seconds
Not sure what's the point on comparing the two, an A100 will get you a lot more speed than 2.5 tokens/sec. GPT 2 is just a 1.5B param model, a Pi 4 would get you more tokens per second with just CPU inference.
Still, I'm sure there's improvements to be made and the direction is fantastic to see, especially after Coral TPUs have proven completely useless for LLM and whisper acceleration. Hopefully it ends up as something vaguely affordable.
[1] https://coral.ai/docs/edgetpu/models-intro/#model-requiremen...
What does that mean, practically?
How can you mimic that layout in silicon?
It's unclear how they managed to use this to run LLMs, though. Getting GPT-2 running with SNNs is a legitimate achievement, because SNNs have traditionally lagged significantly behind conventional deep learning architectures.
https://web.stanford.edu/group/brainsinsilicon/documents/ANe... https://web.stanford.edu/group/brainsinsilicon/documents/Ben...
I feel like SNNs are like Brazil - they are the future, and shall remain so. I think more basic research is needed for them to mature. AFAIK the current SOTA is to train them with 'surrogate gradients', which shoe-horn them into the current NN training paradigm, and that sort of discards some of their worth. Have biologically-inspired learning rules, like STDP, _really_ been exhausted?
Wolfram Alpha says thats roughly equivalent to cell phone power draw when sleeping.