3bit is a bit ridiculous. From that page I am unclear if the current model is 3 or 4bit.
If it’s 4bit… well, NVIDIA showed that a well organized model can perform almost as well as 8bit.
Do we understand how to scale up the hardware to the point it can run a frontier model? Because this is insane. It will be a game changer for agent systems making 10-100+ calls.