The CSE-3 is divided into 900,000 PEs, which each have only 48kB of RAM:
https://hc2024.hotchips.org/assets/program/conference/day2/7...
Similarly, the SMs in Blackwell have up to 228kB of RAM:
https://docs.nvidia.com/cuda/archive/12.8.0/pdf/Blackwell_Tu...
If you need anything else, you need to load it from elsewhere. In the CSE-3, that would be from other PEs. In Blackwell, that would be from on package DRAM. Idle time in Blackwell be mitigated by parallelism, since each SM has SRAM for multiple kernels to run in parallel. I believe the CSE-3 is quick enough that they do not need that trick.
The other guy said “you will not be using more area in the WSE-3”. I do not see this die area efficiency. You need many full wafers (around 20 with Llama 4 Maverick) to do the same thing with the CSE-3 that can be done with a fraction of a wafer with Blackwell. Even if you include the DRAM’s die area, Nvidia’s hardware is still orders of magnitude more efficient in terms of die area.
The only advantage Cerebras has as far as I can see is that they are fast on single queries, but they do not dare advertise figures for their total throughput, while Nvidia will happily advertise those. If they were better than Nvidia at throughput numbers, Cerebras would advertise them, since that is what matters for having mass market appeal, yet they avoid publishing those figures. That is likely because in reality, they are not competitive in throughput.
To give an example of Nvidia advertising throughput numbers:
> In a 1-megawatt AI factory, NVIDIA Hopper generates 180,000 tokens per second (TPS) at max volume, or 225 TPS for one user at the fastest.
https://blogs.nvidia.com/blog/ai-factory-inference-optimizat...
Cerebras strikes me as being like Bugatti, which designs cars that go from start to finish very fast at a price that could buy dozens of conventional vehicles, while Nvidia strikes me as being like Toyota, which designs far lower vehicles, but can manufacture them in a volume that is able to handle a large amount of the world’s demand for transport. Bugatti can make enough vehicles to bring a significant proportion of the world from A to B regularly, while Toyota can. Similarly, Cerebras cannot make enough chips to handle any significant proportion of the world’s demand for inference, while Nvidia can.