NVIDIA GPUs were optimised for different workloads, such as 3D rendering, that have different optimal ratios.
This “supercomputer” isn’t brute force or wasteful because it allows more requests per second. By having each response get processed faster it can pipeline more of them through per unit time and unit silicon area.