That's the theory. In practice, Sapphire Rapids needs 24-28 cores to hit the 200 GB/s mark and it doesn't go much further than that. Intel CPU design generally has a hard time saturating the memory bandwidth so it remains to be seen if they managed to fix this but I wouldn't hold my breath. 200 GB/s is not much. My dual-socket Skylake system hits ~140 GB/s and it's quite slow for larger LLMs.
> Why does it have to be a dual CPU design?
Because memory bandwidth is one of the most important limiting (compute) factors for larger models inference. With dual-socket design you're essentially doubling the available bandwidth.
> And the original message you were responding to was using a CPU with AMX and mixing it with a GPU like Nvidia 4900/5900.
Dual-socket CPU that costs $10k on a server that costs probably couple of factors more. Now you claimed that it doesn't have to be that expensive but I beg to differ - you still need $20k-$30k of worth equipment to run it. That's a lot and not quite "cost effective".