Its weird, i looked up whether AMD has any benchmarks on the 405B for the MI300x, and came across this one --
https://dstack.ai/blog/amd-mi300x-inference-benchmark/#token...From my understanding, it can get up to around 2500 tokens/s? Both are 8x units (h200 and MI300x)