I suggest taking the report with a grain of salt.
The do the standard AMD comparison:
8x AMD MI300X (192GB, 750W) GPU
8x H100 SXM5 (80GB, 700W) GPU
The fair comparison would be against 8x H100 NVL (188GB, <800W) GPU
Price tells a story. If AMD performance would be in par with Nvidia they would not sell their cards for 1/4 price. MTr
------------------
H100 SXM5 80,000
MI300X 153,000
H100 NVL 160,000
H100 SXM4 has 52% of the transistors MI300X has, half of the RAM and MI300X achieves *ONLY* 33% higher throughput compared to the H100. MI300X was launched 6 months ago, H100 20 months ago.AMD has work to do.
I haven't done a head to head and I suppose it depends on whether tensor parallelism actually scales linearly or not, but my understanding is since the NVL's are just PCIe/NVLink paired H100s, you're not really getting much if any benefit on something like vLLM.
I think the more interesting thing critique might be the slightly odd choice of Mixtral 8x7B vs say a more standard Llama2/3 70B (or just test multiple models including some big ones like 8x22B or DBRX.
Also, while I don't have a problem w/ vLLM, as TensorRT gets easier to set up, it might become a factor in comparisons (since they punted on FP8/AMP in this tests). Inferless published a shootoff a couple months ago comparing a few different inference engines: https://www.inferless.com/learn/exploring-llms-speed-benchma...
Price/perf does tell a story, but I think it's one that's mostly about Nvidia's platform dominance and profit margins more than intrinsic hardware advantages. On the spec sheet MI300X has a memory bandwidth and even raw FLOPS advantage but so far it has lacked proper software optimization/support and wide availability (has anyone besides hyperscalers and select partners been able to get them?)
I don't think it should be ignored, especially when the power consumption is similar.
if so ok it's fair to compare 1 mi300x with 1 h100 NVL but then price ( and tco ) should be added to the some metrics conclusion , also the NVL is a 2xpci5.0 quad slot , so not the same thing..
I am not sure about system compatibility and if and how you can stack 8 of those in one system ( like you can do with non NVL and mi300x.. ) so it's a bit a diffent ( and more niche ) beast
What were your thoughts on Zen (1) vs Intel's offerings then? AMD offered more back for the buck then too.
Fun weekend project for anybody.
Also, stuff like this is hard to take the results seriously:
* To make an accurate comparison between the systems with different settings of tensor parallelism, we extrapolate throughput for the MI300X by 2.
* All inference frameworks are configured to use FP16 compute paths. Enabling FP8 compute is left for future work.
They did everything they can to make sure AMD is faster.What would be a suitable input length in your oppinion?
And why isnt this a good one: Are real-life queries shorter? Or longer?
If i count one word as a token, then in my case most of the queries are less than 128 words.
It's not just the query (if you're running a chatbot, which many of us are not). It's the entire context window. It's not uncommon to have a system prompt that is > 512 tokens alone.
I would like to see benchmarks for 512, 1024, 4096 and 8192 token inputs.
If I understood that correctly, context length is something like session storage or short term memory. If it's too small the AI starts to forget what it's talking about.
While I would enjoy a US tech salary, I'm not sure we want a world where all manufacturing is set aside to focus on the attention economy.
Nvidia value deserves to be much higher than any company on the DAX (maybe all of them together, as it currently is) - but how much of that current value is real rather than an AI speculation bubble?
Nah, then ill get my very good wagie pennies here and have plenty jobs available, plus good health insurrance and whatnot.
But there's a long list of German companies not on the DAX
(though Germany DAX really deserves to be worth less than NVidia)
Nvidia problem will sort itself out naturally in the coming months/years.
Jensen isn't stupid. He's making accelerators for anything so that they'll be ready to catch the next bubble that depends on crazy compute power that can't be done efficiently on CPUs. They're so far the only semi company beating Moore's law by a large margin due to their clever scaling tech while everyone else is like "hey look our new product is 15% more efficient and 15% more IPC than the one we launched 3 years ago".
They may be overvalued now but they definitely won't crash back to their "just gaming GPUs" days.
Also curious how many companies are dropping that much money on those kind of accelerators just to run 8x 7B param models in parallel... You're also talking about being able to train a 14B model on a single accelerator. I'd be curious to see how "full-accelerator train and inferrence" workloads would look ie: Training a 14B param model then inferrence throughput on a 4x14B workload.
AMD (and almost every other inferrence claim maker so far... intel and apple specifically) have consistently cherry picked the benchmarks to claim a win over, and ignored the remainder which all show nvidia in the lead - and they've used mid-gen comparison models as many commenters here pointed out in this article.
in a single system ( 8x accelerators ) LLMs, mi300x has very competitive inference TCO vs h100 .
also :
AMD Instinct MI300X Offers The Best Price To Performance on GPT-4 According To Microsoft, Red Team On-Track For 100x Perf/Watt By 2027
https://wccftech.com/amd-instinct-mi300x-best-price-performa...
and with a growing but certainly less mature product ( expecially software ), it requires suitable pricing and allocation strategies
1. https://www.techspot.com/news/102056-nvidia-allegedly-punish...
amd is successfully attacking the inference sector, increasing its advantage with mi325 and aiming for training from 2025 with mi350 (and Infinity Fabric interconnect and other types of interconnection that are arriving for the various topologies), which will probably have an advantage over blackwell, and then fall back against rubin and come back ahead against mi400,
at least, this is what it seems, and as long as the rocm continues to improve.
Personally I am happy to see some competition in the sector and especially on open source software
boo boo, a GTX 670 that cost you $399 in 2012 now costs $599 - grow up, do the inflation calculation, and realize you’re being a child. gamers get the best deal on bulk silicon on the planet, R&D subsidized by enterprise, fantastic blue-sky research that takes years for competitors to (not even) match, and it’s still never enough. ”Gamers” have justified every single cliche and stereotype over the last 5 years, absolutely inveterate manbabies.
(Hardware Unboxed put out a video today with the headline+caption combo “are gamers entitled”/“are GeForce gpus gross”, and that’s what passes for reasoned discourse among the most popular channels. They’ve been trading segments back and forth with GN that are just absolute “how bad is nvidia” “real bad, but what do you guys think???” tier shit, lmao.
https://i.imgur.com/98x0F1H.png
this stuff is real shit, nvidia has been leaning on partners to maintain their segmentation, micromanaging shipment release to maintain price levels (cartel behavior), punishing customers and suppliers with “you know what will happen if you cross us”, literally putting it in writing with GPP (big mistake), playing fuck fuck games with not letting the drivers be run in a datacenter, etc. You see how that’s a little different than a gpu going from an inflation-adjusted $570 to $599 over 10 years?
(And what’s worse the competition can’t even keep that much, they’re falling off even harder now that Moores law has really kicked the bucket and they have to do architectural work every gen just to make progress, instead of getting free shrinks etc… let alone having to develop software! /gasp)
In entirely unrelated news… gigabyte suddenly has a 4070 ti super with a blower cooler. Oh, and it’s single-slot with end-fire power connector. All three forbidden features at once - very subtle, extremely law-abiding.
https://videocardz.com/newz/gigabyte-unveils-geforce-rtx-407...
and literally gamers can’t help but think this whole ftc case is all about themselves anyway…
large orders for those accelerators are placed months ahead
meanwhile mi300x on microsoft are fully booked...
https://techcommunity.microsoft.com/t5/azure-high-performanc...
"Scalable AI infrastructure running the capable OpenAI models These VMs, and the software that powers them, were purpose-built for our own Azure AI services production workloads. We have already optimized the most capable natural language model in the world, GPT-4 Turbo, for these VMs. ND MI300X v5 VMs offer leading cost performance for popular OpenAI and open-source models."
According to the article: """ AMD Configuration: Tensor parallelism set to 1 (tp=1), since we can fit the entire model Mixtral 8x7B in a single MI300X’s 192GB of VRAM.
NVIDIA Configuration: Tensor parallelism set to 2 (tp=2), which is required to fit Mixtral 8x7B in two H100’s 80GB VRAM. """
Everybody thinks it’s CUDA that makes Nvidia the dominant player. It’s not - almost 40% of their revenue this year comes from mega corporations that use their own custom stack to interact with GPUs. It’s only a matter of time before competition catches up and gives us cheaper GPUs.
lol completely made up.
are you conflating CUDA the platform with the C/C++ like language that people write into files that end with .cu? because while some people are indeed not writing .cu files, absolutely no one is skipping the rest of the "stack" (nvcc/ptx/sass/runtime/driver/etc).
source: i work at one of these "mega corps". hell if you don't believe me go look at how many CUDA kernels pytorch has https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/n....
> Everybody thinks it’s CUDA that makes Nvidia the dominant player.
it 100% does
thats just a question of negotiating with tsmc or their few competitors
(also didn't tsmc start production of some factories in the US and/or EU?)
I mean, nvidia use tsmc, so does amd.
But now that there’s a larger incentive to produce GPUs, their moat will eventually fall.
TSMC runs at 100% capacity for top tier processes - their bottleneck is more foundries. These take time to build. So the question becomes - how long can Nvidia remain dominant? It could be quarters or it could be years before any real competitor convinces large customers to switch over.
Microsoft and Google are producing their own AI hardware too - nobody wants to depend solely on Nvidia, but they’re currently forced to if they want to keep up.
NVidia relies on TMSC for manufacturing. Samsung is building competing manufacturing infrastructure which is also a good thing, so Taiwan is not a single point of failure.
95% would be nice too
https://www.reddit.com/r/AMD_MI300/comments/1dgimxt/benchmar...
> MI300X Accelerator: 192GB VRAM, 5.3 TB/s, ~1300 TFLOPS for FP16
> Hardware: Baremetal node with 8 H100 SXM5 accelerators with NVLink, 160 CPU cores, and 1.2 TB of DDR5 RAM.
> H100 SXM5 Accelerator: 80GB VRAM, 3.35 TB/s, ~986 TFLOPS for FP16
I really wonder about the pricing. In theory the MI300X is supposed to be cheaper, but whether is that is really the case in practice remains to be seen.
So, probably around the same price?
The tests look promising, though!
The weird thing on Runpod is the virtual CPUs, you can't run MI300x in virtual machines yet. It is a missing feature that AMD is working on.
https://www.amd.com/en/newsroom/press-releases/2024-5-21-amd...
1. They're only comparing against VLLM, which isn't SOTA for latency-focused inference. For example, their vllm benchmark on 2 GPUs sees 102 tokens/s for BS=1, gpt-fast gets around 190 tok/s. https://github.com/pytorch-labs/gpt-fast 2. As others have pointed out, they're comparing H100 running with TP=2 vs. 2 AMD GPUs running independently.
Specifically,
> To make an accurate comparison between the systems with different settings of tensor parallelism, we extrapolate throughput for the MI300X by 2.
This is uhh.... very misleading, for a number of reasons. For one, at BS=1, what does running with 2 GPUs even mean? Do they mean that they're getting the results for one AMD GPUs at BS=1 and then... doubling that? Isn't that just... running at BS=2?
3. It's very strange to me that their throughput nearly doubles going from BS=1 to BS=2. MoE models have an interesting property that low amounts of batching doesn't actually significantly improve their throughput, and so on their Nvidia vllm benchmark they just go from 102 => 105 tokens/s throughput when going from BS=1 to BS=2. But on AMD GPUs they go from 142 to 280? That doesn't make any sense to me.
https://www.reddit.com/r/AMD_MI300/comments/1dgimxt/benchmar...
That info is conspicuously absent from the article.
Maybe the benchmark should be performance per $... though I suspect power consumption will eclipse the cost of purchasing the chips from NVDA or AMD (and costs of chips will vary over time and with discounts). EDIT: was wrong on eclipsing; still am looking for a more durable benchmark (performance per billion transistors?) given it's suspected NVDA's chips are over-priced due to demand outstripping supply for now, and AMD's are under- to get a foothold in this market.
Making AMD work effortlessly with pytorch et al should make the switch transparent.
Also, the price difference is not quantified.
Additionally, CUDA is a known and tangible software stack - can I try out this "MK1 FLywheel" on my local (AMD) hardware?
For consumer grade inference, there's already many options available.
They also used Flywheel for AMD while not bothering to turn on Flywheel for Nvidia, which is crazy since Flywheel improves Nvidia performance by 70%. https://mk1.ai/blog/flywheel-launch
In this context the 33% performance lead by AMD looks terrible, and straight up looks slower.
This is a new AMD vs last generation nvidia benchmark.
https://www.theregister.com/2024/03/21/nvidia_dgx_gb200_nvk7...
MI300X launched 3 months earlier at the end of December.
H100 launched March 2023,
(Otherwise it's apples and oranges)