I expect a new wave of "your task, but on superior hardware" services to crop up with these chips!
v5es are particularly interesting given the millions that will land and the large pod sizes, particularly well constructed for million token context windows.
* Notwithstanding the Choral boards
We found cuda sycl conversion surprisingly good https://www.intel.com/content/www/us/en/developer/articles/t...
Isn't that the price of a single H100?
Genesis Cloud started integration and testing of Gaudi2 quite a while ago. I fully agree with the take of the article.
I can't promise per hour rental, but for longer times they are available! (should you be interested you can find contact details on the website)
Now working ones is a different story.
Just curious because IME that's the point where the fun problems surface :)
NVIDIA is still the best for research given ecosystem but once the models are standardised as with transformers/LLaMA and likely multimodal diffusion transformers it then becomes about scale, availability and cost per flop.
To those commenting about "no moat" remember CUDA is a huge part of it, it's actually HW+SW and both took a decade to mature, together
Gaudi2 was actually announced 2 years ago and is 7nm like the A100 80Gb it was meant to be competitive with, Gaudi3 later this year is probably going to be the inflection point as that ramps
The cost is like 1/3
https://www.intel.com/content/www/us/en/newsroom/news/vision...
- Intel acquired Habana in 2019
- Habana launched Gaudi2 in 2022
- only in H2 2023 Habana enabled FP8 which delivered around 100% improvement in time-to-train
On the rest I believe you but markets don't move based on single individual's/company's data points
2024: Nvidia's B100 TSMC 3nm (?)
2024: Intel Gaudi3 TSMC 5nm (*)
2023: AMD MI300X TSMC 5nm/6nm
2022: Nvidia H100 TSMC 4N
2020 Nvidia A100 TSMC 7nm
(*): performance critical chiplets at least.Considering its the latter, considering pytorch takes care of providing optimized backends for various hardwares, how big of a moat is Cuda then really?
In other words, it goes something like this:
Application
Pytorch (and similar)
cuDNN (and similar)
CUDA (and similar)
NVidia GPU
My opinion, based on what I saw those wizards do, is that reproducing the feature set and efficiency of cuDNN/cuBLAS is deeply nontrivial.We do know that in 2025 it's supposed to be part of Intel's Falcon Shores HPC XPU. This essentially takes a whole bunch of HPC compute and sticks it all on the same silicon to maximize throughput and minimize latency. Thanks to their tile-based chip strategy they can have many different versions of the chip with different HPC focuses by swapping out different tiles. AI certainly seems to be a major one, but it will be interesting to see what products they come up with.
I think Gaudi2 was bad timed & they had to build stack, Gaudi3 is where I think we will see mass adoption given availability, way cheaper price/performance & maturer stack.
There is still weird stuff when using them but they are surprisingly solid.
[1] https://www.intel.com/content/www/us/en/developer/articles/t...
Haven't been too impressed with inference versus tensor rt llm for example though
Gaudi is a famous name for a reason.. the flowing lines and frankly, nonsense and silliness, in the art and architecture of Gaudi stands for generations as a contrast to the relentless severity of formal classical arts (and especially a contrast to Intel electronic parts).
It has been amazing watching the groupthink at work on that stock when we just saw the same group do it on TSLA to disastrous effect. A similar no moat situation where they simply can’t imagine competitors ever existing.
* just to be clear - this is a joke
I actually put 40% of my TSLA into NVDA last year, because the demand for AI hardware is going to keep going up. I'm not saying the stock will never go down, I'm sure it will be volatile, but don't confuse short term volatility with long term technologic transformations.
The Hopper stuff is particulalry interesting