Nvidia Grace CPU (opens in new tab)

(nvidia.com)

418 pointsintull4y ago185 comments

185 comments

NVIDIA continues to vertically integrate their datacenter offerings. They bought mellanox to get infiniband. They tried to buy ARM - that didn't work. But they're building & bundling CPUs anyway. I guess when you're so far ahead on the compute side, it's all the peripherals that hold you back, so they're putting together a complete solution.

DeepYogurt4y ago

Nvidia's been making their own CPUs for a long time now. IIRC the first tegra was used in the Zune HD back in 2009. Hell they've even tried their hand at their own cpu core designs too.

https://www.anandtech.com/show/7621/nvidia-reveals-first-det...

https://www.anandtech.com/show/7622/nvidia-tegra-k1/2

015a4y ago

Maybe even more importantly: Tegra powers the Nintendo Switch.

2 more replies

my1234y ago

Tegra X2 and Xavier are still sold today and contain NVIDIA-designed CPU cores. The team behind those is building new designs too, I wonder when they’re going to announce something.

1 more reply

andrewstuart4y ago

This leads me to wonder about the microprocessor shortage.

So many computing devices such as Nvidia Jetson and Raspberry Pi are simply not available anywhere. I wonder what's he point of bringing out new products when existing products can't be purchased? Won't the new products also simply not be available?

Gigachad4y ago

The products don't get produced in order. The high value products get priority and continuously bump out low value chips like those on the RPI. Not sure what the cost of this Grace chip is but it looks to be targeting high value users so it gets priority. Notice how there is no shortage of chips for iPhones, because Apple just buys the capacity at whatever cost it takes.

arebop4y ago

Though, there is a shortage of m1 MacBooks. Is it really because they are low value (margin?) products relative to iPhone? I'm not sure.

3 more replies

mschuster914y ago

> Notice how there is no shortage of chips for iPhones, because Apple just buys the capacity at whatever cost it takes.

Apple bought out the entire capacity of TSMC's 3nm node [1]. I would not be surprised if the deal actually was for Apple to fund the construction of the fab in exchange for this level of priority.

[1] https://www.heise.de/news/Bericht-Apple-schnappt-sich-komple...

ekianjo4y ago

> The high value products get priority

So GPUs are not high priority? Because they are out of stock pretty much everywhere too.

2 more replies

ksec4y ago

>Won't the new products also simply not be available?

There are shortage in low end, high NM, mature node. This is on 4nm leading node.

zucker424y ago

Chip production is not completely fungible.

frozenport4y ago

What? They are sold out, not "can't be purchased".

aftbit4y ago

What's the difference? If they are perpetually sold out, then they cannot be purchased.

2 more replies

chemmail4y ago

Nvidia is fabless. THey dont make anything. They are primary R&D. This is the fruit.

ksec4y ago

This is interesting. So without actually targeting a specific Cloud / server market for their CPU, which often ends with a chicken and egg problem with HyperScaler making their own Design or Chip. Nvidia manage to enter the Server CPU market leveraging their GPU and AI workload.

All of a sudden there is real choice of ARM CPU on Server. ( What will happen to Ampere ? ) The LPDDR5X used here will also be the first to come with ECC. And they can cross sell those with Nvidia's ConnectX-7 SmartNICs.

Hopefully it will be price competitive.

Edit: Rather than downvoting may be explain why or what you disagree with ?

belval4y ago

AWS Graviton aren't toys, they work pretty well for a wide range of workloads

messe4y ago

I wonder if Apple also intends to introduce ECC LPDDR5 on the Mac Pro. Other than additional expansion, I’m struggling to see what else they can add to distinguish it from the Mac Studio.

simondotau4y ago

In order for an Apple Silicon Mac Pro to make any sense whatsoever, its SOC will need have to have support for off-package memory and substantially more PCI-E lanes than the M1 Ultra. Therefore it seems all but certain to me that it will debut the M2 chip family.

Apple isn't going to give up the substantial performance benefits of on-package unified memory in order to support DIMMs. Therefore I predict that we'll see a two-tier memory architecture with the OS making automated decisions based on memory pressure, as well as new APIs to allocate memory with a preference for capacity or performance.

The chassis design is new enough that it was designed with an eventual Apple Silicon Mac Pro in mind, so I expect to see minimal change to the exterior. It might shrink and have fewer slots (particularly since most users won't need a slotted GPU) though I think that's unlikely given that its height and width was defined by 5U rack dimensions.

2 more replies

MBCook4y ago

More cores and more RAM is really kind of it. I guess PCIe but I’m kind of wondering if they’ll do that.

1 more reply

luxuryballs4y ago

Reading this makes a veteran software developer want to become a scientific researcher.

melling4y ago

Way too late for me. I think adding machine learning to my toolbox at least gets me knowledgeable.

https://www.kaggle.com/

When Jensen talks about Transformers, I know what he’s talking about because I follow a lot of talented people.

https://www.kaggle.com/code/odins0n/jax-flax-tf-data-vision-...

chippiewill4y ago

> I know what he’s talking about

Robots in disguise?

bitwize4y ago

IKR? Imagine a Beowulf cluster of these...

stonogo4y ago

I don't think you'll have to imagine. It says on the box it's designed for HPC. and every supercomputer in the Top 500 has been a Beowulf cluster for years now.

wmf4y ago

We call it a "SuperPOD" now apparently.

1 more reply

nlh4y ago

Slashdot flashbacks from 2001! Well played. Well played.

donatj4y ago

Maybe it's just me, but it's just cool to see the CPU market competitive again for the first time since the late 90s.

andrewstuart4y ago

I wonder why Intel never had a really good go at GPU's? It seems strange, given the demand.

bduerst4y ago

Intel also announced a new GPU offering, supposed to drop in 8 days:

https://www.intel.com/content/www/us/en/architecture-and-tec...

https://en.wikipedia.org/wiki/Intel_Arc

tyrfing4y ago

Discrete GPUs have historically been a relatively small and volatile niche compared to CPUs, it's only in the last few years that the market has seen extreme growth.

edit: the market pretty much went from gaming as the primary pillar to gaming + HPC, which makes it far more attractive since you'd expect it to be much less cyclical and less price sensitive. Raja Koduri was hired in late 2017 to work on GPU related stuff, and it seems like the first major products from that effort will be coming out this year. That said, they've obviously had a lot of failures in the acelerator and graphics area (consider Altera) and Koduri has stated on Twitter that Gelsinger is the first CEO to actually treat graphics/HPC as a priority.

1 more reply

nine_k4y ago

Intel produced good, as in "cheap and always working", integrated GPUs. For great many tasks, they are adequate. I'm not a gamer, and if I needed to run some ML stuff, my laptop's potential discrete GPU won't be much help anyway.

2 more replies

_delirium4y ago

Besides integrated GPUs for actual graphics usage that other comments mentioned, Intel did make some attempts at the GPGPU market. They had a design for a GPU aimed primarily at GPGPU workloads, Larrabee, that was never released [1], and adapted some of the ideas into Xeon Phi, a more CPU-like chip that was intended to be a competitor to GPUs, which was released but didn't gain a lot of market share [2].

[1] https://en.wikipedia.org/wiki/Larrabee_(microarchitecture)

[2] https://en.wikipedia.org/wiki/Xeon_Phi

TheBigSalad4y ago

The space has two competitors, but NVidia makes most of the GPUs and most of the money. If there's barely room for a second player, there's no room for a third. That being said, they are releasing a GPU soon so we'll see how that goes. Unless the market continues to be insane I'm going to guess it won't go over very well.

sedatk4y ago

You're not alone.

20220322-beans4y ago

What are people's experience of developing with NVIDIA? I know what Linus thinks: https://www.youtube.com/watch?v=iYWzMvlj2RQ

jlokier4y ago

I had a laptop with NVIDIA GPU that crashed Xorg and had to be rebooted whenever Firefox opened WebGL. Just to complement the positive sibling comments :-)

neurostimulant4y ago

Are you using nvidia's driver or nouveau?

pjmlp4y ago

Linus might know his way around UNIX clones and SCM systems, however he doesn't do graphics.

NVidia tooling is the best among all GPU vendors.

CUDA has been polyglot since version 3.0, you get proper IDE and GPGPU debugging tools, and a plethora of libraries for most uses cases one could think of using a GPGPU for.

OpenCL did not fail only because of NVidia not caring, Intel and AMD have hardly done anything with it that could compete on the same tooling level.

dsign4y ago

I like CUDA, that stuff works and is rewarding to use. The only problem is the tons and tons of hoops one must jump to use it in servers. Because a server with a GPU is so expensive, you can't just rent one and have it running 24x7 if you don't have work for it to do, so you need a serverless or auto-scaling deployment. That increases your development workload. Then there is the matter of renting a server with GPU; that's still a bit of a specialty offering. Until the other day, even major cloud providers (i.e. AWS and Google) offered GPUs only in certain datacenters.

ceeplusplus4y ago

Luckily, you can run CUDA code on even a cheap GTX 1050, so you can test locally and run the full size job on a big V100/A100/H100 system.

pzduniak4y ago

I had an Ubuntu 18.04 install that "randomly" started dying (freezing) with my GTX1080 at some point. Pinpointed it to the combination of that GPU + Linux. I didn't want to bother with reconfiguring my WC loop / buying an expensive GPU, so I just gave up and switched to a perfectly stable Windows + WSL.

nl4y ago

Nvidia's AI APIs are well documented and supported. That's why everyone uses them.

dekhn4y ago

over the past two decades that I've used nvidia products for opengl and other related things, my experince has been largely positive although I find installing both the dev packages and the runtimes I need to be cumbersome.

donkeydoug4y ago

soooo... would something like this be a viable option for a non-mac desktop similar to the 'mac studio' ? def seems targeted at the cloud vendors and large labs... but it'd be great to have a box like that which could run linux.

opencl4y ago

It's viable in the sense that you can just stick a server motherboard inside of a desktop case. It certainly won't be cheap though.

This has been done as a commercial product with the Ampere ARM server chips. The base model is about $8k.

https://store.avantek.co.uk/arm-desktops.html

my1234y ago

It’s a server CPU that runs any OS really (Arm SystemReady with UEFI and ACPI).

However, the price tag will be too high for a lot of desktop buyers.

(There are smaller Tegras around though)

oneplane4y ago

It probably won't run Windows. But other operating systems, probably yes. Maybe Microsoft comes up with some sort of Windows Server DC Arm edition in the future so they can join in as well.

1 more reply

wmf4y ago

Nvidia Orin would be a better fit for an ARM desktop/laptop but Nvidia seemingly isn't interested in that market.

fulafel4y ago

As long as your application workload is a good match for the 144 ARM cores.

marcodiego4y ago

Time to sell intel shares?

bloodyplonker224y ago

That time was years and years ago. If you're just thinking about it now, you're already in a world of pain.

namlem4y ago

Intel stock is up 37% from 5 years ago. Though this past year they took quite a beating.

1 more reply

kcb4y ago

Given how larger non-mobile chips are jumping to the LPDDR standard what is the point of having a separate DDR standard? Is there something about LPDDR5 that makes upgradable dimms not possible?

wmf4y ago

AFAIK the higher speed of LPDDR is directly because it avoids signal degradation caused by DIMM connectors.

monocasa4y ago

> Is there something about LPDDR5 that makes upgradable dimms not possible?

It's theoretically possible, but there's no standard for it.

didip4y ago

heh, does Intel have any chance to catch up? They fell so far behind.

qbasic_forever4y ago

I really don't see what they can do. It seems like in the last year they pivoted hard into "ok we'll build chips in the US again!", but it's going to be years and years before any of that pays off or even materializes. The only announcements I've heard from them are just regular "Here's the CEO of Intel telling us how he's going to fix Intel" PR blurbs and nothing else. Best case maybe they just position themselves to be bought by Nvidia...

hughrr4y ago

No. Intel worked out it needs to open its production capacity to other vendors. They will end up another ARM fab with a legacy x86-64 business strapped on the side. That's probably not a bad place to be really. I think x86-64 will fizzle out in about a decade.

astrange4y ago

I don't feel like ARM has serious technical advantages over x86-64 as an ISA, although it is cleaner and has more security features which is good. Isn't the main advantage just that it's easier to license ARM?

Once enough patents expire all ISAs are eventually equal, I'd think.

2 more replies

kllrnohj4y ago

Of course they have a chance to catch up. Only a fool would count Intel down & out. Intel is still larger by revenue than AMD, NVidia, and ARM combined.

This will probably cost them some market share, but they have plenty of cash to weather there current manufacturing issues, they still have world-class CPU design talent which they've proven over and over and over again, and they have some very interesting products & technologies on the roadmap.

ARM offering a fight for the first time ever is not going to be a 1-hit KO against the goliath that is Intel.

kats4y ago

Intel will never catch up because Arm's business model is much better. Intel is not competing with Arm, they're competing with every large tech company, who are all sharing many design costs via Arm and mostly sharing manufacuring costs via TSMC.

Arm has a much more efficient and also much less profitable business model, and Intel will never catch up unless they adopt it. They'll never do that so they'll fade away like IBM.

ganbatekudasai4y ago

The CPUs are designed and made by Apple, the ISA is licensed from ARM. Those are not like ARM Cortex CPUs that are actually designed by ARM.

1 more reply

wmf4y ago

There are some hints that they are redesigning some server processors to double core count but that may not be visible for 2-3 years. Also keep in mind that Intel has 75% server market share and is only losing ~5 points per year.

BirAdam4y ago

There is a very good chance that Intel will catch up. They have money, they have capacity, and from what I understand they still have several more designs researched and those will enter production over the next few years. They are also working on RISC-V stuff (AMD is too).

valine4y ago

Anyone have a sense for how much these will cost? Is this more akin to the Mac Studio that costs 4k or an A100 gpu that costs upward of 30k? Looking for an order of magnitude.

Hamuko4y ago

Considering that the URL is "/data-center/grace-cpu/", assume much more than a Mac Studio.

wmf4y ago

Compare a 72C Grace against an 80C Ampere Altra which is priced at $4K (without RAM).

oofbey4y ago

The top-end datacenter GPUs have been slowly creeping up from $5k a few generations back to about $15k for the A100's now. So this one will probably continue the trend, probably to $20k or maybe $30k but probably not beyond that.

naikrovek4y ago

This is definitely not a consumer-grade device, like a Mac Studio.

IshKebab4y ago

Probably on the order of $100k.

valine4y ago

That would be a real shame. I really want someone to make a high core count ARM processor in the price range of an AMD threadripper that can work with Nvidia gpus.

2 more replies

t0mas884y ago

How likely is it that one of AWS / GCP / Azure will deploy these? Nvidia has some relationships there for the A100 chips.

qbasic_forever4y ago

Amazon has at least two generations of their own homebrew ARM chip, the Graviton. They offer it for people to rent and use in AWS, and publicly stated they are rapidly transitioning their internal services to use it too. In my experience Graviton 2 is much cheaper than x86 for typical web workloads--I've seen costs cut by 20-40% with it.

devmunchies4y ago

> their own homebrew ARM chip

are they going through TSMC like NVIDIA or are they using Samsung?

ksec4y ago

AWS has their own CPU. Microsoft is an investor in Ampere, but I am not sure if they will make one themselves or simply buy from Ampere. Google has responded with faster x86 instances, still no hint of their own ARM CPU. But judging from the past I dont think they are going to go with Nvidia.

That is only the CPU though, they might deploy it as Grace + Hopper config.

ciphol4y ago

With names like that, I assume that was the intention

lmeyerov4y ago

AWS+Azure (and I believe GCP) installed prev advances, and are having huge GPU shortages in general... so probably!

An interesting angle here is these support partitioning even better than in the A100's. AFAICT, the cloud vendors are not yet providing partitioned access, so everyone just exhausts worldwide g4dn capacity for smaller jobs / devs / etc. But partitioning can solve that...

KaoruAoiShiho4y ago

Pretty sure they all will, they all already have the past gens of these things and it's a simple upgrade.

GIFtheory4y ago

Interesting that this has 7x the cores of a M1 Ultra, but only 25% more memory bandwidth. Those will be some thirsty cores!

ZetaZero4y ago

M1 Ultra bandwidth is for CPU and GPU (800GB/s). Grace is just the CPU. Hopper, the GPU, has it's own memory and bandwidth (3 TB/sec).

my1234y ago

https://twitter.com/benbajarin/status/1506296302971334664?s=...

396MB of on-chip cache… (198MB per die)

That’s a significant part of it too.

wmf4y ago

The M1 memory bandwidth is mostly for the GPU but Grace does not include an (on-chip) GPU.

Teknoman1174y ago

The CPU complex on the M1 series doesn't have anything close to the full bandwidth to memory that the SoC has (like, half). The only thing that can drive the full bandwidth is the GPU.

userbinator4y ago

Who bets that the amount of detailed information they'll officially[1] release about it is "none" or close to that? I still think of Torvalds' classic video whenever I hear about nVidia. The last thing the world needs is more proprietary crap that's probably destined to become un-reusable e-waste in less than a decade.

[1]https://news.ycombinator.com/item?id=30550028

rsynnott4y ago

> NVIDIA Grace Hopper Superchip

Finally, a computer optimised for COBOL.

Symmetry4y ago

So they're adding decimal floating point this generation too then ;)

frob4y ago

You win the internet today

bullen4y ago

I think we're all missing the forest because all the cores are in the way:

The contention on that memory means that only segregated non-cooporative as in not "joint parallel on the same memory atomic" will scale on this hardware better than on a 4-core vanilla Xeon from 2018 per watt.

So you might aswell buy 20 Jetson Nanos and connect them over the network.

Let that sink in... NOTHING is improving at all... there is ZERO point to any hardware that CAN be released for eternity at this point.

Time to learn JavaSE and roll up those sleves... electricity prices are never coming down (in real terms) no matter how high the interest rate.

As for GPUs, I'm calling it now: nothing will dethrone the 1030 in Gflops/W in general and below 30W in particular; DDR4 or DDR5, doesn't matter.

Memory is the latency bottleneck since DDR3.

Please respect the comment on downvote principle. Otherwise you don't really exist; in a quantum physical way anyway.

ribit4y ago

1030 has been dethroned a while ago. Apple G13 delivers 260GFLOPS/W in a general-purpose GPU. I mean, their phone has more GPU FLOPS than a 1030.

bullen4y ago

Nope, 1030 has 37 Gflops/W... G13 786/20W = 40... and that's 14nm vs 5nm... still I'm pretty sure there are things the 1030 can do that the A13 will struggle with.

Game Over!

2 more replies

BirAdam4y ago

I wouldn’t say it’s all over. People have been saying that it’s all over for longer than I can remember, and there is always someone smarter and more clever. The GPU space is ripe for disruption, the memory space is ripe for disruption, and the CPU space is being disrupted presently. For all I know, some genius has it worked out now and is going to launch a new startup sometime this month.

cma4y ago

Aren't you are ignoring use cases where all cores read shared data, but rarely contentiously write to it. You should get much more read bandwidth and latency than over a network.

bullen4y ago

Sure, but my point is: why cram more and more cores into the same SoC if they can't talk to each other more efficiently than separate computers over ethernet?

3 more replies

simulate-me4y ago

Performance per watt isn’t so useful for a GPU. People training ML algorithms would gladly increase power consumption if they could train larger models or train models faster.

bullen4y ago

And that's exactly my point: they can't. Power does not solve contention and latency! It's over, permanently... (or atleast until some photon/quantum alternative, which honestly we don't have the energy to imagine, let alone manufacture, anymore)

cjensen4y ago

"Grace?"

After 13 microarchitectures given the last names of historical figures, it's really weird to use someone's first name. Interesting that Anandtech and Wikipedia are both calling it Hopper. What on Earth are the marketing bros thinking?

thereddaikon4y ago

The GPU is Hopper, which is in line with their naming scheme up till now. The CPU is call Grace. Clearly they are planning to continue the tradition of naming their architectures after famous scientists and the CPUs will take on the first name while the GPU will continue to use last.

So expect a future Einstein GPU to come with a matching Albert CPU.

fay594y ago

They also made the “Hopper” architecture to complement it.

j / k navigate · click thread line to collapse

185 comments

oofbey4y ago

DeepYogurt4y ago

Nvidia's been making their own CPUs for a long time now. IIRC the first tegra was used in the Zune HD back in 2009. Hell they've even tried their hand at their own cpu core designs too.

https://www.anandtech.com/show/7621/nvidia-reveals-first-det...

https://www.anandtech.com/show/7622/nvidia-tegra-k1/2

015a4y ago

Maybe even more importantly: Tegra powers the Nintendo Switch.

2 more replies

my1234y ago

Tegra X2 and Xavier are still sold today and contain NVIDIA-designed CPU cores. The team behind those is building new designs too, I wonder when they’re going to announce something.

1 more reply

andrewstuart4y ago

This leads me to wonder about the microprocessor shortage.

Gigachad4y ago

arebop4y ago

Though, there is a shortage of m1 MacBooks. Is it really because they are low value (margin?) products relative to iPhone? I'm not sure.

3 more replies

mschuster914y ago

> Notice how there is no shortage of chips for iPhones, because Apple just buys the capacity at whatever cost it takes.

Apple bought out the entire capacity of TSMC's 3nm node [1]. I would not be surprised if the deal actually was for Apple to fund the construction of the fab in exchange for this level of priority.

[1] https://www.heise.de/news/Bericht-Apple-schnappt-sich-komple...

ekianjo4y ago

> The high value products get priority

So GPUs are not high priority? Because they are out of stock pretty much everywhere too.

2 more replies

ksec4y ago

>Won't the new products also simply not be available?

There are shortage in low end, high NM, mature node. This is on 4nm leading node.

zucker424y ago

Chip production is not completely fungible.

frozenport4y ago

What? They are sold out, not "can't be purchased".

aftbit4y ago

What's the difference? If they are perpetually sold out, then they cannot be purchased.

2 more replies

chemmail4y ago

Nvidia is fabless. THey dont make anything. They are primary R&D. This is the fruit.

ksec4y ago

Hopefully it will be price competitive.

Edit: Rather than downvoting may be explain why or what you disagree with ?

belval4y ago

AWS Graviton aren't toys, they work pretty well for a wide range of workloads

messe4y ago

I wonder if Apple also intends to introduce ECC LPDDR5 on the Mac Pro. Other than additional expansion, I’m struggling to see what else they can add to distinguish it from the Mac Studio.

simondotau4y ago

2 more replies

MBCook4y ago

More cores and more RAM is really kind of it. I guess PCIe but I’m kind of wondering if they’ll do that.

1 more reply

luxuryballs4y ago

Reading this makes a veteran software developer want to become a scientific researcher.

melling4y ago

Way too late for me. I think adding machine learning to my toolbox at least gets me knowledgeable.

https://www.kaggle.com/

When Jensen talks about Transformers, I know what he’s talking about because I follow a lot of talented people.

https://www.kaggle.com/code/odins0n/jax-flax-tf-data-vision-...

chippiewill4y ago

> I know what he’s talking about

Robots in disguise?

bitwize4y ago

IKR? Imagine a Beowulf cluster of these...

stonogo4y ago

I don't think you'll have to imagine. It says on the box it's designed for HPC. and every supercomputer in the Top 500 has been a Beowulf cluster for years now.

wmf4y ago

We call it a "SuperPOD" now apparently.

1 more reply

nlh4y ago

Slashdot flashbacks from 2001! Well played. Well played.

donatj4y ago

Maybe it's just me, but it's just cool to see the CPU market competitive again for the first time since the late 90s.

andrewstuart4y ago

I wonder why Intel never had a really good go at GPU's? It seems strange, given the demand.

bduerst4y ago

Intel also announced a new GPU offering, supposed to drop in 8 days:

https://www.intel.com/content/www/us/en/architecture-and-tec...

https://en.wikipedia.org/wiki/Intel_Arc

tyrfing4y ago

Discrete GPUs have historically been a relatively small and volatile niche compared to CPUs, it's only in the last few years that the market has seen extreme growth.

1 more reply

nine_k4y ago

2 more replies

_delirium4y ago

[1] https://en.wikipedia.org/wiki/Larrabee_(microarchitecture)

[2] https://en.wikipedia.org/wiki/Xeon_Phi

TheBigSalad4y ago

sedatk4y ago

You're not alone.

20220322-beans4y ago

What are people's experience of developing with NVIDIA? I know what Linus thinks: https://www.youtube.com/watch?v=iYWzMvlj2RQ

jlokier4y ago

I had a laptop with NVIDIA GPU that crashed Xorg and had to be rebooted whenever Firefox opened WebGL. Just to complement the positive sibling comments :-)

neurostimulant4y ago

Are you using nvidia's driver or nouveau?

pjmlp4y ago

Linus might know his way around UNIX clones and SCM systems, however he doesn't do graphics.

NVidia tooling is the best among all GPU vendors.

CUDA has been polyglot since version 3.0, you get proper IDE and GPGPU debugging tools, and a plethora of libraries for most uses cases one could think of using a GPGPU for.

OpenCL did not fail only because of NVidia not caring, Intel and AMD have hardly done anything with it that could compete on the same tooling level.

dsign4y ago

ceeplusplus4y ago

Luckily, you can run CUDA code on even a cheap GTX 1050, so you can test locally and run the full size job on a big V100/A100/H100 system.

pzduniak4y ago

nl4y ago

Nvidia's AI APIs are well documented and supported. That's why everyone uses them.

dekhn4y ago

donkeydoug4y ago

opencl4y ago

It's viable in the sense that you can just stick a server motherboard inside of a desktop case. It certainly won't be cheap though.

This has been done as a commercial product with the Ampere ARM server chips. The base model is about $8k.

https://store.avantek.co.uk/arm-desktops.html

my1234y ago

It’s a server CPU that runs any OS really (Arm SystemReady with UEFI and ACPI).

However, the price tag will be too high for a lot of desktop buyers.

(There are smaller Tegras around though)

oneplane4y ago

It probably won't run Windows. But other operating systems, probably yes. Maybe Microsoft comes up with some sort of Windows Server DC Arm edition in the future so they can join in as well.

1 more reply

wmf4y ago

Nvidia Orin would be a better fit for an ARM desktop/laptop but Nvidia seemingly isn't interested in that market.

fulafel4y ago

As long as your application workload is a good match for the 144 ARM cores.

marcodiego4y ago

Time to sell intel shares?

bloodyplonker224y ago

That time was years and years ago. If you're just thinking about it now, you're already in a world of pain.

namlem4y ago

Intel stock is up 37% from 5 years ago. Though this past year they took quite a beating.

1 more reply

kcb4y ago

Given how larger non-mobile chips are jumping to the LPDDR standard what is the point of having a separate DDR standard? Is there something about LPDDR5 that makes upgradable dimms not possible?

wmf4y ago

AFAIK the higher speed of LPDDR is directly because it avoids signal degradation caused by DIMM connectors.

monocasa4y ago

> Is there something about LPDDR5 that makes upgradable dimms not possible?

It's theoretically possible, but there's no standard for it.

didip4y ago

heh, does Intel have any chance to catch up? They fell so far behind.

qbasic_forever4y ago

hughrr4y ago

astrange4y ago

Once enough patents expire all ISAs are eventually equal, I'd think.

2 more replies

kllrnohj4y ago

Of course they have a chance to catch up. Only a fool would count Intel down & out. Intel is still larger by revenue than AMD, NVidia, and ARM combined.

ARM offering a fight for the first time ever is not going to be a 1-hit KO against the goliath that is Intel.

kats4y ago

Arm has a much more efficient and also much less profitable business model, and Intel will never catch up unless they adopt it. They'll never do that so they'll fade away like IBM.

ganbatekudasai4y ago

The CPUs are designed and made by Apple, the ISA is licensed from ARM. Those are not like ARM Cortex CPUs that are actually designed by ARM.

1 more reply

wmf4y ago

BirAdam4y ago

valine4y ago

Anyone have a sense for how much these will cost? Is this more akin to the Mac Studio that costs 4k or an A100 gpu that costs upward of 30k? Looking for an order of magnitude.

Hamuko4y ago

Considering that the URL is "/data-center/grace-cpu/", assume much more than a Mac Studio.

wmf4y ago

Compare a 72C Grace against an 80C Ampere Altra which is priced at $4K (without RAM).

oofbey4y ago

naikrovek4y ago

This is definitely not a consumer-grade device, like a Mac Studio.

IshKebab4y ago

Probably on the order of $100k.

valine4y ago

That would be a real shame. I really want someone to make a high core count ARM processor in the price range of an AMD threadripper that can work with Nvidia gpus.

2 more replies

t0mas884y ago

How likely is it that one of AWS / GCP / Azure will deploy these? Nvidia has some relationships there for the A100 chips.

qbasic_forever4y ago

devmunchies4y ago

> their own homebrew ARM chip

are they going through TSMC like NVIDIA or are they using Samsung?

ksec4y ago

That is only the CPU though, they might deploy it as Grace + Hopper config.

ciphol4y ago

With names like that, I assume that was the intention

lmeyerov4y ago

AWS+Azure (and I believe GCP) installed prev advances, and are having huge GPU shortages in general... so probably!

KaoruAoiShiho4y ago

Pretty sure they all will, they all already have the past gens of these things and it's a simple upgrade.

GIFtheory4y ago

Interesting that this has 7x the cores of a M1 Ultra, but only 25% more memory bandwidth. Those will be some thirsty cores!

ZetaZero4y ago

M1 Ultra bandwidth is for CPU and GPU (800GB/s). Grace is just the CPU. Hopper, the GPU, has it's own memory and bandwidth (3 TB/sec).

my1234y ago

https://twitter.com/benbajarin/status/1506296302971334664?s=...

396MB of on-chip cache… (198MB per die)

That’s a significant part of it too.

wmf4y ago

The M1 memory bandwidth is mostly for the GPU but Grace does not include an (on-chip) GPU.

Teknoman1174y ago

The CPU complex on the M1 series doesn't have anything close to the full bandwidth to memory that the SoC has (like, half). The only thing that can drive the full bandwidth is the GPU.

userbinator4y ago

[1]https://news.ycombinator.com/item?id=30550028

rsynnott4y ago

> NVIDIA Grace Hopper Superchip

Finally, a computer optimised for COBOL.

Symmetry4y ago

So they're adding decimal floating point this generation too then ;)

frob4y ago

You win the internet today

bullen4y ago

I think we're all missing the forest because all the cores are in the way:

So you might aswell buy 20 Jetson Nanos and connect them over the network.

Let that sink in... NOTHING is improving at all... there is ZERO point to any hardware that CAN be released for eternity at this point.

Time to learn JavaSE and roll up those sleves... electricity prices are never coming down (in real terms) no matter how high the interest rate.

As for GPUs, I'm calling it now: nothing will dethrone the 1030 in Gflops/W in general and below 30W in particular; DDR4 or DDR5, doesn't matter.

Memory is the latency bottleneck since DDR3.

Please respect the comment on downvote principle. Otherwise you don't really exist; in a quantum physical way anyway.

ribit4y ago

1030 has been dethroned a while ago. Apple G13 delivers 260GFLOPS/W in a general-purpose GPU. I mean, their phone has more GPU FLOPS than a 1030.

bullen4y ago

Nope, 1030 has 37 Gflops/W... G13 786/20W = 40... and that's 14nm vs 5nm... still I'm pretty sure there are things the 1030 can do that the A13 will struggle with.

Game Over!

2 more replies

BirAdam4y ago

cma4y ago

Aren't you are ignoring use cases where all cores read shared data, but rarely contentiously write to it. You should get much more read bandwidth and latency than over a network.

bullen4y ago

Sure, but my point is: why cram more and more cores into the same SoC if they can't talk to each other more efficiently than separate computers over ethernet?

3 more replies

simulate-me4y ago

Performance per watt isn’t so useful for a GPU. People training ML algorithms would gladly increase power consumption if they could train larger models or train models faster.

bullen4y ago

cjensen4y ago

"Grace?"

thereddaikon4y ago

So expect a future Einstein GPU to come with a matching Albert CPU.

fay594y ago

They also made the “Hopper” architecture to complement it.

j / k navigate · click thread line to collapse