Nvidia Launches Vera CPU, Purpose-Built for Agentic AI (opens in new tab)

(nvidianews.nvidia.com)

179 pointslewismenelaws2mo ago101 comments

101 comments

Agentic AI CPU? No.

It’s a CPU designed for an AI cluster. Their last CPU Grace was the same thing and no one called it agentic.

Vera now just has more performance/more bandwidth. It’s cool, I’d like to have one of these clusters, but this is not new.

It’s marketed as agentic AI because that’s fashionable in 2026.

storus2mo ago

They significantly lowered latency compared to EPYC/Xeon, which is critical for streaming agents (e.g. text/audio/video agents).

stingraycharles2mo ago

What latency? How much is it compared to LLM inference speed?

1 more reply

PeterCorless2mo ago

This is the related benchmark blog from Redpanda [disclosure: I work for Redpanda and I helped write this. Credit to Travis Downs & others at Redpanda for the heavy lifting on the testing and analysis.]

https://www.redpanda.com/blog/nvidia-vera-cpu-performance-be...

jauntywundrkind2mo ago

Given the price of these systems the ridiculously expensive network cards isn't such a huge huge deal, but I can't help but wonder at the absurdly amazing bandwidth hanging off Vera, the amazing brags about "7x more bandwidth than pcie gen 6" (amazing), but then having to go to pcie to network to chat with anyone else. It might be 800Gbe but it's still so many hops, pcie is weighty.

I keep expecting we see fabric gains, see something where the host chip has a better way to talk to other host chips.

It's hard to deny the advantages of central switching as something easy & effective to build, but reciprocally the amazing high radix systems Google has been building have just been amazing. Microsoft Mia 200 did a gobsmacking amount of Ethernet on chip 2.8Tbps, but it's still feels so little, like such a bare start. For reference pcie6 x16 is a bit shy of 1Tbps, vaguely ~45 ish lanes of that.

It will be interesting to see what other bandwidth massive workloads evolve over time. Or if this throughout era all really ends up serving AI alone. Hoping CXL or someone else slims down the overhead and latency of attachment, soon-ish.

Maia 200: https://www.techpowerup.com/345639/microsoft-introduces-its-...

bob10292mo ago

> It might be 800Gbe but it's still so many hops, pcie is weighty.

Once you need to reach beyond L2/L3 it is often the case that perfectly viable experiments cannot be executed in reasonable timeframes anymore. The current machine learning paradigm isn't that latency sensitive, but there are other paradigms that can't be parallelized in the same way and are very sensitive to latency.

babelfish2mo ago

Most of the big AI/HPC clusters these systems are aimed at aren’t running regular PCIe Ethernet between nodes, they’re usually wired up with InfiniBand fabrics (HDR/NDR now, XDR soon)

tryauuum2mo ago

Infiniband cards are connected to the rest of the machine with the PCIe as well

baal80spam2mo ago

Say what you want about NVIDIA (to me they are just doing what every company would do in their place), but they create engineering marvels.

1 more reply

d_silin2mo ago

It is a 88-core ARM v9 chip, for somewhat more detailed spec.

PeterCorless2mo ago

Vera does what NVIDIA calls Spatial Multithreading, "physically partitioning each core’s resources rather than time slicing them, allowing the system to optimize for performance or density at runtime." A kind of static hyperthreading; you get two threads per core.

It's somewhat different from how x86 chips do simultaneous multithreading (SMT),

fulafel2mo ago

Seems like curious terminology from NV. In estabilished use, SMT means executing instructions from several cpu threads concurrently in the OOO CPU's execution units so they are not starved from work, whereas timeslicing conventionally means context switching between threads/processes, alternating temporally.

In operating systems timeslicing means giving a quantum of execution time to each process, and context switching between processes. Not normally a term used in computer architecture but possibly the characterisation would fit a barrer processor rather than SMT.

mixmastamyk2mo ago

Hmm, the 128-core Ampere Altra CPU is already available, and in a case from System76. I wonder what else differentiates it.

If they're going to build CPUs I wish they had used Risc-V instead. They are using it somewhat already.

OneDeuxTriSeiGo2mo ago

You can see here[1] what the specs are for the CPU (listed as "NVIDIA Vera Rubin Superchip").

The CPU is integrated with two Rubin GPUs but each of the CPU cores has dedicated FP8 acceleration as well.

1. https://www.nvidia.com/en-us/data-center/vera-rubin-nvl72/

ibgeek2mo ago

I own one of these systems. My interpretation is the Ampere systems are targeted at lower cost scale out. The Ampere Altra CPUs are limited to DDR4. The raw single core performance doesn’t match Intel or AMD offerings. You get a lot of cores for a lower hardware cost and at lower energy usage.

The Nvidia CPUs are designed for a very specific use case. They are designed for high performance with less concern about cost control.

The newer AmpereOne CPUs use DDR5 with the AmpereOne M supporting even higher memory bandwidth. Even then, I doubt the AmpereOne CPUs will match the performance of the Nvidia Rubin CPUs. But the Ampere processors are available for general use. I am guessing that Nvidia is only going to sell the complete rack system and only to high-volume customers.

gcanyon2mo ago

Anyone know how this compares to Apple’s M5 chips? Or is that comparison <takes off sunglasses> apples to oranges.

pdpi2mo ago

Features like hardware FP8 support definitely make it apples-to-oranges.

philjohn2mo ago

But doesn't the Apple M series NPU support FP8, and as it's a monolithic die (except for the GPU in the M5 Pro and Max) it could be argued it has hardware FP8 support, no?

3 more replies

pjmlp2mo ago

It doesn't matter, because you will never find M5 chips on cloud offerings, or server racks.

It is kind of rediculous that the only server option with Apple hardware has been to stack up mac minis.

They got rid of the server and workstation market, focusing on consumers only.

storus2mo ago

Grace GB10, Vera's predecessor, had a single core performance comparable to M3 so I guess we can expect at least M4 level performance now.

porphyra2mo ago

Isn't the GB10 a Mediatek chip and not directly related to the Grace datacenter CPU?

2 more replies

d_silin2mo ago

M5 are 9-18 cores and optimized for power-efficiency, those are more like Xeons, with 200-300W TDP, I'd bet.

kllrnohj2mo ago

If M5 has 9-18 cores and takes ~20w, then that's ~1-2w per CPU core. If these are 200-300W, and have ~100-200 CPU cores, then guess what? That's also ~1-2w per CPU core.

Xeons, Epycs, whatever this is - they are all also typically optimized for power efficiency. That's how they can fit so many CPU cores in 200-300W.

tencentshill2mo ago

So does this cut out Intel/x86 from all the massive new datacenter buildouts entirely? They've already lost Apple as a customer and are not competitive in the consumer space. I don't see how they can realistically grow at all with x86.

alecco2mo ago

Even Apple hardware looks inexpensive compared to Nvidia's huge premium. And never mind the order backlog.

x86 and Apple already sell CPUs with integrated memory and high bandwidth interconnects. And I bet eventually Intel's beancounter board will wake up and allow engineering to make one, too.

But competition is good for the market.

bigyabai2mo ago

Even with those advantages, Apple can't even sell datacenter hardware to themselves: https://9to5mac.com/2026/03/02/some-apple-ai-servers-are-rep...

1 more reply

storus2mo ago

Apple went from a high-end PC to a low-end AI provider due to blocking Nvidia on their platform.

1 more reply

mikrl2mo ago

>are not competitive in the consumer space

AFAIK they still dominate on clock rate, which I was surprised to see when doing some back of the envelope calculations regarding core counts.

I felt my 8 core i9 9900K was inadequate, so shopped around for something AMD, and IIRC the core multiplier of the chip I found was dominated by the clock rate multiplier so it’s possible that at full utilization my i9 is still towards the best I can get at the price.

Not sure if I’m the typical consumer in this case however.

kllrnohj2mo ago

Your 9900k at 5ghz does work slower than a Ryzen 9800X3D at 5ghz. A lot slower (1700 single core geekbench vs 3300, and just about any benchmark will tell the same story). Clock speed alone doesn't mean anything.

1 more reply

wmf2mo ago

A 9700X is twice the performance of a 9900K and M5 Max is almost 3X the performance. The megahertz myth is a myth.

1 more reply

RantyDave2mo ago

Ahhh, so is this a chip "more optimised" for connecting GPU's to reality ... or are they skipping the GPU step entirely? Are GPU's only for training now?

cyanydeez2mo ago

have you seen this: https://chatjimmy.ai/

It's quite impressive what purpose build inference can/will do once everyone stops trying to become kind of the best model.

redwood2mo ago

Wow impressive. What's the story with this?

2 more replies

dmitrygr2mo ago

> Purpose-Built for Agentic AI

From the "fridge purpose-built for storing only yellow tomatoes" and "car only built for people whose last name contains the letter W" series.

When can this insanity end? It is a completely normal garden-variety ARM SoC, it'll run Linux, same as every other ARM SoC does. It is as related to "Agentic $whatever" as your toaster is related to it

pdpi2mo ago

> It is as related to "Agentic $whatever" as your toaster is related to it

These things have hardware FP8 support, and a 1.8TB/s full mesh interconnect between CPUs and GPUs. We can argue about the "agentic" bit, but those are features that don't really matter for any workload other than AI.

pezezin2mo ago

The huge interconnect would also useful be for HPC tasks. The FP8 not so much, HPC still loves FP64.

dmitrygr2mo ago

mem bw between cores matters for .... literally all workloads that are not single-core (read: all). And FP8 matters not at all cause inference on cpu is too slow to be of any use whatsoever in the days of proper accelerators

kibibu2mo ago

Would cloud gaming platforms benefit from the interconnect?

1 more reply

dpe822mo ago

The power and importance of marketing is deeply underappreciated by us technical types.

LogicFailsMe2mo ago

And yet more than a little Gavin Belson "Box III" vibes here. Fortunately, no signature edition.

dwb2mo ago

I don’t underappreciate it, but I do despise it.

pwg2mo ago

> It is a completely normal garden-variety ARM SoC

To mis-quote the politician quip:

How can you tell a marketer is lying?

Answer: His/her mouth is moving.

rka1282mo ago

"democratize access to AI and accelerating innovation."

So they make inference cheaper and the models get even worse. Or Jensen Huang has AI psychosis. Or both.

Here is a new business idea for Nvidia: Give me $3000 in a circular deal which I will then spend on a graphics card.

kwertyoowiyop2mo ago

Me too plz. To quote (more or less) Harvey Pekar: “I’m trying to sell out, but nobody’s buying!”

rishabhaiover2mo ago

I'm assuming this is for tool call and orchestration. I didn't know we needed higher exploitable parallelism from the hardware, we had software bottlenecks (you're not running 10,000 agents concurrently or downstream tool calls)

Can someone explain what is Vera CPU doing that a traditional CPU doesn't?

kibibu2mo ago

> you're not running 10,000 agents concurrently or downstream tool calls

Cursor seem to be doing exactly that though

urig2mo ago

Lots and lots of CPUs pooled. Faster more efficient power RAM accessible to both GPU and CPU. IIUC.

rishabhaiover2mo ago

But at what stage are we asking for that RAM? if it's the inference stage then doesn't that belong to the GPU<>Memory which has nothing to do with the CPU?

I did see they have the unified CPU/GPU memory which may reduce the cost of host/kernel transactions especially now that we're probably lifting more and more memory with longer context tasks.

recvonline2mo ago

Does this mean their gaming GPUs are becoming less in demand, and therefore cheaper/more available again?

Teknoman1172mo ago

Absolutely not, unfortunately.

The problem is not that gaming GPUs are in demand, it’s that selling silicon to AI center buildouts is so absurdly profitable right now they just aren’t making many gaming GPUs.

If you can only get so many mm^2 of dies from TSMC, might as well make 50x selling to AI providers.

pjmlp2mo ago

Check the GTC 2026 agenda, there are hardly any graphics programming talks.

At least there are a few cool ones about programming CUDA directly in Python.

1 more reply

TheRoque2mo ago

It means it will be profitable to mine crypto again

wmf2mo ago

No.

yalogin2mo ago

This is yet not the grok acquisition, so there is another update coming with that claiming more improvements?

nilstycho2mo ago

https://developer.nvidia.com/blog/inside-nvidia-groq-3-lpx-t...

ksec2mo ago

The most interesting part is that Nvidia intend to sell this CPU separately, meaning you dont need to buy Nvidia GPU to use it.

Other than Hyperscaler ARM has yet to enter the server market and it might well be Nvidia that makes a different.

kibibu2mo ago

Am I crazy, or is Jensen's statement a copy-paste from ChatGPT?

(Could be both)

wmf2mo ago

If AI is so great why should he not use it?

magackame2mo ago

Should work on building the AI Jensen. Maybe it's already the AI Jensen

akomtu2mo ago

They should've called it Vega: https://doom.fandom.com/wiki/VEGA

pohuing2mo ago

Perhaps, but consider the existence of the AMD Vega GPU line https://en.wikipedia.org/wiki/Radeon_RX_Vega_series

FridgeSeal2mo ago

Are we rapidly careening towards a world where _only_ AI “computing” is possible?

Wanted to do general purpose stuff? Too bad, we watched the price of everything up, and then started producing only chips designed to run “ai” workloads.

Oh you wanted a local machine? Too bad, we priced you out, but you can rent time with an ai!

Feels like another ratchet on the “war on general purpose computing” but from a rather different direction.

simulator5g2mo ago

The World's First Central Sloppressing Unit

_s_a_m_2mo ago

what a bizarre title

dude2507112mo ago

A GPU purpose-built for Slop.

jal5052mo ago

I think you're right - the Tauri vs Electron comparison isn't quite the same scale of difference.

Both still run web tech in a wrapper, just with different performance characteristics. The local-first vs cloud distinction is more fundamental, especially for tools that interact with platforms like LinkedIn.

When I built ZenMode, the core insight was that LinkedIn can easily detect automation coming from AWS/datacenter IPs, but when your desktop app uses your actual Chrome browser and home IP, it's indistinguishable from manual usage.

That's why we went with an Electron/Puppeteer architecture running locally rather than yet another cloud service. Check it out at https//zen-mode.io if you're curious about the local execution model.

urig2mo ago

What the heck is agentic inference and how is it supposed to be different from LLM inference? That's a rhetorical question. Screw marketing and screw hype.

BoredPositron2mo ago

Who wants general computing anyways?

KnuthIsGod2mo ago

China will beat this....

Seems like a triumph of hype over reality.

China can do breathless hype just as well as Nvidia.

j / k navigate · click thread line to collapse

101 comments

WhitneyLand2mo ago

Agentic AI CPU? No.

It’s a CPU designed for an AI cluster. Their last CPU Grace was the same thing and no one called it agentic.

Vera now just has more performance/more bandwidth. It’s cool, I’d like to have one of these clusters, but this is not new.

It’s marketed as agentic AI because that’s fashionable in 2026.

storus2mo ago

They significantly lowered latency compared to EPYC/Xeon, which is critical for streaming agents (e.g. text/audio/video agents).

stingraycharles2mo ago

What latency? How much is it compared to LLM inference speed?

1 more reply

PeterCorless2mo ago

https://www.redpanda.com/blog/nvidia-vera-cpu-performance-be...

jauntywundrkind2mo ago

I keep expecting we see fabric gains, see something where the host chip has a better way to talk to other host chips.

Maia 200: https://www.techpowerup.com/345639/microsoft-introduces-its-...

bob10292mo ago

> It might be 800Gbe but it's still so many hops, pcie is weighty.

babelfish2mo ago

Most of the big AI/HPC clusters these systems are aimed at aren’t running regular PCIe Ethernet between nodes, they’re usually wired up with InfiniBand fabrics (HDR/NDR now, XDR soon)

tryauuum2mo ago

Infiniband cards are connected to the rest of the machine with the PCIe as well

baal80spam2mo ago

Say what you want about NVIDIA (to me they are just doing what every company would do in their place), but they create engineering marvels.

1 more reply

d_silin2mo ago

It is a 88-core ARM v9 chip, for somewhat more detailed spec.

PeterCorless2mo ago

It's somewhat different from how x86 chips do simultaneous multithreading (SMT),

fulafel2mo ago

mixmastamyk2mo ago

Hmm, the 128-core Ampere Altra CPU is already available, and in a case from System76. I wonder what else differentiates it.

If they're going to build CPUs I wish they had used Risc-V instead. They are using it somewhat already.

OneDeuxTriSeiGo2mo ago

You can see here[1] what the specs are for the CPU (listed as "NVIDIA Vera Rubin Superchip").

The CPU is integrated with two Rubin GPUs but each of the CPU cores has dedicated FP8 acceleration as well.

1. https://www.nvidia.com/en-us/data-center/vera-rubin-nvl72/

ibgeek2mo ago

The Nvidia CPUs are designed for a very specific use case. They are designed for high performance with less concern about cost control.

gcanyon2mo ago

Anyone know how this compares to Apple’s M5 chips? Or is that comparison <takes off sunglasses> apples to oranges.

pdpi2mo ago

Features like hardware FP8 support definitely make it apples-to-oranges.

philjohn2mo ago

But doesn't the Apple M series NPU support FP8, and as it's a monolithic die (except for the GPU in the M5 Pro and Max) it could be argued it has hardware FP8 support, no?

3 more replies

pjmlp2mo ago

It doesn't matter, because you will never find M5 chips on cloud offerings, or server racks.

It is kind of rediculous that the only server option with Apple hardware has been to stack up mac minis.

They got rid of the server and workstation market, focusing on consumers only.

storus2mo ago

Grace GB10, Vera's predecessor, had a single core performance comparable to M3 so I guess we can expect at least M4 level performance now.

porphyra2mo ago

Isn't the GB10 a Mediatek chip and not directly related to the Grace datacenter CPU?

2 more replies

d_silin2mo ago

M5 are 9-18 cores and optimized for power-efficiency, those are more like Xeons, with 200-300W TDP, I'd bet.

kllrnohj2mo ago

If M5 has 9-18 cores and takes ~20w, then that's ~1-2w per CPU core. If these are 200-300W, and have ~100-200 CPU cores, then guess what? That's also ~1-2w per CPU core.

Xeons, Epycs, whatever this is - they are all also typically optimized for power efficiency. That's how they can fit so many CPU cores in 200-300W.

tencentshill2mo ago

alecco2mo ago

Even Apple hardware looks inexpensive compared to Nvidia's huge premium. And never mind the order backlog.

x86 and Apple already sell CPUs with integrated memory and high bandwidth interconnects. And I bet eventually Intel's beancounter board will wake up and allow engineering to make one, too.

But competition is good for the market.

bigyabai2mo ago

Even with those advantages, Apple can't even sell datacenter hardware to themselves: https://9to5mac.com/2026/03/02/some-apple-ai-servers-are-rep...

1 more reply

storus2mo ago

Apple went from a high-end PC to a low-end AI provider due to blocking Nvidia on their platform.

1 more reply

mikrl2mo ago

>are not competitive in the consumer space

AFAIK they still dominate on clock rate, which I was surprised to see when doing some back of the envelope calculations regarding core counts.

Not sure if I’m the typical consumer in this case however.

kllrnohj2mo ago

1 more reply

wmf2mo ago

A 9700X is twice the performance of a 9900K and M5 Max is almost 3X the performance. The megahertz myth is a myth.

1 more reply

RantyDave2mo ago

Ahhh, so is this a chip "more optimised" for connecting GPU's to reality ... or are they skipping the GPU step entirely? Are GPU's only for training now?

cyanydeez2mo ago

have you seen this: https://chatjimmy.ai/

It's quite impressive what purpose build inference can/will do once everyone stops trying to become kind of the best model.

redwood2mo ago

Wow impressive. What's the story with this?

2 more replies

dmitrygr2mo ago

> Purpose-Built for Agentic AI

From the "fridge purpose-built for storing only yellow tomatoes" and "car only built for people whose last name contains the letter W" series.

pdpi2mo ago

> It is as related to "Agentic $whatever" as your toaster is related to it

pezezin2mo ago

The huge interconnect would also useful be for HPC tasks. The FP8 not so much, HPC still loves FP64.

dmitrygr2mo ago

kibibu2mo ago

Would cloud gaming platforms benefit from the interconnect?

1 more reply

dpe822mo ago

The power and importance of marketing is deeply underappreciated by us technical types.

LogicFailsMe2mo ago

And yet more than a little Gavin Belson "Box III" vibes here. Fortunately, no signature edition.

dwb2mo ago

I don’t underappreciate it, but I do despise it.

pwg2mo ago

> It is a completely normal garden-variety ARM SoC

To mis-quote the politician quip:

How can you tell a marketer is lying?

Answer: His/her mouth is moving.

rka1282mo ago

"democratize access to AI and accelerating innovation."

So they make inference cheaper and the models get even worse. Or Jensen Huang has AI psychosis. Or both.

Here is a new business idea for Nvidia: Give me $3000 in a circular deal which I will then spend on a graphics card.

kwertyoowiyop2mo ago

Me too plz. To quote (more or less) Harvey Pekar: “I’m trying to sell out, but nobody’s buying!”

rishabhaiover2mo ago

Can someone explain what is Vera CPU doing that a traditional CPU doesn't?

kibibu2mo ago

> you're not running 10,000 agents concurrently or downstream tool calls

Cursor seem to be doing exactly that though

urig2mo ago

Lots and lots of CPUs pooled. Faster more efficient power RAM accessible to both GPU and CPU. IIUC.

rishabhaiover2mo ago

But at what stage are we asking for that RAM? if it's the inference stage then doesn't that belong to the GPU<>Memory which has nothing to do with the CPU?

I did see they have the unified CPU/GPU memory which may reduce the cost of host/kernel transactions especially now that we're probably lifting more and more memory with longer context tasks.

recvonline2mo ago

Does this mean their gaming GPUs are becoming less in demand, and therefore cheaper/more available again?

Teknoman1172mo ago

Absolutely not, unfortunately.

The problem is not that gaming GPUs are in demand, it’s that selling silicon to AI center buildouts is so absurdly profitable right now they just aren’t making many gaming GPUs.

If you can only get so many mm^2 of dies from TSMC, might as well make 50x selling to AI providers.

pjmlp2mo ago

Check the GTC 2026 agenda, there are hardly any graphics programming talks.

At least there are a few cool ones about programming CUDA directly in Python.

1 more reply

TheRoque2mo ago

It means it will be profitable to mine crypto again

wmf2mo ago

No.

yalogin2mo ago

This is yet not the grok acquisition, so there is another update coming with that claiming more improvements?

nilstycho2mo ago

https://developer.nvidia.com/blog/inside-nvidia-groq-3-lpx-t...

ksec2mo ago

The most interesting part is that Nvidia intend to sell this CPU separately, meaning you dont need to buy Nvidia GPU to use it.

Other than Hyperscaler ARM has yet to enter the server market and it might well be Nvidia that makes a different.

kibibu2mo ago

Am I crazy, or is Jensen's statement a copy-paste from ChatGPT?

(Could be both)

wmf2mo ago

If AI is so great why should he not use it?

magackame2mo ago

Should work on building the AI Jensen. Maybe it's already the AI Jensen

akomtu2mo ago

They should've called it Vega: https://doom.fandom.com/wiki/VEGA

pohuing2mo ago

Perhaps, but consider the existence of the AMD Vega GPU line https://en.wikipedia.org/wiki/Radeon_RX_Vega_series

FridgeSeal2mo ago

Are we rapidly careening towards a world where _only_ AI “computing” is possible?

Wanted to do general purpose stuff? Too bad, we watched the price of everything up, and then started producing only chips designed to run “ai” workloads.

Oh you wanted a local machine? Too bad, we priced you out, but you can rent time with an ai!

Feels like another ratchet on the “war on general purpose computing” but from a rather different direction.

simulator5g2mo ago

The World's First Central Sloppressing Unit

_s_a_m_2mo ago

what a bizarre title

dude2507112mo ago

A GPU purpose-built for Slop.

jal5052mo ago

I think you're right - the Tauri vs Electron comparison isn't quite the same scale of difference.

That's why we went with an Electron/Puppeteer architecture running locally rather than yet another cloud service. Check it out at https//zen-mode.io if you're curious about the local execution model.

urig2mo ago

What the heck is agentic inference and how is it supposed to be different from LLM inference? That's a rhetorical question. Screw marketing and screw hype.

BoredPositron2mo ago

Who wants general computing anyways?

KnuthIsGod2mo ago

China will beat this....

Seems like a triumph of hype over reality.

China can do breathless hype just as well as Nvidia.

j / k navigate · click thread line to collapse