I am giving up on Intel and have bought an AMD Ryzen 9950X3D (opens in new tab)

(michael.stapelberg.ch)

345 pointssecure8mo ago360 comments

360 comments

c0l08mo ago

I realize this has not much to do with CPU choice per se, but I'm still gonna leave this recommendation here for people who like to build PCs to get stuff done with :) Since I've been able to afford it and the market has had them available, I've been buying desktop systems with proper ECC support.

I've been chasing flimsy but very annoying stability problems (some, of course, due to overclocking during my younger years, when it still had a tangible payoff) enough times on systems I had built that taking this one BIG potential cause out of the equation is worth the few dozens of extra bucks I have to spend on ECC-capable gear many times over.

Trying to validate an ECC-less platform's stability is surprisingly hard, because memtest and friends just aren't very reliably detecting more subtle problems. PRIME95, y-cruncher and linpack (in increasing order of effectiveness) are better than specialzied memory testing software in my experience, but they are not perfect, either.

Most AMD CPUs (but not their APUs with potent iGPUs - there, you will have to buy the "PRO" variants) these days have full support for ECC UDIMMs. If your mainboard vendor also plays ball - annoyingly, only a minority of them enables ECC support in their firmware, so always check for that before buying! - there's not much that can prevent you from having that stability enhancement and reassuring peace of mind.

Quoth DJB (around the very start of this millenium): https://cr.yp.to/hardware/ecc.html :)

dijit8mo ago

> only a minority of them enables ECC support in their firmware, so always check for that before buying!

This is the annoying part.

That AMD permits ECC is a truly fantastic situation, but if it's supported by the motherboard is often unlikely and worse: it's not advertised even when it's available.

I have an ASUS PRIME TRX40 PRO and the tech specs say that it can run ECC and non-ECC but not if ECC will be available to the operating system, merely that the DIMMS will work.

It's much more hit and miss in reality than it should be, though this motherboard was a pricey one: one can't use price as a proxy for features.

sunshowers8mo ago

If you're on Linux, dmesg containing

  EDAC MC0: Giving out device to module amd64_edac

is a pretty reliable indication that ECC is working.

See my blog post about it (it was top of HN): https://sunshowers.io/posts/am5-ryzen-7000-ecc-ram/

oneshtein8mo ago

My `dmesg` tells:

    EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (INTERRUPT)
    EDAC MC1: Giving out device to module igen6_edac controller Intel_client_SoC MC#1: DEV 0000:00:00.0 (INTERRUPT)

but `dmidecode --type 16` says:

    Error Correction Type: None
    Error Information Handle: Not Provided

2 more replies

c0l08mo ago

Usually, if a vendor's spec sheet for a (SOHO/consumer-grade) motherboard mentions ECC-UDIMM explicitly in its memory compatibility section, and (but this is a more recent development afaict) DOES NOT specify something like "operating in non-ECC mode only" at the same time, then you will have proper ECC (and therefore EDAC and RAS) support in Linux, if the kernel version you have can already deal with ECC on your platform in general.

I would assume your particular motherboard to operate with proper SECDED+-level ECC if you have capable, compatible DIMM, enable ECC mode in the firmware, and boot an OS kernel that can make sense of it all.

adrian_b8mo ago

This is weird. I have used many ASUS MBs specified as "can run ECC and non-ECC" and this has always meant that there was an ECC enabling option in the BIOS settings, and then if the OS had an appropriate EDAC driver for the installed CPU ECC worked fine.

I am writing this message on such an ASUS MB with a Ryzen CPU and working ECC memory. You must check that you actually have a recent enough OS to know your Threadripper CPU and that you have installed any software package required for this (e.g. on Linux "edac-utils" or something with a similar name).

jml7c58mo ago

The big problem with ECC for me is that the sticks are so much more expensive. You'd expect ECC UDIMMs to have a price premium of just over 12.5% (because there are 9 chips instead of 8), but it's usually at least 100%. I don't mind paying reasonable premium for ECC, but paying double is too hard to swallow.

mr_toad8mo ago

Trouble with enterprise is that the people buying care about the technology, but not the cost, while the people that do care about cost don’t understand the technology.

Some businesses (and governments) try and unify their purchasing, but this seems to make things worse, with the purchasing department both not understanding technology and being outwitted by vendors.

thewebguyd8mo ago

> Trouble with enterprise is that the people buying care about the technology, but not the cost

Enterprise also ruins it for small/medium businesses as well, at least those with dedicated internal IT departments who do care about both the technology and the cost. We are left with unreliable consumer-grade hardware, or prohibitively expensive enterprise hardware.

There's very little in between. This market is also underserved with software/SaaS as well with the SSO Tax and whatnot. There's a huge gap between "I'm taking the owner's CC down to best buy" and "Enterprise" that gets screwed over.

wmf8mo ago

Enterprise IT is overpriced so you can negotiate a 50% discount. Unfortunately negotiating isn't worth it for something like a pair of DIMMs.

1 more reply

sippeangelo8mo ago

Yeah, with that kind of markup you might as well just buy new ones IF they break, or just spend the extra budget on better quality parts. Just having to pick a very specific motherboard that probably is very much not optimal for your build will blow the costs up even more, and for what gain?

I've been building my own gaming and productivity rigs for 20 years and I don't think memory has ever been a problem. Maybe survivorship bias, but surely even budget parts aren't THIS bad.

lmm8mo ago

> with that kind of markup you might as well just buy new ones IF they break

Assuming you can tell, and assuming you don't end up silently corrupting your data before then.

1 more reply

varispeed8mo ago

You would think that competition would naturally regulate the price down, but it seems like we are dealing with some sort of a cartel that regulators have not caught up with yet.

consp8mo ago

Isn't it mostly an ease of mind thing? I've never seen a ECC error on my home server which has plenty of memory in use and runs longer than my desktop. Maybe it's more common with higher clocked, near the limit, desktop PC's.

Also: DDR5 has some false ecc marketing due to the memory standard having an error correction scheme build in. Don't fall for it.

adrian_b8mo ago

Whether you will see ECC errors depends a lot on how much memory you have and how old it is.

A computer with 64 GB of memory is 4 times more likely to encounter memory errors than one with 16 GB of memory.

When DIMMs are new, at the usual amounts of memory for desktops, you will see at most a few errors per year, sometimes only an error after a few years. With old DIMMs, some of them will start to have frequent errors (such modules presumably had a borderline bad fabrication quality and now have become worn out, e.g. due to increased leakage leading to storing a lower amount of charge on the memory cell capacitors).

For such bad DIMMs, the frequency of errors will increase, and it may become of several errors per day, or even per hour.

For me, a very important advantage of ECC has been the ability to detect such bad memory modules (in computers that have been used for 5 years or more) and replace them before corrupting any precious data.

I also had a case with a HP laptop with ECC, where memory errors had become frequent after being stored for a long time (more than a year) in a rather humid place, which might have caused some oxidation of the SODIMM socket contacts, because removing the SODIMMs, scrubbing the sockets and reinserting the SODIMMs made disappear the errors.

fluoridation8mo ago

>A computer with 64 GB of memory is 4 times more likely to encounter memory errors than one with 16 GB of memory.

No. Or well, not exactly. More bits will flip randomly, but if between the two systems only the total installed memory changed, both systems will see the same amount of memory errors, because bit flips on the additional 48 GB will not result in errors, because they will not be used. Memory errors scale with memory used not with memory installed.

1 more reply

c0l08mo ago

I see a particular ECC error at least weekly on my home desktop system, because one of my DIMMs doesn't like the (out of spec) clock rate that I make it operate at. Looks like this:

    94 2025-08-26 01:49:40 +0200 error: Corrected error, no action required., CPU 2, bank Unified Memory Controller (bank=18), mcg mcgstatus=0, mci CECC, memory_channel=1,csrow=0, mcgcap=0x0000011c, status=0x9c2040000000011b, addr=0x36e701dc0, misc=0xd01a000101000000, walltime=0x68aea758, cpuid=0x00a50f00, bank=0x00000012
    95 2025-09-01 09:41:50 +0200 error: Corrected error, no action required., CPU 2, bank Unified Memory Controller (bank=18), mcg mcgstatus=0, mci CECC, memory_channel=1,csrow=0, mcgcap=0x0000011c, status=0x9c2040000000011b, addr=0x36e701dc0, misc=0xd01a000101000000, walltime=0x68b80667, cpuid=0x00a50f00, bank=0x00000012

(this is `sudo ras-mc-ctl --errors` output)

It's always the same address, and always a Corrected Error (obviously, otherwise my kernel would panic). However, operating my system's memory at this clock and latency boosts x265 encoding performance (just one of the benchmarks I picked when trying to figure out how to handle this particular tradeoff) by about 12%. That is an improvement I am willing to stomach the extra risk of effectively overclocking the memory module beyond its comformt zone for, given that I can fully mitigate it by virtue of properly working ECC.

Hendrikto8mo ago

Running your RAM so far out of spec that it breaks down regularly, where do you take the confidence that ECC will still work correctly?

Also: Could you not have just bought slightly faste RAM, given the premium for ECC?

1 more reply

kderbe8mo ago

I would loosen the memory timings a bit and see if that resolves the ECC errors. x265 performance shouldn't fall since it generally benefits more from memory clock rate than latency.

Also, could you share some relevant info about your processor, mainboard, and UEFI? I see many internet commenters question whether their ECC is working (or ask if a particular setup would work), and far fewer that report a successful ECC consumer desktop build. So it would be nice to know some specific product combinations that really work.

1 more reply

ainiriand8mo ago

I think you've found a particularly weak memory cell, I would start thinking about replacing that module. The consistent memory_channel=1, csrow=0 pattern confirms it's the same physical location failing predictably.

wpm8mo ago

I had a somewhat dodgy stick of used RAM (DDR4 UDIMM) in a Supermicro X11 board. This board is running my NAS, all ZFS, so RAM corruption can equal data corruption. The OS alerted me to recoverable errors on DIMM B2. Swapped it and another DIMM, rebooted, saw DIMM error on slot B1. Swapped it for a spare stick. No more errors.

This was running at like, 1866 or something. It's a pretty barebones 8th gen i3 with a beefier chipset, but ECC still came in clutch. I won't buy hardware for server purposes without it.

immibis8mo ago

I saw a corrected memory error logged every few hours when my current machine was new. It seems to have gone away now, so either some burn-in effect, or ECC accidentally got switched off and all my data is now corrupted. Threadripper 7000 series, 4x64GB DDR5.

Edit: it's probably because I switched it to "energy efficiency mode" instead of "performance mode" because it would occasionally lock up in performance mode. Presumably with the same root cause.

Jach8mo ago

I have a slightly older system with 128 GB of UDIMM DDR4 over four sticks. Ran just fine for quite a while but then I started having mysterious system freezes. Later discovered I had somehow disabled ECC error reporting in my system log on linux... once that was turned back on, oh, I see notices of recoverable errors. I finally found a repeatable way to trigger a freeze with a memory stress testing tool and that was from an unrecoverable error. I couldn't narrow the problem down to a single stick or RAM channel, it seemed to only happen if all 4 slots were occupied, but I eventually figured out that if I just lowered the RAM speed from standard 3200 MHz to the next officially supported (by the sticks) step of 2933 MHz, everything was fine again and no more ECC errors, recoverable or not. Been running like that since.

Last winter I was helping someone put together a new gaming machine... it was so frustrating running into the fake ecc marketing for DDR5 that you mention. The motherboard situation for whether they support it or not, or whether a bios update added support or then removed it or added it back or not, was also really sad. And even worse IMO is that you can't actually max out 4 slots on the top tier mobos unless you're willing to accept a huge drop in RAM speed. Leads to ugly 48 GB sized sticks and limiting to two of them... In the end we didn't go with ECC for that someone, but I was pretty disappointed about it. I'm hoping the next gen will be better, for my own setup running ZFS and such I'm not going to give up ECC.

hedora8mo ago

You have to go pretty far down the rabbit hole to make sure you’ve actually got ECC with [LP]DDR5

Some vendors use hamming codes with “holes” in them, and you need the CPU to also run ECC (or at least error detection) between ram and the cache hierarchy.

Those things are optional in the spec, because we can’t have nice things.

BikiniPrince8mo ago

I pick up old serves for my garage system. With edac it is a dream to isolate the fault and be instantly aware. It also lets you determine the severity of the issue. Dimms can run for years with just the one error or overnight explode into streams of corrections. I keep spares so it’s fairly easy to isolate any faults. It’s just how do you want to spend your time?

Scramblejams8mo ago

I run a handful of servers and I have a couple that pop ECC errors every year or three, so YMMV.

swinglock8mo ago

Excellent point. It's a shame and a travesty that data integrity is still mostly locked away inside servers, leaving most other computing devices effectively toys, the early prototype demo thing but then never finished and sold forever at inflated prices.

I wish AMD would make ECC a properly advertised feature with clear motherboard support. At least DDR5 has some level of ECC.

kevin_thibedeau8mo ago

> At least DDR5 has some level of ECC.

That is mostly to assist manufacturers in selling marginal chips with a few bad bits scattered around. It's really a step backwards in reliability.

wpm8mo ago

I wish AMD wouldn't gate APU ECC support behind unobtainium "PRO" SKUs they only give out, seemingly, to your typical "business" OEMs and the rare Chinese miniPC company.

c0l08mo ago

It's not that dire as you make it out to be :)

Both the 8700G and the 8700G PRO are readily available in the EU, and the PRO SKU is about 50% more expensive (EUR 120 in absolute numbers): https://geizhals.eu/?cmp=3096260&cmp=3096300&cmp=3200470&act...

rendaw8mo ago

So I'm trying to learn more about this stuff, but aren't there multiple ECC flavors and the AMD consumer CPUs only support one of them (not the one you'd have on servers?)

Does anyone maintain a list with de-facto support of amd chips and mainboards? That partlist site only shows official support IIRC, so it won't give you any results.

adrian_b8mo ago

The difference between the "unbuffered" ECC DIMMs (ECC UDIMMs), which you must use in desktop motherboards (and in some of those advertised as "workstation" MBs) and the "registered" ECC DIMMs (ECC RDIMMs), which you must use in server motherboards (and in some of the "workstation" MBs), has existed for decades.

However in the past there have existed very few CPU models and MBs that supported either kind of DIMMs, while today this has become completely impossible, as the mechanical and electrical differences between them have increased.

In any case, today, like also 20 years ago, when searching for ECC DIMMs you must always search only the correct type, e.g. unbuffered ECC DIMMs for desktop CPUs.

In general, registered ECC DIMMs are easier to find, because wherever "server memory" is advertised, that is what is meant. For desktop ECC memory, you must be careful to see both "ECC" and "unbuffered" mentioned in the module description.

hungmung8mo ago

Seconding this. I'm looking for a fanless industrial mini PC with out of band ECC and I'm having a hell of a time.

adrian_b8mo ago

Had you been looking for "in-band ECC", the cheap ODROID H4 PLUS ($150) or the cheaper ODROID H4 ($110) would have been fine, or for something more expensive some of the variants of Asus NUC 13 Rugged support in-band ECC.

For out-of-band ECC, e.g. with standard ECC SODIMMs, all the embedded SBCs that I have seen used only CPUs that are very obsolete nowadays, i.e. ancient versions of Intel Xeon or old AMD industrial Ryzen CPUs (AMD's series of industrial Ryzen CPUs are typically at least one or two generations behind their laptop/desktop CPUs).

Moreover all such industrial SBCs with ECC SODIMMs were rather large, i.e. either in the 3.5" form factor or in the NanoITX form factor (120 mm x 120 mm), and it might have been necessary to replace their original coolers with bigger heatsinks for fanless operation.

In-band ECC causes a significant decrease of the performance, but for most applications of such mini-PCs the performance is completely acceptable.

nicman238mo ago

https://www.asrockind.com/en-gb/iBOX-V2000V

something like that?

devnullbrain8mo ago

I like the warning not to buy a motherboard from a manufacturer that has been defunct for 17 years

storus8mo ago

Now where can I get 64GB ECC UDIMM DDR5 modules so that my X870E board can have 256GB RAM? The largest I found were just 48GB ECC UDIMMs or 64GB non-ECC UDIMMs.

c0l08mo ago

I don't think 64GB ECC UDIMM is commercially available yet. I use Geizhals to check for EU availability: https://geizhals.eu/?cat=ramddr3&xf=7500_DDR5~7501_DIMM~7761...

In my experience, it's generally unwise to push the platform you're on to the outermost of its spec'd limits. At work, we bought several 5950X-based Zen3 workstations with 128GB of 3200MT/s ECC UDIMM, and two of these boxes will only ever POST when you manually downclock memory to 3000MT/s. Past a certain point, it's silicon lottery deciding if you can make reality live up to the datasheets' promises.

storus8mo ago

I am fine with downclocking the RAM; my X870E board (ProArt) should be fine running ECC, I only use 9800X3D to have a single CCD (maybe upgraded later to EPYC 4585PX) and together have RTX 6000 Pro and 2x NVLinked A6000 in PCIe slots, with two M.2 SSDs. Power supply follows the latest specs as well. This build was meant to be a light-weight Threadripper replacement and ECC is a must for my use cases (it's a build for my summer house so that I can do serious work while there).

unethical_ban8mo ago

Any specific recommendations? I am having random, OS agnostic lockups on my ryzen 1xxx build and thought DDR5 will be enough, but true ECC sounds good.

edit: Looks like a lot of Asus motherboards work, and the thing to look for is "unbuffered" ECC. Kingston has some, I see 32GB module for $190 on Newegg.

moffkalast8mo ago

Do you live at a very high altitude with a significant amount of solar radiation, or at an underfunded radiology lab or perhaps near a uranium deposit or a melted down nuclear reactor? Because the average machine should never see a memory bit flip error at all during its entire lifetime.

rkomorn8mo ago

Then how do you explain all the bugs in the software I write?!

moffkalast8mo ago

It truly is a cosmic mystery :)

yndoendo8mo ago

Bit flipping can be the byproduct of bow the system components harmonize. Role hammer RAM also has the same affect. [0]

[0] https://en.m.wikipedia.org/wiki/Row_hammer

moffkalast8mo ago

> Furthermore, research shows that precisely targeted three-bit Rowhammer flips prevents ECC memory from noticing the modifications.

Doesn't exactly sound like a use case for ECC memory, given that it can't correct these attacks. Interesting though, I'd have thought that virtual addresses would've largely fixed this.

enronmusk8mo ago

If OP's CPU cooler (Noctua NH-D15 G2) wasn't able to cool down his CPU below 100C, he must have been (intentionally or unintentionally with Asus multi core enhancement) overclocked his CPU. Or he didn't apply thermal paste properly or didn't remove the cooler plastic sticker?

I have followed his blog for years and hold him in high respect so I am surprised he has done that and expected stability at 100C regardless of what Intel claim is okay.

Not to mention that you rapidly hit diminishing returns pass 200W with current gen Intel CPUs, although he mentions caring able idle power usage. Why go from 150W to 300W for a 20% performance increase?

magicalhippo8mo ago

He did have the Fractal Define 7 Compact case, and the pictures[1] only show a single 140mm case fan. From personal experience the Fractal Define cases are great at sound reduction due to the thermal padding, but those pads also insulates well.

Given the motherboard and RAM will also generate quite some heat, if the case fan profile was conservative (he does mention he likes low noise), could be the insides got quite toasty.

Back when I got my 2080 Ti, I had this issue when gaming. The internal temps would get so hot due to the blanket effect of the padding I couldn't touch the components after a gaming session. Had to significantly tweak my fan profiles. His CPU at peak would generate about the same amount of heat as my 2080 Ti + CPU I had then, and I had the non-Compact case with two case fans.

[1]: https://michael.stapelberg.ch/posts/2025-05-15-my-2025-high-...

enronmusk8mo ago

Excellent point. A single case fan is highly atypical and concerning.

I also have a fractal define case with anti noise padding material and dust filters, but my temperatures are great and the computer is almost inaudible even when compiling code for hours with -j $(nproc). And my fans and cooler are much cheaper than his.

magicalhippo8mo ago

> thermal padding

That should of course be sound padding...

Dunedan8mo ago

> […] so I am surprised he has done that and expected stability at 100C regardless of what Intel claim is okay.

Intel specifies a max operating temperature of 105°C for the 285K [1]. Also modern CPUs aren't supposed to die when run with inadequate cooling, but instead clock down to stay within their thermal envelope.

[1]: https://www.intel.com/content/www/us/en/products/sku/241060/...

epolanski8mo ago

I always wonder: how many sensors are registering that temp?

Because CPUs can get much hotter in specific spots at specific pins no? Just because you're reading 100, doesn't mean there aren't spots that are way hotter.

My understanding is that modern Intel CPUs have a temp sensor per core + one at package level, but which one is being reported?

lucianbr8mo ago

There's no way on Earth Intel hasn't thought of this. Probably the sensors are in or near the places that get the hottest, or they are aware of the delta and have put in the proper margin, or something like that.

1 more reply

enronmusk8mo ago

Yes, I have read the article and I agree Intel should be shamed (and even sued) for inaccurate statements. But it doesn't change the fact it has never been a good idea to run desktop processors at their throttling temperature -- it's not good for performance, it's not good for longevity and stability, and it's also terrible for efficiency (performance per watt).

Anyway, OP's cooler should be able to cool down 250W CPUs below 100C. He must have done something wrong for this to not happen. That's my point -- the motherboard likely overclocked the CPU and he failed to properly cool it down or set a power limit (PL1/PL2). He could have easily avoided all this trouble.

dahauns8mo ago

The cpu temps are one thing, but if - as you said - even a beast like the D15 G2 has it pegged at 100C, this very much sounds like bad ventilation and other parts of the system being toasted as well - VRMs in particular, for which the "PRIME" (actually being the low-end series) mainboards from Asus, as used here, don't exactly have a stellar reputation.

And yeah, having Arrow Lake running at its defaults is just a waste of energy. Even halving your TDP just loses you roughly 15% performance in highly MT scenarios...

secureOP8mo ago

> If OP's CPU cooler (Noctua NH-D15 G2) wasn't able to cool down his CPU below 100C, he must have been (intentionally or unintentionally with Asus multi core enhancement) overclocked his CPU. Or he didn't apply thermal paste properly or didn't remove the cooler plastic sticker?

I did not overclock this CPU. I pay attention to what I change in the BIOS/UEFI firmware, and I never select any overclocking options.

Also, I have applied thermal paste properly: Noctua-supplied paste, following Noctua’s instructions for this CPU socket.

enronmusk8mo ago

Thank you for responding. How do you explain your CPU hitting 100C in that case? That should not have happened.

https://www.techpowerup.com/review/intel-core-ultra-9-285k/2... lists maximum temperature as 88.2C with the previous gen NH-D15 cooler.

microtonal8mo ago

I feel like both Intel and AMD are not doing great in the desktop CPU stability department. I made a machine with a Ryzen 9900X a while back and it had the issue that it would freeze when idling. A few years before I had a 5950X that would regularly crash under load (luckily it was a prebuilt, so it was ultimately fixed).

When you do not have a bunch of components ready to swap out it is also really hard to debug these issues. Sometimes it’s something completely different like the PSU. After the last issues, I decided to buy a prebuilt (ThinkStation) with on-site service. The cooling is a bit worse, etc., but if issues come up, I don’t have to spend a lot of time debugging them.

Random other comment: when comparing CPUs, a sad observation was that even a passively cooled M4 is faster than a lot of desktop CPUs (typically single-threaded, sometimes also multi-threaded).

seec8mo ago

Your comment about the passively cooled M4 is misleading. Sure, in single thread, it will be definitely faster. In multithread unless you are going for low end or older CPUs it's basically a lie. A 10 Core M4 will score around a 14TH gen mobile i5. It will consume much less power but the argument is on performance, so that's beside the point.

And if we are talking about a passively cooled M4 (MacBook Air basically) it will quite heavily throttle relatively quickly, you lose at the very least 30%.

So, let's not misrepresent things, Apple CPUs are very power efficient but they are not magic, if you hit them hard, they still need good cooling. Plenty of people have had the experience with their M4 Max, discovering that actually, if they did use the laptop as a workstation, it will generate a good amount of fan noise, there is no other way around.

Apple stuff is good because most people actually have bursty workload (especially graphic design, video editing and some audio stuff) but if you hammer it for hours on end, it's not that good and the power efficiency point becomes a bit moot.

bob10298mo ago

I've got a 5950x that I can reliably crater with a very specific .NET 8 console app when it would otherwise be stable 24/7/365, even under some pretty crazy workloads like Unity.

I think a lot of it boils down to load profile and power delivery. My 2500VA double conversion UPS seems to have difficulty keeping up with the volatility in load when running that console app. I can tell because its fans ramp up and my lights on the same circuit begin to flicker very perceptibly. It also creates audible PWM noise in the PC which is crazy to me because up til recently I've only ever heard that from a heavily loaded GPU.

heelix8mo ago

I wonder if cooling/power is really the key here. I've got a 5950x that ended up getting the water loop I'd intended for my next threadripper - only to find they were not selling the blasted things to anyone but a few companies. With the cooling sized for almost twice what the 5950x could put out, it has been a very stable machine for some crazy workloads. That old dog will likely keep the setup when a zen 5 TR gets swapped in.

For a long time, my Achille's heel was my Bride's vacuum. Her Dyson pulled enough amps that the UPS would start singing and trigger the auto shutdown sequence for the half rack. Took way too long to figure out as I was usually not around when she did it.

esseph8mo ago

I have a 5700X with an AIO water cooler and it runs 65C under load. Never seems to crash. Been like this for years.

486sx338mo ago

My 5950 didn’t like liquid cooling and lives very well with air cooling :)

neRok8mo ago

> I think a lot of it boils down to load profile and power delivery

You said the right words but with the wrong meaning! On Gigabyte mobo you want to increase the "CPU Vcore Loadline Calibration" and the "PWM Phase Control" settings, [see screenshot here](https://forum.level1techs.com/t/ddr4-ram-load-line-calibrati...).

When I first got my Ryzen 3900X cpu and X570 mobo in 2019, I had many issues for a long time (freezes at idle, not waking from sleep, bios loops, etc). Eventually I found that bumping up those settings to ~High (maybe even Extreme) was what was required, and things worked for 2 years or so until I got a 5950X on clearance last year.

I slotted that in to the same mobo and it worked fine, but when I was looking at HWMon etc, I noticed some strange things with the power/voltage. After some mucking about and theorising with ChatGPT (it's way quicker than googling for uncommon problems), it became apparent that the ~High LLC/power settings I was still using were no good. ChatGPT explained that my 3900X was probably a bit "crude" in relative quality, and so it needed the "stronger" power settings to keep itself in order. Then when I've swapped to 5950X, it happens to be more "refined" and thus doesn't need to be "manhandled" — and in fact, didn't like being manhandled at all!

shrubble8mo ago

If you have a double conversion UPS that is complaining about less than 100W deviation, I would recommend you check the UPS for a component that is out of spec or on the way to failure.

bob10298mo ago

The concern isn't the average rated TDP. It's the high Di/dt (change in current over time) transients of certain workload profiles cascading through the various layers of switch mode power supplies. Every layer of power delivery has some reactivity to it. I'd agree this would be no problem if all our power supplies were purely linear (and massively inefficient).

bell-cot8mo ago

I'm sure there are spec's for how fast a PS should be able to ramp up in response to spikes in demand, how a motherboard should handle sudden load changes, etc.

But if your UPS (or just the electrical outlet you're plugged into) can't cope - dunno if I'd describe that as cratering your CPU.

encom8mo ago

>M4 is faster than a lot of desktop CPUs

Yea, but unfortunately it comes attached to a Mac.

An issue I've encountered often with motherboards, is that they have brain damaged default settings, that run CPU's out of spec. You really have to go through it all with a fine toothed comb and make sure everything is set to conservative stock manufacturer recommended settings. And my stupid MSI board resets everything (every single BIOS setting) to MSI defaults when you upgrade its BIOS.

homebrewer8mo ago

Also be careful with overclocking, because the usual advice of "just running EXPO/XMP" often results in motherboards setting voltages on very sensitive components to more than 30% over their stock values, and this is somehow considered normal.

It looks completely bonkers to me. I overclocked my system to ~95% of what it is able to do with almost default voltages, using bumps of 1-3% over stock, which (AFAIK) is within acceptable tolerances, but it requires hours and hours of tinkering and stability testing.

Most users just set automatic overclocking, have their motherboards push voltages to insane levels, and then act surprised when their CPUs start bugging out within a couple of years.

Shocking!

microtonal8mo ago

Unfortunately, some separately purchasable hardware components seem to be optimized completely for gamers these days (overclocking mainboards, RGB on GPUs, etc.).

I'd rather run everything at 90% and get very big power savings and still have pretty stellar performance. I do this with my ThinkStation with Core Ultra 265K now - I set the P-State maximum performance percentage to 90%. Under load it runs almost 20 degrees Celsius cooler. Single core is 8% slower, multicore 4.9%. Well worth the trade-off for me.

(Yes, I know that there are exceptions.)

2 more replies

microtonal8mo ago

Also relevant in this context:

https://www.pugetsystems.com/blog/2024/08/02/puget-systems-p...

to;dr: they heavily customize BIOS settings, since many BIOSes run CPUs out-of-spec by default. With these customizations there was not much of a difference in failure rate between AMD and Intel at that point in time (even when including Intel 13th and 14th gen).

ahartmetz8mo ago

Since you mention EXPO/XMP, which are about RAM overclocking: RAM has the least trouble with overvoltage. Raising some of the various CPU voltages is a problem, which RAM overclocking may also do.

1 more reply

electroglyph8mo ago

yah, the default overclocking stuff is pretty aggressive these days

microtonal8mo ago

Yea, but unfortunately it comes attached to a Mac.

Yeah. If Asahi worked on newer Macs and Apple Silicon Macs supported eGPU (yes I know, big ifs), the choice would be simple. I had NixOS on my Mac Studio M1 Ultra for a while and it was pretty glorious.

claudex8mo ago

>And my stupid MSI board resets everything (every single BIOS setting) to MSI defaults when you upgrade its BIOS.

I had the same issue with my MSI board, next one won't be a MSI.

techpression8mo ago

My ASUS and Gigabyte did the same too. I think vendors are being lazy and don’t want to write migration code

1 more reply

esseph8mo ago

I don't think I've ever had a BIOS that didn't reset things to default after a firmware upgrade.

philistine8mo ago

I'd bet you don't care that it's attached to a Mac. I bet you don't want to switch OS. Which is understandable. In a couple of years, when Microsoft finally offers Windows as an ARM purchase, Linux is finally full-fledged done implementing support, and Apple resuscitates Boot Camp, I think a lot of people like you will look at Macs like the Mac Mini differently.

timmytokyo8mo ago

Just get used to the extortionate prices on things like memory and storage. Who wouldn't want to pay $200 to go from a 16GB to a 24GB configuration? Who wouldn't want to pay $600 more for the 2TB storage option? And forget about upgrading after you buy the computer.

etempleton8mo ago

My experience is similar. Modern enthusiast CPUs and hardware compatibility is going backwards. I have a 5900x that randomly crashes on idle, but not under load. My 285K has so far been rock solid and generally feels snappier. I feel like both Intel and AMD are really trying to push the envelope to look good on benchmarks and this is the end result.

naasking8mo ago

Crash on idle, interesting. Must be some timing issue related to down clocking, or maybe a voltage issue related to shutting off a core.

InMice8mo ago

Have you tried using powerprofilesctl to change the power profile to 'performance' instead of 'balanced' or 'power saver'? I think this would prevent the lowest idle states at least. Just a guesss, never had this problem myself.

My modern CPU problems are DDR5 and the pre-boot timing thing never completing. So a build of a 9700x that I did that WAS supposed to be located remotely from me has to sit in my office and have its hand held thru every reboot cuz you never know quite know when its doing to decide it needs to retime and randomly never come back. Requires pulling the plug from the back and waiting a few minutes then powering back, then waiting 30 minutes for 64gb of ddr5 to do its timing thing.

Dennip8mo ago

I have a 5950X system that will just randomly shut down, I've RMA'd the CPU, tried swapping the RAM, GPU, PSU and the motherboard in different combinations. I cannot track down a specific issue and it just won't be stable. I've given up and decided to discard the PC of theseus and build a new one -_-.

66fm472tjy78mo ago

Occasionally occurring issues are so annoying. I lived with these issues for years before becoming able to reliably reproduce them by accident and thus making a good guess on the cause:

My system would randomly freeze for ~5 seconds, usually while gaming and having a video in the browser running a the same time. Then, it would reliably happen in Titanfall 2 and I noticed there were always AHCI errors in the Windows logs at the same time so I switched to an NVMe drive.

The system would also shut down occasionally (~ once every few hours) in certain games only. Then, I managed to reproduce it 100% of the time by casting lightning magic in Oblivion Remastered. I had to switch out my PSU, the old one probably couldn't handle some transient load spike, even though it was a Seasonic Prime Ultra Titanium.

api8mo ago

The M series chips aren’t the absolute fastest in raw speed, though they are toward the top of the list, but they destroy x86 lineage chips on performance per watt.

I have an M1 Max, a few revisions old, and the only thing I can do to spin up the fans is run local LLMs or play Minecraft with the kids on a giant ultra wide monitor at full frame rate. Giant Rust builds and similar will barely turn on the fan. Normal stuff like browsing and using apps doesn’t even get it warm.

I’ve read people here and there arguing that instruction sets don’t matter, that it’s all the same past the decoder anyway. I don’t buy it. The superior energy efficiency of ARM chips is so obvious I find it impossible to believe it’s not due to the ISA since not much else is that different and now they’re often made on the same TSMC fabs.

AnthonyMouse8mo ago

> they destroy x86 lineage chips on performance per watt.

This isn't really true. On the same process node the difference is negligible. It's just that Intel's process in particular has efficiency problems and Apple buys out the early capacity for TSMC's new process nodes. Then when you compare e.g. the first chips to use 3nm to existing chips which are still using 4 or 5nm, the newer process has somewhat better efficiency. But even then the difference isn't very large.

And the processors made on the same node often make for inconvenient comparisons, e.g. the M4 uses TSMC N3E but the only x86 processor currently using that is Epyc. And then you're obviously not comparing like with like, but as a ballpark estimate, the M4 Pro has a TDP of ~3.2W/core whereas Epyc 9845 is ~2.4W/core. The M4 can mitigate this by having somewhat better performance per core but this is nothing like an unambiguous victory for Apple; it's basically a tie.

> I have an M1 Max, a few revisions old, and the only thing I can do to spin up the fans is run local LLMs or play Minecraft with the kids on a giant ultra wide monitor at full frame rate. Giant Rust builds and similar will barely turn on the fan. Normal stuff like browsing and using apps doesn’t even get it warm.

One of the reasons for this is that Apple has always been willing to run components right up to their temperature spec before turning on the fan. And then even though that's technically in spec, it's right on the line, which is bad for longevity.

In consumer devices it usually doesn't matter because most people rarely put any real load on their machines anyway, but it's something to be aware of if you actually intend to, e.g. there used to be a Mac Mini Server product and then people would put significant load on them and then they would eat the internal hard drives because the fan controller was tuned for acoustics over operating temperature.

ac298mo ago

This anecdote perfectly describes my few generation old Intel laptop too. The fans turn on maybe once a month. I dont think its as power efficient as an M-series Apple CPU, but total system power is definitely under 10W during normal usage (including screen, wifi, etc).

adithyassekhar8mo ago

I'd rather it spin the fan all the time to improve longevity but that's just me.

One of the many reasons why snapdragon windows laptops failed was both amd and Intel (lunar lake) was able to reach the claimed efficiency of those chips. I still think modern x86 can match arm ones in efficiency if someone bothered to tune the os and scheduler for most common activities. M series was based on their phone chips which were designed from the ground up to run on a battery all these years. AMD/Intel just don't see an incentive to do that nor do Microsoft.

hedora8mo ago

I have a modern AMD system on chip mini desktop that runs Linux (devuan), and have had M1/2/3 laptops. They all seem pretty comparable on power usage, especially at idle. Games and LLM load warm up the desktop and kill the laptop battery. Other than that, power consumption seems fine.

There is one exception: If I run an idle Windows 11 ARM edition VM on the mac, then the fans run pretty much all the time. Idle Linux ARM VMs don’t cause this issue on the mac.

I’ve never used windows 11 for x86. It’s probably also an energy hog.

dagmx8mo ago

Afaik they are the fastest cores in raw speed. They’re just not available in very high core offerings so eventually fall behind when parallelism wins.

johnisgood8mo ago

This does not fill me with much hope. What am I even ought to buy at this point then, I wonder. I have a ~13 years old Intel CPU which lacks AVX2 (and I need it by now) and I thought of buying a new desktop (items separately, of course), but that is crazy to me that it freezes because of the CPU going idle. It was never an issue in my case. I guess I can only hope it is not going to be a problem once I completed building my PC. :|

On what metric am I ought to buy a CPU these days? Should I care about reviews? I am fine with a middle-end CPU, for what it is worth, and I thought of AMD Ryzen 7 5700 or AMD Ryzen 5 5600GT or anything with a similar price tag. They might even be lower-end by now?

hhh8mo ago

Just buy an AMD CPU. One person’s experience isn’t the world. Nobody in my circle has had an issue with any chip from AMD in recent time (10 years).

Intel is just bad at the moment and not even worth touching.

microtonal8mo ago

I agree that Intel is bad at the moment (especially with the 13th and 14th gen self-destruct issues). But unfortunately I also know plenty of people with issues with AMD systems.

And it's no bad power quality on mains as someone suggested (it's excellent here) or 'in the air' (whatever that means) if it happens very quickly after buying.

I would guess that a lot of it comes from bad firmware/mainboards, etc. like the recent issue with ASRock mainboards destroying Ryzen 9000-series GPUs: https://www.techspot.com/news/108120-asrock-confirms-ryzen-9... Anyone who uses Linux and has dealt with bad ACPI bugs, etc. knows that a lot of these mainboards probably have crap firmware.

I should also say that I had a Ryzen 3700X and 5900X many years back and two laptops with a Ryzen CPU and they have been awesome.

1 more reply

tester7568mo ago

This is funny because recently my AMD Ryzen 7 5700X3D died and I've decided that my next CPU will be Intel

https://news.ycombinator.com/item?id=45043269

2 more replies

hedora8mo ago

I went further and got an AMD system on chip machine with an integrated gpu. It’s fine for gaming and borderline for LLM inference (I should have put 64GB in instead of 32GB).

The only issues are with an intel Bluetooth chipset, and bios auto detection bugs. Under Linux, the hardware is bug for bug compatible with Windows, and I’m down to zero known issues after doing a bit of hardware debugging.

1 more reply

johnisgood8mo ago

That is what I thought, thanks.

ahofmann8mo ago

I wouldn't be so hopeless. Intel and AMD CPUs are used in millions of builds and most of them just work.

dahcryn8mo ago

Indeed. I feel so weird reading this discussion section.

My home server is on a 5600G. I turned it on, installed home assistant and jellyfin etc... , and since it has not been off. It's been chugging along completely unattended, no worries.

Yes, it's in a basement where temperature is never above 21C, and it's almost never pushed to 100%, and certainly never for extended periods of time.

But it's the stock cooler, cheap motherboard, cheap RAM and cheap SSD (with expensive NAS grade mechanical hard drives).

microtonal8mo ago

However, the vast majority of PCs out there are not hobbyist builds but Dell/Lenovo/HP/etc. [1] with far fewer possible configurations (and much more testing as a byproduct). I am not saying these machines never have issues, but a high failure rate would not be acceptable to their business customers.

[1] Well, most non-servers are probably laptops today, but the same reasoning applies.

1 more reply

homebrewer8mo ago

It's either bad luck, bad power quality from the mains, or something in the air in that particular area. I know plenty of people running AM5 builds, have done so myself for the last couple of years, and there were no problems with any of them apart from the usual amdgpu bugs in latest kernels (which are "normal" since I'm running mainline kernels — it's easy to solve by just sticking to lts, and it has seemingly improved anyway since 6.15).

johnisgood8mo ago

I have a really specific Xorg config with my own kernel boot line in GRUB for amdgpu, it seems to work for me. I have not experienced any bugs with amdgpu so far. The GPU and HDD / SSD would stay anyway, I would only have to get everything else.

scns8mo ago

> I thought of AMD Ryzen 7 5700

Definetly not that one if you plan to pair with a dedicated GPU! The 5700X has twice the L3 cache. All Ryzen 5000 with a GPU have only 16MB, 5700 has the GPU deactivated.

johnisgood8mo ago

I have a lower-end Radeon GPU (Navi 24 [Radeon RX 6400]). Which CPU would you suggest with it? I only ever want to use the GPU though, not the CPU's integrated one. I kind of want to get a motherboard that is compatible with the latest AM socket, which is AM5, right? So if I want a CPU with AM5, what would you suggest for the CPU?

But see, this is why it is so difficult. I would have never guessed. I would have to research this A LOT, which I am fine with, but you know.

PartiallyTyped8mo ago

3 of my last 4 machines have been AMD x NVDA and I have been very happy. The intel x NVDA machine has been my least stable one.

protocolture8mo ago

>I made a machine with a Ryzen 9900X a while back and it had the issue that it would freeze when idling

I also have this issue.

c0balt8mo ago

If you are on Linux, there are long time known problems with low power cpu states. These states can be entered by your CPU when under low/no load.

A common approach is to go into the BIOS/UEFI settings and check that c6 is disabled. To verify and/or temporarily turn c6 off, see https://github.com/r4m0n/ZenStates-Linux

hedora8mo ago

It’s also worth checking all the autodetected stuff that can be overclocked, like ram speed. That stuff can be wrong, and then you get crazy but similar bugs in linux and windows.

InMice8mo ago

You could try using powerprofilesctl to change the mode from 'balanced' or 'power saver' to 'performance' since i think this may prevent the cpu from ever entering the throttled down low idle states that your freezing happens in. they are controlled with powerprofilesctl. You may also be able to add some flugs to grub config file. assuming you are using linux i guess.

protocolture8mo ago

I have set my timings manually in the bios, and disabled most advanced CPU features for stability.

If I enable virtualisation, the issue can be replicated within 15 minutes of boot.

But with basically half the CPU set to do nothing, and all features disabled its once a week max.

Which sucks because I basically live in WSL.

sunmag8mo ago

Have had three systems (two 5800x, one 3600x) that reboots/freezes due to WHEA errors. Started after about 3years problem free. One of the 5800xs so frequently it was trashed.

cptskippy8mo ago

I wonder if it's related to the main board... Are you running X or B series chipsets? I've found that less is more when it comes to stability. Vendors always add features to justify the $200-300 price of X series.

I have always run B series because I've never needed the overclocking or additional peripherals. In my server builds I usually disable peripherals in the UEFI like Bluetooth and audio as well.

kristopolous8mo ago

M3 ultra is the more capable chip by quite a bit. For instance: 80 GPU cores versus 10 in the m4.

Twice the memory bandwidth, twice the CPU core count... It's really wacky how they've decided to name things

bee_rider8mo ago

Is there some tricky edge case here? I thought the “3” and “4” just denoted generations. The Ultra chips are like Apple’s equivalent of a workstation chip, they are always bigger right? It is like comparing the previous generation Xeon to a current gen i5.

dwood_dev8mo ago

Apple has been iterating IPC as well as increasing core count.

The Ultra is a pair of Max chips. While the core counts didn't increase from M3 to M4 Max, overall performance is in the neighborhood of 5-25% better. Which still puts the M3 Ultra as Apple's top end chip, and the M5 Max might not dethrone it either.

The uplift in IPC and core counts means that my M1 Max MBP has a similar amount of CPU performance as my M3 iPad Air.

1 more reply

IAmGraydon8mo ago

9900x here and zero crashes since I built it 9 months ago. A lot of the stability comes down to choosing the right RAM with the right timing for Ryzen CPUs.

jonbiggums228mo ago

I haven't moved on from AM4 yet but the way XMP is advertised you'd think it was guaranteed instead of overclocking that technically voids your warranty.

ozgrakkurt8mo ago

Buying one generation old CPUs seems like the good option now.

It is cheaper and more stable. Performance difference doesn’t matter that much too

baobabKoodaa8mo ago

Why is the author showing a chart of room temperatures? CPU temperature is what matters here. Expecting a CPU to be stable at 100C is just asking for problems. Issue probably could have been avoided by making improvements to case airflow.

Jolter8mo ago

I would expect the CPU to start throttling at high temperatures in order to avoid damage. Allegedly, it never did, and instead died. Do you think that’s acceptable in 2025?

ACCount378mo ago

Thermal throttling originated as a safety feature. The early implementations were basically a "thermal fuse" in function, and cut all power to the system to prevent catastrophic hardware damage. Only later did the more sophisticated versions that do things like "cut down clocks to prevent temps from rising further" appear.

On desktop PCs, thermal throttling is often set up as "just a safety feature" to this very day. Which means: the system does NOT expect to stay at the edge of its thermal limit. I would not trust thermal throttling with keeping a system running safely at a continuous 100C on die.

100C is already a "danger zone", with elevated error rates and faster circuit degradation - and there are only this many thermal sensors a die has. Some under-sensored hotspots may be running a few degrees higher than that. Which may not be enough to kill the die outright - but more than enough to put those hotspots into a "fuck around" zone of increased instability and massively accelerated degradation.

If you're relying on thermal throttling to balance your system's performance, as laptops and smartphones often do, then you seriously need to dial in better temperature thresholds. 100C is way too spicy.

baobabKoodaa8mo ago

What does room temperature have to do with any of this? Yes, you can lower your CPU temperature by lowering your room temperature. But you can also lower your CPU temperature by a variety of other means; particularly by improving case airflow. CPU temperature is the interesting metric here, not room temperature.

FeepingCreature8mo ago

No but it's also important to realize that this CPU was running at an insane temperature that should never happen in normal operation. I have a laptop with an undersized fan and if I max out all my cores with full load, I barely cross 80. 100 is mental. It doesn't matter if the manufacturer set the peak temperature wrong, a computer whose cpu reaches 100 degrees celsius is simply built incorrectly.

If nothing else, it very clearly indicates that you can boost your performance significantly by sorting out your cooling because your cpu will be stuck permanently emergency throttling.

izacus8mo ago

I somehow doubt that, are you looking at the same temperature? I haven't seen a laptop that would have thermal stop under 95 for a long time and any gaming laptop will run at 95 under load for package temps.

1 more reply

formerly_proven8mo ago

Strange, laptop CPUs and their thermal solutions are designed in concert to stay at Tjmax when under sustained load and throttle appropriately to maintain maximum temperature (~ power ~ performance).

ACCount378mo ago

And those mobile devices have much more conservative limits, and much more aggressive throttling behavior.

Smartphones have no active cooling and are fully dependent on thermal throttling for survival, but they can start throttling at as low as 50C easily. Laptops with underspecced cooling systems generally try their best to avoid crossing into triple digits - a lot of them max out at 85C to 95C, even under extreme loads.

dezgeg8mo ago

For handhelds the temperature of the device's case is one factor as well when deciding the thermal limits (so you don't burn the user's hands) - less of a problem on laptops.

userbinator8mo ago

Expecting a CPU to be stable at 100C is just asking for problems.

I had an 8th-gen i7 sitting at the thermal limit (~100C) in a laptop for half a decade 24/7 with no problem. As sibling comments have noted, modern CPUs are designed to run "flat-out against the governor".

Voltage-dependent electromigration is the biggest problem and what lead to the failures in Intel CPUs not long ago, perhaps ironically caused by cooling that was "too good" --- the CPU finds that there's still plenty of thermal headroom, so it boosts frequency and accompanying voltage to reach the limit, and went too far with the voltage. If it had hit the thermal limit it would've backed off on the voltage and frequency.

swinglock8mo ago

The text clearly explains all of this.

baobabKoodaa8mo ago

No it does not. Which part of the text do you feel explains this?

chmod7758mo ago

First off, there's a chart for CPU temperature at the very top and they do talk about it:

> I also double-checked if the CPU temperature of about 100 degrees celsius is too high, but no: [..] Intel specifies a maximum of 110 degrees. So, running at “only” 100 degrees for a few hours should be fine.

Secondly, the article reads:

> Tom’s Hardware recently reported that “Intel Raptor Lake crashes are increasing with rising temperatures in record European heat wave”, which prompted some folks to blame Europe’s general lack of Air Conditioning.

> But in this case, I actually did air-condition the room about half-way through the job (at about 16:00), when I noticed the room was getting hot. Here’s the temperature graph:

> [GRAPH]

> I would say that 25 to 28 degrees celsius are normal temperatures for computers.

So apparently a Tom's Hardware article connected a recent heat wave with crashing computers containing Intel CPUs. They brought that up to rule it out by presenting a graph showing reasonable room temperatures.

I hope this helps.

1 more reply

perching_aix8mo ago

> Expecting a CPU to be stable at 100C is just asking for problems.

No. High performance gaming laptops will routinely do this for hours on end for years.

If it can't take it, it shouldn't allow it.

bell-cot8mo ago

I've not looked at the specifics here - but "stable at X degrees, Y% duty cycle, for Z" years is just another engineering spec.

Intel's basic 285K spec's - https://www.intel.com/content/www/us/en/products/sku/241060/... - say "Max Operating Temperature 105 °C".

So, yes - running the CPU that close to its maximum is really not asking for stability, nor longevity.

No reason to doubt your assertion about gaming laptops - but chip binning is a thing, and the manufacturers of those laptops have every reason to pay Intel a premium for CPU's which test to better values of X, Y, and Z.

whyoh8mo ago

It's crazy how unreliable CPUs have become in the last 5 years or so, both AMD and Intel. And it seems they're all running at their limit from the factory, whereas 10-20 years ago they usually had ample headroom for overclocking.

stavros8mo ago

That's good, isn't it? I don't want the factory leaving performance on the table.

topspin8mo ago

I do. I've been buying Intel for the same reason as the author: I build machines that don't have glitches and mysterious failures and driver issues and all the rest of the garbage one sees PC assemblers inflict on themselves. Make conservative choices and leave ample headroom and you get a solid machine with no problems.

I've never overclocked anything and I've never felt I've missed out in any way. I really can't imagine spending even one minute trying to squeeze 5% or whatnot tweaking voltages and dealing with plumbing and roaring fans. I want to use the machine, not hotrod it.

I would rather Intel et al. leave a few percent "on the table" and sell things that work, for years on end without failure and without a lot of care and feeding. Lately it looks like a crapshoot trying to identify components that don't kill themselves.

stavros8mo ago

So underclock your CPU.

2 more replies

bell-cot8mo ago

Depends on your priorities. That "performance on the table" might also be called "engineering safety factor for stability".

makeitdouble8mo ago

TBF using more conservative energy profiles will bring stability and safety. To that effect in Windows the default profile effectively debuffs the CPU and most people will be fine that way.

1 more reply

stavros8mo ago

Given that there used to be plenty of room to overclock the cores while still keeping them stable, I think it was more "performance on the table".

1 more reply

devnullbrain8mo ago

Yep. Redundancy and headroom are antonyms of efficiency.

techpression8mo ago

The 7800X3D is amazing here, runs extremely cool and stable, you can push it far above its defaults and it still won’t get to 80C even with air cooling. Mine was running between 60-70 under load with PBO set to high. Unfortunately it seems its successor is not that great :/

williamDafoe8mo ago

The 7000 series of CPUs is NOT known for running cool, unlike the AMD 5000 series (which are basically server CPUs repurposed for desktop usage). In the 7000 series, AMD decided to just increase the power of each CPU and that's where most of the performance gains are coming from - but power consumption is 40-50% higher than with similar 5000-series CPUs.

scns8mo ago

When you use EcoMode with them you only lose ~5% performance, but are still ~30% ahead of the corresponding 5000-series CPU. You can reduce PPT/TDP even further while still ahead.

https://www.computerbase.de/artikel/prozessoren/amd-ryzen-79...

techpression8mo ago

I specifically singled out the 7800X3D though, it runs incredibly cool and at a very low power draw for the performance you get.

mldbk8mo ago

> You know, I'm something of a CPU engineer myself :D

Actually almost everything what you wrote is not true, and commenter above already sent you some links.

7800X3D is the GOAT, very power efficient and cool.

1 more reply

hu38mo ago

Same for 9800X3D here, which is basically the same CPU. Watercooled. Silent. Stupidly fast.

k4rli8mo ago

7900X same. System uptimes of 1month+ often and nearly always runs at 5.0Ghz. Never goes above 80c or so either.

mrheosuper8mo ago

we have unstable "code" generator, so unstable CPU would be natural.

orthoxerox8mo ago

It's interesting how Intel has been surviving in smaller and smaller market niches these days:

  - cheap ULV chips like N100, N150, N300
  - ultrabook ULV chips (I hope Lunar Lake is not a fluke)
  - workstation chips that aren't too powerful (mainstream Core CPUs)
  - inexpensive GPUs (a surprising niche, but excruciatingly small)

AMD has been dominating them in all other submarkets.

Without a mainstream halo product Intel has been forced to compete on price, which is not something they can afford. They have to make a product that leapfrogs either AMD or Nvidia and successfully (and meaningfully) iterate on it. The last time they tried something like that was in 2021 with the launch of Alder Lake, but AMD overtook them with 3D V-Cache in 2022.

norman7848mo ago

AFAIK most (if not all) business laptops AKA Dell are intel based? Also I believe they are still big in the server market.

guardian5x8mo ago

Dell has been very loyal to Intel all these years, but i guess that is under pressure as well. As more and more customers look for AMD CPUs nowadays. I guess the CPU doesn't matter much in standard office company laptops and price is more important.

noisem4ker8mo ago

I'm not sure whether a "Dell Pro 16 Plus" is considered a "business laptop" (although I think so), but I'm using one right now and it has an AMD Ryzen AI 5 Pro CPU inside.

vid8mo ago

For the past 30 odd years I've hand picked and built a desktop PC (which also acts as a home server) pretty much every year, selling the "old" one each time. I really enjoy it as a hobby and for the benefits of understanding and optimizing a system at the parts level. Even though there is a lot of nonsense created by all the choices and marketing, I really prefer the parts approach, and am happy with Linux so a Mac isn't very appealing. A perfectly designed PC can do tasks very well with optimized parts and at a much better price.

But I just can't bring myself to upgrade this year. I dabble in local AI, where it's clear fast memory is important, but the PC approach is just not keeping up without going to "workstation" or "server" parts that cost too much.

There are glimmers of hope with MR-DIMMs CU-DIMM, and other approaches, but really boards and CPUs need to support more memory channels. Intel has a small advantage over AMD, but it's nothing compared to the memory speed of a Mac Pro or higher. "Strix Halo" offers some hope with four memory channel support, but it's meant for notebooks so isn't really expandable (which would enable à la carte hybrid AI; fast GPUs with reasonably fast shared system RAM).

I wish I could fast forward to a better time, but it's likely fully integrated systems will dominate if the size and relatively weak performance for some tasks makes the parts industry pointless. It is a glaring deficiency in the x86 parts concept and will result in PC parts being more and more niche, exotic and inaccessible.

827a8mo ago

To be honest, much of the sense that Apple is ridiculously far ahead when it comes to unified memory SoC architectures comes from people who aren't actually invested in any kind of non-Nvidia local AI development to the degree where you'd actually notice a difference (either the AMD AI Max platform or Apple Silicon Ultra). Because if you were, you'd realize that the grass isn't greener on these unified memory platforms, and no one in the industry has a product that can compete with Nvidia on any vertical except "things for Jeff Geerling to make a video about".

vid8mo ago

People are running GPT OSS 120b at 46 tokens per second on Strix Halo systems, which is quite usable and a fraction of the cost of a 128GB NVidia or Apple system. Apple's GPU isn't that strong, so real competition to Apple and NVidia can be created.

827a8mo ago

Exactly yeah, my point is that there's a lot more to running these models than just the raw memory bandwidth and GPU-available memory size, and the difference between a $6000 M4 Ultra Mac Studio and a $2000 AI Max 395+ isn't actually as big as the raw numbers would suggest.

On the flip-side, though: Running GPT-OSS-120b locally is "cool", but have people found useful, productivity enhancing use-cases which justify doing this over just loading $2000 into your OpenAI API account? That, I'm less sure of.

I think we'll get to the point where running a local-first AI stack is obviously an awesome choice; I just don't think the hardware or models are there yet. Next-year's Medusa Halo, combined with another year of open source model improvements might be the inflection point.

1 more reply

gsibble8mo ago

I think most parts are geared towards gaming these days. When I've needed a server, I went for multi-CPU setups with older, cheaper CPUs.

That being said, for AI, HEDT is the obvious answer. Back in the day, it was much more affordable with my 9980XE only costing $2,000.

I just built a Threadripper 9980 system with 192GB of RAM and good lord it was expensive. I will actually benefit from it though and the company paid for it.

That being said, there is a glaring gap between "consumer" hardware meant for gaming and "workstation" hardware meant for real performance.

Have you looked into a 9960 Threadripper build? The CPU isn't TOO expensive, although the memory will be. But you'll get a significantly faster and better machine than something like a 9950X.

I also think besides the new Threadripper chips, there isn't much new out this year anyways to warrant upgrading.

vid8mo ago

I have looked into the Threadripper, but just can't justify it. The tension between all the options and the cost, power usage, size (EATX) is too much, and I don't think such a system, especially with 2025 era DDR5 in the 6000mt range, will hold its value well. If I were directly earning money with it, sure, but as a hobby/augmentation to my work, I will wait out a generation or lose interest in the pursuit.

Competitors to NVidia really need to figure things out, even for gaming with AI being used more I think a high end APU would be compelling with fast shared memory.

stillsut8mo ago

At a meta-level, I wonder if there's this un-talked about advantage of poaching ambitious talent out of an established incumbent to work a new product line in a new organization, in this case Apple Silicon disrupting Intel/AMD. And we've also seen SpaceX do this NASA/Boeing, and OpenAI do it to Google's ML departments.

It seems like large, unchallenged organizations like Intel (or NASA or Google) collect all the top talent out of school. But changing budgets, changing business objectives, frozen product strategies make it difficult for emerging talent to really work on next-generation technology (those projects have already been assigned to mid-career people who "paid their dues").

Then someone like Apple Silicon with M-chip or SpaceX with Falcon-9 comes along and poaches the people most likely to work "hardcore" (not optimizing for work/life balance) while also giving the new product a high degree of risk tolerance and autonomy. Within a few years, the smaller upstart organization has opened up in un-closeable performance gap with behemoth incumbent.

Has anyone written about this pattern (beyond Innovator's Dilemma)? Does anyone have other good examples of this?

vid8mo ago

I'm not sure it really takes that kind of breakthrough approach. Apple chips are more energy efficient, but x86 can be much faster on CPU or GPU tasks, and it's much more versatile. A main "bug and feature" issue is the PC industry relies on common denominator standards and components, whereas Apple has gone vertical with very limited core expansion. This is particularly important when it comes to memory speed, where the standards are developed and factories upgraded over years at huge cost.

I gather it's very difficult and expensive to make a board that supports more channels of RAM, so that seems worth targeting at the platform level. Eight channel RAM using common RAM DIMMs would transform PCs for many tasks, however for now gamers are a main force and they don't really care about memory speed.

stillsut8mo ago

Makes sense: M-chips, Falcon-9, GPT's are product subsets or the incumbent's traditional product capabilities.

natch8mo ago

Apple and unified memory seems great but losing CUDA seem like a big downside.

How do you sell your systems when their time comes?

vid8mo ago

I post them with good descriptions on local sites and Facebook marketplace (sigh) and wait for the right buyer. Obviously for less than what I paid, but top end parts can usually get a good price, I got a year of enjoyment out of it, and it's not going to landfill.

augustl8mo ago

Happy 9950X user here. Super happy with it, everything is crazy fast. Not a gamer, according to internet and benchmarks the extra cost is only worth it for gaming workloads.

I use Arch, btw ;)

Aeolun8mo ago

With AMD I find it’s often quite reasonable to go for their fastest hardware simply because you can. A top of the line AMD PC is $2500, but a Intel/Nvidia one easily runs $5000, though I’ll admit that’s almost all GPU.

fmajid8mo ago

I generally prefer AMD Zen5 to Intel due to AVX512 not being gimped by crippled E-cores that really don't belong on a desktop system, SMT (hyperthreading) that actually works and using TSMC processes, but they've also had their issues recently:

https://www.theregister.com/2025/08/29/amd_ryzen_twice_fails...

Ekaros8mo ago

Seems like failure in choosing cooling solutions. These high-end chips have obscene cooling needs. My guess would be using something that was not designed for TDP in question.

Sufficient cooler, with sufficient airflow is always needed.

uniqueuid8mo ago

For what it's worth, I have an i9-13900K paired with the largest air cooler available at the time (a be quiet! Dark Rock 5 IIRC), and it's incapable of sufficiently cooling that CPU.

The 13900k draws more than 200W initially and thermal throttles after a minute at most, even in an air conditioned room.

I don't think that thermal problems should be pushed to end user to this degree.

michaelt8mo ago

The "Dark Rock 5" marketing materials say it provides a 210 W TDP [1] and marketers seldom under-sell their products' capabilities.

So if your CPU is drawing "more than 200W" you're pretty much at the limits of your cooler.

[1] https://www.bequiet.com/en/cpucooler/5110

lmm8mo ago

Feels like CPU manufacturers should be at least slapping a big warning on if they're selling a CPU that draws more power than any available cooler can dissipate.

2 more replies

SomeoneOnTheWeb8mo ago

This means your system doesn't have enough airflow if it throttles this quickly.

But I agree this should not be a problem in the first place.

mrheosuper8mo ago

CPU TDP means nothing now. a 65W tdp cpu can easily consume over 100w during boost.

ttyyzz8mo ago

Agree. Also, use good thermal paste. 100 °C is not safe or sustainable long term. Unfortunately, I think the manufacturer's specifications regarding the maximum temperature are misleading. With proper cooling, however, you'll be well within that limit.

onli8mo ago

No, those processors clock or shut down if too hot. In no circumstances should they fail because of insufficient cooling. Even without airflow etc.

williamDafoe8mo ago

A badly optimized CPU will take excessive amounts of power. The "failure in choosing cooling solutions" excuse is just the pot calling the kettle black.

casenmgreen8mo ago

Curiously, libgmp reported something similar recently, but with AMD.

https://gmplib.org/gmp-zen5

kd9138mo ago

General consensus on that case seems to be they picked a budget motherboard and skimped on the cooler.

devnullbrain8mo ago

That ASUS motherboard is far from the cheapest available. If using it makes the user liable for failure, a large part of the market is unsuitable.

For both the cooler and the motherboard, AMD have too much control to look the other way. The chip can measure its own temperature and the conceit of undermining partners by moving things on chip and controlling more of the ecosystem is that things perform better. They should at least perform.

kd9138mo ago

The cpu is what a 9950x whilst paired with one of the cheapest asus motherboards with underpowered VRMs according to games nexus, hardware unboxed.

The cooler was under the rated tdp of the platform. That and it lasted 6 months and so far seemed the only case of it falling over like it did.

Yea am leaning on it being user error.

devnullbrain8mo ago

In my experience building PCs this is not so curious. There are just a lot of duds, from individual SKUs to entire generations, and both manufacturers and retailers will do anything to prevent you RMAing them.

I also find that, as performance improvements tolerances get tighter throughout the system, the set of 'things that can screw your build' grows bigger.

steve19778mo ago

I tried building and running a 7950X workstation for some time. I managed to get stable settings in a modified "ECO" mode (i.e. maybe about 10% less performance, but much less power usage).

The problem is, it's a huge effort to get there. You really have to tune PBO curves for each core individually, as they can vary so much between cores.

Now the test itself is mostly automatic with tools like OCCT, but of course you have to change the settings in the BIOS between each test and you cannot use the computer during that time, so there's a huge opportunity cost. I'm talking about weeks, not days.

To cut a long story short, I sold the system and just bought a M4 Max Mac Studio now. Apple Silicon might not have the top performance of AMD or Intel, but it comes with much less headaches and opportunity cost. Which in the end probably equalizes the difference in purchase cost.

eptcyka8mo ago

This is rather late, to be quite fair.

discardable_dan8mo ago

My thoughts exactly: he figured out in 2025 what the rest of us knew in 2022.

positron268mo ago

One of my work computers died and I hadn't checked the CPU market in years. Rode home that night in a taxi with a Ryzen 1700x completely stoked that AMD was back in the game.

If anyone thinks competition isn't good for the market or that also-rans don't have enough of an effect, just take note. Intel is a cautionary tale. I do agree we would have gotten where we are faster with more viable competitors.

M4 is neat. I won't be shocked if x86 finally gives up the ghost as Intel decides playing in Risc V or ARM space is their only hope to get back into an up-cycle. AMD has wanted to do heterogeneous stuff for years. Risc V might be the way.

One thing I'm finding is that compilers are actually leaving a ton on the table for AMD chips, so I think this is an area where AMD and all of the users, from SMEs on down, can benefit tremendously from cooperatively financing the necessary software to make it happen.

J_Shelby_J8mo ago

You figured out in 2022 that AMD would finally catch up to intel single core performance in 2025?

Fr0styMatt888mo ago

Alright so two CPUs failing in the same system has gotta be strange; mobo issue?

Secondly, what BIOS settings should I be using to run safely? Is XMP/whatever the AMD equivalent is safe? If I don't run XMP then my RAM runs at way below spec (for the stick) default speeds.

Anyone know of a good guide for this stuff?

jonbiggums228mo ago

XMP is technically overclocking, nothing inherently safe about it. I've had new dual channel kits fail memtest at XMP settings on Ryzen, it seemed to depend almost entirely on what the individual CPUs memory controller was capable of.

Maybe the situation is better on DDR5 platforms.

Jnr8mo ago

I have not had any issues with Intel or AMD CPUs but I have so many issues with AMD APUs, I would steer clear of them. In my experience with different models, they have many graphics issues, broken video transcoding and overall extremely unstable. If you need decent integrated graphics then Intel is the only real option.

sellmesoap8mo ago

They make a lot of apus for gaming handhelds, I think they do well in that segment. I've had a handful of desktop and laptop apus with no complaints. Even an APU with ecc support, they've all worked without a hitch. I haven't tried transcoding anything on them mind you.

Jnr8mo ago

Yes, I have Steam Deck and it works great. But I also have 2400G and 5700G and both of those have graphics issues (tested with different recommended RAM sets).

imiric8mo ago

I've had the same experience with an 8600G on Linux. Very frequent graphics driver crashes and KDE/Wayland freezes, on old and new kernels alike. I've been submitting error reports for months, and the issues still persist. The RAM passes MemTest, and the system otherwise works fine, but the graphics issues are very annoying. It's not like I'm gaming or doing anything intensive either; it happens during plain desktop usage.

Yet I also use a 7840U in a gaming handheld running Windows, and haven't had any issues there at all. So I think this is related to AMD Linux drivers and/or Wayland. In contrast, my old laptop with an NVIDIA GPU and Xorg has given me zero issues for about a decade now.

So I've decided to just avoid AMD on Linux on my next machine. Intel's upcoming Panther Lake and Nova Lake CPUs seem promising, and their integrated graphics have consistently been improving. I don't think AMD's dominance will continue for much longer.

hedora8mo ago

Check dmesg after the driver crashes and restarts. If the crash is something about a ringbuffer timeout, use dmidecode to see what the ram is actually clocked at.

Make sure it matches the min of the actual spec of the ram that you bought and what the CPU can do.

I used to get crashes like you are describing on a similar machine. The crashes are in the GPU firmware, making debugging a bit of a crap shoot. If you can run windows with the crashing workload on it, you’ll probably find it crashes the same ways as Linux.

For me, it was a bios bug that underclocked the ram. Memory tests, etc passed.

I suspect there are hard performance deadlines in the GPU stack, and the underclocked memory was causing it to miss them, and assume a hang.

If the ram frequency looks OK, check all the hardware configuration knobs you can think of. Something probably auto-detected wrong.

imiric8mo ago

Hhmm I did underclock the RAM to 4800 MHz, since running it at the stock 6400 MHz would overheat the system (it's a mini PC) and cause artifacting. And, practically, I don't need higher frequencies, since I'm using the machine as an HTPC and for casual desktop use. In fact, from what I've read, high frequencies can introduce stability issues on these APUs, which is exactly what I'm trying to avoid.

But I'll play around with this and the timings, and check if there's a BIOS update that addresses this. Though I still think that AMD's drivers and firmware should be robust enough to support any RAM configuration (within reason), so it would be a problem for them to resolve regardless.

Thanks for the suggestion!

1 more reply

Jnr8mo ago

Initially I also tried debugging, writing reports, etc. Some 8 years later I have given up and just live with the occasional crashes.

1 more reply

vkazanov8mo ago

My laptop's AMD is great (Ryzen AI 7 PRO 360 w/ Radeon 880M). Gaming, GPI work, battery, small LLMs - all just work on my Ubuntu.

Don't know about transcoding though.

energy1238mo ago

I can't comment on the quality question, but for memory bandwidth sensitive use cases, Intel desktop is superior.

olavgg8mo ago

The last 15 years, servers has gone from 3x memory channels to 12x, while desktop still only have 2x memory channels. It is by far the biggest bottleneck today.

Dylan168078mo ago

15 years ago a server CPU had twice as many cores as a desktop CPU. Today a server CPU has about eight times as many cores.

ttyyzz8mo ago

I'm not convinced, what would be the use case?

energy1238mo ago

Data science where you need to keep ~50GB of data in RAM and do intensive things with it (e.g. loop over it repeatedly with numba). You can't get use out of more than 4 cores because memory bandwidth is the only limitation. The data is too big for AMD's cache to be a factor.

Threadripper is built for this. But I am talking about the consumer options if you are on a budget. Intel has significantly more memory bandwidth than AMD in the consumer end. I don't have the numbers on hand, but someone at /r/localllama did a comparison a while ago.

exceptione8mo ago

This? https://old.reddit.com/r/LocalLLaMA/comments/1ak2f1v/ram_mem...

I can't see how that supports your conclusion.

1 more reply

Marsymars8mo ago

Framework Desktop?

jeffbee8mo ago

The idle power consumption on this guy's rig is completely outrageous. Since almost everything else in my rig is the same, but I use the integrated GPU, I can only conclude that the power floor for GPUs is way too high. Or is it Linux that isn't managing the GPU properly?

formerly_proven8mo ago

> Looking at my energy meter statistics, I usually ended up at about 9.x kWh per day for a two-person household, cooking with induction.

> After switching my PC from Intel to AMD, I end up at 10-11 kWh per day.

It's kind of impressive to increase household electricity consumption by 10% by just switching one CPU.

rubin558mo ago

I run a 13900T unlocked (meaning, it runs 35W TDP at idle, 1.1ghz, but is allowed to peak to 210W for up to a minute, with the hugest Noctua D14something I could fit on it). It runs at ~29c idle, peaks to 80ish celsius at 210W (~4.5ghz over all cores - songle core peaking to 5.3ghz).

For a time I ran it 24/7 without suspend. It's a big system, lots of disks, expansion cards, etc. If it doesn't suspend, and doesn't do anything remarkable, it uses about ~5kWh per day. Needless to say, it suspends after 60 minutes now (my daily energy usage went from ~9 to ~4 kWh).

usr11068mo ago

I guess the author runs it at high load for long times, not only for the benchmarks to write this blog post. And less than 10 kWh is a low starting point, many households would be much higher.

Dunedan8mo ago

That vastly depends where you live and what you use electricity for. Most of Europe for example uses much less energy [1], although that will probably change as heat pumps are becoming more and more widespread.

[1]: https://en.wikipedia.org/wiki/European_countries_by_electric...

formerly_proven8mo ago

I think this is just consumption divided by population, so very easily influenced by e.g. having little population and many data centers: I doubt the average person in Iceland is spending 10k+ bucks on electricity annually.

don-bright8mo ago

The amount of power this is using is roughly the same as it takes my car to do my short commute to work

shmerl8mo ago

Well, worryingly there were reports of AMD X3D CPUs burning out too. I hope it will be sorted out.

cjpartridge8mo ago

Try enabling PBO and finding a setting for the curve optimizer that works for you, each CPU is different but -10/-15 is generally achievable - should reduce temperatures across the board and potentially give you some more performance.

steve19778mo ago

The problem is that stable curve optimizer settings can vary hugely across cores

I had differences of like 20 or more between different cores... i.e. one core might work fine at -20, the other maybe only at +5.

cjpartridge8mo ago

Most definitely - you should always do your own stress testing with your specific CPU (and system) to find out what's stable.

And while all core CO might not be optimal, based on personal experience and what I've seen across multiple enthusiast communities, more often than not you can get an worthwhile improvement to temps/perf with an all core CO.

That being said, there are certainly ways to find and set the best CO values per core, but it will certainly take more effort, stress testing and time.

mrlonglong8mo ago

To make linking go quicker, use mold.

Pass -fuse=mold when building.

positron268mo ago

Do beware when doing WASM with mold. I shipped a broken WASM binary that Firefox could run just fine but Chrome would not.

jcalvinowens8mo ago

I wonder if some of what people interpret as hardware problems are actually software bugs that only trigger on new CPUs...

I recently hit this testing pre-release kernels on my gaming PC, a 9900X3D: https://lore.kernel.org/lkml/20250623083408.jTiJiC6_@linutro...

A pile of older Skylake machines was never able to reproduce that bug one single time in 100+ hours of running the same workload. The fast new AMD chips would almost always hit it in a few hours.

itvision8mo ago

Yeah, nice:

> I get the general impression that the AMD CPU has higher power consumption in all regards: the baseline is higher, the spikes are higher (peak consumption) and it spikes more often / for longer.

> Looking at my energy meter statistics, I usually ended up at about 9.x kWh per day for a two-person household, cooking with induction.

> After switching my PC from Intel to AMD, I end up at 10-11 kWh per day.

It's been the bane of desktop AMD CPUs since Zen 1. Hopefully AMD will address this in Zen 6 but I don't have too much hope.

TinkersW8mo ago

I wonder why the idle power is so high(55 watts), I have measured a beelink mini PC with an 8 core Zen4 when idle, and it was 10 watts.

itvision8mo ago

> I have measured a beelink mini PC with an 8 core Zen4 when idle, and it was 10 watts.

Zen APUs have no such issue.

My 7840HS idles at 3W when plugged in and around 0.5W when running on battery power.

rkrisztian8mo ago

It's the 3D cache, as I wrote in my other response. It has to be powered on at all times, so it affects even the idle power usage.

itvision8mo ago

It has little to nothing to do with the 3D cache.

The IOD (die) is extremely inefficient for all desktop Zen CPUs as it never truly idles.

1 more reply

FuriouslyAdrift8mo ago

The 9950X3D is equivalent to the Intel Core Ultra 9 285K which uses even more power. The X3D series are extreme gaming chips.

rkrisztian8mo ago

What are you talking about? AMD has been really good at the power efficiency department until the 3D CPUs that use extra power for cache memory that simply cannot be turned off. Plus, Intel started applying the 3nm fabrication process, while AMD is still at 4nm. But previously, Intel was at 10nm for a long time, see i9-13900K for example, while Ryzen went to 5nm much sooner, see Ryzen 9 7900x.

itvision8mo ago

Nothing, I'm making this up, except it's been confirmed by pretty much all desktop Zen users:

https://www.reddit.com/r/Amd/comments/1brs42g/amd_please_tac...

I don't bloody care that AMD CPUs seem to be more power efficient than Intel's. For most people their CPUs are completely idle most of the time and Zen CPUs on average idle at 25W or MORE.

Many Zen 4 and Zen 5 owners report that their desktop CPUs idle at 40W or more even without the 3D cache.

rkrisztian8mo ago

I can't confirm the 40W, my Ryzen 9 7900 (non-X) consumes 1W to 3W at idle on Windows 10.

1 more reply

rkrisztian8mo ago

Please someone flag the comment above for offensive language ("I don't bloody care")

wallopinski8mo ago

Funny and depressing that the AMD/Intel culture war still exists. I remember arguing about it in 1990. Their marketing departments severely brainwashed generations of nerds.

tverbeure8mo ago

I didn't see anything CPU culture war related in the blog post.

J_Shelby_J8mo ago

It’s impossible to find balanced information on the topic because so many people are heavily invested in AMD stock.

bubblebeard8mo ago

I have the same CPU in my primary system, and if you can afford it, it’s so choice.

A big surprise for me, having owned both a Ryzen gen 1 & 3 previously, was that this time my system posted without me needing to flash my BIOS or play around with various RAM configurations. Felt like magic.

unsnap_biceps8mo ago

I also recently swapped to amd and my biggest surprise was how awful their platform is. With intel, getting sensor data just worked without anything special. With amd, it looks like each platform, perhaps model, requires special support. My mobo is a used godlike, and just has zero sensor support.

protocolture8mo ago

Hmm arent the top end Ryzens dying too? This could be a funny future blog post.

whalesalad8mo ago

OP how are you collecting internal metrics? I see what looks like a grafana dashboard of your workstation usage and I would like to do something similar.

secureOP8mo ago

I’m using Prometheus for this, with the Prometheus Node Exporter

andsoitis8mo ago

> I would say that 25 to 28 degrees celsius are normal temperatures for computers.

An ideal ambient (room) temperature for running a computer is 15-25 celcius (60-77 Fahrenheit)

Source: https://www.techtarget.com/searchdatacenter/definition/ambie...

trueismywork8mo ago

And that is an impossibility in most of the world today and it will be even more like that going forward.

nl8mo ago

Much of the world (for better or worse) uses airconditioning in places they commonly use desktop computers.

em-bee8mo ago

no they don't. in some countries in europe (maybe in all of them?), installing airconditioning is frowned upon because it is considered a waste of energy. if you want government subsidies for replacing your heating system with a more energy efficient one you are not allowed to have airconditioning. and in the rest of the world only people/countries well of, that don't consider their energy usage, do it. airconditioning is luxury.

using to much airconditioning is also not comfortable. i used to live in singapore. we used to joke that singapore has two seasons: indoors and outdoors. because the airconditioning is powered so high that you had to bring jacket to wear inside. i'd frequently freeze after entering a building. i don't know why they do it, because it doesn't make sense. when i did turn on airconditioning at home i'd go barely below 30. just a few degrees cooler than the outside so it feels more comfortable without making the transition to hard.

2 more replies

trueismywork8mo ago

No they dont. They don't have the money. I remember my childhood when gaming in summer holidays in India, my PC would run at full tilt because my room was at 36C (and outside was 48C).

imtringued8mo ago

So you're saying that if you go even 3 degrees Celsius over that temperature range you should expect your CPU to fry itself? Even when the CPU throttled itself to exactly 100°C?

andsoitis8mo ago

> So you're saying that if you go even 3 degrees Celsius over that temperature range you should expect your CPU to fry itself? Even when the CPU throttled itself to exactly 100°C?

It is actually 2.9999, precisely.

johnisgood8mo ago

  coretemp-isa-0000
  Adapter: ISA adapter
  Package id 0:  +40.0°C  (high = +80.0°C, crit = +100.0°C)
  Core 0:        +38.0°C  (high = +80.0°C, crit = +100.0°C)
  Core 1:        +39.0°C  (high = +80.0°C, crit = +100.0°C)

Are they saying this is bad? This Intel CPU has been at it for over a decade. There was a fan issue for half a year and would go up to 80 C for... half a year. Still works perfectly fine but it is outdated, it lacks instruction sets that I need, and it has two cores only, and 1 thread per core.

Maybe today's CPUs would not be able to handle it, I am not sure. One would expect these things to only improve, but seems like this is not the case.

Edit: I misread it, oops! Disregard this comment.

Rohansi8mo ago

That is your CPU temperature, not ambient (room) temperature.

johnisgood8mo ago

Oh, I misread. My bad!

1 more reply

rurban8mo ago

I've burned two of those already, watercooled. The H100 traffic was too much for them

KronisLV8mo ago

> I also double-checked if the CPU temperature of about 100 degrees celsius is too high, but no: this Tom’s Hardware article shows even higher temperatures, and Intel specifies a maximum of 110 degrees. So, running at “only” 100 degrees for a few hours should be fine.

I'd say that even crashing at max temperatures is still completely unreasonable! You should be able to run at 100C or whatever the max temperature is for a week non-stop if you well damn please. If you can't, then the value has been chosen wrong by the manufacturers. If the CPU can't handle that, the clock rates should just be dialed back accordingly to maintain stability.

It's odd to hear about Core Ultra CPUs failing like that, though - I thought that they were supposed to be more power efficient than the 13th and 14th gen, all while not having their stability issues.

That said, I currently have a Ryzen 7 5800X, OCed with PBO to hit 5 GHz with negative CO offsets per core set. There's also an AIO with two fans and the side panel is off because the case I have is horrible. While gaming the temps usually don't reach past like 82C but Prime95 or anything else that's computationally intensive can make the CPU hit and flatten out at 90C. So odd to have modern desktop class CPUs still bump into thermal limits like that. That's with a pretty decent ambient temperature between 21C to 26C (summer).

williamDafoe8mo ago

Just FYI Google runs their data centers at 85 degrees F (about 30 degrees C). I think Google probably knows more about how to run Intel CPUs for longest life and lowest cost per CPU cycle. After all they are the #5 computer maker on earth. What Intel is doing and what they are recommending is the act of a desperate corporation incapable of designing energy-efficient CPUs, incapable of progressing their performance in MIPS per Watt of power. This is a sign of a failed corporation.

Panzer048mo ago

Google runs datacenters hot because it's probably cheaper than over-cooling them with AC.

Chips are happy to run at high temperatures, that's not an issue. It's just a tradeoff of expense and performance.

KronisLV8mo ago

> Just FYI Google runs their data centers at 85 degrees F (about 30 degrees C). I think Google probably knows more about how to run Intel CPUs for longest life and lowest cost per CPU cycle. After all they are the #5 computer maker on earth.

Servers and running things at scale are way different from consumer use cases and the cooling solutions you'll find in the typical desktop tower, esp. considering the average budget and tolerance for noise. Regardless, on a desktop chip, even if you hit tJMax, it shouldn't lead to instability as in the post above, nor should the chips fail.

If they do, then that value was chosen wrong by the manufacturer. The chips should also be clocking back to maintain safe operating temps. Essentially, squeeze out whatever performance is available with a given cooling solution: be it passive (I have some low TDP AM4 chips with passive Alpine radiator blocks), air coolers or AIOs or a custom liquid loop.

> What Intel is doing and what they are recommending is the act of a desperate corporation incapable of designing energy-efficient CPUs, incapable of progressing their performance in MIPS per Watt of power.

I don't disagree with this entirely, but the story is increasingly similar with AMD as well - most consumer chip manufacturers are pushing the chips harder and harder out of the factory, so they can compete on benchmarks. That's why you hear about people limiting the power envelope to 80-90% of stock and dropping close to 10 degrees C in temperatures, similarly you hear about the difficulties of pushing chips all that far past stock in overclocking, because they're already pushed harder than the prior generations.

To sum up: Intel should be less delusional in how far they can push the silicon, take the L and compete against AMD on the pricing, instead of charging an arm and a leg for chips that will burn up. What they were doing with the Arc GPUs compared to the competitors was actually a step in the right direction.

black_puppydog8mo ago

Layoutparser looks really neat! Glad the author led with this. :D

Avlin678mo ago

get a w790 motherboard with proper xeon. If it says 100° DTS it still not a good idea to run it 100° for 3hours, because it id already throtling.

neurolesudiste8mo ago

Will do the same on my next hardware turn-over.

austin-cheney8mo ago

Another reason to switch from Intel to AMD is that Intel is in the top priority segment of the following list while AMD is not on the list at all:

https://boycott-israel.org/boycott.html

gradientsrneat8mo ago

People boycotting an entire company because a fraction of its civilian industry operates in a tiny country that doesn't care about your existence at all, with a high concentration of multiple ethnic minorities who fled from other nearby countries, in an extremely politically unstable region, will never not sound weird to me.

TSMC (AMD's fab), is heavily based in Taiwan, which has its own implications regarding long-term sustainability and monopoly.

With only two real choices for x86, and the complexity of the global supply chain, it hardly seems like a fair comparison.

austin-cheney8mo ago

Boycotting countries that actively participate in apartheid sounds perfectly reasonable to me. When that apartheid graduates to ethnic cleansing and then genocide it becomes more than reasonable, more than necessary. Israel will not stop murdering people until they are crushed either economically, militarily, or both. Their political leaders have made this clear even to the loud objections of their own military leadership.

hawshemi8mo ago

And welcome to USB slow speeds and issues...

amelius8mo ago

USB issues are driving me nuts. Please, someone show me the path to serenity.

crinkly8mo ago

I’ve given up both.

Thorrez8mo ago

What do you use?

aurareturn8mo ago

I’ve given up on both and use Apple Silicon only. AMD and Intel are simply too power hungry for how slow they are and can’t optimize for power like Apple can.

maciejw8mo ago

I also switched to cheapest Mac Mini M4 Pro this year (after 20+ years of using Intel CPUs). MacOS has its quirks, but it provides ZSH and it "just works" (unlike manjaro I used in parallel with Windows). I especially like the preview tool - it has useful pdf and photo editing options.

The hardware is impressive - tiny, metal box, always silent, basic speaker built-in and it can be left always on with minimal power consumption.

Drive size for basic models is limited (512gb) - I solved it by moving photos to NAS. I don't use it for gaming, except Hello Kitty Island Adventure. I would say it's a very competitive choice for a desktop PC in 2025 overall.

lostlogin8mo ago

I just replaced a headless nuc 9 with a headless M4.

Nuc 9 averaged 65-70W power usage, while the m4 is averaging 6.6W.

The Mac is vastly more performant.

mr_windfrog8mo ago

That's pretty amazing, I've never heard of that before .-_-!

crinkly8mo ago

M4 Mac. Not going back now. It’s like skipping a decade ahead.

Avlin678mo ago

sorry, but looking at previous article, you computer cooling seems inadéquate for sustained loads

stevefan19998mo ago

Just a funny anecdote:

I got an i5 13600KF last black friday (with a long haul to Hong Kong for about 2 weeks) from Amazon, with initially a budget motherboard that I thought would be fine, and it turns out the system would keep turning off at one point and reboot again with a huge drop in voltage (it was about 10 months later that I learned this is a brownout).

It was for my company computer, but I bought it personally, so the ownership is still mine. I then bought a new SF750 PSU at home and swapped the CPU for 13100 salvaged from a computer someone donated, so now the 13600KF would be my personal gaming rig.

I made sure it gets a platform that sustain enough power and appropriate headroom for thermals, and it was all fine until 6 months ago, it starts to BSOD all over the place, when gaming; programming; or even just resume from suspend. I have to refund two games because of this, one is accepted and the other isn't. And also turn over to cloud machine for development because BSOD in the middle of debugging is really nasty.

So I decided to say "fuck it, I'm going back to AMD". I actually still use my 3700X gig a year ago but I figured the 5 year old system is now becoming an old dog. I just can't run most modern game at even 80FPS, so I swapped to the 13600KF as an intermediate replacement until it glitched up, so I need another replacement again.

Coincidentally I bought a 7945HX engineering sample ITX motherboard originally for the intent of running Kubernetes homelab (now that I think about it, a big waste of money indeed, yikes). Then I have a eureka moment: why don't I just use that 7945HX plus the 96GB DDR5 that I bought?

So after a painful assemble-reassemble process, I'm back to AMD once again -- it was almost perfect, scoring almost exactly as a 5950X, but only at around 100W TDP for the total package, with almost double the CPU cache, plus it is not the Zen 5/Zen 5c design which complicates CPU scheduling, I have been able to solve the gaming-productivity dilemma at the same time -- and the MoDT motherboard itself is just shy of ~1800HKD in total, which is less than the 5950X CPU alone plus I have a huge TDP headroom for the 9070XT I purchased also in June -- almost complete silent platform with Noctua, too.

The original 13600KF has been redelivered back to my company with a new 800W PSU and a new case specifically bought to fit the wood aesthetic, and another AMD GPU I salvaged from my NUC (6600XT Challenger, but single fan), but this time it runs surprisingly fine -- no kernel panic or PSU brownout just yet.

After all this in a short span of 10 months, I guess I just reached my own "metastability" now -- Intel CPU for office work, AMD for gaming and workstation.

The old 3700X system is being repurposed again for running cheap Kubernetes homelab and I guess this time too it is worth the right place. I don't think I ever need to have a new purchase again for the coming few years, hopefully.

The only problem would be that I'm using an engineering sample rather than the normal version of 7945HX -- the normal one can reach up to 5.4GHz boost but mine only got 5.2GHz boost, at a cost of 600HKD difference, I would say it is not worth it to upgrade to the normal version, no?

ychompinator8mo ago

No desktop CPU I’ve ever used has remained stable at 100 degrees. My 14900k crashes almost immediately at that temp. 3 hours at 100 degrees is obscene.

swinglock8mo ago

Then all your desktop CPUs were defective.

Besides AMD CPUs of the early 2000s going up in smokes without working cooling, they all throttle before they become temporarily or permanently unstable. Otherwise they are bad.

I've never had a desktop part fail due to max temperatures, but I don't think I've owned one that advertises nor allows itself to reach or remain at 100c or higher.

If someone sells a CPU that's specified to work at 100 or 110 degrees and it doesn't then it's either defective or fraudulent, no excuses.

arp2428mo ago

285k: https://www.intel.com/content/www/us/en/products/sku/241060/...

Max Operating Temperature: 105 °C

14900k: https://www.intel.com/content/www/us/en/products/sku/236773/...

Max Operating Temperature: 100 °C

Different CPUs, different specs.

And any CPU from the last decade will just throttle down if it gets too hot. That's how the entire "Turbo" thing works: go as fast as we can until it gets too hot, after which it throttles down.

clait8mo ago

Yeah, i can’t believe they think it’s fine. I would’ve shutdown my PC and rethought my cooling setup the first time it hit 100C tbh

sys_647388mo ago

100C will trigger PROCHOT state in Intel leading to CPU throttling. The CPU will eventually shutdown.

scotty798mo ago

Last Intel desktop CPU that I bought was Pentium 133Mhz. It was also my first PC. Never again ratio of performance to price in my preferred price range favored Intel.

Eric_WVGG8mo ago

my first Intel was a 286 (DX2-66), and I just did a double-take reading this article when I saw “the Intel 285K CPU”

ah if only they had incremented that number by one… a new 286 even just in name would be sooo funny… not as funny as bringing back the number 8088 of course

magicalhippo8mo ago

You skipped the whole Bulldozer generation?

scotty798mo ago

I've never bought server CPUs. I think I had some Athlon along the way. Some K6, I think. I don't remember all of them. Now I have some Ryzen. For the next build I'm finally eyeing server grade processor and of course it's going to be a Threadripper.

willtemperley8mo ago

Gaming seems to be the final stronghold of x86 and I imagine that will shrink. Clearly games are able to run well on RISC architectures despite decades of x86 optimisation in game engines. Long term, an architecture that consumes more power and is tightly locked down by licensing looks unsustainable compared to royalty-free RISC alternatives. The instability, presumably because Intel are overclocking their own chips to look OK on benchmarks will not help.

smallpipe8mo ago

x86 hasn't been CISC in 3 decades anywhere but in the frontend. An architecture doesn't consume power, a design does. I'm all for shitting on intel, but getting the facts right wouldn't hurt.

uncircle8mo ago

X86 isn’t CISC, sure, but it isn’t a RISC architecture either.

arp2428mo ago

Do RISC architectures still exist? ARM has gained tons of stuff and isn't really "RISC" any more either.

Maybe RISC-V? It's right there in the name, but I haven't really looked at it. However, there are no RISC-V chips that have anywhere near the performance x86 or ARM has, so it remains to be seen if RISC-V can be competitive with x86 or ARM for these types of things.

RISC is one of those things that sounds nice and elegant in principle, but works out rather less well in practice.

2 more replies

immibis8mo ago

The traditional CISC and RISC division broke down the moment processors started doing more than one thing at a time.

A RISC architecture was actually one with simple control flow and a CISC architecture was one with complex control flow, usually with microcode. This distinction isn't applicable to CPUs past the year 1996 or so, because it doesn't make sense to speak of a CPU having global control flow.

willtemperley8mo ago

You’re contradicting yourself. The whole reason x86 burns more power is that the CISC front end can’t be avoided.

userbinator8mo ago

That was disproved 11 years ago:

https://www.extremetech.com/extreme/188396-the-final-isa-sho...

The CISC decoder is like a "decompressor" that saves memory bandwidth and cache usage.

wqaatwt8mo ago

Are there any mid-high end RISC-V chips that have comparable performance per watt to x86?

wqaatwt8mo ago

> Clearly games are able to run well on RISC architectures

Theoretically that’s likely true. But is there any empirical evidence?

Even underclocked Intel desktop chips are massively faster.

willtemperley8mo ago

Mobile gaming has run well on RISC for a long time and more recently Macs have shown gaming potential.

wqaatwt8mo ago

Oh sorry. Based on the tone and general fervor in your comment I somehow read it as RISC-V instead of simply RISC (which as other say seems like a mostly meaningless label these days).

Yes, ARM is certainly competitive. But I don’t know how much is that down to Apple being good at making chips instead of the architecture itself.

Qualcomm of course makes decent chips but it’s not like they are that much ahead of x86 on laptops.

Even in Apple’s case, if you only care about raw CPU power instead of performance per watt M series is not that great compared to AMD/Intel.

williamDafoe8mo ago

"Stronghold" is a joke phrase, is it not? Intel had ZERO progress in integrated graphics from 2013-2020. ZERO. That's the reason why "it works so well" - because they NEVER improved the performance or architecture! Sure, they diddled with the number of CU's, but as far as graphics architecture, they never changed it, and it was POOR to begin with (couldn't do 1080p esports very well ...)

nofriend8mo ago

x86 is the cpu architecture. i don't believe gp was talking about intels igpu solution at all.

willtemperley8mo ago

Are x86 consoles a joke?

lmm8mo ago

Consoles have very little lock-in on their architectural choices, since they only ever support a small set of hardware configurations in the first place. I guess some of the current generation are x86-based but it would be very easy to move to ARM for the next generation if they wanted to.

j / k navigate · click thread line to collapse

360 comments

c0l08mo ago

Quoth DJB (around the very start of this millenium): https://cr.yp.to/hardware/ecc.html :)

dijit8mo ago

> only a minority of them enables ECC support in their firmware, so always check for that before buying!

This is the annoying part.

That AMD permits ECC is a truly fantastic situation, but if it's supported by the motherboard is often unlikely and worse: it's not advertised even when it's available.

I have an ASUS PRIME TRX40 PRO and the tech specs say that it can run ECC and non-ECC but not if ECC will be available to the operating system, merely that the DIMMS will work.

It's much more hit and miss in reality than it should be, though this motherboard was a pricey one: one can't use price as a proxy for features.

sunshowers8mo ago

If you're on Linux, dmesg containing

  EDAC MC0: Giving out device to module amd64_edac

is a pretty reliable indication that ECC is working.

See my blog post about it (it was top of HN): https://sunshowers.io/posts/am5-ryzen-7000-ecc-ram/

oneshtein8mo ago

My `dmesg` tells:

    EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (INTERRUPT)
    EDAC MC1: Giving out device to module igen6_edac controller Intel_client_SoC MC#1: DEV 0000:00:00.0 (INTERRUPT)

but `dmidecode --type 16` says:

    Error Correction Type: None
    Error Information Handle: Not Provided

2 more replies

c0l08mo ago

adrian_b8mo ago

jml7c58mo ago

mr_toad8mo ago

Trouble with enterprise is that the people buying care about the technology, but not the cost, while the people that do care about cost don’t understand the technology.

Some businesses (and governments) try and unify their purchasing, but this seems to make things worse, with the purchasing department both not understanding technology and being outwitted by vendors.

thewebguyd8mo ago

> Trouble with enterprise is that the people buying care about the technology, but not the cost

wmf8mo ago

Enterprise IT is overpriced so you can negotiate a 50% discount. Unfortunately negotiating isn't worth it for something like a pair of DIMMs.

1 more reply

sippeangelo8mo ago

I've been building my own gaming and productivity rigs for 20 years and I don't think memory has ever been a problem. Maybe survivorship bias, but surely even budget parts aren't THIS bad.

lmm8mo ago

> with that kind of markup you might as well just buy new ones IF they break

Assuming you can tell, and assuming you don't end up silently corrupting your data before then.

1 more reply

varispeed8mo ago

You would think that competition would naturally regulate the price down, but it seems like we are dealing with some sort of a cartel that regulators have not caught up with yet.

consp8mo ago

Also: DDR5 has some false ecc marketing due to the memory standard having an error correction scheme build in. Don't fall for it.

adrian_b8mo ago

Whether you will see ECC errors depends a lot on how much memory you have and how old it is.

A computer with 64 GB of memory is 4 times more likely to encounter memory errors than one with 16 GB of memory.

For such bad DIMMs, the frequency of errors will increase, and it may become of several errors per day, or even per hour.

fluoridation8mo ago

>A computer with 64 GB of memory is 4 times more likely to encounter memory errors than one with 16 GB of memory.

1 more reply

c0l08mo ago

I see a particular ECC error at least weekly on my home desktop system, because one of my DIMMs doesn't like the (out of spec) clock rate that I make it operate at. Looks like this:

    94 2025-08-26 01:49:40 +0200 error: Corrected error, no action required., CPU 2, bank Unified Memory Controller (bank=18), mcg mcgstatus=0, mci CECC, memory_channel=1,csrow=0, mcgcap=0x0000011c, status=0x9c2040000000011b, addr=0x36e701dc0, misc=0xd01a000101000000, walltime=0x68aea758, cpuid=0x00a50f00, bank=0x00000012
    95 2025-09-01 09:41:50 +0200 error: Corrected error, no action required., CPU 2, bank Unified Memory Controller (bank=18), mcg mcgstatus=0, mci CECC, memory_channel=1,csrow=0, mcgcap=0x0000011c, status=0x9c2040000000011b, addr=0x36e701dc0, misc=0xd01a000101000000, walltime=0x68b80667, cpuid=0x00a50f00, bank=0x00000012

(this is `sudo ras-mc-ctl --errors` output)

Hendrikto8mo ago

Running your RAM so far out of spec that it breaks down regularly, where do you take the confidence that ECC will still work correctly?

Also: Could you not have just bought slightly faste RAM, given the premium for ECC?

1 more reply

kderbe8mo ago

I would loosen the memory timings a bit and see if that resolves the ECC errors. x265 performance shouldn't fall since it generally benefits more from memory clock rate than latency.

1 more reply

ainiriand8mo ago

wpm8mo ago

This was running at like, 1866 or something. It's a pretty barebones 8th gen i3 with a beefier chipset, but ECC still came in clutch. I won't buy hardware for server purposes without it.

immibis8mo ago

Edit: it's probably because I switched it to "energy efficiency mode" instead of "performance mode" because it would occasionally lock up in performance mode. Presumably with the same root cause.

Jach8mo ago

hedora8mo ago

You have to go pretty far down the rabbit hole to make sure you’ve actually got ECC with [LP]DDR5

Some vendors use hamming codes with “holes” in them, and you need the CPU to also run ECC (or at least error detection) between ram and the cache hierarchy.

Those things are optional in the spec, because we can’t have nice things.

BikiniPrince8mo ago

Scramblejams8mo ago

I run a handful of servers and I have a couple that pop ECC errors every year or three, so YMMV.

swinglock8mo ago

I wish AMD would make ECC a properly advertised feature with clear motherboard support. At least DDR5 has some level of ECC.

kevin_thibedeau8mo ago

> At least DDR5 has some level of ECC.

That is mostly to assist manufacturers in selling marginal chips with a few bad bits scattered around. It's really a step backwards in reliability.

wpm8mo ago

I wish AMD wouldn't gate APU ECC support behind unobtainium "PRO" SKUs they only give out, seemingly, to your typical "business" OEMs and the rare Chinese miniPC company.

c0l08mo ago

It's not that dire as you make it out to be :)

rendaw8mo ago

So I'm trying to learn more about this stuff, but aren't there multiple ECC flavors and the AMD consumer CPUs only support one of them (not the one you'd have on servers?)

Does anyone maintain a list with de-facto support of amd chips and mainboards? That partlist site only shows official support IIRC, so it won't give you any results.

adrian_b8mo ago

In any case, today, like also 20 years ago, when searching for ECC DIMMs you must always search only the correct type, e.g. unbuffered ECC DIMMs for desktop CPUs.

hungmung8mo ago

Seconding this. I'm looking for a fanless industrial mini PC with out of band ECC and I'm having a hell of a time.

adrian_b8mo ago

In-band ECC causes a significant decrease of the performance, but for most applications of such mini-PCs the performance is completely acceptable.

nicman238mo ago

https://www.asrockind.com/en-gb/iBOX-V2000V

something like that?

devnullbrain8mo ago

I like the warning not to buy a motherboard from a manufacturer that has been defunct for 17 years

storus8mo ago

Now where can I get 64GB ECC UDIMM DDR5 modules so that my X870E board can have 256GB RAM? The largest I found were just 48GB ECC UDIMMs or 64GB non-ECC UDIMMs.

c0l08mo ago

I don't think 64GB ECC UDIMM is commercially available yet. I use Geizhals to check for EU availability: https://geizhals.eu/?cat=ramddr3&xf=7500_DDR5~7501_DIMM~7761...

storus8mo ago

unethical_ban8mo ago

Any specific recommendations? I am having random, OS agnostic lockups on my ryzen 1xxx build and thought DDR5 will be enough, but true ECC sounds good.

edit: Looks like a lot of Asus motherboards work, and the thing to look for is "unbuffered" ECC. Kingston has some, I see 32GB module for $190 on Newegg.

moffkalast8mo ago

rkomorn8mo ago

Then how do you explain all the bugs in the software I write?!

moffkalast8mo ago

It truly is a cosmic mystery :)

yndoendo8mo ago

Bit flipping can be the byproduct of bow the system components harmonize. Role hammer RAM also has the same affect. [0]

[0] https://en.m.wikipedia.org/wiki/Row_hammer

moffkalast8mo ago

> Furthermore, research shows that precisely targeted three-bit Rowhammer flips prevents ECC memory from noticing the modifications.

Doesn't exactly sound like a use case for ECC memory, given that it can't correct these attacks. Interesting though, I'd have thought that virtual addresses would've largely fixed this.

enronmusk8mo ago

I have followed his blog for years and hold him in high respect so I am surprised he has done that and expected stability at 100C regardless of what Intel claim is okay.

magicalhippo8mo ago

Given the motherboard and RAM will also generate quite some heat, if the case fan profile was conservative (he does mention he likes low noise), could be the insides got quite toasty.

[1]: https://michael.stapelberg.ch/posts/2025-05-15-my-2025-high-...

enronmusk8mo ago

Excellent point. A single case fan is highly atypical and concerning.

magicalhippo8mo ago

> thermal padding

That should of course be sound padding...

Dunedan8mo ago

> […] so I am surprised he has done that and expected stability at 100C regardless of what Intel claim is okay.

[1]: https://www.intel.com/content/www/us/en/products/sku/241060/...

epolanski8mo ago

I always wonder: how many sensors are registering that temp?

Because CPUs can get much hotter in specific spots at specific pins no? Just because you're reading 100, doesn't mean there aren't spots that are way hotter.

My understanding is that modern Intel CPUs have a temp sensor per core + one at package level, but which one is being reported?

lucianbr8mo ago

1 more reply

enronmusk8mo ago

dahauns8mo ago

And yeah, having Arrow Lake running at its defaults is just a waste of energy. Even halving your TDP just loses you roughly 15% performance in highly MT scenarios...

secureOP8mo ago

I did not overclock this CPU. I pay attention to what I change in the BIOS/UEFI firmware, and I never select any overclocking options.

Also, I have applied thermal paste properly: Noctua-supplied paste, following Noctua’s instructions for this CPU socket.

enronmusk8mo ago

Thank you for responding. How do you explain your CPU hitting 100C in that case? That should not have happened.

https://www.techpowerup.com/review/intel-core-ultra-9-285k/2... lists maximum temperature as 88.2C with the previous gen NH-D15 cooler.

microtonal8mo ago

Random other comment: when comparing CPUs, a sad observation was that even a passively cooled M4 is faster than a lot of desktop CPUs (typically single-threaded, sometimes also multi-threaded).

seec8mo ago

And if we are talking about a passively cooled M4 (MacBook Air basically) it will quite heavily throttle relatively quickly, you lose at the very least 30%.

bob10298mo ago

I've got a 5950x that I can reliably crater with a very specific .NET 8 console app when it would otherwise be stable 24/7/365, even under some pretty crazy workloads like Unity.

heelix8mo ago

esseph8mo ago

I have a 5700X with an AIO water cooler and it runs 65C under load. Never seems to crash. Been like this for years.

486sx338mo ago

My 5950 didn’t like liquid cooling and lives very well with air cooling :)

neRok8mo ago

> I think a lot of it boils down to load profile and power delivery

shrubble8mo ago

If you have a double conversion UPS that is complaining about less than 100W deviation, I would recommend you check the UPS for a component that is out of spec or on the way to failure.

bob10298mo ago

bell-cot8mo ago

I'm sure there are spec's for how fast a PS should be able to ramp up in response to spikes in demand, how a motherboard should handle sudden load changes, etc.

But if your UPS (or just the electrical outlet you're plugged into) can't cope - dunno if I'd describe that as cratering your CPU.

encom8mo ago

>M4 is faster than a lot of desktop CPUs

Yea, but unfortunately it comes attached to a Mac.

homebrewer8mo ago

Most users just set automatic overclocking, have their motherboards push voltages to insane levels, and then act surprised when their CPUs start bugging out within a couple of years.

Shocking!

microtonal8mo ago

Unfortunately, some separately purchasable hardware components seem to be optimized completely for gamers these days (overclocking mainboards, RGB on GPUs, etc.).

(Yes, I know that there are exceptions.)

2 more replies

microtonal8mo ago

Also relevant in this context:

https://www.pugetsystems.com/blog/2024/08/02/puget-systems-p...

ahartmetz8mo ago

Since you mention EXPO/XMP, which are about RAM overclocking: RAM has the least trouble with overvoltage. Raising some of the various CPU voltages is a problem, which RAM overclocking may also do.

1 more reply

electroglyph8mo ago

yah, the default overclocking stuff is pretty aggressive these days

microtonal8mo ago

Yea, but unfortunately it comes attached to a Mac.

claudex8mo ago

>And my stupid MSI board resets everything (every single BIOS setting) to MSI defaults when you upgrade its BIOS.

I had the same issue with my MSI board, next one won't be a MSI.

techpression8mo ago

My ASUS and Gigabyte did the same too. I think vendors are being lazy and don’t want to write migration code

1 more reply

esseph8mo ago

I don't think I've ever had a BIOS that didn't reset things to default after a firmware upgrade.

philistine8mo ago

timmytokyo8mo ago

etempleton8mo ago

naasking8mo ago

Crash on idle, interesting. Must be some timing issue related to down clocking, or maybe a voltage issue related to shutting off a core.

InMice8mo ago

Dennip8mo ago

66fm472tjy78mo ago

Occasionally occurring issues are so annoying. I lived with these issues for years before becoming able to reliably reproduce them by accident and thus making a good guess on the cause:

api8mo ago

The M series chips aren’t the absolute fastest in raw speed, though they are toward the top of the list, but they destroy x86 lineage chips on performance per watt.

AnthonyMouse8mo ago

> they destroy x86 lineage chips on performance per watt.

ac298mo ago

adithyassekhar8mo ago

I'd rather it spin the fan all the time to improve longevity but that's just me.

hedora8mo ago

There is one exception: If I run an idle Windows 11 ARM edition VM on the mac, then the fans run pretty much all the time. Idle Linux ARM VMs don’t cause this issue on the mac.

I’ve never used windows 11 for x86. It’s probably also an energy hog.

dagmx8mo ago

Afaik they are the fastest cores in raw speed. They’re just not available in very high core offerings so eventually fall behind when parallelism wins.

johnisgood8mo ago

hhh8mo ago

Just buy an AMD CPU. One person’s experience isn’t the world. Nobody in my circle has had an issue with any chip from AMD in recent time (10 years).

Intel is just bad at the moment and not even worth touching.

microtonal8mo ago

I agree that Intel is bad at the moment (especially with the 13th and 14th gen self-destruct issues). But unfortunately I also know plenty of people with issues with AMD systems.

And it's no bad power quality on mains as someone suggested (it's excellent here) or 'in the air' (whatever that means) if it happens very quickly after buying.

I should also say that I had a Ryzen 3700X and 5900X many years back and two laptops with a Ryzen CPU and they have been awesome.

1 more reply

tester7568mo ago

This is funny because recently my AMD Ryzen 7 5700X3D died and I've decided that my next CPU will be Intel

https://news.ycombinator.com/item?id=45043269

2 more replies

hedora8mo ago

I went further and got an AMD system on chip machine with an integrated gpu. It’s fine for gaming and borderline for LLM inference (I should have put 64GB in instead of 32GB).

1 more reply

johnisgood8mo ago

That is what I thought, thanks.

ahofmann8mo ago

I wouldn't be so hopeless. Intel and AMD CPUs are used in millions of builds and most of them just work.

dahcryn8mo ago

Indeed. I feel so weird reading this discussion section.

My home server is on a 5600G. I turned it on, installed home assistant and jellyfin etc... , and since it has not been off. It's been chugging along completely unattended, no worries.

Yes, it's in a basement where temperature is never above 21C, and it's almost never pushed to 100%, and certainly never for extended periods of time.

But it's the stock cooler, cheap motherboard, cheap RAM and cheap SSD (with expensive NAS grade mechanical hard drives).

microtonal8mo ago

[1] Well, most non-servers are probably laptops today, but the same reasoning applies.

1 more reply

homebrewer8mo ago

johnisgood8mo ago

scns8mo ago

> I thought of AMD Ryzen 7 5700

Definetly not that one if you plan to pair with a dedicated GPU! The 5700X has twice the L3 cache. All Ryzen 5000 with a GPU have only 16MB, 5700 has the GPU deactivated.

johnisgood8mo ago

But see, this is why it is so difficult. I would have never guessed. I would have to research this A LOT, which I am fine with, but you know.

PartiallyTyped8mo ago

3 of my last 4 machines have been AMD x NVDA and I have been very happy. The intel x NVDA machine has been my least stable one.

protocolture8mo ago

>I made a machine with a Ryzen 9900X a while back and it had the issue that it would freeze when idling

I also have this issue.

c0balt8mo ago

If you are on Linux, there are long time known problems with low power cpu states. These states can be entered by your CPU when under low/no load.

A common approach is to go into the BIOS/UEFI settings and check that c6 is disabled. To verify and/or temporarily turn c6 off, see https://github.com/r4m0n/ZenStates-Linux

hedora8mo ago

It’s also worth checking all the autodetected stuff that can be overclocked, like ram speed. That stuff can be wrong, and then you get crazy but similar bugs in linux and windows.

InMice8mo ago

protocolture8mo ago

I have set my timings manually in the bios, and disabled most advanced CPU features for stability.

If I enable virtualisation, the issue can be replicated within 15 minutes of boot.

But with basically half the CPU set to do nothing, and all features disabled its once a week max.

Which sucks because I basically live in WSL.

sunmag8mo ago

Have had three systems (two 5800x, one 3600x) that reboots/freezes due to WHEA errors. Started after about 3years problem free. One of the 5800xs so frequently it was trashed.

cptskippy8mo ago

I have always run B series because I've never needed the overclocking or additional peripherals. In my server builds I usually disable peripherals in the UEFI like Bluetooth and audio as well.

kristopolous8mo ago

M3 ultra is the more capable chip by quite a bit. For instance: 80 GPU cores versus 10 in the m4.

Twice the memory bandwidth, twice the CPU core count... It's really wacky how they've decided to name things

bee_rider8mo ago

dwood_dev8mo ago

Apple has been iterating IPC as well as increasing core count.

The uplift in IPC and core counts means that my M1 Max MBP has a similar amount of CPU performance as my M3 iPad Air.

1 more reply

IAmGraydon8mo ago

9900x here and zero crashes since I built it 9 months ago. A lot of the stability comes down to choosing the right RAM with the right timing for Ryzen CPUs.

jonbiggums228mo ago

I haven't moved on from AM4 yet but the way XMP is advertised you'd think it was guaranteed instead of overclocking that technically voids your warranty.

ozgrakkurt8mo ago

Buying one generation old CPUs seems like the good option now.

It is cheaper and more stable. Performance difference doesn’t matter that much too

baobabKoodaa8mo ago

Jolter8mo ago

I would expect the CPU to start throttling at high temperatures in order to avoid damage. Allegedly, it never did, and instead died. Do you think that’s acceptable in 2025?

ACCount378mo ago

baobabKoodaa8mo ago

FeepingCreature8mo ago

If nothing else, it very clearly indicates that you can boost your performance significantly by sorting out your cooling because your cpu will be stuck permanently emergency throttling.

izacus8mo ago

1 more reply

formerly_proven8mo ago

Strange, laptop CPUs and their thermal solutions are designed in concert to stay at Tjmax when under sustained load and throttle appropriately to maintain maximum temperature (~ power ~ performance).

ACCount378mo ago

And those mobile devices have much more conservative limits, and much more aggressive throttling behavior.

dezgeg8mo ago

For handhelds the temperature of the device's case is one factor as well when deciding the thermal limits (so you don't burn the user's hands) - less of a problem on laptops.

userbinator8mo ago

Expecting a CPU to be stable at 100C is just asking for problems.

swinglock8mo ago

The text clearly explains all of this.

baobabKoodaa8mo ago

No it does not. Which part of the text do you feel explains this?

chmod7758mo ago

First off, there's a chart for CPU temperature at the very top and they do talk about it:

Secondly, the article reads:

> But in this case, I actually did air-condition the room about half-way through the job (at about 16:00), when I noticed the room was getting hot. Here’s the temperature graph:

> [GRAPH]

> I would say that 25 to 28 degrees celsius are normal temperatures for computers.

I hope this helps.

1 more reply

perching_aix8mo ago

> Expecting a CPU to be stable at 100C is just asking for problems.

No. High performance gaming laptops will routinely do this for hours on end for years.

If it can't take it, it shouldn't allow it.

bell-cot8mo ago

I've not looked at the specifics here - but "stable at X degrees, Y% duty cycle, for Z" years is just another engineering spec.

Intel's basic 285K spec's - https://www.intel.com/content/www/us/en/products/sku/241060/... - say "Max Operating Temperature 105 °C".

So, yes - running the CPU that close to its maximum is really not asking for stability, nor longevity.

whyoh8mo ago

stavros8mo ago

That's good, isn't it? I don't want the factory leaving performance on the table.

topspin8mo ago

stavros8mo ago

So underclock your CPU.

2 more replies

bell-cot8mo ago

Depends on your priorities. That "performance on the table" might also be called "engineering safety factor for stability".

makeitdouble8mo ago

TBF using more conservative energy profiles will bring stability and safety. To that effect in Windows the default profile effectively debuffs the CPU and most people will be fine that way.

1 more reply

stavros8mo ago

Given that there used to be plenty of room to overclock the cores while still keeping them stable, I think it was more "performance on the table".

1 more reply

devnullbrain8mo ago

Yep. Redundancy and headroom are antonyms of efficiency.

techpression8mo ago

williamDafoe8mo ago

scns8mo ago

When you use EcoMode with them you only lose ~5% performance, but are still ~30% ahead of the corresponding 5000-series CPU. You can reduce PPT/TDP even further while still ahead.

https://www.computerbase.de/artikel/prozessoren/amd-ryzen-79...

techpression8mo ago

I specifically singled out the 7800X3D though, it runs incredibly cool and at a very low power draw for the performance you get.

mldbk8mo ago

> You know, I'm something of a CPU engineer myself :D

Actually almost everything what you wrote is not true, and commenter above already sent you some links.

7800X3D is the GOAT, very power efficient and cool.

1 more reply

hu38mo ago

Same for 9800X3D here, which is basically the same CPU. Watercooled. Silent. Stupidly fast.

k4rli8mo ago

7900X same. System uptimes of 1month+ often and nearly always runs at 5.0Ghz. Never goes above 80c or so either.

mrheosuper8mo ago

we have unstable "code" generator, so unstable CPU would be natural.

orthoxerox8mo ago

It's interesting how Intel has been surviving in smaller and smaller market niches these days:

  - cheap ULV chips like N100, N150, N300
  - ultrabook ULV chips (I hope Lunar Lake is not a fluke)
  - workstation chips that aren't too powerful (mainstream Core CPUs)
  - inexpensive GPUs (a surprising niche, but excruciatingly small)

AMD has been dominating them in all other submarkets.

norman7848mo ago

AFAIK most (if not all) business laptops AKA Dell are intel based? Also I believe they are still big in the server market.

guardian5x8mo ago

noisem4ker8mo ago

I'm not sure whether a "Dell Pro 16 Plus" is considered a "business laptop" (although I think so), but I'm using one right now and it has an AMD Ryzen AI 5 Pro CPU inside.

vid8mo ago

827a8mo ago

vid8mo ago

827a8mo ago

1 more reply

gsibble8mo ago

I think most parts are geared towards gaming these days. When I've needed a server, I went for multi-CPU setups with older, cheaper CPUs.

That being said, for AI, HEDT is the obvious answer. Back in the day, it was much more affordable with my 9980XE only costing $2,000.

I just built a Threadripper 9980 system with 192GB of RAM and good lord it was expensive. I will actually benefit from it though and the company paid for it.

That being said, there is a glaring gap between "consumer" hardware meant for gaming and "workstation" hardware meant for real performance.

Have you looked into a 9960 Threadripper build? The CPU isn't TOO expensive, although the memory will be. But you'll get a significantly faster and better machine than something like a 9950X.

I also think besides the new Threadripper chips, there isn't much new out this year anyways to warrant upgrading.

vid8mo ago

Competitors to NVidia really need to figure things out, even for gaming with AI being used more I think a high end APU would be compelling with fast shared memory.

stillsut8mo ago

Has anyone written about this pattern (beyond Innovator's Dilemma)? Does anyone have other good examples of this?

vid8mo ago

stillsut8mo ago

Makes sense: M-chips, Falcon-9, GPT's are product subsets or the incumbent's traditional product capabilities.

natch8mo ago

Apple and unified memory seems great but losing CUDA seem like a big downside.

How do you sell your systems when their time comes?

vid8mo ago

augustl8mo ago

Happy 9950X user here. Super happy with it, everything is crazy fast. Not a gamer, according to internet and benchmarks the extra cost is only worth it for gaming workloads.

I use Arch, btw ;)

Aeolun8mo ago

fmajid8mo ago

https://www.theregister.com/2025/08/29/amd_ryzen_twice_fails...

Ekaros8mo ago

Seems like failure in choosing cooling solutions. These high-end chips have obscene cooling needs. My guess would be using something that was not designed for TDP in question.

Sufficient cooler, with sufficient airflow is always needed.

uniqueuid8mo ago

For what it's worth, I have an i9-13900K paired with the largest air cooler available at the time (a be quiet! Dark Rock 5 IIRC), and it's incapable of sufficiently cooling that CPU.

The 13900k draws more than 200W initially and thermal throttles after a minute at most, even in an air conditioned room.

I don't think that thermal problems should be pushed to end user to this degree.

michaelt8mo ago

The "Dark Rock 5" marketing materials say it provides a 210 W TDP [1] and marketers seldom under-sell their products' capabilities.

So if your CPU is drawing "more than 200W" you're pretty much at the limits of your cooler.

[1] https://www.bequiet.com/en/cpucooler/5110

lmm8mo ago

Feels like CPU manufacturers should be at least slapping a big warning on if they're selling a CPU that draws more power than any available cooler can dissipate.

2 more replies

SomeoneOnTheWeb8mo ago

This means your system doesn't have enough airflow if it throttles this quickly.

But I agree this should not be a problem in the first place.

mrheosuper8mo ago

CPU TDP means nothing now. a 65W tdp cpu can easily consume over 100w during boost.

ttyyzz8mo ago

onli8mo ago

No, those processors clock or shut down if too hot. In no circumstances should they fail because of insufficient cooling. Even without airflow etc.

williamDafoe8mo ago

A badly optimized CPU will take excessive amounts of power. The "failure in choosing cooling solutions" excuse is just the pot calling the kettle black.

casenmgreen8mo ago

Curiously, libgmp reported something similar recently, but with AMD.

https://gmplib.org/gmp-zen5

kd9138mo ago

General consensus on that case seems to be they picked a budget motherboard and skimped on the cooler.

devnullbrain8mo ago

That ASUS motherboard is far from the cheapest available. If using it makes the user liable for failure, a large part of the market is unsuitable.

kd9138mo ago

The cpu is what a 9950x whilst paired with one of the cheapest asus motherboards with underpowered VRMs according to games nexus, hardware unboxed.

The cooler was under the rated tdp of the platform. That and it lasted 6 months and so far seemed the only case of it falling over like it did.

Yea am leaning on it being user error.

devnullbrain8mo ago

I also find that, as performance improvements tolerances get tighter throughout the system, the set of 'things that can screw your build' grows bigger.

steve19778mo ago

I tried building and running a 7950X workstation for some time. I managed to get stable settings in a modified "ECO" mode (i.e. maybe about 10% less performance, but much less power usage).

The problem is, it's a huge effort to get there. You really have to tune PBO curves for each core individually, as they can vary so much between cores.

eptcyka8mo ago

This is rather late, to be quite fair.

discardable_dan8mo ago

My thoughts exactly: he figured out in 2025 what the rest of us knew in 2022.

positron268mo ago

One of my work computers died and I hadn't checked the CPU market in years. Rode home that night in a taxi with a Ryzen 1700x completely stoked that AMD was back in the game.

J_Shelby_J8mo ago

You figured out in 2022 that AMD would finally catch up to intel single core performance in 2025?

Fr0styMatt888mo ago

Alright so two CPUs failing in the same system has gotta be strange; mobo issue?

Secondly, what BIOS settings should I be using to run safely? Is XMP/whatever the AMD equivalent is safe? If I don't run XMP then my RAM runs at way below spec (for the stick) default speeds.

Anyone know of a good guide for this stuff?

jonbiggums228mo ago

Maybe the situation is better on DDR5 platforms.

Jnr8mo ago

sellmesoap8mo ago

Jnr8mo ago

Yes, I have Steam Deck and it works great. But I also have 2400G and 5700G and both of those have graphics issues (tested with different recommended RAM sets).

imiric8mo ago

hedora8mo ago

Check dmesg after the driver crashes and restarts. If the crash is something about a ringbuffer timeout, use dmidecode to see what the ram is actually clocked at.

Make sure it matches the min of the actual spec of the ram that you bought and what the CPU can do.

For me, it was a bios bug that underclocked the ram. Memory tests, etc passed.

I suspect there are hard performance deadlines in the GPU stack, and the underclocked memory was causing it to miss them, and assume a hang.

If the ram frequency looks OK, check all the hardware configuration knobs you can think of. Something probably auto-detected wrong.

imiric8mo ago

Thanks for the suggestion!

1 more reply

Jnr8mo ago

Initially I also tried debugging, writing reports, etc. Some 8 years later I have given up and just live with the occasional crashes.

1 more reply

vkazanov8mo ago

My laptop's AMD is great (Ryzen AI 7 PRO 360 w/ Radeon 880M). Gaming, GPI work, battery, small LLMs - all just work on my Ubuntu.

Don't know about transcoding though.

energy1238mo ago

I can't comment on the quality question, but for memory bandwidth sensitive use cases, Intel desktop is superior.

olavgg8mo ago

The last 15 years, servers has gone from 3x memory channels to 12x, while desktop still only have 2x memory channels. It is by far the biggest bottleneck today.

Dylan168078mo ago

15 years ago a server CPU had twice as many cores as a desktop CPU. Today a server CPU has about eight times as many cores.

ttyyzz8mo ago

I'm not convinced, what would be the use case?

energy1238mo ago

exceptione8mo ago

This? https://old.reddit.com/r/LocalLLaMA/comments/1ak2f1v/ram_mem...

I can't see how that supports your conclusion.

1 more reply

Marsymars8mo ago

Framework Desktop?

jeffbee8mo ago

formerly_proven8mo ago

> Looking at my energy meter statistics, I usually ended up at about 9.x kWh per day for a two-person household, cooking with induction.

> After switching my PC from Intel to AMD, I end up at 10-11 kWh per day.

It's kind of impressive to increase household electricity consumption by 10% by just switching one CPU.

rubin558mo ago

usr11068mo ago

I guess the author runs it at high load for long times, not only for the benchmarks to write this blog post. And less than 10 kWh is a low starting point, many households would be much higher.

Dunedan8mo ago

[1]: https://en.wikipedia.org/wiki/European_countries_by_electric...

formerly_proven8mo ago

don-bright8mo ago

The amount of power this is using is roughly the same as it takes my car to do my short commute to work

shmerl8mo ago

Well, worryingly there were reports of AMD X3D CPUs burning out too. I hope it will be sorted out.

cjpartridge8mo ago

steve19778mo ago

The problem is that stable curve optimizer settings can vary hugely across cores

I had differences of like 20 or more between different cores... i.e. one core might work fine at -20, the other maybe only at +5.

cjpartridge8mo ago

Most definitely - you should always do your own stress testing with your specific CPU (and system) to find out what's stable.

That being said, there are certainly ways to find and set the best CO values per core, but it will certainly take more effort, stress testing and time.

mrlonglong8mo ago

To make linking go quicker, use mold.

Pass -fuse=mold when building.

positron268mo ago

Do beware when doing WASM with mold. I shipped a broken WASM binary that Firefox could run just fine but Chrome would not.

jcalvinowens8mo ago

I wonder if some of what people interpret as hardware problems are actually software bugs that only trigger on new CPUs...

I recently hit this testing pre-release kernels on my gaming PC, a 9900X3D: https://lore.kernel.org/lkml/20250623083408.jTiJiC6_@linutro...

A pile of older Skylake machines was never able to reproduce that bug one single time in 100+ hours of running the same workload. The fast new AMD chips would almost always hit it in a few hours.

itvision8mo ago

Yeah, nice:

> I get the general impression that the AMD CPU has higher power consumption in all regards: the baseline is higher, the spikes are higher (peak consumption) and it spikes more often / for longer.

> Looking at my energy meter statistics, I usually ended up at about 9.x kWh per day for a two-person household, cooking with induction.

> After switching my PC from Intel to AMD, I end up at 10-11 kWh per day.

It's been the bane of desktop AMD CPUs since Zen 1. Hopefully AMD will address this in Zen 6 but I don't have too much hope.

TinkersW8mo ago

I wonder why the idle power is so high(55 watts), I have measured a beelink mini PC with an 8 core Zen4 when idle, and it was 10 watts.

itvision8mo ago

> I have measured a beelink mini PC with an 8 core Zen4 when idle, and it was 10 watts.

Zen APUs have no such issue.

My 7840HS idles at 3W when plugged in and around 0.5W when running on battery power.

rkrisztian8mo ago

It's the 3D cache, as I wrote in my other response. It has to be powered on at all times, so it affects even the idle power usage.

itvision8mo ago

It has little to nothing to do with the 3D cache.

The IOD (die) is extremely inefficient for all desktop Zen CPUs as it never truly idles.

1 more reply

FuriouslyAdrift8mo ago

The 9950X3D is equivalent to the Intel Core Ultra 9 285K which uses even more power. The X3D series are extreme gaming chips.

rkrisztian8mo ago

itvision8mo ago

Nothing, I'm making this up, except it's been confirmed by pretty much all desktop Zen users:

https://www.reddit.com/r/Amd/comments/1brs42g/amd_please_tac...

I don't bloody care that AMD CPUs seem to be more power efficient than Intel's. For most people their CPUs are completely idle most of the time and Zen CPUs on average idle at 25W or MORE.

Many Zen 4 and Zen 5 owners report that their desktop CPUs idle at 40W or more even without the 3D cache.

rkrisztian8mo ago

I can't confirm the 40W, my Ryzen 9 7900 (non-X) consumes 1W to 3W at idle on Windows 10.

1 more reply

rkrisztian8mo ago

Please someone flag the comment above for offensive language ("I don't bloody care")

wallopinski8mo ago

Funny and depressing that the AMD/Intel culture war still exists. I remember arguing about it in 1990. Their marketing departments severely brainwashed generations of nerds.

tverbeure8mo ago

I didn't see anything CPU culture war related in the blog post.

J_Shelby_J8mo ago

It’s impossible to find balanced information on the topic because so many people are heavily invested in AMD stock.

bubblebeard8mo ago

I have the same CPU in my primary system, and if you can afford it, it’s so choice.

unsnap_biceps8mo ago

protocolture8mo ago

Hmm arent the top end Ryzens dying too? This could be a funny future blog post.

whalesalad8mo ago

OP how are you collecting internal metrics? I see what looks like a grafana dashboard of your workstation usage and I would like to do something similar.

secureOP8mo ago

I’m using Prometheus for this, with the Prometheus Node Exporter

andsoitis8mo ago

> I would say that 25 to 28 degrees celsius are normal temperatures for computers.

An ideal ambient (room) temperature for running a computer is 15-25 celcius (60-77 Fahrenheit)

Source: https://www.techtarget.com/searchdatacenter/definition/ambie...

trueismywork8mo ago

And that is an impossibility in most of the world today and it will be even more like that going forward.

nl8mo ago

Much of the world (for better or worse) uses airconditioning in places they commonly use desktop computers.

em-bee8mo ago

2 more replies

trueismywork8mo ago

No they dont. They don't have the money. I remember my childhood when gaming in summer holidays in India, my PC would run at full tilt because my room was at 36C (and outside was 48C).

imtringued8mo ago

So you're saying that if you go even 3 degrees Celsius over that temperature range you should expect your CPU to fry itself? Even when the CPU throttled itself to exactly 100°C?

andsoitis8mo ago

> So you're saying that if you go even 3 degrees Celsius over that temperature range you should expect your CPU to fry itself? Even when the CPU throttled itself to exactly 100°C?

It is actually 2.9999, precisely.

johnisgood8mo ago

  coretemp-isa-0000
  Adapter: ISA adapter
  Package id 0:  +40.0°C  (high = +80.0°C, crit = +100.0°C)
  Core 0:        +38.0°C  (high = +80.0°C, crit = +100.0°C)
  Core 1:        +39.0°C  (high = +80.0°C, crit = +100.0°C)

Maybe today's CPUs would not be able to handle it, I am not sure. One would expect these things to only improve, but seems like this is not the case.

Edit: I misread it, oops! Disregard this comment.

Rohansi8mo ago

That is your CPU temperature, not ambient (room) temperature.

johnisgood8mo ago

Oh, I misread. My bad!

1 more reply

rurban8mo ago

I've burned two of those already, watercooled. The H100 traffic was too much for them

KronisLV8mo ago

williamDafoe8mo ago

Panzer048mo ago

Google runs datacenters hot because it's probably cheaper than over-cooling them with AC.

Chips are happy to run at high temperatures, that's not an issue. It's just a tradeoff of expense and performance.

KronisLV8mo ago

black_puppydog8mo ago

Layoutparser looks really neat! Glad the author led with this. :D

Avlin678mo ago

get a w790 motherboard with proper xeon. If it says 100° DTS it still not a good idea to run it 100° for 3hours, because it id already throtling.

neurolesudiste8mo ago

Will do the same on my next hardware turn-over.

austin-cheney8mo ago

Another reason to switch from Intel to AMD is that Intel is in the top priority segment of the following list while AMD is not on the list at all:

https://boycott-israel.org/boycott.html

gradientsrneat8mo ago

TSMC (AMD's fab), is heavily based in Taiwan, which has its own implications regarding long-term sustainability and monopoly.

With only two real choices for x86, and the complexity of the global supply chain, it hardly seems like a fair comparison.

austin-cheney8mo ago

hawshemi8mo ago

And welcome to USB slow speeds and issues...

amelius8mo ago

USB issues are driving me nuts. Please, someone show me the path to serenity.

crinkly8mo ago

I’ve given up both.

Thorrez8mo ago

What do you use?

aurareturn8mo ago

I’ve given up on both and use Apple Silicon only. AMD and Intel are simply too power hungry for how slow they are and can’t optimize for power like Apple can.

maciejw8mo ago

The hardware is impressive - tiny, metal box, always silent, basic speaker built-in and it can be left always on with minimal power consumption.

lostlogin8mo ago

I just replaced a headless nuc 9 with a headless M4.

Nuc 9 averaged 65-70W power usage, while the m4 is averaging 6.6W.

The Mac is vastly more performant.

mr_windfrog8mo ago

That's pretty amazing, I've never heard of that before .-_-!

crinkly8mo ago

M4 Mac. Not going back now. It’s like skipping a decade ahead.

Avlin678mo ago

sorry, but looking at previous article, you computer cooling seems inadéquate for sustained loads

stevefan19998mo ago

Just a funny anecdote:

After all this in a short span of 10 months, I guess I just reached my own "metastability" now -- Intel CPU for office work, AMD for gaming and workstation.

ychompinator8mo ago

No desktop CPU I’ve ever used has remained stable at 100 degrees. My 14900k crashes almost immediately at that temp. 3 hours at 100 degrees is obscene.

swinglock8mo ago

Then all your desktop CPUs were defective.

Besides AMD CPUs of the early 2000s going up in smokes without working cooling, they all throttle before they become temporarily or permanently unstable. Otherwise they are bad.

I've never had a desktop part fail due to max temperatures, but I don't think I've owned one that advertises nor allows itself to reach or remain at 100c or higher.

If someone sells a CPU that's specified to work at 100 or 110 degrees and it doesn't then it's either defective or fraudulent, no excuses.

arp2428mo ago

285k: https://www.intel.com/content/www/us/en/products/sku/241060/...

Max Operating Temperature: 105 °C

14900k: https://www.intel.com/content/www/us/en/products/sku/236773/...

Max Operating Temperature: 100 °C

Different CPUs, different specs.

And any CPU from the last decade will just throttle down if it gets too hot. That's how the entire "Turbo" thing works: go as fast as we can until it gets too hot, after which it throttles down.

clait8mo ago

Yeah, i can’t believe they think it’s fine. I would’ve shutdown my PC and rethought my cooling setup the first time it hit 100C tbh

sys_647388mo ago

100C will trigger PROCHOT state in Intel leading to CPU throttling. The CPU will eventually shutdown.

scotty798mo ago

Last Intel desktop CPU that I bought was Pentium 133Mhz. It was also my first PC. Never again ratio of performance to price in my preferred price range favored Intel.

Eric_WVGG8mo ago

my first Intel was a 286 (DX2-66), and I just did a double-take reading this article when I saw “the Intel 285K CPU”

ah if only they had incremented that number by one… a new 286 even just in name would be sooo funny… not as funny as bringing back the number 8088 of course

magicalhippo8mo ago

You skipped the whole Bulldozer generation?

scotty798mo ago

willtemperley8mo ago

smallpipe8mo ago

x86 hasn't been CISC in 3 decades anywhere but in the frontend. An architecture doesn't consume power, a design does. I'm all for shitting on intel, but getting the facts right wouldn't hurt.

uncircle8mo ago

X86 isn’t CISC, sure, but it isn’t a RISC architecture either.

arp2428mo ago

Do RISC architectures still exist? ARM has gained tons of stuff and isn't really "RISC" any more either.

RISC is one of those things that sounds nice and elegant in principle, but works out rather less well in practice.

2 more replies

immibis8mo ago

The traditional CISC and RISC division broke down the moment processors started doing more than one thing at a time.

willtemperley8mo ago

You’re contradicting yourself. The whole reason x86 burns more power is that the CISC front end can’t be avoided.

userbinator8mo ago

That was disproved 11 years ago:

https://www.extremetech.com/extreme/188396-the-final-isa-sho...

The CISC decoder is like a "decompressor" that saves memory bandwidth and cache usage.

wqaatwt8mo ago

Are there any mid-high end RISC-V chips that have comparable performance per watt to x86?

wqaatwt8mo ago

> Clearly games are able to run well on RISC architectures

Theoretically that’s likely true. But is there any empirical evidence?

Even underclocked Intel desktop chips are massively faster.

willtemperley8mo ago

Mobile gaming has run well on RISC for a long time and more recently Macs have shown gaming potential.

wqaatwt8mo ago

Oh sorry. Based on the tone and general fervor in your comment I somehow read it as RISC-V instead of simply RISC (which as other say seems like a mostly meaningless label these days).

Yes, ARM is certainly competitive. But I don’t know how much is that down to Apple being good at making chips instead of the architecture itself.

Qualcomm of course makes decent chips but it’s not like they are that much ahead of x86 on laptops.

Even in Apple’s case, if you only care about raw CPU power instead of performance per watt M series is not that great compared to AMD/Intel.

williamDafoe8mo ago

nofriend8mo ago

x86 is the cpu architecture. i don't believe gp was talking about intels igpu solution at all.

willtemperley8mo ago

Are x86 consoles a joke?

lmm8mo ago

j / k navigate · click thread line to collapse