It's not just thermal throttling of the controller that causes slowdown, it's also the filling of the DRAM/SLC cache.
Also, we should talk more about some vendors screwing over customers by replacing controllers or NAND chips with slower parts to cut down cost while keeping the same SSD SKU, after seeding the original SKUs to reviewers to lock in good benchmark scores in online tests. This is why I recommend only buying SSDs from reputable vendors/OEMs who are more vertically integrated: Samsung, WD, Sandisk, Micron.
The more I've worked with SSDs to try to get them to perform well, the more I've come to realize benchmarking SSD performance is virtually impossible to do in any meaningful sense, because they themselves are stateful and your performance is highly dependent not only on what you are doing now, but what you did a few moments ago. There's also a bunch of protocol-level behavior adding to this, as well as os-level behavior that may be difficult to isolate, and even if you succeed, yeah, you're benchmarking the device in an unrealistic fashion unlike any actual real world usecases so congrats I guess.
Ideally, for a proper test, I'd expect thermocouples placed on the controller and not just rely on the sensor data provided by the SSD as that could be very misleading depending on where the senor is and how the raw sensor data is processed.
Will show all temperature sensors on the device, there are usually several. It also will show how often the composite temperature crossed the warning and critical thresholds in the lifetime of the device, and how long the device spent above those thresholds.
I'd advise against throwing accusations without proof. Looking for reliable hardware vendors is hard enough, there is no need for people unintentionally muddying the waters.
Their latest SSD (I think), S70, has a marketing bullet point that all S70 parts are built with the same components. Should be unnecessary, but here we are.
I’ve basically never bought anything but the Samsung Pro drives and never once been disappointed.
With no research what-so-ever I would have attributed the slow down to some kind of cache or buffer filling up.
ASUS Hyper M.2 X16 PCIe 4.0 X4 Expansion Card Supports 4 NVMe M.2 (2242/2260/2280/22110) up to 256Gbps for AMD 3rd Ryzen sTRX40, AM4 Socket and Intel VROC NVMe Raid https://www.amazon.com/dp/B084HMHGSP/ref=cm_sw_r_cp_api_i_ZY...
If you have a spare x16 slot I highly recommend. I’ve torture tested the latest and greatest hot screaming NVMe drives with it and between the massive heat sink and fan I’ve never seen NVMe temps rise above 40c.
Supporting more than one NVMe is tricky, though. You need to make sure your motherboard supports PCIe bifurcation. Common in server motherboards and some recent high end consumer motherboards but virtually unsupported with everything else. That said if you’re experiencing NVMe throttling due to temperature it’s worthwhile for even one drive.
SilverStone Technology M.2 PCIE Adapter for SATA or PCIE NVMe SSD https://www.amazon.com/gp/product/B075ZNWS9Y/
Its worked like a charm. I've only put a drive into the M.2 slot so far.
The hardware is in a colocation facility and I like to have plenty of buffer with standard operating temperatures in case of a cooling failure, etc. Is maintaining 40C necessary under normal conditions? Nope, but it's definitely a nice to have regardless.
One of our tasks was to help qualify supported systems, which went down to approving SOME chassis IF the card was in a particular slot with a specific airflow. In some cases, it was required that internal ribbon cables were re-routed to improve airflow. The flash cards would throttle progressively at set temperatures, eventually going read-only and offline to protect the contents.
The issue of temperature and thermal throttling carried-on into the 'consumer' HDD replacement market. I can recall attending an online tech briefing on SSDs where I put a comment in the chat that one issue not being covered was device temperature. When this was put to the panel of 'experts', they were a bit bemused, commenting that SSDs don't get hot because they consume less power than HDDs. Environmental conditons were not even considered.
Truth is that, with a bit of averaging, the power consumption of a modern 2TB HDD is about the same as a 2TB SSD: around 2-5W. Both devices generate heat and both devices are often in a warm environment.
So a lot of them fail from overheating - mostly cheaper models/makes. But even the cheapest Samsungs (that their own controllers) seem to fare better than cheap brands like ADATA.
Actually, this has been a problem for a lot of controllers - RAID, SATA, USB3.x, networking cards that fail due to the manufacturer using subpar cooling - usually a small heatsink that they've deemed to be good enough under "normal usage" (i.e. not heavy, sustained usage), or they rely on server cooling to do the job (which actually makes sense).
The dogma at the moment is that SSDs don't require de-fragmentation and that is potentially true to a certain point but I think Windows actually needs the file system de-fragmented due to its overhead. I have a program to reproduce the effect and have been meaning to test EXT4 and write an article about it at some point. I need to check its something that happens across a range of devices before I publish and it really is just windows, I know defragging the files (copy away, delete files and replace) works to instantly fix performance but it could be device/controller/firmware specific.
The other possibility is large amounts of writes filling the device can result in reduced working space especially in drives with very small amounts of cache that cause slow downs near the end of tests.
They don't require a regular de-fragmentation, like HDDs, because if you are just occasionally read some files it would be fast enough AND with a physical layer hidden by remap it doesn't make sense at all, because the file what is logically present to you as a one continuous block could be really stored across multiple locations.[0]
> I know defragging the files (copy away, delete files and replace)
And this is one the real way to "defrag" on SSD backed media. Tossing around clusters like it is a HDD only wastes your TBW.[1]
> While there is no particular reason why the SSD would run slow once you try to read a file from the filesystem it does slow down and it can impact performance dramatically
THere is always a couple of factors what affect the performance.
Where is always a question what exactly you are reading: a bazillion of < 1KB files could be anywhere on the physical storage and while the time to access for a single file could be as fast as SSD can provide, the pattern of accessing thousands of files of small files not only fills the IO queue, but wastes tons of time on overhead, for every file you access there is not only "Hey SSD grab bytes at LBA 44444 to 5555", but also there are a before mentioned queue for IO operations, parsing MFT for the file location at LBA, reading and parsing DACL, allocating handles (and discarding them later) etc, etc. And if you run out of caches (most notably the DRAM cache on your SSD) then of course things starts to slow down to a crawl, especially if you not only reading those files, but do other things on the same drive at the same time.
Also while I mention MFT - some small files are stored in it entirely[2], so all the overhead is processed quickly (because in normal conditions most of the MFT is cached in memory anyway) but it should be small enough[3].
Also don't forget what if your file is 1KB the drive doesn't read 1KB from the storage. At best it reads 4KB (the default NTFS cluster size), but if your next file isn't in this block (or it is but by the time it comes to read it the cache of this block was already flushed) then you need to wait until the previous read completes. Yes, reads are fast, in theory, but again this is where IO queue, caches, NCQ starts do matter.
And last but not least: on Windows there is always a question if the antivirus software (be it built-in Defender or a 3rd-party one) is still sane or wastes your time rechecking all your already checked, static, non-executable files. Like a bazillion of jsons.
[0] and without TRIM support you can't even have even a very loose guarantee what you really cleared the block.
[1] back in the day I used this to defrag a very heavy fragmented HDDs, just Ghost it to another drive and then restore it back - all files are defraged and it takes way less time because source drive only reads, not read-write-repeat.
[2] https://superuser.com/questions/1185461/maximum-size-of-file...
[3] just checked a couple random files on my drive - cutoff is somewhere around ~700B.
> a series of 96 files ranging in size from 2 MB to 2 GB, fixed sizes but in a randomised order. The test completed in a total write time of 24.4 seconds
2MB to 2GB was a good test somewhere in the early '00, not today, when even your average not-very-AAA game requires 50GB for the install alone.
Edit: and the first two graphs uses 10^8 as the base while the last one uses 10^10 for bytes, while the cumulative total says it was only 3 GB total data written.
This is not a test, it is just a bunch of loosely related data based on one "test".
Verdict: once you have any kind of passive cooler on one it won’t overheat. But he has only tested it with one fast PCIe 4 x 4 drive.
Unpowered persistence is the opposite, however. At 85 Celsius, the data may last only single digit days unpowered by at freezing temperatures could last hundreds of years unpowered. For the same reason, I believe: charge mobility is lower at lower temperatures in semiconductors.
I wound up dumping the ssd to ensure I had a full image and that entire operation topped out at 5 MB/s. And I had the SSD out of the case in a cradle when ripping so it wasn't getting cooked in the case. I did not notice it feeling abnormally warm or hot. I later tried to mess with the disk in a cradle but as soon as I powered it on it topped out at 4-5 MB/sec so either there is some sort of defect causing an immediate thermal issue or something in the controller went awry.
More likely, you were experiencing repeated ECC failures and read retries on every access. Since you ruled out a bad cable by also running it in a dock, I'm guessing you had a premature failure of a large chunk of flash, possibly an entire die (though a lot of drives especially at lower capacities don't have enough over-provisioning to do erasure coding to protect against a full die failure).
Probably combination of:
- Constructed with fewest number of nand chips possible, and using bottom of the barrel shit like 64L Intel QLC
- Easily overheating DRAMless controller (eg. SM225*XT hitting 70+C at the slightest load)
- Tucked in a heat trap plastic case
- pSLC cache is always almost-full (because for some reason entry level controllers always choose to do that), which is quickly filled after a not-so-long run of Windows update
Combine all these in a single product (like ADATA often does in its entry level SATA), and 5 MB/s sounds actually fast :)
Technically, the storage medium itself doesn't require cooling (the NAND flash chips). The issue is NAND does not exist in a vacuum on its own, as far as computer storage goes, but the SSD needs a controller to manage the data I/O from the PCI-Express bus across all the NAND chips and also caching to DRAM/SLC in between while doing various integrity, error checks and trimming in the background, so those controllers need to be very powerful for the insane speeds the PCI-E bus is capable of, being usually multi core ARM chips running a real-time OS with complex algorithms, so of course they run hot when you push them hard.
Hard disks also had controllers to manage the transfer of data between IDE/SATA to the DRAM cache and then finally the spinning rust, but since the speeds were s much slower, those controllers didn't run as hot. However, some HDDs still benefited from cooling as they had a powerful motor spinning at over 10K RPM which made the drives hot.
In either case, there's no free lunch, when you push lots of current through electronics chasing extreme performance, you get lots of heat that needs to get dissipated somehow, simple as that.
I don't care if climate change kills millions of people. It impacts my summer gaming sessions. I need AC!