How long do disk drives last? (opens in new tab)

pavs12y ago

But isn't it cheaper on the long run if your drives fail less often?

TrainedMonkey12y ago

I remember seeing somewhere that failure rates correspond strongly with number of platters (Something along the lines that doubling platters doubles chance of failure within 3 years).

drzaiusapelord12y ago

Sysadmin here. My experience:

1. Infant mortality. Drives fail after a couple months of use.

2. 3 year mark. This is where fails begin for typical work loads.

3. 4-6 year mark. This is when you can expect the drives that haven't failed earlier to fail. By this point, we're looking at 33% fail.

Interesting that my experiences roughly match up with Chart 1.

My experiences are 10 to 15k SAS drives. Slower moving 7200rpm drives? No idea. Haven't used them in servers in a while. They seem more of a crapshoot to me. SSD's, thus far, are even more of a crapshoot and we don't use them in servers and only hesitantly in desktops/laptops and only Intel.

rsync12y ago

Agreed RE: SSD drives ...

It is very disappointing how flaky and unreliable SSD devices have been when their promise was just the opposite, due to lack of moving parts.

Back in 1999/2000 I had a habit of building some personal as well as commercial servers in datacenters with compact flash parts (plain old consumer CF drives) as boot devices with the goal of fault tolerance in mind. There was a price to be paid in that these devices needed to be mounted, and run, read-only.

But they ran forever. I never had one part fail. Just plain old CF drives mated directly to the IDE interface.

Now fast forward to 2013 and new servers we deploy for rsync.net have a boot mirror made of two SSDs ... things have gone well, but our general experience and anecdotal evidence from other parties gives us pause.

One thought: an SSD mirror, if it fails from some weird device bug or strange "wear" pattern would fail entirely, since both members of the mirror are getting the exact same treatment. For that reason, when we build SSD boot mirrors, we do so with two different parts - either one current gen and one previous gen intel part, or one intel part and one samsung part. That way if there is some strange behavior or defect or wearing issue, they both won't experience it.

baruch12y ago

They get the same writes but not the same reads, so depending on the bug source it may not hit both at the same time. The read pattern itself may affect the way the writes are performed to the flash (delaying or speeding up writes pending for commit) that it may have a butterfly effect on the rest of the behavior and removing the disks from being in sync with regard to firmware bugs.

If you'd still follow up on your idea of using a read-only root like you did with CF cards and figured a safe place for the logs you could still use the SSDs in the same mode. Why not go that route?

noonespecial12y ago

I did exactly the same thing with CF. We had a default config that would operate read-only and leave the machine in a reachable state no matter what. Once we got that far, we'd mount some spinning rust or NFS and pivotroot and run a secondary init.

It was a huge win for uptime.

baruch12y ago

I've used mostly 7200rpm SATA and Nearline-SAS and they are mostly fine in fact, didn't play enough with higher rpm SAS yet but so far I don't see a big difference between them and the 7200 NL-SAS drives.

I'd echo the sentiment seen elsewhere in the comments about Seagate drives vs. Hitachi drives. Both for SATA and NL-SAS. Hitachi 1TB were rock solid compared to Seagate.

tracker112y ago

Completely anecdotal... but, the 640gb two platter drives have seemed to be the most rock solid.. ymmv though... With the much larger, and much more expensive (since the taiwan floods) drives, who knows anymore... it's relatively anecdotal for anyone at this point, since after 3-5 years all the warranties have expired.

velodrome12y ago

The hard disk drive quality has dropped over the last few years.

* Most consumer drives over 2TB have extremely poor reliability. Just check any Amazon or Newegg review (DOA and early mortality show up with more frequency). Yes, I know using reviews are not accurate but since there is no public information of drive failure rates there is not really much to go on.

* The reduction of manufacturer warranty since Thailand floods. Surprise, they never changed it back to the original 3 year warranty.

If you have a large array of disks, there is nothing to really to worry about. If you have a small set of drives, spend a little extra and get the "Black" or RE drives with 5 year warranty. Avoid any "Green" drive.

JoeAltmaier12y ago

I have to suspect its because bleeding-edge drives have to be over-rated to compete. They have better margins which should mean they could afford better quality. But they can barely make the drive at all; they want to ship before reliability is quite there; so top-end drives are sketchy.

pedrocr12y ago

Why avoid the green drives? I assumed those were less power hungry and spinning slower so more reliable. I've been ordering them for RAID5 arrays and not had too many issues yet.

jws12y ago

Greens have had problems with aggressive head parking. If you have an idle set of them you can go through their design limit of head parks in a couple of months and start to get failures shortly after. Done that.

Check your S.M.A.R.T data. Look at the head park number. (Load cycles I think it is called, can't look it up now). If it is a six digit number, you are in trouble. For a server you want if to be in the same order as to power ups. Anything else and you have to explain to yourself "why?"

Edit: adding. The 1TB and smaller greens were disasters. I ruined a lot of them. I was told all of the 2TB and up greens didn't have head park issues, but spent part of last week replacing a storage unit populated with 2TB greens when a spindle failed (>200 unrecoverable blocks) and found that some of the 2TB greens were load cycling into the 200000 range, others weren't running up. They were all identical models purchased at the same time. Maybe they had different firmware? I replaced hem with REDs. They aren't supposed to park and they won't try to recover a bad sector for more than a few seconds so the don't hang your RAID when they get bad sectors.

toyg12y ago

My experience is that they spin down way too often for server usage, and eventually break much faster than others.

Some of them were also crippled in firmware so you couldn't use them in RAID1 arrays, but this might have changed.

http://www.newegg.com/Product/Product.aspx?Item=N82E16822236...

xeroxmalf12y ago

The RED drives you are refering to actually have some of the worst reviews around compared to even normal consumer drives. (Source: newegg reviews you mentioned)

leeoniya12y ago

i'm not sure where you're getting that info:

http://www.amazon.com/WD-Red-NAS-Hard-Drive/dp/B008JJLZ7G

been running 3x of these in a raid-5 NAS, no issues so far (not that it's any kind of indicator on a system which idles as a backup all day)

j4512y ago

Managing harddrives, especially in redundant setups can be helped in one small way if you're sure to:

1) select the make and model of drive you want

2) buy the same model of drives from multiple vendors which have different serial and build numbers.. even if you're buying two drives, buy each from separate locations or vendors.

3) mix up the drives to make sure they don't die. place stickers of purchase date and invoice number on each drive to keep them straight.

This all.. because when one drive goes due to a defect or hitting a similar MTBF, other ones with a close by serial number or build number can tend to die around the same time for similar reasons.

From owning hard drives over 8 or 9 generations of replacing or upgrading since the 90's on all types of servers, desktops and laptops: The day you buy a new piece of equipment is the day you buy it's death. Manage the death proactively as it gets more and more tiring to deal with it each time.

smoyer12y ago

Backblaze may not know because they are "a company that keeps more than 25,000 disk drives spinning all the time". After 3-5 years, you'd better have a back-up of a drive you choose to spin down. Every drive I've lost (in the last 10-15 years and ignoring two failed controllers subject to a close lightning strike) failed to start back up when I had powered the machine off for maintenance.

barrkel12y ago

I've bought perhaps 50 drives in the past 20 years, and maybe 10 of them died, the others mostly becoming obsolescent. I only started taking serious logs about 6 years ago.

Drives have died for me both in 24/7 powered systems and through power cycles. Drives have reported intermittent failures for many months, but still lived for years without any actual data loss. The oldest drive I still have spinning is a 200G IDE containing the OS for my old OpenSolaris zfs NAS; must be getting on for 9 years.

I advise having a back-up of every drive you own, preferably two. I built a new NAS last week, 12x 4G drives in raidz2 configuration; with zfs snapshots, it fulfills 2 of the 3 requirements for backup (redundancy and versioning), while I use CrashPlan for cloud backup (distribution, the third requirement). Nice thing about CrashPlan is my PCs can back up to my NAS as well, so restores are nice and quick, pulling from the internet is a last resort.

mikevm12y ago

The one thing to know about cheap consumer cloud backup solutions like CrashPlan and Backblaze is that they only have one copy of your data. So if their RAID array where your data is stored dies and cannot be rebuilt, it's all gone. You can Google for a few disaster stories about both companies.

http://research.microsoft.com/pubs/144888/eurosys84-nighting...

nknighthb12y ago

These numbers line up nicely with what I've experienced on much smaller scales (I've never personally cared about more than a few hundred spinning drives at once), which is that in a nice mix of old, middle-aged, and new drives, 5-10% go kaput each year.

Incidentally, about "consumer-grade drives", the last time I looked into this, I was led to believe that if it's SATA and 7200RPM (or less), there's no hardware distinction. It's just firmware. Consumer drives try very hard to recover data from a bad sector, while Enterprise/RAID drives have a recovery time limit to prevent them being unnecessarily dropped from an array (which will have its own recovery mechanisms). That's it.

jve12y ago

Well Intel tells us different story[1] that promises 1. More performance 2. Less vibration (improves performance) 3. ECC Memory

There is a long feature reference that mentions things like: higher RPM, more quality, larger magnets, air turbulance control, dual processors, etc.

I'm not a spec in hard drives, just that I remember reading this stuff when trying to figure out do I need it. In the end, For my small-scale corporate file server, I chose zfs raidz with consumer grade disk drives.

[1] Enterprise-class versus Desktop-class Hard Drives: http://download.intel.com/support/motherboards/server/sb/ent...

nknighthb12y ago

A marketing team at Intel tells us a vague story about what is either a very vague or very specific set of drives, or may be about an entirely hypothetical set of drives. It's not clear.

They even admit to the problem themselves at the end:

"Some hard drive manufactures may differentiate enterprise from desktop drives by not testing certain enterprise-class features, validate the drives with different test criteria, or disable enterprise-class features on a desktop class hard drives so they can market and price them accordingly. Other manufacturers have different architectures and designs for the two drive classes. It can be difficult to get detailed information and specifications on different drive modes."

That PDF tells me nothing interesting. It's marketing crap for clueless executives, not a technical analysis. (Given their absurd obsession with "Higher RPM" as some sort of defining characteristic, it's not even relevant to the statement I made in the first place.)

brongondwana12y ago

I wonder how many will die before they age out of usefulness. If you're still spinning 250Gb hard disks which are using the same power and space that a 4Tb drive could be using - it might not actually be economically sensible to keep running them.

Certainly the old 9.1Gb SCSI disks that were so popular 10 years ago are well past be justifiable to give power to now.

outworlder12y ago

That may be true for blackblaze.

But these drives will still be useful. What about, say, shipping them to ONGs located in Africa?

skriticos212y ago

That's a horrible idea! The logistics to collect and ship the old tech to Africa alone would probably cost more then just buying it bulk from China and ship in one batch.

But there are other considerations:

* This would also result in a big pile of waste in Africa, as their recycling infrastructure is limited.

* They need food, shelter, stable politics and functional education before they can make any use of computers.

* They have limited energy supply. Low powered tablets / laptops are much more useful.

4 more replies

barrkel12y ago

I suspect the overheads of collection, shipping and distribution would dwarf the savings from not buying the optimal price/GB drive du jour.

Hard drive space per dollar grows exponentially, and they're big weighty things. The window of time where it would be economical to reuse would be short, and value dubious.

DanBC12y ago

Here's a tv programme about the e-waste dump in Agbogbloshie, Accra, Ghana. It's probably on the Internet somewhere. (http://www.bbc.co.uk/programmes/b00sch78)

nodata12y ago

Even with the energy used for shipping them end-to-end?

mgraczyk12y ago

Microsoft did a metaanalysis on general hardware failure based on the error reports sent by literally millions of consumer PCs. Although the results weren't particularly interesting (Hard drives fail the most, with rates consistent with what backblaze observed in the posted link), I was impressed by the sheer volume of data available to the study.

ComputerGuru12y ago

Thank you, this is a real gem. I'm very grateful for MS Research in general, as they do some very interesting things that are only possible when you're the size of Microsoft, et. al. but I really do wish more academic papers came out of huge institutions. This knowledge really is worth sharing!

mavhc12y ago

tl;dr: Here's some statistically significant data on how 25000 drives have worked over 4 years, please now comment on how the 3 drives you've owned died.

atYevP12y ago

I enjoyed this comment.

confluence12y ago

You can use default warranty information to figure out the lower bound on useful life. Companies price the value of the warranty into the product and perform statistical QA to ensure that 95%-99% of all products released will work correctly for the length of the standard warranty. Also added warranties aren't worth the cost. Just replace the product when it breaks.

JoeAltmaier12y ago

Agreed. Pretty much they assume you won't return a failed drive either. Since they last about as long as the warranty, you have only a few weeks at best to remember to send it in for replacement.

I've been hit-and-miss, gotten a few drives replaced, had a few warranties expire. But pretty much every disk drive fails eventually.

Think about it - its a commodity. If it lasted much longer than the warranty, they spent too much on robustness for the price.

toyg12y ago

If some of them live a long, long time, it makes it hard to compute the average. Also, a few outliers can throw off the average and make it less useful.

Proper statistical analysis would help you there.

marcosdumay12y ago

> Proper statistical analysis would help you there.

Yes, if you know the probability distribution. If you don't know the distribution, you can not calculate your confidence, and thus can not do a proper statistical analysis.

And, guess what, nobody knows the probability distribution of hard drive failures. That's exactly what they are trying to find out.

xixi7712y ago

There are actually many methods in survival analysis -- just as in the rest of statistics -- that do not impose strict distributional assumptions, and account for the fact that many drives are still operational. But as someone else mentioned, the median is also a good statistic to report :)

InclinedPlane12y ago

Simply using a median solves some of these problems pretty easily.

thatthatis12y ago

Articles like this one are the reason I went with backblaze over carbonite. It may not mean their tech is any better, but it does 1) increase my confidence in them and 2) teach me something interesting each time. Both of those are, in my book, good reasons for giving them my money.

atYevP12y ago

And we love your money! Please tell more people to give us your money ;-) Seriously though, we're glad we can entertain you and help you back up. We figure being open about this stuff leads to awesome discussion and sometimes, like in the case of our storage pods, we learn a thing or two from the world at large! It's a win/win :)

e12e12y ago

I only wish you'd consider a (possible different-brand) spin-off that targeted server backup and/or power users :-)

Amazon could use some competition in this space, IMNHO.

JoeAltmaier12y ago

Its complicated. Here's a link to a paper modeling disk drive failure in data centers. tl;dr: its about half a percent per 1000 hrs operation.

http://www.ssrc.ucsc.edu/Papers/xin-mascots05.pdf

jankey12y ago

Approximately 40 PB raw storage in our datacentres here, half of them Supermicro servers with whatever disk that came, half HP Proliant with $$$ HP Enterprise class disks, all < 5 years old, so quite comparable to the Backblaze situation.

80% drives surviving after 5 years seems right, this is what we're seeing as well. The hardware is decommissioned faster then the drives fail.

nether12y ago

Does ANYONE know what hard drives Google, Facebook, and Dropbox use at their datacenters? This 2006 article says Google buys Seagate: http://tech.fortune.cnn.com/2006/11/16/seagate-ceo-google-we....

wmf12y ago

There are only three disk vendors now and I would assume that large customers buy from at least two of them. But knowing this information won't help you, because a new model from company X may be a dud even if the company's previous models were reliable. By the time any model has enough accumulated reliability data that you can tell whether it's reliable or not it's obsolete and you don't want to buy it.

mvgoogler12y ago

This is the only publicly published information that's even slightly relevant (AFAIK): http://research.google.com/archive/disk_failures.pdf. It doesn't mention any manufacturers by name, but you should be able to draw some inferences from the paper.

I'm not sure that the information would be all that valuable anyway. Google's data-center environments, workloads and requirements are likely pretty different than your environment and requirements, so I'm not sure how the information would be useful?

ck212y ago

Part of the problem is the move by manufacturers to have consumers basically burn-in test their products for them as cost reduction and shift the expense to retailers.

svantana12y ago

Well ultimately it's the consumer/user who does the work. I've considered buying second-hand disks just to get around this problem. Although that usually comes with other issues...

WatchDog12y ago

Do they plan on sharing any data on which vendors and models have the highest failure rates?

nisa12y ago

Backblaze wrote about it here (2011): http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v... and here (2013): http://blog.backblaze.com/2013/02/20/180tb-of-good-vibration...

We are constantly looking at new hard drives, evaluating them for reliability and power consumption. The Hitachi 3TB drive (Hitachi Deskstar 5K3000 HDS5C3030ALA630) is our current favorite for both its low power demand and astounding reliability. The Western Digital and Seagate equivalents we tested saw much higher rates of popping out of RAID arrays and drive failure. Even the Western Digital Enterprise Hard Drives had the same high failure rates. The Hitachi drives, on the other hand, perform wonderfully.

In the second article they say that WD-RED are 2nd in reliability (WD-RED did not exist 2011). I'm happy that I've got a cheap Hitachi Ultrastar. But who knows.

As a personal anecdote: WD-Green failure rates are huge here. 24/7 Desktop machines, 240 drives. I've replaced in last 12 month at least 20 drives.

chemmail12y ago

While WD has taken over Hitachi, they had to give Toshiba Hitachi's desktop 3.5" HDD operations in part of the deal of taking over Toshiba's (TSDT) Thailand factory. So if you want Hitachi's good 'ol reliability, buy Toshiba desktop drives.

atYevP12y ago

Yev on Backblaze | We actually hope to see if we can collect enough data to release a "per manufacturer" report on had drive failure. We're currently collecting that data, but as always, follow the Backblaze Blog for info on that type of stuff!

yen22312y ago

From what I understand, at this point all 3 major hard disk vendors have very similar failure rates. The thing to look out for is that certain runs of drives may be "contaminated" during the manufacturing process, but knowing which ones is pretty unpredictable.

JoeAltmaier12y ago

Also some combinations of devices can create problems. We had a RAID controller that randomly rebuild mirrored sets because the drive was 'dead'. After the rebuild the drive seemed fine. We concluded there was some firmware delay that caused the controller to time out some operation and call the drive 'failed'.

We replaced the drives with a different brand and the 'failures' went away.

leapius12y ago

lol my money is on Seagate.. I thought it amusing how they rebadged their crap drives as Samsungs when they bought the name because of their very good rep in the sector.

nsxwolf12y ago

My oldest drive currently in use is the original 20MB disk on my IBM PS/2 Model 30.

[0] http://ascii.textfiles.com/archives/3191

cmer12y ago

Anybody knows how long floppy disks, diskettes and their respective drives last?

Odd question, but I've always been wondering. These things just seem to hast forever.

lgeek12y ago

According to this article[0], the data from old floppy disks is pretty much gone.

stormbrew12y ago

I recently went through all my old 3.5" floppies I could find and most of them were still quite readable. This seems like a bit of hysteria to me. I didn't preserve as many of my 5.25" floppies, but I also don't have a drive for them. My 3.5" drive is an old USB one that came from a Toshiba laptop circa 2000. It was in a drive bay that was replaceable with a cdrom drive.

bcl12y ago

Anecdotal data-point: a few years ago I rescued the data off my 3.5" floppies from the early 90's (IBM PC and Atari ST formats) and they all seemed to be fine.

I don't have any hardware to read my pile to 5.25 Atari 800 disks.

jonlucc12y ago

Here's Google's 2007 survey.

http://static.googleusercontent.com/external_content/untrust...

dmourati12y ago

Ugh. Backblaze is one of those companies with an extraordinarily poor design that they flout and "open source" as if anyone would follow their lead. Take a look at the physical design of their system and combine that with the published data. Consider that to remove any harddrive from their setup requires physically removing a 4u rackmount storage pod from the rack. http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v...

Also, no hardware raid, battery, or cap.

Source: worked at Eye-Fi, built 2PB storage

brianwski12y ago

Disclaimer: I work at Backblaze but I'm on the software side, I barely ever touch the storage pods anymore.

It is not true that the pod team must remove the 4U server from the rack. It is slid out like a drawer (no tools required, takes maybe 10 seconds). The drive or motherboard is then replaced, then you slide the drawer back in. So the 4U server must slide 18 inches one way, but zero cables have to be unplugged or replugged when done. This only takes one technician and no "server lift", the drawer supports all the weight.

I'm not defending this design, just correcting a mistake. Backblaze frankly "makes do" with this design because nobody will step up and make anything that fits our needs better. The number 1 criteria is total system cost over the lifetime of the system INCLUDING all the time spent on salaries of datacenter techs dealing with the pods. "raw I/O performance" is not that important for backup, so trying to sell us an awesome EMC or NetApp that costs 10x as much and has 10x the raw I/O performance is not very compelling to us. But if you came up with a design making it faster for our datacenter technicians to replace a drive faster while not significantly increasing overall costs in another area, we SURELY would listen.

dmourati12y ago

Thanks for the clarification. That the PODs were on rails was never made clear to me. Still, I count that as "physically removing a 4u rackmount storage pod." Those suckers cannot be light. 10 seconds sounds rather fast. I don't imagine you could do it that fast for any of the upper pods.

While I don't recommend them outright, we settled on 3U boxes from SuperMicro. http://www.supermicro.com/products/chassis/3u/837/sc837e26-r...

We somewhat affectionately dubbed them "mullets" as in business in the front, party in the rear.

They make 4U devices as well. Cost was about $1000. We added LSI Megaraid 9280 controllers, about another $1500 and ran min-SAS back to a controller node responsible for 4 JBODs.

patrickg_zill12y ago

HW Raid is a PITA for the following reasons:

1. you have to muck around with more firmware and sometimes reboot in order for changes to take effect

2. if a controller dies, you have to replace it with (almost) the exact same controller in order to read the data

3. Datacenters rarely lose power, take the HW raid money and instead put servers on true A+B power feeds.

CPUs are so fast these days that they can easily handle in software all the "stuff" that HW raid used to do.

baruch12y ago

They do a different tradeoff here. There is no need for a hardware raid if performance is not your main concern (and even if it is hardware raid is no panacea), if they save everything to disk before acknowledging it they don't need a battery and I'm not sure what you refer to as cap.

Their hardware design is specifically geared towards their use-case and I applaud them for knowing how to optimize for their use-case. I wouldn't use it for mine but only because it's not a good fit.

They can open-source the hardware because the real secret sauce is the software and the hardware open sourcing gives them a nice edge in marketing.

dmourati12y ago

cap=capacitor. http://www.lsi.com/downloads/Public/MegaRAID%20SAS/MegaRAID%...

Edited to add: They've optimized for hardware purchase price and given up reliability (HW RAID, battery, cap), performance, and maintainability. The strange thing is the overall cost of the storage system is driven by power, not purchase price. Smarter RAID controllers, like I link above, let you manage power by spinning down disks as they are unused and thereby reducing your power draw. Can't do that with SW RAID that I've ever seen. Take a look at Amazon Glacier which I suspect is using this power-off strategy to drastically reduce their costs.

http://attrition.org/errata/charlatan/steve_gibson/

pyGuru12y ago

Just out of curiosity, I wonder what the savings would amount to if this company used something like SpinRite to fix / recover failed drives? Although I've never used it from what I hear its pretty good at saving drives...

jlgaddis12y ago

wazoox12y ago

It's actually not that good at saving drives, however it sometimes saves your data. Lately I've used it to recover a 250 GB drive that was part of a failed RAID-5 array, and miraculously got the data back.

So SpinRite may be handy, but throw the drive away after use.

maratd12y ago

It depends on the warranty period of the drive. Your hard drive manufacturer knows precisely how long the drive will last and sets their warranty period to expire right before your drive gives up the ghost.

gambiting12y ago

Yeah, if that was true that would be completely against EU regulations,so you could sue manufacturers and win millions. So if you have any proof that they are doing that, you are practically a millionaire, I don't know why are you not on your way to the court yet.

maratd12y ago

You realize drives are made for specific markets, branded for specific markets, and warranty periods are specified on a per market basis?

Not everyone lives in the EU. In fact, the majority of people don't.

Outside your regulation happy haven, warranty periods aren't random and do indicate durability under normal use.

driverdan12y ago

How so? That is how warrantees work. They're designed to protect you against premature failure. Unless it's a necessary competitive advantage no company is going to warrantee something past its expected lifetime. And in that case it's going to be included in the price.

deathcakes12y ago

The manufacturer has some good information about the statistical likelihood of your drive failing within a certain time period and tunes the warranty accordingly. They do not know exactly when your specific drive will fail as this would require some form of time travel.

cenhyperion12y ago

> Your hard drive manufacturer knows precisely how long the drive will last

Care to back that up with any real data instead of baseless consumer speculation relying on time travel?

fletchowns12y ago

They don't know exactly, but they have a very good idea. It's all based on statistics.

headgasket12y ago

We dont know, but backblaze sure knows how to: 1. print backblaze's brand as often as linguistically possible in the same article. 2. Get to the top of HN over a 50 yrs old technology's failure rate without clustering for brand spin speed density etc. (read not much)

Am I unaware that there are new paid spots on the first page of HN? (it would make sense I guess, from a business perspective)

TIA to anyone that can be of help on this, cheerio, (and good luck to Blackblaze, backblaze a path to a backblazing success!)

nwh12y ago

There's always been paid articles. This is not one of them. They get to the front page because they're usually well written, contain interesting information, and are relevant to the HN user base.

Don't like it? Don't vote for it.

atYevP12y ago

Yev from Backblaze | One might say that we are "Backblazing a trail"? To my knowledge we've never paid for a HN top article. People tend to like our pod and storage related stuff and we're always thrilled to see it on here, so we come and chat about it along with everyone. The discussions are awesome. I'll chat with the writing team to see if we can take out one or two Backblaze's from future posts ;-)

headgasket12y ago

Hey, it was tongue-in-cheek. Your service looks awesome, and as an awareness piece it was bang on. Although I don't recall hearing about backblaze before, the name has been ringing in my ears all day. Keep up the good work and good luck!

Ellipsis75312y ago

Damn, my last two harddrives have failed in around 3 years exactly. Did I have bad luck or am I being too mean to them? My computer is on mostly all the time and is often reading files throughout the night (for slow uploads for example). Does it make a difference how often you read/write a drive or only the spun-up time? One died suddenly without any warning in the SMART data and the other got badblocks and started to struggle reading data.

cbhl12y ago

The data presented in the article only makes sense if you buy a large quantity of hard drives; if you only have had a handful then you were just unlucky.

I suspect the reason why people do "burn in" tests on hard drives is to make drives that suffer from early failure ("infant mortality" as described in the article) fail early enough that you can RMA them with the manufacturer. Apart from that, I don't think there's much you can do to improve your chances.

stygiansonic12y ago

Well, it's hard to say as two is not really an adequate sample size.

The article actually makes this point (about anecdote), but their data suggests that failure rates do rise substantially after three years.

theandrewbailey12y ago

Extremely useful info here. Most of my HDs have been running for years and still work fine, but you go online, and all you read about is that HDs are horrendously unreliable and all fail after a year or two. (manufacturer propaganda, anyone?)

The optical drives I've had, on the other hand, are actually unreliable. They all seem to break down after about four years, and I don't use them all that often!

baruch12y ago

Your disks are going fine, do you feel the urge to report about it in the reviews on amazon or newegg or where you bought it from?

There is an inherent bias in the reviews. Which makes the backblaze report so interesting, they have less of a bias though they do not report actual disk vendors and models to really draw direct inference only the general trend.

leapius12y ago

I think the floods has probably been a factor in reduced reliability. It took forever for prices to come down to where they were and the manufacturers are probably cutting corners everywhere to save costs. Why ramp up factories when the tech itself is on the way out?

I think this is most evident in the reduced warranty periods compared to before when 5 years was quite normal.

atYevP12y ago

Yev from Backblaze | Yes, prices still haven't come back down to pre-flood levels unfortunately. They used to drop about 4% per month in price (over time) but now it's going down more slowly.

Semaphor12y ago

One pure storage disk I use has been alive with top S.M.A.R.T. values for over 8 years now. The one with more regular reads & writes is 5 years old now. And while I have local backups I'm now finally in the progress of uploading 800GB of (semi-) important data to backblaze. I'll probably be done in another month.

bobbles12y ago

Since there seem to be backblaze staff posting in the thread, is there anyway as a 'personal' (non-business) user to have multiple PCs configured with one account? Think of something like a family plan.

Can't seem to find relevant information on the website anywhere for this.

wrongc0ntinent12y ago

It'd be very useful to have more detailed information about read/write volume and capacity, MTBF should vary a lot depending on those. Until then I'll keep being paranoid.

ffrryuu12y ago

Has blackblaze budgeted for the cliff failure rate that is coming?

atYevP12y ago

Yev from Backblaze | Somewhat. Luckily, not all the drives fail at the same time. If a drive fails in one pod we replace it, so even if there is a large fall-off after 5-6 years, it won't be like we are losing a majority of our farm in one fell swoop. We buy our gear as we need it and have lots of drives to spare, so unless all drives automatically shut off after 6 years (which they don't) we should be OK ;-)

crpatino12y ago

Based on the linear extrapolation they make over the third year failure data into the future (to calculate an estimated half life of roughly 6 years), I would say they probably have not.

A little statistics is a dangerous thing.

ffrryuu12y ago

Well, as long as they IPO before that happens...

http://venturebeat.com/2011/03/07/western-digital-buys-hitac...

aaronz812y ago

I was going to say, I learned about this in class! Then I read the article and the link to "CMU’s study." ... My professor was one of the authors. Go Garth Gibson!

mrottenkolber12y ago

Spot on, I have experienced my drives to either fail quite soon, or "never". I am still running one of my very early 40GB hard disks. It must be like 8 years old.

kenneth_reitz12y ago

Not long.

yen22312y ago

Surprisingly long, all things considered.

kimonos12y ago

Wow! Great info in here!

j / k navigate · click thread line to collapse

158 comments

wazoox12y ago

There are extremely large difference in reliability between different drives makes and models. Here are a couple of numbers (we're building storage servers, running 24/24 in various environments):

* Of the few WD drives we installed, we replaced about 10% in the past 3 years. Not exactly impressive, but not significant either.

* We replaced a number of Seagate Barracudas with Constellations, and these seem to be reliable so far, however the numbers aren't significant enough (only about 120 used in the past 2 years).

atYevP12y ago

lingben12y ago

But aren't Hitachi drives now owned by WD?

pavs12y ago

But isn't it cheaper on the long run if your drives fail less often?

TrainedMonkey12y ago

I remember seeing somewhere that failure rates correspond strongly with number of platters (Something along the lines that doubling platters doubles chance of failure within 3 years).

drzaiusapelord12y ago

Sysadmin here. My experience:

1. Infant mortality. Drives fail after a couple months of use.

2. 3 year mark. This is where fails begin for typical work loads.

3. 4-6 year mark. This is when you can expect the drives that haven't failed earlier to fail. By this point, we're looking at 33% fail.

Interesting that my experiences roughly match up with Chart 1.

rsync12y ago

Agreed RE: SSD drives ...

It is very disappointing how flaky and unreliable SSD devices have been when their promise was just the opposite, due to lack of moving parts.

But they ran forever. I never had one part fail. Just plain old CF drives mated directly to the IDE interface.

baruch12y ago

If you'd still follow up on your idea of using a read-only root like you did with CF cards and figured a safe place for the logs you could still use the SSDs in the same mode. Why not go that route?

noonespecial12y ago

It was a huge win for uptime.

baruch12y ago

I'd echo the sentiment seen elsewhere in the comments about Seagate drives vs. Hitachi drives. Both for SATA and NL-SAS. Hitachi 1TB were rock solid compared to Seagate.

tracker112y ago

velodrome12y ago

The hard disk drive quality has dropped over the last few years.

* The reduction of manufacturer warranty since Thailand floods. Surprise, they never changed it back to the original 3 year warranty.

JoeAltmaier12y ago

pedrocr12y ago

Why avoid the green drives? I assumed those were less power hungry and spinning slower so more reliable. I've been ordering them for RAID5 arrays and not had too many issues yet.

jws12y ago

toyg12y ago

My experience is that they spin down way too often for server usage, and eventually break much faster than others.

Some of them were also crippled in firmware so you couldn't use them in RAID1 arrays, but this might have changed.

http://www.newegg.com/Product/Product.aspx?Item=N82E16822236...

xeroxmalf12y ago

The RED drives you are refering to actually have some of the worst reviews around compared to even normal consumer drives. (Source: newegg reviews you mentioned)

leeoniya12y ago

i'm not sure where you're getting that info:

http://www.amazon.com/WD-Red-NAS-Hard-Drive/dp/B008JJLZ7G

been running 3x of these in a raid-5 NAS, no issues so far (not that it's any kind of indicator on a system which idles as a backup all day)

j4512y ago

Managing harddrives, especially in redundant setups can be helped in one small way if you're sure to:

1) select the make and model of drive you want

2) buy the same model of drives from multiple vendors which have different serial and build numbers.. even if you're buying two drives, buy each from separate locations or vendors.

3) mix up the drives to make sure they don't die. place stickers of purchase date and invoice number on each drive to keep them straight.

This all.. because when one drive goes due to a defect or hitting a similar MTBF, other ones with a close by serial number or build number can tend to die around the same time for similar reasons.

smoyer12y ago

barrkel12y ago

I've bought perhaps 50 drives in the past 20 years, and maybe 10 of them died, the others mostly becoming obsolescent. I only started taking serious logs about 6 years ago.

mikevm12y ago

http://research.microsoft.com/pubs/144888/eurosys84-nighting...

nknighthb12y ago

jve12y ago

Well Intel tells us different story[1] that promises 1. More performance 2. Less vibration (improves performance) 3. ECC Memory

There is a long feature reference that mentions things like: higher RPM, more quality, larger magnets, air turbulance control, dual processors, etc.

[1] Enterprise-class versus Desktop-class Hard Drives: http://download.intel.com/support/motherboards/server/sb/ent...

nknighthb12y ago

A marketing team at Intel tells us a vague story about what is either a very vague or very specific set of drives, or may be about an entirely hypothetical set of drives. It's not clear.

They even admit to the problem themselves at the end:

brongondwana12y ago

Certainly the old 9.1Gb SCSI disks that were so popular 10 years ago are well past be justifiable to give power to now.

outworlder12y ago

That may be true for blackblaze.

But these drives will still be useful. What about, say, shipping them to ONGs located in Africa?

skriticos212y ago

That's a horrible idea! The logistics to collect and ship the old tech to Africa alone would probably cost more then just buying it bulk from China and ship in one batch.

But there are other considerations:

* This would also result in a big pile of waste in Africa, as their recycling infrastructure is limited.

* They need food, shelter, stable politics and functional education before they can make any use of computers.

* They have limited energy supply. Low powered tablets / laptops are much more useful.

4 more replies

barrkel12y ago

I suspect the overheads of collection, shipping and distribution would dwarf the savings from not buying the optimal price/GB drive du jour.

Hard drive space per dollar grows exponentially, and they're big weighty things. The window of time where it would be economical to reuse would be short, and value dubious.

DanBC12y ago

Here's a tv programme about the e-waste dump in Agbogbloshie, Accra, Ghana. It's probably on the Internet somewhere. (http://www.bbc.co.uk/programmes/b00sch78)

nodata12y ago

Even with the energy used for shipping them end-to-end?

mgraczyk12y ago

ComputerGuru12y ago

mavhc12y ago

tl;dr: Here's some statistically significant data on how 25000 drives have worked over 4 years, please now comment on how the 3 drives you've owned died.

atYevP12y ago

I enjoyed this comment.

confluence12y ago

JoeAltmaier12y ago

Agreed. Pretty much they assume you won't return a failed drive either. Since they last about as long as the warranty, you have only a few weeks at best to remember to send it in for replacement.

I've been hit-and-miss, gotten a few drives replaced, had a few warranties expire. But pretty much every disk drive fails eventually.

Think about it - its a commodity. If it lasted much longer than the warranty, they spent too much on robustness for the price.

toyg12y ago

If some of them live a long, long time, it makes it hard to compute the average. Also, a few outliers can throw off the average and make it less useful.

Proper statistical analysis would help you there.

marcosdumay12y ago

> Proper statistical analysis would help you there.

Yes, if you know the probability distribution. If you don't know the distribution, you can not calculate your confidence, and thus can not do a proper statistical analysis.

And, guess what, nobody knows the probability distribution of hard drive failures. That's exactly what they are trying to find out.

xixi7712y ago

InclinedPlane12y ago

Simply using a median solves some of these problems pretty easily.

thatthatis12y ago

atYevP12y ago

e12e12y ago

I only wish you'd consider a (possible different-brand) spin-off that targeted server backup and/or power users :-)

Amazon could use some competition in this space, IMNHO.

JoeAltmaier12y ago

Its complicated. Here's a link to a paper modeling disk drive failure in data centers. tl;dr: its about half a percent per 1000 hrs operation.

http://www.ssrc.ucsc.edu/Papers/xin-mascots05.pdf

jankey12y ago

80% drives surviving after 5 years seems right, this is what we're seeing as well. The hardware is decommissioned faster then the drives fail.

nether12y ago

Does ANYONE know what hard drives Google, Facebook, and Dropbox use at their datacenters? This 2006 article says Google buys Seagate: http://tech.fortune.cnn.com/2006/11/16/seagate-ceo-google-we....

wmf12y ago

mvgoogler12y ago

ck212y ago

Part of the problem is the move by manufacturers to have consumers basically burn-in test their products for them as cost reduction and shift the expense to retailers.

svantana12y ago

Well ultimately it's the consumer/user who does the work. I've considered buying second-hand disks just to get around this problem. Although that usually comes with other issues...

WatchDog12y ago

Do they plan on sharing any data on which vendors and models have the highest failure rates?

nisa12y ago

Backblaze wrote about it here (2011): http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v... and here (2013): http://blog.backblaze.com/2013/02/20/180tb-of-good-vibration...

In the second article they say that WD-RED are 2nd in reliability (WD-RED did not exist 2011). I'm happy that I've got a cheap Hitachi Ultrastar. But who knows.

As a personal anecdote: WD-Green failure rates are huge here. 24/7 Desktop machines, 240 drives. I've replaced in last 12 month at least 20 drives.

chemmail12y ago

atYevP12y ago

yen22312y ago

JoeAltmaier12y ago

We replaced the drives with a different brand and the 'failures' went away.

leapius12y ago

lol my money is on Seagate.. I thought it amusing how they rebadged their crap drives as Samsungs when they bought the name because of their very good rep in the sector.

nsxwolf12y ago

My oldest drive currently in use is the original 20MB disk on my IBM PS/2 Model 30.

[0] http://ascii.textfiles.com/archives/3191

cmer12y ago

Anybody knows how long floppy disks, diskettes and their respective drives last?

Odd question, but I've always been wondering. These things just seem to hast forever.

lgeek12y ago

According to this article[0], the data from old floppy disks is pretty much gone.

stormbrew12y ago

bcl12y ago

Anecdotal data-point: a few years ago I rescued the data off my 3.5" floppies from the early 90's (IBM PC and Atari ST formats) and they all seemed to be fine.

I don't have any hardware to read my pile to 5.25 Atari 800 disks.

jonlucc12y ago

Here's Google's 2007 survey.

http://static.googleusercontent.com/external_content/untrust...

dmourati12y ago

Also, no hardware raid, battery, or cap.

Source: worked at Eye-Fi, built 2PB storage

brianwski12y ago

Disclaimer: I work at Backblaze but I'm on the software side, I barely ever touch the storage pods anymore.

dmourati12y ago

While I don't recommend them outright, we settled on 3U boxes from SuperMicro. http://www.supermicro.com/products/chassis/3u/837/sc837e26-r...

We somewhat affectionately dubbed them "mullets" as in business in the front, party in the rear.

They make 4U devices as well. Cost was about $1000. We added LSI Megaraid 9280 controllers, about another $1500 and ran min-SAS back to a controller node responsible for 4 JBODs.

patrickg_zill12y ago

HW Raid is a PITA for the following reasons:

1. you have to muck around with more firmware and sometimes reboot in order for changes to take effect

2. if a controller dies, you have to replace it with (almost) the exact same controller in order to read the data

3. Datacenters rarely lose power, take the HW raid money and instead put servers on true A+B power feeds.

CPUs are so fast these days that they can easily handle in software all the "stuff" that HW raid used to do.

baruch12y ago

Their hardware design is specifically geared towards their use-case and I applaud them for knowing how to optimize for their use-case. I wouldn't use it for mine but only because it's not a good fit.

They can open-source the hardware because the real secret sauce is the software and the hardware open sourcing gives them a nice edge in marketing.

dmourati12y ago

cap=capacitor. http://www.lsi.com/downloads/Public/MegaRAID%20SAS/MegaRAID%...

http://attrition.org/errata/charlatan/steve_gibson/

pyGuru12y ago

jlgaddis12y ago

wazoox12y ago

So SpinRite may be handy, but throw the drive away after use.

maratd12y ago

gambiting12y ago

maratd12y ago

You realize drives are made for specific markets, branded for specific markets, and warranty periods are specified on a per market basis?

Not everyone lives in the EU. In fact, the majority of people don't.

Outside your regulation happy haven, warranty periods aren't random and do indicate durability under normal use.

driverdan12y ago

deathcakes12y ago

cenhyperion12y ago

> Your hard drive manufacturer knows precisely how long the drive will last

Care to back that up with any real data instead of baseless consumer speculation relying on time travel?

fletchowns12y ago

They don't know exactly, but they have a very good idea. It's all based on statistics.

headgasket12y ago

Am I unaware that there are new paid spots on the first page of HN? (it would make sense I guess, from a business perspective)

TIA to anyone that can be of help on this, cheerio, (and good luck to Blackblaze, backblaze a path to a backblazing success!)

nwh12y ago

There's always been paid articles. This is not one of them. They get to the front page because they're usually well written, contain interesting information, and are relevant to the HN user base.

Don't like it? Don't vote for it.

atYevP12y ago

headgasket12y ago

Ellipsis75312y ago

cbhl12y ago

The data presented in the article only makes sense if you buy a large quantity of hard drives; if you only have had a handful then you were just unlucky.

stygiansonic12y ago

Well, it's hard to say as two is not really an adequate sample size.

The article actually makes this point (about anecdote), but their data suggests that failure rates do rise substantially after three years.

theandrewbailey12y ago

The optical drives I've had, on the other hand, are actually unreliable. They all seem to break down after about four years, and I don't use them all that often!

baruch12y ago

Your disks are going fine, do you feel the urge to report about it in the reviews on amazon or newegg or where you bought it from?

leapius12y ago

I think this is most evident in the reduced warranty periods compared to before when 5 years was quite normal.

atYevP12y ago

Yev from Backblaze | Yes, prices still haven't come back down to pre-flood levels unfortunately. They used to drop about 4% per month in price (over time) but now it's going down more slowly.

Semaphor12y ago

bobbles12y ago

Can't seem to find relevant information on the website anywhere for this.

wrongc0ntinent12y ago

It'd be very useful to have more detailed information about read/write volume and capacity, MTBF should vary a lot depending on those. Until then I'll keep being paranoid.

ffrryuu12y ago

Has blackblaze budgeted for the cliff failure rate that is coming?

atYevP12y ago

crpatino12y ago

Based on the linear extrapolation they make over the third year failure data into the future (to calculate an estimated half life of roughly 6 years), I would say they probably have not.

A little statistics is a dangerous thing.

ffrryuu12y ago

Well, as long as they IPO before that happens...