Not that they want to, but I think Wikipedia could fund this using their current donations if they wanted. Hell, I almost wonder if one of the big storage providers would do it for free if they could do it in their staging environment so they get real traffic. It would be less good than real backups, but extra copies are still extra copies even if they're unreliable.
A good portion of the text on Wikipedia relies on Wayback Machine links to remain verifiable. If they lose that, I guess the editors might have to comb every page for information which would need to be either resourced or deleted.
You might be able to back up a significant portion of the unique data in IA if you limited it to text files. I think they probably have the highest information to file size ratio.
It’s also probably the most likely to already be back up, though. Interesting issue; you might also get somewhere by cutting the 50TB up into 10GB torrents (or 100GB or whatever, something reasonable for a consumer hard drive) and maybe adding a script that checks the torrent swarm stats to recommend a torrent to download.
Something where I run it, tell it I want to let it use 600GB, and it hands me torrent files for the least seeded 600GB. Maybe a super basic web UI so people can see how well backed up it is?
Unsure if people would sign on or not; I probably would. I’ve got 10 or so TB of NFS I’m not using I could chuck at it. I would guess there are other data hoarders out there who would do the same, but only if it were somewhat easy. I’m probably not going to volunteer to do an hour of rtorrent cleanup a week to make sure I’m backing up the right things.
This is a great question, and a state of the art kind of thing.
HDDs are sold with a lifetime drive read/write amount and power cycle warranty, along with usually some environmental operating envelope. read/write relates to the quality/space of the platter, power cycle is usually the actuator & read/write head being reseated/wearing out. Environment is the same as all other devices in a DC.
Most folks replace drives when they die (reads/writes stall or return garbage), or when the warranty runs out. Some will pay for a warranty exception, and some will just use the drive outside of warranty. Depending on how you use the drive, what environment it's in, etc changes how much you can push things.
I'd say anywhere from 4-8 years, depending on how it's used. In many cases it can be cheaper to have a worse environment for your fleet (thus using less power on hvac) and replace devices more frequently.
I remember for a long time (I'm talking 20-ish years back here), every hard drive I bought had double or more the capacity of every drive I'd ever bought previously combined. My first ever 40MB (yes, megabyte) drive got upgraded to an 80MB one, that got updated to a 250MB one, then a 750MB, and then a whopping 2GB drive (how would I _ever_ fill that up???) - and so on. That's slowed down some, but I'm currently starting to think about upgrading my 8TB drives (Raid1 pair) with 20TB drives when the prices start to drop a bit more.
Drives do 140-220MB/s depending on the LBA distance of the readhead, and that's not really changing. 160MB/s is very common.
So your 8TB drives, assuming 1MiB writes with a 20ms latency and 160MB/s, you can rewrite the drive ~155 times/year. At 20T this drops to ~62 times/yr.
Do people really replace their drives when the warranty runs out? Hard drive manufacturers won't provide data recovery on drives that fail under warranty[1]. It makes more economical sense to just run a drive until it dies. You'll end up paying the price for a new drive either way, but less often if you ignore the warranty expiring.
1: I discovered this myself when a Seagate drive containing some important data failed under warranty. If you're foolish enough to send them a failed drive with data you need recovered (like I was), all they'll do is throw it in the bin and send you a replacement drive.
1.71% a year failure rate if you care for the hardware as much as they do.
So the question becomes more like "how long does an average hard drive last while powered down and still reliably be able to power back up and be read?".
I'm fairly sure that is a lot longer than the single digit years that'd be the probably answer to your question.
I wonder if there are useful guidelines for long term storage of powered down hard drives? My gut feel is the major failure modes would be electrolytic capacitor failure, bearings sticking as the lubrication ages, and obseleting of the interfaces. I wonder how hard it'd be to find hardware that'd read my Mac SCSI hard drives from 25 years ago?
Easy… that original Mac is sitting in my basement and it worked like a charm last time it was powered on 4 years ago.
They are cheaper per Gio, and last significantly longer
You'd have to spend a lot more, because with that many drives, you need redundancy now.
I think with that many drives, you'd be losing them constantly, and I suppose you wouldn't know which ones until later (assuming you're doing an offline backup, if you aren't you have to factor in power costs).