The long road to recover Frogger 2 source from tape drives (opens in new tab)

(github.com)

510 pointsWhiteDawn3y ago212 comments

212 comments

Wow, this part makes my blood boil, emphasis mine:

> This issue doesn't affect tapes written with the ADR-50 drive, but all the tapes I have tested written with the OnStream SC-50 do NOT restore from tape unless the PC which wrote the tape is the PC which restores the tape. This is because the PC which writes the tape stores a catalog of tape information such as tape file listing locally, which the ARCserve is supposed to be able to restore without the catalog because it's something which only the PC which wrote the backup has, defeating the purpose of a backup.

Holy crap. A tape backup solution that doesn't allow the tape to be read by any other PC? That's madness.

Companies do shitty things and programmers write bad code, but this one really takes the prize. I can only imagine someone inexperienced wrote the code, nobody ever did code review, and then the company only ever tested reading tapes from the same computer that wrote them, because it never occured to them to do otherwise?

But yikes.

throw0101b3y ago

> Holy crap. A tape backup solution that doesn't allow the tape to be read by any other PC? That's madness.

What is needed is the backup catalog. This is fairly standard on a lot of tape-related software, even open source; see for example "Bacula Tape Restore Without Database":

* http://www.dayaro.com/?p=122

When I was still doing tape backups the (commercial) backup software we were using would e-mail us the bootstrap information daily in case we had to do a from-scratch data centre restore.

The first step would get a base OS going, then install the backup software, then import the catalog. From there you can restore everything else. (The software in question allowed restores even without a license (key?), so that even if you lost that, you could still get going.)

ilyt3y ago

Right, the on-PC database act as index to data on the tape. That's pretty standard.

But having format where you can't recreate the index from data easily is just abhorrently bad coding...

tinus_hn3y ago

Obviously to know what to restore, you need to index the data on the tapes. Tape is not a random access medium, there is no way around this.

This is only for a complete disaster scenario, if you’re restoring one PC or one file, you would still have the backup server and the database. But if you don’t, you need to run the command to reconstruct the database.

ShadowBanThis013y ago

There is a way around this: You allocate enough space at the beginning (or the end, or both) of the tape for a catalog. There are gigabytes on these tapes; they could have reserved enough space to store millions of filenames and indices.

2 more replies

IshKebab3y ago

Wouldn't it make sense to also write the backup catalog to the tape though? Seems like a very obvious thing to do to me.

throw0101b3y ago

> Wouldn't it make sense to also write the backup catalog to the tape though? Seems like a very obvious thing to do to me.

The catalog would be written to tape regularly: this is what would gets e-mailed out. But it wouldn't necessarily be written to every tape.

Remember that the catalog changes every day: you'd have Version 3142 of the catalog at the beginning of Monday, but then you'd back a bunch of clients, so that catalog would now be out-of-date, so Version 3143 would have to be written out for disaster recovery purposes (and you'd get an e-mail telling you about the tape labels and offsets for it).

In a DR situation you'd go through your e-mails and restore the catalog listed in the most recent e-mail.

1 more reply

fsckboy3y ago

you'd have to put the catalog at the end of the tape, but in that case you might as well rebuild the catalog by simply reading the tape on your way to the end (yeah, if the tape is partially unreadable blah blah backup of your backup...)

1 more reply

hnlmorg3y ago

Storing the catalogue on the PC is standard. But being able to rebuild that catalogue from scratch is also standard. I’ve not used any tapes before now where you couldn’t recover the catalogue.

jiggawatts3y ago

This type of thing a surprisingly common mistake, I've come across it several times in industry.

An example of this done right: If you disconnect a SAN volume from VMware and attach it to a completely different cluster, it's readable. You can see the VM configs and disks in named folders. This can be used for DR scenarios, PRD->TST clones, etc...

Done wrong: XenServer. If you move a SAN volume to a new cluster, it gets shredded, with every file name replaced by a GUID instead. The file GUID to display name mapping is stored in a database that's only on the hosts! That database is replicated host-to-host and can become corrupted. Backing up just the SAN arrays is not enough!

Nextgrid3y ago

I’d like to believe maybe that’s why the company went out of business but that’s just wishful thinking - a lot of incompetence is often ignored if not outright rewarded in business nowadays. Regardless, it’s at least somewhat of a consolation those idiots did go out of business in the end, even if that’s wasn’t the root cause.

Neil443y ago

I'm familiar with needing to re-index a backup if it's accessed from a 'foreign' machine and sometimes the procedure is non-obvious but just not having that option seems pretty bad.

bluedino3y ago

I worked for an MSP a million years ago and we had a customer that thought they had lost everything. They had backup tapes but the backup server itself had died, after showing them the 'catalog tape' operation, and keeping their fingers crossed for a few hours, they bought me many beers.

EvanAnderson3y ago

I always had the Customer keep a written log of which tapes were used on which days. It helped for accountability but also prevented the "Oh, shit, we have to catalog all the tapes because the log of which tapes were used on which day are on the now-failed server."

sumtechguy3y ago

That is not terribly surprising. The cheap tape drives of that era were very picky like that. Even if I had the same tape drive as a friend it was not always certain that I could read back my tape on his box and the other way around. These drives were very affordable and the tapes were low priced as well. However, they were really designed for the 'oh no I messed up my computer let me restore it' or 'I deleted a file I should not have' scenarios. Not server side backup rotation solutions. Unless that was the backup/restore computer. Long term storage or off site type drives were decently more pricy.

My guess is they lack the RAM buffer and CPU to properly keep up correctly. Then with a side of assumptions on the software side.

foolfoolz3y ago

this does not sound like a junior programmer error. this is not the kind of thing companies let inexperienced people come up with at least not on their own. this is a lack of testing. any real world backup test would have caught this. and i would expect the more senior engineers of the project to ensure this was covered

sokoloff3y ago

If you’re making an “ought to be” argument, I agree.

If you’re making an “is” argument, I completely disagree. I see companies (including my own) regularly having junior programmers responsible for decisions that cast inadvertently or unexpectedly long shadows.

winrid3y ago

It's basically an index stored on faster media. You would have redundancy on that media, too.

angch3y ago

I guess that's why the .zip format chucks its catalog index at the end of the archive. But it's still unnatural to use in a streaming format like tapes though.

ilamont3y ago

In The Singularity Is Near (2005) Ray Kurzweil discussed an idea for the “Document Image and Storage Invention”, or DAISI for short, but concluded it wouldn't work out. I interviewed him a few years later about this and here's what he said:

The big challenge, which I think is actually important almost philosophical challenge — it might sound like a dull issue, like how do you format a database, so you can retrieve information, that sounds pretty technical. The real key issue is that software formats are constantly changing.

People say, “well, gee, if we could backup our brains,” and I talk about how that will be feasible some decades from now. Then the digital version of you could be immortal, but software doesn’t live forever, in fact it doesn’t live very long at all if you don’t care about it if you don’t continually update it to new formats.

Try going back 20 years to some old formats, some old programming language. Try resuscitating some information on some PDP1 magnetic tapes. I mean even if you could get the hardware to work, the software formats are completely alien and [using] a different operating system and nobody is there to support these formats anymore. And that continues. There is this continual change in how that information is formatted.

I think this is actually fundamentally a philosophical issue. I don’t think there’s any technical solution to it. Information actually will die if you don’t continually update it. Which means, it will die if you don’t care about it. ...

We do use standard formats, and the standard formats are continually changed, and the formats are not always backwards compatible. It’s a nice goal, but it actually doesn’t work.

I have in fact electronic information that in fact goes back through many different computer systems. Some of it now I cannot access. In theory I could, or with enough effort, find people to decipher it, but it’s not readily accessible. The more backwards you go, the more of a challenge it becomes.

And despite the goal of maintaining standards, or maintaining forward compatibility, or backwards compatibility, it doesn’t really work out that way. Maybe we will improve that. Hard documents are actually the easiest to access. Fairly crude technologies like microfilm or microfiche which basically has documents are very easy to access.

So ironically, the most primitive formats are the ones that are easiest.

wongarsu3y ago

In 2005 the computing world was much more in flux than it is now.

PNG is 26 years old and basically unchanged since then. Same with 30 year old JPEG, or for those with more advanced needs the 36 year old TIFF (though there is a newer 21 year old revision). All three have stood the test of time against countless technologically superior formats by virtue of their ubiquity and the value of interoperability. The same could be said about 34 year old zip or 30 year old gzip. For executable code, the wine-supported subset of PE/WIN32 seems to be with us for the foreseeable future, even as Windows slowly drops comparability.

The latest Office365 Word version still supports opening Word97 files as well as the slightly older WordPerfect 5 files, not to mention 36 year old RTF files. HTML1.0 is 30 years old and is still supported by modern browsers. PDF has also got constant updates, but I suspect 29 year old PDF files would still display fine.

In 2005 you could look back 15 years and see a completely different computing landscape with different file formats. Look back 15 years today and not that much changed. Lots of exciting new competitors as always (webp, avif, zstd) but only time will tell whether they will earn a place among the others or go the way of JPEG2000 and RAR. But if you store something today in a format that's survived the last 25 years, you have good chances to still be able to open it in common software 50 years down the line.

orbital-decay3y ago

This is too shortsighted by the archival standards. Even Word itself doesn't offer full compatibility. VB? 3rd party active components? Other Office software integration? It's a mess. HTML and other web formats are only readable by the virtue of being constantly evolved while keeping the backwards compatibility, which is nowhere near complete and is hardware-dependent (e.g. aspect ratios, colors, pixel densities). The standards will be pruned sooner or later, due to the tech debt or being sidestepped by something else. And I'm pretty sure there are plenty of obscure PDF features that will prevent many documents from being readable in mere half a century. I'm not even starting on the code and binaries. And cloud storage is simply extremely volatile by nature.

Even 50 years (laughable for a clay tablet) is still pretty darn long in the tech world. We'll still probably see the entire computing landscape, including the underlying hardware, changing fundamentally in 50 years.

Future-proofing anything is a completely different dimension. You have to provide the independent way to bootstrap, without relying on the unbroken chain of software standards, business/legal entities, and the public demand in certain hardware platforms/architectures. This is unfeasible for the vast majority of knowledge/artifacts, so you also have to have a good mechanism to separate signal from noise and to transform volatile formats like JPEG or machine-executable code into more or less future proof representations, at least basic descriptions of what the notable thing did and what impact it had.

ilyt3y ago

>Future-proofing anything is a completely different dimension. You have to provide the independent way to bootstrap, without relying on the unbroken chain of software standards, business/legal entities, and the public demand in certain hardware platforms/architectures. This is unfeasible for the vast majority of knowledge/artifacts, so you also have to have a good mechanism to separate signal from noise and to transform volatile formats like JPEG or machine-executable code into more or less future proof representations, at least basic descriptions of what the notable thing did and what impact it had.

I'd argue that best way would be to not do that but to make sure format is ubiquitous enough that the knowledge will never be lost in the first place.

1 more reply

tannhaeuser3y ago

> HTML and other web formats are only readable by the virtue of being constantly evolved while keeping the backwards compatibility, which is nowhere near complete and is hardware-dependent (e.g. aspect ratios, colors, pixel densities).

HTML itself is relatively safe, by virtue of it being based on SGML. Though it's not ideal either because those who think it's their job to evolve HTML don't bother to maintain SGML DTDs or use other long established formal methods to keep HTML readable, but believe a hard-coded and (hence necessarily) erroneous and incomplete parsing description the size of a phone book is the right tool for the job.

Let me quote the late Yuri Rubinski's foreword to The SGML Handbook outlining the purpose of markup languages (from 1990):

> The next five years will see a revolution in computing. Users will no longer have to work at every computer task as if they had no need to share data with all their other computer tasks, they will not need to act as if the computer is simply a replacement for paper, nor will they have to appease computers or software programs that seem to be at war with one another.

However, exactly because evolving markup vocabularies requires organizing consensus, a task which W3C et al seemingly weren't up to (busy with XML, XHTML, WS-Star, and RDF-Star instead for over a decade), CSS and JS was invented and extended for the absurd purpose of basically redefining what's in the markup which itself didn't need to change, with absolute disastrous results for long-term readability or even readability on browsers other than from the browser cartel today.

1 more reply

forgotmypw173y ago

There is something called Lindy Effect, which states that a format's longevity is proportional to its current age.

I try to take advantage of this by only using older, open, and free things (or the most stable subsets of them) in my "stack".

For example, I stick to HTML that works across 20+ years of mainstream browsers.

moron4hire3y ago

While it's true that these standards are X years old, the software that encoded those formats yesteryear is very different from the software that decodes it today. It's a Ship of Theseus problem. They can claim an unbroken lineage since the distant future, the year 2000, but encoders and decoders had defects and opinions that were relied on--both intentionally and unintentionally--that are different from the defects and opinions of today.

I have JPEGs and MP3s from 20 years ago that don't open today.

matja3y ago

Are they really JPEGs and MP3s, or just bitrot?

I've found https://github.com/ImpulseAdventure/JPEGsnoop useful to fix corruption but I haven't come across a non-standard JFIF JPEG unless it was intentionally designed to accommodate non-standard features (alpha channel etc).

1 more reply

jlawson3y ago

In the case of individual files with non-conformant or corrupted elements it seems fairly straightforward project to build an AI model that can fix up broken files with a single click. I suspect such a thing will be widely-accessible in 10 years.

jeffrallen3y ago

Just going to mention Pro/E forward compatibility here: https://youtu.be/tY_Gy-EElc0

"The roots of Creo Parametric. Probably one of the last running installations of PTC's revolutionary Pro/ENGINEER Release 7 datecode 9135 installed from tape. Release 7 was released in 1991 and is - as all versions of Pro/ENGINEER - fully parametric. Files created with this version can still - directly - be opened and edited in Creo Parametric 5.0 (currently the latest version for production).

This is a raw video, no edits, and shows a bit of the original interface (menu manager, blue background, yellow and red datum planes, no modeltree).

Hardware used: Sun SparcStation 5 running SunOS 4.1.3 (not OpenWindows), 128MB RAM

Video created on january 6, 2019."

ddingus3y ago

That is great! NX has similar compatibility, though not quite as good.

In some cases, the actual version code of a feature is invoked by that data being encoded as part of the model data schema.

Literal encapsulation in action. That way bugs and output variances are preserved so that the regenerated model is accurate to what the software did decades ago.

dmje3y ago

I can't help but think bad thoughts whenever I see another "static site maker" posted on here, or a brand new way of using JavaScript to render a web page.

Talk about taking the simplest and most durable of (web) formats and creating a hellscape of tangled complexity which becomes less and less likely to be maintainable or easy to archive the more layers of hipster js faddishness you add...

magpi33y ago

One of the claimed benefits of the JVM (and obviously later VMs) was that it would solve this issue: Java programs written in 2000 should still be able to run in 2100. And as far as I know the JVM has continued to fulfill this promise.

An honest question: If you are writing a program that you want to survive for 100+ years, shouldn't you specifically target a well-maintained and well-documented VM that has backward compatibility as a top priority? What other options are there?

cmrdporcupine3y ago

People routinely boot DOS in e.g. qemu. The x86 ISA is 45 years old, older if you consider the 8008/8080 part of the lineage. It's not pretty, but it's probably the most widespread backwards compatible system out there.

elzbardico3y ago

S/360 assembly programs probably would still run on a modern IBM mainframe. Punched cards kept in an inert atmosphere probably would last for centuries, and along with printed documentation in archival-quality paper would allow future generations to come up with card readers and an emulator to actually run the program.

sverhagen3y ago

While I love the JVM, and I also think it's one of the better runtimes in terms is backwards compatibility, there have been breakages. Most of the ones I've dealt with were easy to fix. But the ease of fixing is related to the access to source code. When something in a data stream is broken, be it an MP3 or a JPEG, I guess you almost inherently need special tooling to fix it (realistically). I imagine that with an SVG it'd be easier to hand-fix it.

lmm3y ago

> An honest question: If you are writing a program that you want to survive for 100+ years, shouldn't you specifically target a well-maintained and well-documented VM that has backward compatibility as a top priority? What other options are there?

I'd be tempted to target a well-known physical machine - build a bootable image of some sort as a unikernel - although in the age of VMWare etc. there's not a huge difference.

IMO the "right" way to do this would be to endow an institution to keep the program running, including keeping it updated to the "live" version of the language it's writen in, or even porting it between languages as and when that becomes necessary.

mrguyorama3y ago

Basically, just target the System/360 from IBM

crazygringo3y ago

But he seems to have written this before virtual machines became widespread.

I think the concern is becoming increasingly irrelevant now, because if I really need to access a file I created in Word 4.0 for the Mac back in 1990, it's not too hard to fire up System 6 with that version of Word and read my file. In fact it's much easier now than it was in 2005 when he was writing. Sure it might take half an hour to get it all working, but that's really not too bad.

Most of this is probably technically illegal and will sometimes even have to rely on cracked versions, but also nobody cares and. All the OS's and programs are still around and easy to find on the internet.

Not to mention that while file formats changed all the time early on, these days they're remarkably long-lived -- used for decades, not years.

The outdated hardware concern was more of a concern (as the original post illustrates), but so much of everything important we create today is in the cloud. It's ultimately being saved in redundant copies on something like S3 or Dropbox or Drive or similar, that are kept up to date. As older hardware dies, the bits are moved to newer hardware without the user even knowing.

So the problem Kurzweil talked about has basically become less of an issue as time has marched on, not more. Which is kind of nice!

ilyt3y ago

>I think the concern is becoming increasingly irrelevant now, because if I really need to access a file I created in Word 4.0 for the Mac back in 1990, it's not too hard to fire up System 6 with that version of Word and read my file. In fact it's much easier now than it was in 2005 when he was writing. Sure it might take half an hour to get it all working, but that's really not too bad.

And that was easy years ago.

Now you can WASM it and run it in a browser

duskwuff3y ago

You don't even need to set it up yourself: http://system6.app/

It's even got Word 5.1 installed. :)

johannes12343213y ago

> I think the concern is becoming increasingly irrelevant

I fear we may be on top of that point. With the "cloudification" where more and more software is run on servers one doesn't control there is no way to run that software in a VM as you don't have access to the software anymore. And even getting the pure data for a custom backup becomes harder and harder.

krapp3y ago

I'm certain that 100 years from now, when the collapse really gets rolling, we'll still have cuneiform clay tablets complaining about Ea-Nassir's shitty copper but most of the digital information and culture we've created and tried to archive will be lost forever. Eventually, we're going to lose the infrastructure and knowledge base we need to keep updating everything, people will be too busy just trying to find food and fighting off mutants from the badlands to care.

jakeinspace3y ago

Well, almost all early tablets are destroyed or otherwise lost now. Do you think we will lose virtually all digital age information within a century? Maybe from a massive CME, I suppose.

jcranmer3y ago

Clay tablets were usually used for temporary records, as you could erase it simply by smearing the clay a little bit (a lot easier than writing with on papyrus). The tablets we have exist because of something that causes the clay to be baked into ceramic, which is generally some sort of catastrophic fire that caused the records to accidentally be preserved for much longer.

ShadowBanThis013y ago

I know. My first iPad just stopped powering up. WTF!

I should etch something into its glass and bury it in my back yard. Perhaps a shopping list, or a complaint about how my neighbor inexplicably gets into his truck six or eight times a day and just sits there with it running.

krapp3y ago

I can see it happening. Not as a single catastrophic event but, like Rome falling bit by bit, our technological civilization fails and degenerates as climate change (in the worst possible scenario) wreaks havoc on everything.

hello_computer3y ago

I was able to backup/restore an old COBOL system via cpio between modern GNU cpio (man page last updated June 2018), and SCO's cpio (c. 1989). This is neither to affirm nor contradict Kurzweil, but rather to praise the GNU userland for its solid legacy support.

gwern2y ago

Interview? https://www.computerworld.com/article/2477417/the-kurzweil-i...

ChuckMcM3y ago

This is very very true. I have archived a number of books and magazines that were scanned and converted into "simplified" PDF, and archived on a DVD disks with C source code.

There are external dependencies but one hopes that the descriptions are sufficient to figure out how to make those work.

ilyt3y ago

Actually I'd argue it's wrong precisely because we do manage to retrieve even such old artifacts. Only problem is that nobody cared for 30 years so the process was harder than it should be but in the end it was possible.

Sure, there is a risk that at some point, for example, any version of every PNG or H.264 decoder gets lost and so re-creating decoder for that would be significantly more complicated, but chances for that are pretty slim, but looking at `ffmpeg -codecs` I'm not really worried for that to ever happen.

0xdeadbeefbabe3y ago

> Hard documents are actually the easiest to access. Fairly crude technologies like microfilm or microfiche which basically has documents are very easy to access.

Maybe it isn't crude after all if it wins.

ddingus3y ago

I do not consider microfiche or film crude at all.

They are just simple.

And what they do is very fully exploit the analog physics to yield high data density mere mortals can make effective use of.

And they make sense.

In my life, text, bitmaps and perhaps raw audio endure. Writing software to make use of this data is not difficult.

A quick scan later, microfiche type data ends up a bitmap.

Prior to computing, audio tape, pictures on film and ordinary paper, bonus points for things like vellum, had similar endurance and utility.

My own archives are photos, papers and film.

ogurechny3y ago

Modern backup would simply state “API keys and settings are here:”, and a link to collaboration platform closed after 3 years of existence.

jandrese3y ago

Hey, it's the cloud. Backups are "someone else's problem". That is until they are your problem, then you're up a creek.

tivert3y ago

> Hey, it's the cloud. Backups are "someone else's problem". That is until they are your problem, then you're up a creek.

The FSF used to sell these wonderful stickers that said "There is not cloud. It's just someone else's computer."

isaidthis3y ago

The sticker: https://static.fsf.org/nosvn/stickers/thereisnocloud.svg

"Stickers from various FSF campaigns - Print out copies of our stickers for your own uses, local conferences and more." https://www.fsf.org/resources/stickers

ilyt3y ago

Honestly backup space is weirdly sparse for anything on enterprise scale.

For anything more than few machines there is bacula/bareos (that pretends everything is tape with mostly miserable results), backuppc (that pretends tapes are not a thing, with miserable results), and that's about it, everything else seems to be point-to-point backups only with no real central management.

dabiged3y ago

Are talking about open source only? Because there are loads of options available. Veritas has two products (netbackup and backupexec). There is also commvault, veeam, ibm spectrum protect and hp data protector. Admittedly only netbackup and commvault are what I would truly call enterprise, but your options are certainly not limited.

whatusername3y ago

As someone who used to administer an ADSM server back a long time ago -- I'm curious what the gap between spectrum protect or whatever it's called now and commvault/netbackup? I've haven't really looked at that space for at least a decade.

somat3y ago

You can add amanda to the "pretends everything is tape with mostly miserable results" list.

hlandau3y ago

Absolutely amazing story. Fantastic!

I've actually long been stunned by the propensity of proprietary backup software to use undocumented, proprietary formats. I've always found this quite stunning, in fact. It seems to me like the first thing one should make sure to solve when designing a backup format is to ensure it can be read in the future even if all copies of the backup software are lost.

I may be wrong but I think some open source tape backup software (Amanda, I think?) does the right thing and actually starts its backup format with emergency restoration instructions in ASCII. I really like this kind of "Dear future civilization, if you are reading this..." approach.

Frankly nobody should agree to use a backup system which generates output in a proprietary and undocumented format, but also I want a pony...

It's interesting to note that the suitability of file formats for archiving is also a specialised field of consideration. I recall some article by someone investigating this very issue who argued formats like .xz or similar weren't very suited to archiving. Relevant concerns include, how screwed you are if the archive is partly corrupted, for example. The more sophisticated your compression algorithm (and thus the more state it records from longer before a given block), the more a single bit flip can result in massive amounts of run-on data corruption, so better compression essentially makes things worse if you assume some amount of data might be damaged. You also have the option of adding parity data to allow for some recovery from damage, of course. Though as this article shows, it seems like all of this is nothing compared to the challenge of ensuring you'll even be able to read the media at all in the future.

At some point the design lifespan of the proprietary ASICs in these tape drives will presumably just expire(?). I don't know what will happen then. Maybe people will start using advanced FPGAs to reverse engineer the tape format and read the signals off, but the amount of effort to do that would be astronomical, far more even than the amazing effort the author here went to.

hlandau3y ago

To add, thinking a bit more about it: Designing formats to be understandable by future civilizations actually reduces to a surprising degree to the same set of problems which METI has to face. As in, sending signals designed to be intelligible to extraterrestrials - Carl Sagan's Contact, etc.

Even if you write an ASCII message directly to a tape, that data is obviously going to be encoded before being written to the tape, and you have no idea if anyone will be able to figure out that encoding in future. Trouble.

What makes this particularly pernicious is the fact that LTO nowadays is a proprietary format(!!). I believe the spec for the first generation or two of LTO might be available, but last I checked, it's been proprietary for some time. The spec is only available to the (very small) consortium of companies which make the drives and media. And the number of companies which make the drives is now... two, I think? (They're often rebadged.) Wouldn't surprise me to see it drop to one in the future.

This seems to make LTO a very untrustworthy format for archiving, which is deeply unfortunate.

londons_explore3y ago

The best format for archiving is many formats.

Make an LTO tape... But also make a Bluray... And also store it on some hard drives... And also upload it to a web archive...

The same for the actual file format... Upload PDF's... But also upload word documents.. And also ASCII...

And same for the location... Try to get diversity of continents... Diversity of geopolitics (ie. some in USA, some in Russia). Diversity of custodians (friends, businesses, charities).

WorldMaker3y ago

Even ASCII itself is a strange encoding that could be lost with enough time and need to be recovered through cryptographic analysis and signals processing. That doesn't look at all likely today given UTF-8's promised and mostly accomplished ubiquity and its permanent grandfathering of ASCII. But ASCII is still only one of a number of potential encoding schemes, isn't necessarily obvious from first principles.

Past generations thought EBCDIC would last longer than it did.

Again, not that there any indications now that ASCII won't survive nearly as long as the English language does at this point, just that when we're talking about sending signals to the future, even assuming ASCII encoding is an assumption to question.

Dylan168072y ago

Baby's first cryptographic analysis, sure. Mapping letters to bits is easy, and the 8 bit repeating pattern is also easy.

The thing that might make it hard is if people have forgotten English itself, and in that case ASCII is one of the smallest barriers.

EBCDIC would also be fine.

mrguyorama3y ago

These things make more sense because LTO is used for backup, not archival. Companies don't want to be able to read the tape data in 50 years, they want to be able to read it tomorrow, after the entire business campus burns down.

ddingus3y ago

You mean the "if you are reading this in the distant future" instructions are written to the medium first? And are straight up ASCII?

Nice. That kind of thing makes too much sense. Wow. Such cheap insurance. Nice work from that team.

hlandau3y ago

Yeah. If I ever wrote a backup system I'd do this too, write the whole spec for the format first to every medium. A 100k specification describing the format is nothing to waste on a medium which can store 10TB.

ddingus2y ago

Seriously. Scale really changes things.

grishka3y ago

It's kinda strange that we still don't have a technology that would allow one to scan a magnetic medium at high resolution and then process it in software. This would be nice for all kinds of things that use magnetic tapes and platters — data recovery, perfect analog tape digitization, etc. The closest I've seen to it is that project that captures the raw signal from the video head of a VCR and then decodes it into a picture.

hakfoo2y ago

Isn't there a subset of that at least for floppy discs with Kryoflux or GreaseWeazle style controllers? They read the raw flux transitions off the drive head, and then it's up to software to figure out that it's a Commodore GCR disc or a Kaypro MFM one.

robin_reala3y ago

LTO tape media itself is typically only rated at 30 years, so I suspect the tapes will die before the drives do.

robotnikman3y ago

I've always admired the tenacity of people who reverse engineer stuff. To be able to spend multiple months figuring out barely documented technologies with no promise of success takes a lot a willpower and discipline. It's something I wish I could improve more in myself.

detrites3y ago

I think you could. In some sense "easily". It may be about finding that thing you're naturally so interested in or otherwise drawn to, that the months figuring out become a type of driven joy, and so the willpower kinda automatic.

And if you find it, don't judge what it is or worry what others might think - or even necessarily tell anyone. Sometimes the most motivating things are highly personal, as with the OP; a significant part of their childhood.

robotnikman3y ago

You definitely have a point there, looking at some of my previous work I was able to stick to projects for many months if I found the work interesting. I'll have to admit in the past 5 or so years any time I've tried to start a project there was always the thought in the back of my mind of 'will this benefit my career' or 'how can I make money on this in the future'. It seems having such thoughts adds additional anxiety to whenever I try and start to work on something for fun.

Looks like that is what I need to start looking for again, projects which I find interesting or fun to do in my spare time, without thinking about how it would affect my career or trying to find ways to monetize it.

detrites3y ago

I totally get this. Something I'm learning - slowly - as it is so counter to sound ethic, is that making some work into pure fun helps the other work by preventing burn-out, which for me at least is an ongoing risk. Certainly feels better!

huehehue3y ago

Fascinating read that unlocked some childhood memories.

I'm secondhand pissed at the recovery company, I have a couple of ancient SD cards laying around and this just reinforces my fear that if I send them away for recovery they'll be destroyed (the cards aren't recognized/readable by the readers built into MacBooks, at least)

somat3y ago

My understanding is that flash memory does not do very well at all for long term unpowered data retention. flash memory is basically a capacitor(it is not really a capacitor but close to one) and will loose it's charge after a few years.

And magnetic drives will seize up. and optical disks get oxidized, and tapes stick together. long term archiving is a tricky endeavor.

It is however an interesting challenge. I think I would get acquainted with the low level protocol used by sd cards. then modify a microcontroller sdmmc driver to get me an image of the card(errors and all). that is, without all the scsi disk layers doing their best to give you normalized access to the device. Or more realistically, hope someone more talented than me does the above.

User233y ago

Tapes hold up really well if they're not in absolutely awful storage conditions. And the claim at least was that the early CD-ROMs were quite durable, being a straight up laser carved diffraction grating. CDRs on the other hand rely on dye which will degrade rapidly.

mrguyorama3y ago

M discs were made, quite possibly to meet the Mormon need of post Armageddon lineage documentation, to last at a minimum of 1000 years in reasonable condition. They are just special, expensive CDRs that use a different dye system that won't break down in 4 years.

My optical disk drive was $17 and has M disc writing compatibility, and my understanding is they are meant to be read by any CD reader.

somat3y ago

That is true about mag tape, I suspect tape to be one of the better choices for archival storage. In fact the biggest problem you will have with mag tape is making sure you have a working drive 10 - 20 years in the future when you want to look at your archives. to make things worse tape drives are getting more and more flimsy and fragile as tape tech advances.

my first job we had a vault of ibm reel to reel tapes of old business data. our attitude if we were ever asked to pull any of the data was that we would probably only get one chance at it as the ferric material on on the tape had a disturbing tendency to flake off. note that there are techniques to reduce this, but we did not have the means or motivation to apply them.

And a pedantic observation on your correct point about optical disks. you can't make backups on pressed disks, only recordable ones.

1 more reply

MayeulC3y ago

> My understanding is that flash memory does not do very well at all for long term unpowered data retention

You need to let flash cells rest before writing again if you want to achieve long retention periods, see section 4 in [1]. The same document says 100 years is expected if you only cycle it once a day, 10k times over 20 years (Fig 8).

[1]: https://www.infineon.com/dgdl/Infineon-AN217979_Endurance_an...

justsomehnguy3y ago

Last year I helped a friend recover photos from a portable WD HDD. It was formatted in FAT32 and I was forced to run R-Studio to get reliable results. There was a lot of damaged (readable, with artifacts) and corrupted (doesn't render, have wrong size) files.

ryanjshaw3y ago

Painful lesson I've learned myself the hard way - don't rush something that doesn't need to be rushed.

tombert3y ago

This is giving me some anxiety about my tape backups.

I have backed up my blu-ray collection to a dozen or so LTO-6 tapes, and it's worked great, but I have no idea how long the drives are going to last for, and how easy it will be to repair them either.

Granted, the LTO format is probably one of the more popular formats, but articles like this still keep me up at night.

EvanAnderson3y ago

The only surefire method to keep the bits readable is to continue moving them onto new media every few years. Data has a built-in recurring cost. I'd love to see a solution to that problem but I think it's unlikely. It's a least possible, though, that we'll come up with a storage medium with sufficient density and durability that'll it'll be good enough.

I don't even want to think about the hairy issues associated with keeping the bits able to be interpreted. That's a human behavior problem more than a technology problem.

antod3y ago

LTO is one of the best choices for compatibility. I remember just how awful DDS (same sort of media as DAT) tape backups were - due to differences in head alignments, it was a real lottery as to whether any given tape could be read on a different drive than the one that wrote it.

bombcar3y ago

Do test restores. LTO is very good but without verification some will fail at some point.

But your original bluray disk are also a backup.

wazoox3y ago

LTO-7 drives read LTO-6, and will be available for quite a while.

In 2016 I've used an LTO-3 drive to restore a bunch (150 or 200) of LTO-1/2 tapes from 2000-2003, and almost all but one or two worked fine.

LeoPanthera3y ago

I really wish they would name the data recovery company so that I can never darken their door with my business.

a13692099933y ago

https://news.ycombinator.com/item?id=36063114 claims it's https://www.datarecovery.net/tape-data-recovery.aspx (and that https://news.ycombinator.com/item?id=36062785 had been edited to censor the information, so I'm dupicating it here). Caveat that I don't know if that's actually correct, since efforts to suppress it are only circusantial evidence in favor.

bluedino3y ago

> Over the span of about a month, I received very infrequent and vague communications from the company despite me providing extremely detailed technical information and questions.

Ahh the business model of "just tell them to send us the tape and we'll buy the drive on eBay"

Nextgrid3y ago

To be honest as long as they are very careful about not doing any damage to the original media then it might work and be a win-win for both sides in a "no fix no fee" model where the customer only pays if the data is successfully recovered.

Their cardinal sin was that they irreparably damaged the tape without prior customer approval.

mrguyorama3y ago

If you are advertising and attempting recovery from formats you are unfamiliar with, damaging the original medium is inevitable.

1 more reply

nickt3y ago

It’s not too hard to find with the following search, “we can recover data from tape formats including onstream”

stepupmakeup3y ago

The OP explicity didn't name them (despite many people recommending to, even preservationists in this field on Reddit and Discord) but it's easy to find just by googling the text on the screenshots

omoikane3y ago

Reddit thread: https://www.reddit.com/r/DataHoarder/comments/13q1pv7/playst...

ddtaylor3y ago

Name them and we can setup a thread or site to publicly shame them

1 more reply

phkahler3y ago

>> The tape was the only backup for those things, and it completes Frogger 2's development archives, which will be released publicly.

In cases like this can imagine some company yelling "copyright infringement" even though they don't possess a copy themselves. It's a really odd situation.

xigency3y ago

As a kid, I got this game as a gift and really, really wanted to play it. But after beating the second level, the game would always crash on my computer with an Illegal Operation exception. I remember sending a crash report to the developer, and even updating the computer, but I never got it working.

jakeinspace3y ago

I adored this game as a kid, and I think I do have a faint memory of some stability issues, but I believe I was able to beat the game.

dabiged3y ago

I work in the tape restoration space. My biggest piece of advice is never NEVER encrypt your tapes. If you think restoring data from an unknown format tape is hard, trying to do it when the drive will not let you read the blocks off the tape without a long lost decryption key is impossible.

aidenn03y ago

TIL there are three completely different games named "Frogger 2" I assumed this was for the 1984 game, but this is for the 2000 game (there is also a 2008 game).

pimlottc3y ago

Thanks for that, it seems like a surprisingly modern format for such an old game.

Links for the games referenced:

- Frogger II: ThreeeDeep! (1984)

https://www.mobygames.com/game/7265/frogger-ii-threeedeep/

- Frogger 2: Swampy's Revenge (2000) [1]

https://www.mobygames.com/game/2492/frogger-2-swampys-reveng...

- Frogger 2 (2008) [2]

https://www.mobygames.com/game/47641/frogger-2/

FearNotDaniel3y ago

> the ADR-50e drive was advertised as compatible, but there was a cave-at

I'm assuming the use of "cave-at" means the author has inferred an etymology of "caveat" being made up of "cave" and "at", as in: this guarantee has a limit beyond which we cannot keep our promises, if we ever find ourselves AT that point then we're going to CAVE. (As in cave in, meaning give up.) I can't think of any other explanation of the odd punctuation. Really quite charming, I'm sure I've made similar inferences in the past and ended up spelling or pronouncing a word completely wrong until I found out where it really comes from. There's an introverted cosiness to this kind of usage, like someone who has gained a whole load of knowledge and vocabulary from quietly reading books without having someone else around to speak things out loud.

Kneesnap3y ago

Dang it. OP here, I saw this typo and swear I fixed this typo before posting it!!

nocoiner3y ago

I thought it might have been a transcription error of “carve out,” but your theory is more logical.

h2odragon3y ago

Truly noble effort. Hopefully the writeup and the tools will save others much heartbreak.

db48x3y ago

Wow, that backup software sounds like garbage. Why not just use tar? Why would anyone reinvent that wheel?

bombcar3y ago

The world of tape backup was (is?) absolutely filled with all sorts of vendor-lock in projects and tools. It's a complete mess.

And even various versions of tar aren't compatible, and that's not even starting with star and friends.

stepupmakeup3y ago

It's not just limited to tape, most archiving and backup software is proprietary. It's impossible to open Acronis or Macrium Reflect images without their Windows software. In Acronis's case they even make it impossible to use offline or on a server OS without paying for a license. NTBackup is awfully slow and doesn't work past Vista, and it's not even part of XP POSReady for whatever reason, so I had to rip the exe from a XP ISO and unpack it (NTBACKUP._EX... I forgot microsoft's term for that) because the Vista version available on Microsoft's site specifically checks for longhorn or vista.

Then there's slightly more obscure formats that didn't take off in the western world, and the physical mediums too. Not many people had the pleasure of having to extract hundreds of "GCA" files off of MO disks using obscure Japanese freeware from 2002. The English version of the software even has a bunch of flags on virustotal that the standard one doesn't. And there's obscure LZH compression algorithms that no tool available now can handle.

I've found myself setting up one-time Windows 2000/XP VMs just to access backups made after 2000.

Dylan168072y ago

I can only speak for macrium but they have good reasons to use their own format, so that you can have differential mountable backups. That's very different from someone inventing tar-but-worse.

jandrese3y ago

I have at various times considered a tape backup solution for my home, but always give up when it seems every tape vendor is only interested in business clients. It was a race to stay ahead of hard drives and oftentimes they seemed to be losing. The price points were clearly aimed at business customers, especially on the larger capacity tapes. In the end I do backup to hard drives instead because it's much cheaper and faster.

stepupmakeup3y ago

Tape absolutely isn't viable for the consumer at all, but definitely worth exploring for the novelty. Even if you manage to get a pretty good deal on a legacy LTO system (other formats don't even come close to the tb/$ of 10+ year old LTO and drives are still fairly cheap), the drives aren't being made any more and aren't getting any cheaper. Backwards compatibilty may be in your favor depending on your choice of tape generation at least, I think there's at least two generations guaranteed. Optical will probably remain king though the pricing is worse than HDDs, there's no shortage of DVD or BD readers, but you might run into issues with quad layer 128 BD as they only hit the market fairly recently.

1 more reply

ilyt3y ago

Tape drive and Bareos/Bacula "just works"

Absolutely not worth it tho. Drives are hideously expensive which means they only start making sense where you have at least dozens of tapes.

There is an advantage of tapes not being electrically connected most of the time so lightning strike will not burn your archives, I have pondered making a separate box with a bunch of hard drives that boots once a month and just copies last months of backups on hard drives, powered from solar or something just to separate from the network

bombcar3y ago

The only way to do tape at home is with used equipment and Linux/BSD. You can do quite a bit with tar and mt (iirc) - even controlling auto loaders.

What’s fun are the hard drive based systems designed to perfectly imitate a tape autoloader so you don’t have to buy new backup software (virtual tape libraries).

EvanAnderson3y ago

ARCServe was a Computer Associates product. That's all you need to know.

It had a great reputation on Novell Netware but the Windows product was a mess. I never had a piece of backup management software cause blue screens (e.g. kernel panics) before an unfortunate Customer introduced me to ARCServe on Windows.

nycdotnet3y ago

My favorite ArcServe bug which they released a patch for (and which didn’t actually fix the issue, as I recall) had a KB article called something along the lines of “The Open Database Backup Agent for Lotus Notes Cannot Backup Open Databases”.

giantrobot3y ago

IIRC tar has some Unixisms that don't necessarily work for Windows/NTFS. Not saying reinventing tar is appropriate but there's Windows/NTFS that a Windows based tape backup need to support.

cosmotic3y ago

Most of what makes NTFS different than FAT probably doesn't need to be backed up. Complex ACLs, alternative data streams, shadow copies, etc, are largely irrelevant when it comes to making a backup. Just a simple warning "The data being backed up includes alternative data streams. These aren't supported and won't be included in the backup" would suffice.

jandrese3y ago

All of that stuff matters when you're using the backup for its intended purpose: to restore a system after hardware failure.

Unix tar is obviously not the right solution, but a Windows tar seems like it shouldn't be that hard to do and yet we are in the situation we are today. I've been using dump/restore for decades now on Unix, including to actually recover from loss, but I admit that it's not that pleasant to use. I like that it is very simple and reliable however, unlike the mess that is Time Machine (recovering from a hardware loss on a Mac is a roll of the dice, and I've gotten snakes) or worse Deja Dup. I'm not sure I've ever successfully recovered a system from a Deja Dup backup.

2 more replies

nycdotnet3y ago

If you’re backing up a db or something sure, but for a file server this can be just as important as the data itself (ex: now everyone can read HR’s personnel files which had strict permissions before)

db48x3y ago

That’s fair; I wasn’t really considering windows. It seems like there ought to be some equivalent by now though.

ilyt3y ago

The format is extensible enough that it could be added

robotnikman3y ago

The company that made it probably was hoping for vendor lock-in

cosmotic3y ago

Vendor lock in for backup and archival products is so ridiculous. It increases R&D to ensure the lock-in, and the company won't exist by the time the lock-in takes effect.

fifteen15063y ago

Well yes, but the boss probably is willing to invest more money (meaning higher salaries, more people, better tools) expecting a future return than when using reasonable formats.

bsder3y ago

Is there way to read magnetic tapes like these in such a way as to get the raw magnetic flux at high resolution?

It seems like it would be easier to process old magnetic tapes by imaging them and then applying signal processing rather than finding working tape drives with functioning rollers. Most of the time, you're not worried about tape speed since you're just doing recovery read rather than read/write operations. So, a slow but accurate operation seems like it would be a boon for these kinds of things.

EvanAnderson3y ago

For anybody who is into this this is a a good excuse to share a presentation from Vintage Computer Fest West 2020 re: magnetic tape restoration: https://www.youtube.com/watch?v=sKvwjYwvN2U

The presentation explores using software-defined signal processing analyze a digitized version of the analog signal generated from the flux transitions. It's basically moving the digital portion of the tape drive into software (a lot like software-defined radio). This is also very similar to efforts in floppy disk preservation. Floppies are amazingly like tape drives, just with tiny circular tapes.

Kneesnap3y ago

OP here! Yes I'd highly recommend this video, I stumbled across it early on when trying to familiarize myself with what the options were-- and it's a good video!

dpratt3y ago

At the very least, and the cost for this perhaps would be prohibitive, but some mechanism to duplicate the raw flux off the tape onto another tape in an identical format, a backup of the backup. This would allow for attempts to read the data that may be potentially destructive to the media (for example, breaking the tape accidentally) and not lose the original signal.

iforgotpassword3y ago

Sounds like at least in this case that ASIC in the drive was doing some (non trivial) signal processing. Would be interesting to know how hard it would be to get from the flux pattern back to zeros and ones. I guess with a working drive you can at least write as many test patterns as you want until you maybe figure it out.

jandrese3y ago

At the very least the drive needs to be able to lock onto the signal. It's probably encoded in a helix on the drive and if the head isn't synchronized properly you won't get anything useful, even with a high sampling rate.

Clamchop3y ago

I would be surprised if it used helical recording. Data tape recorders rarely do because it's much more complex, increases tape wear, and the use cases don't usually demand that kind of linear bandwidth.

1 more reply

fifteen15063y ago

You still need to know where to look, the format, and using specialized equipment which cost wasn't driven down by mass manufacturing, so, in theory yes, in practice not.

(Completely guessing here with absolute no knowledge of the real state of things)

bombcar3y ago

Yes. There’s some guy on YouTube who does stuff like that (he reverse engineered the audio recordings from a 747 tape array) but it can be quite complicated.

Nextgrid3y ago

Would you have a link by any chance? Thanks!

bombcar3y ago

https://youtu.be/MU02pQe3E5Q Is the one I’m thinking of, digital would require more work

jimbob453y ago

F2 was a really neat game. It almost invented Crypt of the Necrodancer’s genre decades early.

It’s a little sad that it took such a monumental effort to bring the source code back from the brink of loss. It’s times like that that should inspire lawmakers to void copyright in the case that the copyright holders can’t produce the thing they’re claiming copyright over.

smokel3y ago

Heh, I remember playing .mp3 files directly from QIC-80 tapes, somewhere around 1996. One tape could store about 120 MB, which is equal to about two compact discs' worth of audio. The noise of the tape drive was slightly annoying, though. And it made me appreciate what the 't' in 'tar' stands for.

mjaniczek3y ago

Did you mean 1200 MB? That would make sense wrt. 2x CD capacity.

smokel3y ago

No, it was really only 120 MB. I was referring to the length of an audio compact disc, not the capacity of a CD-ROM. At 128 kbps, you'd get about 2 hours of play time.

Of course it didn't really make sense to use digital tapes for that use case, even back then. It was just for fun, and the article sparked some nostalgic joy, which felt worth sharing :)

stewarts3y ago

They reference MP3, and a CD ripped down to MP3 probably fits in the 50-100MB envelope for size. It has been a very long time since I last ripped an album, but that size jives with my memory.

ddingus3y ago

This is just random, but reading this and the backup discussion made me think about SGI IRIX and how it could do incremental backups.

One option was to specify a set of files, and that spec could just be a directory. Once done, the system built a mini filesystem and would write that to tape.

XFS was the filesystem in use at the time I was doing systems level archival.

On restores, each tape, each record was a complete filesystem.

One could do it in place and literally see the whole filesystem build up and change as each record was added. Or, restore to an empty directory and you get whatever was in that record.

That decision was not as information dense as others could be, but it was nice and as easy as it was robust.

What our team did to back up some data managed engineering software was perform a full system backup every week, maybe two. Then incrementals every day, written twice to the tape.

Over time, full backups were made and sent off site. One made on a fresh tape, another made on a tape that needed to be cycled out of the system before it aged out. New, fresh tapes entered the cycle every time one aged out.

Restores were done to temp storage and rather than try and get a specific file, it was almost always easier to just restore the whole filesystem and then copy the desired file from there into its home location. The incrementals were not huge usually. Once in a while they got really big due to some maintenance type operation touching a ton of files.

The nifty thing was no real need for a catalog. All one needed was the date to know which tapes were needed.

Given the date, grab the tapes, run a script and go get coffee and then talk to the user needing data recovery to better understand what might be needed. Most of the time the tapes were read and the partial filesystem was sitting there ready to go right about the time those processes completed.

Having each archive, even if it were a single file, contain a filesystem data set was really easy to use and manage. Loved it.

readyplayernull3y ago

A few months ago I was looking for an external backup drive and thought that SSD would be great because it's fast and shock resistant. Years ago I killed a Macbook Pro HD by throwing it on my bed from few inches high. Then I read a comment on Amazon about SSD losing information when unpowered for a long time. I couldn't find any quick confirmation in the product page, took me a few hours of research to find some paper about this phenomenon. If I remember correctly it takes a few weeks for the stored SSD to start losing its data. So I bought a mechanical HD.

Another tech tip is not buying 2 backup devices from the same batch or even the same model. Chances being these will fail in the same way.

vidarh3y ago

To the last bit, I've seen this first hand. Had a whole RAID array of the infamous IBM DeathStar drives fail one after the other while we frantically copied data off.

Last time I ever had the same model drives in an array.

smcameron3y ago

Heh, I remember in the early 1990s having a RAID array with a bunch of 4Gb IBM drives come up dead after a weekend powerdown for a physical move due to "stiction". I was on the phone with IBM, and they were telling me to physically bang the drives on the edge of desk to loosen them up. Didn't seem to be working, so their advice was "hit it harder!" When I protested, they said, "hey, it already doesn't work, what have you got to lose?" So I hit it harder. Eventually got enough drives to start up to get the array on line, and you better believe the first thing I did after that was create a fresh backup (not that we didn't have a recent backup anyway), and the 2nd thing I did was replace those drives, iirc, with Seagate Barracudas.

vidarh3y ago

Ouch. I knew someone who claimed to have dealt with that or a similar effect after their cleaning person had pulled the plug on their servers by putting the drive in an oven while connected and heating it slowly.

Personally my most nailbiting period was when I got my first (20MB!) drive as a kid and it was too big an investment to replace even when it refused to spin up without me opening up the drive(!) and nudging the platter with my finger to help the motor spin it up... I backed everything up (on floppies), and stored everything important straight to floppies, but it was still more convenient to hold on to the HD for the next 6 months or so until I'd saved up enough to replace it...

It's remarkable what drives can survive if you're lucky... Also remarkable how quickly that luck can run out, though.

> "hey, it already doesn't work, what have you got to lose?"

This attitude has saved me more than once. Recognising when you can afford to do things that seems ridiculous helps surprisingly often.

userbinator3y ago

When I was still relatively familiar with flash memory technologies (in particular NAND flash, the type used in SSDs and USB drives), the retention specs were something like 10 years at 20C after 100K cycles for SLC, and 5 years at 20C after 5-10K cycles for MLC. The more flash is worn, the leakier it becomes. I believe the "few weeks" number for modern TLC/QLC flash, but I suspect that is still after the specified endurance has been reached. In theory, if you only write to the flash once, then the retention should still be many decades.

Someone is trying to find out with an experiment, however: https://news.ycombinator.com/item?id=35382252

mrguyorama3y ago

Indeed. The paper everyone gets the "flash loses its data in a few years" claim from wasn't dealing with consumer flash and consumer use patterns. Remember that having the drive powered up wouldn't stop that kind of degradation without explicitly reading and re-writing the data. Surely you have a file on an SSD somewhere that hasn't been re-written in several years, go check yourself whether it's still good.

Even the utter trash that is thumb drives and SD cards seem to hold data just fine for many years in actual use.

IIRC, the paper was explicitly about heavily used and abused storage.

Const-me3y ago

CD-R drives were already common in 2001: https://en.wikipedia.org/wiki/CD-R

I wonder would a CD-R disk retain data for these 22 years?

Tangurena23y ago

Only if you kept the disk in a refrigerator. Bits are stored by melting the plastic slightly and the dye seeping in. Over time, the warmth of "room temperature" will cause the pits to become less well-defined so the decoder has to spend more time calculating "well, is that really a 1 or is it a sloppy 0". There's a lot of error detection/correction built into the CD specs, but eventually, there will be more error than can be corrected for. If you've ever heard the term "annealing" when used in machine learning, this is equivalent.

Living in South Florida, ambient temperatures were enough to erase CD-Rs - typically in less than a year. I quickly started buying the much more expensive "archival" discs, but that wasn't enough. One fascinating "garage band" sold their music on CD-Rs and all of my discs died (it was a surfer band from Alabama).

Clamchop3y ago

The recording is made in the dye layer, a chemical change, and the dye degrades (particularly in sunlight) so the discs have a limited shelf life. Checking Wikipedia, it appears azo dye formulations can be good for tens of years.

Melting polycarbonate would call for an absurdly powerful laser, a glacial pace, or both, and you wouldn't have to use dye at all. I'd guess such a scheme would be extremely durable, though.

hakfoo2y ago

I tried a couple of CD-Rs that were stored in a dry closed drawer for most of the last 20 years recently, and they seemed to at least initially come up. Now I can reinstall Windows 2000 with Service Pack 4 slipstreamed!

dllthomas3y ago

On the topic of Froggers, I enjoyed https://www.youtube.com/watch?v=FCnjMWhCOcA

masto3y ago

This brings back (unpleasant) memories. I remember trying to get those tape drives working with FreeBSD back in 1999, and it going nowhere.

bluedino3y ago

This will be fun in 20 years, trying recover 'cloud' backups from servers found in some warehouse.

ilyt3y ago

Nah it will be very simple:

....What do you mean "nobody paid for the bucket for last 5 years" ?

There is some chance someone might stash old hard drive or tape with backup somewhere in the closet. There is no chance there will be anything left when someone stops paying for cloud.

Gigachad3y ago

Those drives will all be encrypted and most likely shredded.

dark-star3y ago

I'm pretty sure that even with the substantial damage done by the recovery company, a professional team like Kroll Ontrack can still recover the complete tape data, although it probably won't be cheap.

userbinator3y ago

As the other comment here says, any company claiming to do data recovery, and damaging the original media to that extent, should be named and shamed. I can believe that DR companies have generic drives and heads to read tapes of any format they come across, but even if they couldn't figure out how the data was encoded, there was absolutely no need to cut and splice the tape. I suspect they did that just out of anger at not likely being able to recover anything (and thus having spent a bunch of time for no profit.)

Melted pinch rollers are not uncommon and there are plenty of other (mostly audio) equipment with similar problems and solutions --- dimensions are not absolutely critical and suitable replacements/substitutes are available.

As an aside, I think that prominent "50 Gigabytes" capacity on the tape cartridge, with a small asterisk-note at the bottom saying "Assumes 2:1 compression", should be outlawed as a deceptive marketing practice. It's a good thing HDD and other storage media didn't go down that route.

rootsudo3y ago

Name and shame the company, you had a personal experience, you have proof. Name and shame. It helps nobody if you don't publicize it. Let them defend it, let them say whatever excuse, but your review will stand.

Kneesnap3y ago

I don't want to even remotely tempt them to sue. They have no grounds, but I'm not taking risks-- companies are notorious for suing when they know they'll lose. Others who have posted it here have identified the right company though.

sydbarrett743y ago

This is a masterful recovery effort. The README should be shared as an object lesson far and wide to every data restoration and archival service around.

chrisstanchak3y ago

I’ve been suffering through something similar with a DLT IV tape from 1999. Luckily I didn’t send out to the data recovery company. But still unsuccessful.

omnibrain3y ago

Is anyone else calling it “froggering/to frogger” if they have to cross a bigger street by foot without a dedicated crossing?

GnarfGnarf3y ago

DVDs should not be overlooked for backup. The Millennium type have been simulated to withstand 1,000 years.

PicassoCTs3y ago

The author has fantastic endurance, what a marathon to get the files of the tape.

kookamamie3y ago

Someone was wise enough to erase the evidence in Party.

Kneesnap3y ago

Nice catch! I think it was a little less juvenile than it might sound. I believe this was for a different game, Fusion Frenzy, which was a party minigame collection.

MayeulC3y ago

While I didn't understand the parent you are replying to (not your answer), your mention of Fusion Frenzy caught my eye. I've had a soft spot for that game since spending hours playing the "xbox magazine" demo with a childhood friend. Could you clarify? Is there any history gem about that one? I'd dig a PC port!

Kneesnap3y ago

In one of the screenshots was an empty folder called "Party", with the commenter suggesting they deleted it before the backup as some way to hide it, but most likely it was just because it was for another game.

I don't have much knowledge about Fusion Frenzy, but I am looking for options for archiving its development too, if that's even a possibility, which I'm not certain of yet.

1 more reply

jasomill3y ago

Yet, despite ARCserve showing a popup which says "Restoration Successful", it restores up to the first 32KB of every file on the tape, but NO MORE."
From 10,000 feet, this sounds suspiciously like ARCserve is reading a single tape block or transfer buffer's worth of data for each file, writing out the result, then failing and proceeding to the next file.
Success popup notwithstanding, I'd expect to find errors in either the ARCserve or Windows event logs in this case — were there none?
While it's been decades since I've dealt with ARCserve specifically, I've seen similar behavior caused by any number of things. Off the top of my head,
(1) Incompatibilities between OS / backup software / HBA driver / tape driver.
In particular, if you're using a version of Windows much newer than Windows 2000, try a newer version of ARCserve.
In the absence of specific guidance, I'd probably start with the
second* ARCserve version that officially supports Windows Server 2003:

(a) Server 2003 made changes to the SCSI driver architecture that may not be 100% compatible with older software.

(b) The second release will likely fix any serious Server 2003 feature-related bugs the first compatible version may have shipped with, without needing to install post-release patches that may be hard to find today.

(b) Significantly newer ARCserve versions are more likely to introduce tape drive / tape format incompatibilities of their own.

(2) Backup software or HBA driver settings incompatible with the hardware configuration (e.g., if ARCserve allows it, try reducing the tape drive transfer buffer size or switching from fixed block (= multiple tape blocks per transfer) to variable block (= single tape block per transfer) mode; if using an Adaptec HBA, try increasing the value of /MAXIMUMSGLIST[1]).

(3) Shitty modern HBA driver support for tape (and, more generally, non-disk) devices.

For example, modern Adaptec Windows HBA drivers have trouble with large tape block sizes that AFAIK cannot be resolved with configuration changes (though 32 kB blocks, as likely seen here, should be fine).

In my experience with PCIe SCSI HBAs, LSI adapters are more likely to work with arbitrary non-disk devices and software out-of-the-box, whereas Adaptec HBAs often require registry tweaks for "unusual" circumstances (large transfer sizes; concurrent I/O to >>2 tape devices; using passthrough to support devices that lack Windows drivers, especially older, pre-SCSI 2 devices), assuming they can be made to work at all.

LSI20320IE PCIe adapters are readily available for $50 or less on eBay and, in my experience, work well for most "legacy" applications.

(To be fair to Adaptec, I've had nothing but good experiences using their adapters for "typical" applications: arbitrary disk I/O, tape backup to popular drive types, CD/DVD-R applications not involving concurrent I/O to many targets, etc.)

(4) Misconfigured or otherwise flaky SCSI bus.

In particular, if you're connecting a tape drive with a narrow (50-pin) SCSI interface to a wide (68-pin) port on the HBA, make sure the entire bus, including the unused pins, are properly terminated.

The easiest way to ensure this is to use a standard 68-pin Ultra320 cable with built-in active LVD/SE termination, make sure termination is enabled on the HBA, disabled on the drive, that the opposite end of the cable from the built-in terminator is connected to the HBA, and, ideally, that the 68-to-50-pin adapter you're using to connect the drive to the cable is unterminated.

You can also use a 50-pin cable connected to the HBA through a 68-to-50-pin adapter, but then you're either relying on the drive properly terminating the bus — which it may or may not do — or else you need an additional (50-pin) terminator for the drive end, which will probably cost as much as a Ultra320 cable with built-in termination (because the latter is a bog-standard part that was commonly bundled with both systems and retail HBA kits).

Note that I have seen cases where an incorrect SCSI cable configuration works fine in one application, but fails spectacularly in another, seemingly similar application, or even the same application if the HBA manages to negotiate a faster transfer mode. While this should be far less likely to occur with a modern Ultra160 or Ultra320 HBA, assume nothing until you're certain the bus configuration is to spec (and if you're using an Ultra2 or lower HBA, consider replacing it).

With all that said, reversing the tape format may well be easier than finding a compatible OS / ARCserve / driver / HBA combination.

In any case, good job with that, and thanks for publishing source code!

[1] http://download.adaptec.com/pdfs/readme/relnotes_29320lpe.pd...

jasomill3y ago

On a related note, I own a few older tape drives[1], have access to many more[2], and would be happy to volunteer my time and equipment to small-scale hobbyist / retrocomputing projects such as this — tape format conversions were a considerable part of my day job for several years, and tape drives are now a minor hobby.

See my profile for contact information.

[1] 9-track reel, IBM 3570, IBM 3590, early IBM 3592, early LTO, DLT ranging from TK-50 to DLT8000.

[2] IBM 3480/3490/3490E, most 4mm and 8mm formats, most full-sized QIC formats including HP 9144/9145, several QIC MC/Travan drives with floppy controllers of some description, a Benchmark DLT1 assuming it still works, probably a few others I'm forgetting about.

caycep3y ago

At some point, I feel as if it may be easier just to rewrite the code from the ground up vs. going through all that computational archaeology....

Or in a few years, just have an AI write the code...

ThorsBane3y ago

> This is where the story should probably have stopped. Given up and called it a day, right? Maybe, but I care about this data, and I happen to know a thing or two about computers.

Hahaha awwwww yeah :muscle:

j / k navigate · click thread line to collapse