Tarsnap outage postmortem (opens in new tab)

(mail.tarsnap.com)

553 pointsanderiv2y ago319 comments

319 comments

blinks

Ok, I really wasn't expecting this to land at the top of HN. I'd love to stick around to answer any questions people have, but it's 10PM and my toddler decided to go to bed at 5PM... so if I'm lucky I can get about 4 hours of sleep before she decides that it's time to get up. I'll check in and answer questions in the morning.

stigz2y ago

Why would I use your service over restic?

God bless you Colin, but reading this, it appears you're the only one in charge of the infrastructure for this service. I'm glad you're clear about no SLA, but this seems like a big liability between me and my backups.

ivanhoe2y ago

It's a pretty well-known fact for years that tarsnap is basically a one-man show, and yet Colin has managed to provide fantastic service so far. Sometimes having ppl who built the service also managing it is actually a big plus, compared to other services where you first have to fight through outsourced & underpaid support that's limited to template answers, only to finally get some "engineer" who got that job 2 months ago and is more clueless on their system than myself...

stouset2y ago

And to be frank, I've seen plenty of mission-critical services at $bigco which may have had a team of engineers working on them, but the core functionality was maintained, understood, and supported by effectively one senior engineer. If anything went wrong, the supporting junior staff might have been able to fix reasonably simple stuff, but there was essentially one person who understood the system deeply enough to handle problems of any real significance.

1 more reply

andrewmunsell2y ago

I hate to bring this up, but what about the bus factor? If Colin is physically unable to continue maintaining the service and something like this happens again, how will anyone be able to get their data out? It's not really a concern about the service Tarsnap provides today

1 more reply

tomjakubowski2y ago

Why the scare quotes? I would expect any well-experienced power user to know a complicated system better than a fresh engineer two months into working on it, with no previous experience on the system. Especially if the power user is an engineer themself.

crossroadsguy2y ago

You really shouldn’t if that’s a major concern for you and that is a valid concern. For the same reason I’ll never use PurelyMail otherwise it’s perfect.

I know you didn’t ask me — but I don’t think Colin can answer differently other than saying that he is training a family member or friend to take over if needed.

Here’s more https://news.ycombinator.com/item?id=7514753 this is also linked there http://mail.tarsnap.com/tarsnap-users/msg00846.html

Very old threads but I am not sure much has changed there https://www.tarsnap.com/contact.html

Why would you use it instead of restic? Well, for pricing in pico dollars ;-)

and for it has a functional GUI with tiny system footprint and that there really aren’t many such solutions out there.

twic2y ago

> God bless you Colin, but reading this, it appears you're the only one in charge of the infrastructure for this service

Hence the toddler.

devonkim2y ago

I am really confused by the communication thread and am interpreting that the toddler is somehow in charge of running the infrastructure as a joke. Yet I can’t see it as either a joke or serious.

I’m a native English speaker but sometimes I swear I’m losing grasp on communication in the Internet age and am sincerely trying to understand this all.

2 more replies

hunter2_2y ago

Are you suggesting that those who build enterprises don't have time for kids? Seems plausible, but is the difference in lifestyle so consistently prevalent as to be stereotypical? Elon has 10!

1 more reply

amluto2y ago

tarsnap natively protects against inadvertent or malicious deletion or corruption — old tarsnap backups are immutablez The low-cost competitors (restic, borg, etc) seem to have this feature as an afterthought, and they make it surprisingly difficult.

(FWIW, S3 can be somewhat straightforwardly configured so that old data is effectively immutable. Google Cloud Storage’s similarly named versioning feature appears to be far weaker.)

e1g2y ago

Yep, S3 is reasonably easy to configure for immutability. I personally use restic to send (encrypted) blobs to https://www.borgbase.com which has append-only mode and monitoring to warn me if some backups didn't happen.

1 more reply

889135272y ago

Even large organizations can have fairly regular availability issues. I appreciate the noted flaws of "single point of failure", but I also see orgs where 100s of people have access to the infrastructure, make a change, and then it breaks something. I wouldn't do business with an org just because they have many people, that won't mean they're operationally sound, at least not to my expectations.

k8sToGo2y ago

If the data is super important you should be setting on two different providers anyways for backups.

VWWHFSfQ2y ago

Honestly, whose data isn't "super important"?. All my data is super important. Even the crap I just throw on my Google drive. I want to keep it.

What is this mythical unimportant data that people still want to back up?

6 more replies

throwaway2902y ago

What use is SLA? If a service goes down for too long, are you really going to hire a lawyer sue it over SLA or just... use another backup?

abofh2y ago

Not even then - most SLAs say that if it's breached, you pay less. Not that you get money back

AceJohnny22y ago

It's not about suing, but defining expectations about how you can rely on a service.

For example, my team has people across the world for HW bringup, so we can't allow our code hosting or CI to be down for more than a few hours. Of course, backups have different uptime requirements, but as for everything, it's a tradeoff between features, of which an SLA is one.

Tarsnap's features are granularity of cost, reliability of storage, and encryption, but not 99.999% uptime.

3 more replies

IntelMiner2y ago

I'm curious how the prices shake out against services like Wasabi, since it's just dumping to an AWS S3 bucket

Wasabi does $7/TB with no ingress/egress fees. My NAS is set up to rclone to it about once a day and I've yet to have any problems

crossroadsguy2y ago

I haven’t checked the pricing in a long time but you can use Tarsnap also if you have to backup only 7.3kb (okay I might ne exaggerating here but you get the drift) and pay for only that much. You can’t do that with Wasabi et al.

Also it’s really simple and does what it says it does, nothing more, nothing less. In today’s everything convoluted and bloated world this is a luxury imho. The GUI app is also quite good and functional. Support is prompt (that is if you need it).

You don’t have to worry about file being deleted just because your machine didn’t connect or backup for some time even if you keep paying (hello Backblaze) etc. I mean there’s no circus, melodrama , and cliffhangers involved.

I personally would never use it backup my entire laptop, due to price alone. But I have a subset of VVI files and Tarsnap is one of more than one backups for those files. So for that use-case Tarsnap is perfect for me, so far.

1 more reply

Mawr2y ago

Uptime isn't an important property of a backup solution, so I'm not sure where the expectation comes from?

vntok2y ago

It sure should be up when you need it, exactly at the time you need it.

jacquesm2y ago

In future postmortems (of which I hope there will be very few or even none) you may want to spell out your 'lessons learned' to show why particular items will never recur.

idlewords2y ago

It always amuses me how people want reassurance that the next crisis will be a fresh, new problem, and not one the person can demonstrably solve.

A lot of 'lessons learned' analysis boils down to this: in order to prevent a recurrence of X, we introduced complex subsystem Y, the unexpected effects of which you can read about in our next post-mortem.

bityard2y ago

That's an overly cynical take, post-mortems are not for anyone's reassurance, they are a learning opportunity.

The airline industry is as safe as it is because every accident gets thoroughly investigated with detailed reports ("post-mortems") including what to do differently going forward. These are taken as gospel among all players in the industry and as a result, you very rarely see two different accidents caused by the same thing anymore.

jacquesm2y ago

That was entirely not what I was getting at and is a cheap shot that is well beneath you, especially because I suspect that you know that that wasn't what I was getting at.

1 more reply

hgsgm2y ago

You'd love my team's recent postmortem, featuring the comment "action items haeve been copied from the previous postmortem".

nijave2y ago

Could be as simple as "test restore a new server every 1-2 years"

rsync2y ago

You should consider this possible lesson:

"Our simple model that fails gracefully did so and was simple to recover"

Redundancies and failsafes are not free - they add complexity.

99.9% availability fails in boring ways.

99.999% availability fails in fascinating ways.

cperciva2y ago

Yeah, I was going to do that but it was getting late, I wanted to get some sleep, and the post-mortem had already been waiting far too long to be sent out.

The main lesson learned was "rehearse this process at least once a year".

jacquesm2y ago

Agreed, that's the big one. But also: when sleep deprived: take a nap!

vintagedave2y ago

The infrastructure page* says,

> at the present time it is possible — but quite unlikely — that a hardware failure would result in the Tarsnap service becoming unavailable until a new EC2 instance can be launched and the Tarsnap server code can be restarted ... So far such an outage has never occurred

I read the postmortem as that a hardware failure did cause it to be unavailable and the code could not be restarted, a new server had to be built.

If that is correct, as well as writing up learning (as Jacques mentions) this page could be updated with outage information -- or even info on changes to reduce risk of repetition.

For what it's worth, one outage of a single day in fifteen years is impressive. If my ballpark math is correct, that's 99.992% uptime, ie four nines.

* http://www.tarsnap.com/infrastructure.html

mike_d2y ago

This was an extremely well written and thoughtful postmortem, but I hope to never see one from you again. :)

Tepix2y ago

It was a postmortem without the mandatory "how can we prevent this in the future" steps…

tptacek2y ago

In 15+ years of running this service, this is one of two (2) postmortems he's ever published, and the first in eleven (yes, 11) years.

bluehatbrit2y ago

I think that's a little unfair given what was in the postmortem. It may not be a separate section with the key points, but the information is all there of what the issues were and what the solutions are. I think it's fair to assume they're actually acting on those without them needing to be reiterated at the bottom of the page.

2 more replies

bombcar2y ago

Time to get your toddler providing round-the-clock support! ;)

Have been having some luck reading https://www.amazon.com/No-Cry-Sleep-Solution-Toddlers-Presch... - available everywhere libraries (blockbuster for books!) are found.

cperciva2y ago

She's generally a wonderful girl. Right now she's dealing with her second molars coming out and just picked up a cold though, which is throwing off her sleep schedule.

gfv2y ago

How long do you keep the transaction logs before rewriting them?

I too had a few EC2 instances go down with signs of being severed from the EBS in the recent couple of weeks; mine were in eu-west.

cperciva2y ago

There's a continual background cleaning process which depends on the amount of storage which can be reclaimed -- there's a tradeoff between cleaning too slowly (and paying for wasted storage) and cleaning too fast (and paying for lots of S3 operations). I think it averages a couple weeks right now.

dharmapure2y ago

Thank you for the post-mortem Colin and I hope you get some sleep!

cperciva2y ago

Thanks, I did! My long suffering wife was up at 3:30 though. :-(

LinAGKar2y ago

What I'm wondering is, I had data on Tarsnap, why am I only hearing about this now?

nodesocket2y ago

Some recommendations on the AWS front (not sure if some of these are already implemented since the postmortem does not go into AWS details).

- Setup nightly automatic snapshots of EBS volumes (this is supported natively now in AWS under lifecycle manager).

- Use EBS volumes of the new GP3 type, and perhaps use provisioned IOPS.

- Setup a auto-scaling group with automatic failover. Of course increases cost, but should be able to automatically failover to a standby EC2 instance (assuming all the code works automatically which the blog post indicates is not currently the case).

e63f67dd-065b2y ago

Can you say a bit more about the log-structured S3 filesystem? I wrote something very similar recently (https://github.com/isaackhor/objectfs) and I'm curious what made you settle on that architecture. The closest thing I know of that's similar is Nvidia's ProxyFS (https://github.com/NVIDIA/proxyfs)

nextaccountic2y ago

> the central Tarsnap server (hosted in Amazon's EC2 us-east-1 region)

What prevents you to distribute load among other regions?

(Also: did you ever think about abandoning AWS?)

rlt2y ago

Nice write up. A couple questions:

- The use of “I” begs the question: what’s the “bus factor” of Tarsnap? If you were unavailable, temporarily or permanently, what are the contingency plans?

- Will you be making any other changes to improve the recovery time, or did the system mostly function as designed? For example having a hot spare central server?

throwawaaarrgh2y ago

Are you gonna switch to us-east-2?

deathanatos2y ago

> Following my ill-defined "Tarsnap doesn't have an SLA but I'll give people credits for outages when it seems fair" policy, on 2023-07-13 (after some dust settled and I caught up on some sleep) I credited everyone's Tarsnap accounts with 50% of a month's storage costs.

This speaks volumes to me about what kind of person Percival is; that credit would appear to be generously on the "make customer whole" side of the fence, and unlike the major cloud providers, he didn't make each customer come and individually grovel for it. And a clearly written, technical, detailed PM, too. This is how it ought to be done, and done everywhere. Thanks for being a beacon of light in the dark.

rsync2y ago

"Thanks for being a beacon of light in the dark."

That's well put.

It makes me very happy to live in a world where tarsnap exists and is priced in picodollars.

cperciva2y ago

For the record, I'm happy to live in a world where rsync.net exists. I've pointed quite a few customers in your direction over the years, when tarsnap hasn't been suitable for their needs for a variety of reasons.

jpgvm2y ago

They make a good pairing. I backup my ZFS NAS to rsync.net for all my media and Tarsnap for all my documents/critical things.

1 more reply

hightrees20232y ago

The downtime could have been much shortened if you had properly setup and _tested_ disaster recovery steps. Create a full fledged separate staging system which you can bring down and recreate and periodically test various failure modes + document all detailed steps of system restore etc.

Also I would suggest to think about the business long term and seeing if you can increase the revenue enough to enable you to hire a part-timer who can be of great help in case a similar event happens.

We are also a small cloud solution provider (we focus on ML API's) and over the years it has become clear to us that when you use cloud hardware (either dedicated or virtual), from time to time the outages periodically happen. RAM, HDD or other parts of the hardware just can malfunction anytime. So this is something which 100% needs to be taken into consideration when running any high availability online service over long-term.

idlewords2y ago

Hats off to you for an honest postmortem and your capable handling of a difficult situation. The only remark I would offer is with respect to sleep deprivation—when you're the only person who can fix a problem, there's no shame in trading some additional outage time for a fresh mind. Though it feels weird to go nap when all the klaxons are blaring, problems are too easy to compound under the combination of adrenaline and inadequate sleep.

cperciva2y ago

Don't worry, I had a couple naps in there. "This seems to be running smoothly but it will take several more hours; I'll set my alarm to wake me up in two hours and have a nap" is part of why I didn't notice the second step was unnecessarily I/O bound.

gus_massa2y ago

IIUC the process had a few steps were you only had to wait while data was transferred or processed for long times. They were probably useful to take a nap or eat or just drink more coffee.

zokier2y ago

Based on the description it sounds like it should be relatively easy to test this recovery process on a regular basis, to catch any lingering bugs and evaluate the recovery time. As they say, the only backups are the ones you have tested.

baz002y ago

As someone who just discovered my DR process does not work by testing it, 100% this. The only plan that is likely to work is a repeatable tested one.

tialaramex2y ago

Ideally, the thing you do in an emergency is largely routine, so that it happens by instinct rather than being a special case you need to remember. It should not be different in arbitrary ways.

For example in both trains and cars, thanks to anti-lock braking, the correct way to stop the vehicle ASAP is to brake just like normal but as hard as you can, the computers will automatically solve the much trickier problem of turning your input into maximum deliverable braking force by periodically releasing brakes on sticking wheels.

If you run a fire drill, it's surprisingly difficult to get employees to use fire doors that they're used to finding alarmed and unusable. Even though intellectually they know that, say, the door at the bottom of the stairwell is a fire door, with crash bars and leads directly to the outside world, and this is a fire drill, they are likely to (for example) exit on a higher floor and go through a chokepoint lobby, as they would normally, instead of following this safer path that is emergency only. Sadly it is hard to fix buildings after construction if they were designed with such "unused" emergency exits.

For a backup process, having restoring machine images be a service that is sometimes, though not constantly, used anyway for some other reason, is a good way to be comfortable with how it works, that it works, etc. At work for example we routinely test upgrades on test servers restored from a recent backup. Restore serviceA to testA, apply upgrade, discover upgrade completely ruins the service, throw testA away and report this upgrade is garbage. But in the process we gained confidence in the restore process, infrastructure people instead of trying to recall something they only ever did in a drill, when things go badly wrong are very used to this procedure because they do it "all the time".

baz002y ago

This is terrible. Instinct cannot be trusted. Write it down.

1 more reply

cperciva2y ago

Yep! I've been meaning to do it for a while but there was always something higher priority... I didn't realize until this outage that it had been almost a decade since I had tested it.

Rehearsing this annually is definitely going to be a high priority.

mplewis2y ago

I always appreciate seeing a professional, courteous, and honest postmortem like this one.

verytrivial2y ago

(caveat: I may be running on old tarsnap company info but) I must say, the ONLY thing that has ever made me shy away from seriously using tarsnap was the prospect of an unexpected Colin Percival outage. i.e. key person risk. I'm guessing I'm not alone in this.

jacquesm2y ago

It's an MTBF like calculation: do you trust the well engineered one person company that has a well engineered solution with few moving parts over the much larger company that has far more moving parts and their probably less well engineered solution with far more moving parts?

I personally would go with the simpler solution because in my experience you need an awful lot of extra complexity before you get to the same level of reliability that you have with the simpler system. Most complexity is just making things worse.

You can see this clearly when it comes to clustering servers. A single server with a solid power supply and network hookup will be more reliable than any attempt at making that service redundant until you get to something like 5x more costly and complex. Then maybe you'll have the same MTBF as you had with the single server. Beyond that you can have actualy improvements. YMMV and you may be able to get better reliability at the same level of performance in some cases but on average it always first gets far more complex, costly and fragile before you see any real improvements.

I strongly believe that the best path to real reliability is simplicity (which is: as simple as possible) and good backups. For stuff that needs to be available 24x7 and 365 days per year this limits your choices in available technologies considerably.

deltarholamda2y ago

While I get this as a risk, I'm not convinced it's any more risky than a larger corporate entity.

This is Colin's job. Colin has his name attached to it. It's really important to Colin.

You're not going to get the same kind of service from BigBackupCorp. Their employees are replaceable, their management is replaceable, and to be honest, you as a customer are replaceable, if they decide to move in a different direction and become BigFlowerArrangementShippingCorp.

The neat thing about a small business is that it runs entirely on its own profits. There are no stock price games or VC jiggery-pokery or anything like that. If it's a profitable business, there will be somebody to come along and take it over and make it their job with their name attached to it. I think the open Internet benefits a lot from this sort of thing.

cowsandmilk2y ago

It isn’t necessarily about Colin quitting. Key person gets hit by bus is also always a concern. You can say someone will pick it up, but I know nothing of whether such plans are in place. Does the person who would inherit the business have the know how to sell it? Is there enough documentation in place for a transfer of assets to be successful?

idlewords2y ago

This is how that scenario shakes out:

  1. Key person gets hit by bus
  2. You see the black bar on Hacker News and learn the sad news
  3. You go download all your data from the service, which is still up because there is no bus access to data centers.
  4. You feel like a jerk for all your creepy "hit by bus" talk.
  5. A few weeks later, some VC-funded operation with multiple employees you depended on disappears overnight without a trace.

2 more replies

XCSme2y ago

Companies and corporations get "killed" often too, even if the people in them are alive.

idlewords2y ago

Make a list of the competitors tarsnap has outlived and maybe it will change your calculus a bit. The risk you need to evaluate is not "what if something happens to the proprietor" (which I've always found pretty macabre), but "what if something happens to him and then the service goes down and also I never backed up my backups". This is a risk you can make as small as you want with judicious planning.

saalweachter2y ago

I mean, if you are on HN, you will probably learn of a Colin outage within 24 hours, so practically speaking you would really only have a problem if your primary data storage, Tarsnap, and Colin all failed in the same 24 hour window or so before you had time to switch to a new backup provider.

koolba2y ago

Pretty sure his brother works on tarsnap too.

They should take separate buses to ______.

cperciva2y ago

Pretty sure his brother works on tarsnap too.

Yes, I hired him in 2015 IIRC. If you look at tarsnap's GitHub you'll see a lot of commits from gperciva.

2 more replies

bombcar2y ago

I would never consider a backup provider to be more reliable than that, because if you depend on it, it will fail you at the hardest time.

Better to have multiple layers of backup, of which tarsnap and friends are only one, and verify regularly.

abiro2y ago

> The second step failed almost immediately, with an error telling me that a replayed log entry was recording data belonging to a machine which didn't exist. This provoked some head-scratching until I realized that this was introduced by some code I wrote in 2014: Occasionally Tarsnap users need to move a machine between accounts, and I handle this storing a new "machine registration" log entry and deleting the previous one

Recommend writing a TLA+ model to catch stuff like this

colonwqbang2y ago

What would be the benefit of tarsnap over using something like restic+backblaze at order(s) of magnitude lower cost? What specific need would motivate you to pay $3000 per TB-year?

carapace2y ago

Some of us have lots of extra money and like an excuse to give some of it to cperciva so he doesn't have to work a shit job and can apply his skills and talents to bigger, better things?

(People here asking about the low Bus Factor: you don't keep your backups in one service/location, eh? You use Tarsnap and Restic with Backblaze, Rsync.net, S3, etc. right? "Backups are a tax you pay for the luxury of restore.")

jpgvm2y ago

Extremely good deduplication means that for the core set of very important data I backup to Tarsnap the costs are negligible. I imagine the math is probably different if your data is changing more frequently. I for instance use other services to manage my video and photo libraries but my accounting databases, critical documents, etc are backed up to Tarsnap.

I have been using Tarsnap for a decade and not only has there been minimal availability issues there have been almost no issues of any kind that I can recall.

mherrmann2y ago

It sounds like most of the 26h downtime was spent restoring backups. Incidentally, this is exactly the reason why Tarsnap is unusable for me for production environments. Backup restoration (as a user) is excruciatingly slow. When my systems are offline, I have no patience to wait for hours for my backup service. Maybe things are better now; Last I tried was a few years ago when Tarsnap took on the order of magnitude of one hour to restore a backup of a few GBs.

akashshah872y ago

Unfortunately, looks like https://www.tarsnap.com/infrastructure.html will have to be updated.

>> So far such an outage has never occurred; but over time Tarsnap will become more tolerant of failures in order to minimize the probability that such an outage occurs in the future.

viscousviolin2y ago

Unrelated to the outage, but I'm curious nonetheless: would it be possible to hook up Tarsnap's encryption software to a Dropbox folder? I'm not sure if it even makes sense to use Tarsnap for this, but I'd love to have an easy setup that allows me to use Dropbox's servers but only let them see encrypted data so they can't snoop.

matthiaswh2y ago

You probably want something like https://cryptomator.org/

ivoras2y ago

Doesn't plain old Duplicity (https://duplicity.us/) do that already? (except for de-duplication)

aborsy2y ago

Tarsnap is undoubtedly expensive, but it also donates to various efforts!

Neglecting the pricing, does Tarsnap have any advantage over Restic?

Restic also deduplicates, using little data.

mattbee2y ago

The deduping in restic is just on the edge of acceptable for me, making me think I'd have trouble with a lot more data. Basically the one a month "prune" operation takes about 36h (to B2) . I feel I could be tuning something but also it works and I don't want to touch it.

aborsy2y ago

I backup around 2TB with Restic, also tried locally with Borg. The size is nearly the same. Sadly, I can’t even test with Tarsnap! (absurd pricing for 2TB).

justsomehnguy2y ago

> absurd pricing for 2TB

Well, it can't be that ba..

    $0.25 x 2000 = $500

Yikes. And this is without BW costs.

At $500/M you can just rent a dedicated physical server with a lot of HDDs and still have money left for your favourite pumpkin latte.

For comparison rsync.net says it's $0.015 per GB/Mo, for 2TBs that's $30/m and no BW costs.

2 more replies

sandgiant2y ago

Curious how much you backup, which version of restic you're running and why you think the deduplication is borderline unacceptable. There were several major (orders of magnitudes) improvements made to pruning within the past ~1 year, that's why I'm interested.

mattbee2y ago

A straight upgrade, that I can do :) It's been running for years without one.

I was only edgy about it because when it takes 36h it blocks the next daily backup, and I wondered whether that was going to get worse (it hasn't).

1 more reply

zgluck2y ago

Tarsnap is undoubtedly expensive, but it also donates to various efforts!

I mean.. you could purchase a cheaper service and also donate to various efforts. Bonus: Then you'd also be able to pick those efforts.

bartvk2y ago

How do you compare the two, price-wise? With Restic, you have to provide your own storage.

RockRobotRock2y ago

Aren't these storage prices absurd? Please let me know if I'm misunderstanding.

gnfargbl2y ago

The prices are absurdly high if your use-case is storage of large volumes of data that regularly change. It wouldn't be sensible to use Tarsnap for that, and you probably want to use one of the bulk backup services instead.

Tarsnap makes a lot of sense when you benefit from the encryption and (especially) de-duplication features that it offers. For me, all of my most important personal and business data, from multiple decades, compresses-and-deduplicates down to around 6GiB. Considering the high value of the data I store in it, tarsnap's pricing actually feels absurdly low.

patrec2y ago

> Tarsnap makes a lot of sense when you benefit from the encryption and (especially) de-duplication features that it offers.

Can you provide more detail why you think so? I don't believe there is any use case in which tarsnap makes sense, other than maybe some Plan-C backup solution which you fall back on in the highly unlikely event that neither Plan-A nor Plan-B worked.

Concretely, what benefits does tarsnap offer over restic or borg in combination with rsync.net, to make up for the substantial downsides (such as insanely slow restore, complete lack of wetware redundancy or being written in C[1])?

[1] https://www.tarsnap.com/bounty-winners.html

rssoconnor2y ago

I use tarsnap because the asymmetric crypto means I can give my cron job authorization to create backups, but it doesn't have authorization to read or delete(!) backups.

This ability is critical to prevent a compromised system from having its data wiped and having all backups wiped as well.

I haven't been able to figure out how to do this in any other system. But if someone has a tutorial, I am all ears.

4 more replies

gnfargbl2y ago

My post was specifically addressing the comments around cost.

1 more reply

ilyt2y ago

That if there was zero other options that offer encrypted backups, but other software offers that too. Many also offer deduplication. And deduplication is less of a needed feature if you dont pay thru the nose for GB

mekster2y ago

It's insane. Not sure how anyone can accept such a rip off pricing.

Tarsnap : $0.25 / GB storage, $0.25 / GB bandwidth cost

rsync.net : $0.015 / GB storage, no bandwidth cost

s3 : $0.023 / GB storage, some complicated bandwidth pricing

If tarsnap is built on top of s3, they're charging 10 times for the storage cost. Easy money from the uninformed?

LukeShu2y ago

How's the saying in every HN thread go? "Don't set your prices based on your costs, set your prices based on the value you deliver." or something like that.

Tarsnap is a wonderful piece of software. You're paying for that.

That said, is the value of "Tarsnap" worth the price difference from "Borg+rsync.net"? (Or Restic, I've been meaning to look into Restic). I'm not so sure. These days I'm a customer of rsync.net, not of Tarsnap.

But I still firmly disagree with the "Colin's just exploiting the uninformed" angle.

patrec2y ago

Completely ignoring costs, can you name a single use case for which tarsnap would be better than Borg or restic on rsync.net?

3 more replies

electroly2y ago

rsync.net is also overpriced for strictly backup purposes. Make sure you do check out restic; it can use S3 or Backblaze B2 (I actually use both) as backends instead of something expensive like rsync.net. The value of these boutique storage services evaporates when you start using restic.

2 more replies

tptacek2y ago

Yes. That's the thing about Tarsnap, a service with a TikZ diagram on its front page, built around a Unix utility, that meters in picodollars. It's meant to bilk money from uninformed mom-and-pop backup users.

nemothekid2y ago

I'm having a hard time to believe that anyone remotely interested in Tarsnap's value prop is also an "uninformed mom-and-pop".

This "uninformed mom-and-pop" is potentially compiling the client application from source, but can't do basic math to compare tarsnap's pricing to the top 20 or so competitors that rank above tarsnap in SEO?

1 more reply

idlewords2y ago

I tried Tarsnap briefly once and was charged billions of picodollars. It definitely preys on the ignorant.

thenickdude2y ago

S3: Upload bandwidth is free, download is what I'd consider to be astronomical at $0.09/GB ($90/TB).

Geez, that's really not improving the comparison with Tarsnap.

gnfargbl2y ago

If you are primarily cost-driven, you missed one:

Backblaze: $0.005 / GB storage, $0.01 / GB download.

ilyt2y ago

You can send your backups to backblaze and S3 and it still would be cheaper

pritambarhate2y ago

There is also cost development and ongoing support. You haven't factored that in your 10 times the storage cost calculation.

vasco2y ago

The 10x doesn't seem like it's enough to pay for more than a single EC2 server though.

mekster2y ago

Sounds like a very poor excuse when competitors are way cheaper.

1 more reply

RockRobotRock2y ago

>Easy money from the uninformed?

I don't think so. Anyone who can use this software I'm sure knows what other options exist.

mekster2y ago

Then I'd like to know what they think of the benefit of spending $25/mo for just 100GB.

baz002y ago

This is why I use s3 sync, versioning and lifecycles for mine on a Standard-IA bucket. My 120Gb costs $1.80 a month. No way would I pay tarsnap prices.

The 120Gb is the contents of my OneDrive and local repository trees. This is everything I've ever done that I want to keep and is approximately 115Gb of photos and not a lot else!

ilyt2y ago

> If tarsnap is built on top of s3, they're charging 10 times for the storage cost. Easy money from the uninformed?

That's pretty much any SaaS... look at the various log or metrics gathering solution, where you pay serious multipliers of what would cost to run same software on your own instance.

AnonHP2y ago

You can get a HN reader’s discount on rsync.net (email them to ask for it or search on HN), bringing the price down to $0.12 / GB, and everything else remains the same.

rsync2y ago

The HN reader’s discount is lower than that these days (we probably "normalized"[1] parents account to reflect that).

We also have .edu / student / nonprofit discounts. Email us.

Finally, Debian and FreeBSD project members get free accounts. See the committers handbook, etc., for details.

[1] Whenever we lower our prices, we increase quota on existing customers to "normalize" them to the new price/GB. If you do nothing, your rsync.net account just grows over time due to this.

wink2y ago

I'm not here to defend it but I only use it as a secondary backup for my mail server (so basically append-only) and for a low amount of gigabytes it's fine and just works. No, I wouldn't want to write changing 100GB there.

RockRobotRock2y ago

in that case I'm curious about all the downvotes for asking a reasonable question

crossroadsguy2y ago

Well while I use Tarsnap for a very small amount of data (due to pricing) and I quite like Tarsnap for my use-case scenario, your question might have been downvoted due to two reasons:

- your comment was a very valid question but rather quip-like, offhanded, seemed off etc etc. I mean something like that…

- Tarsnap is an hn darling

If I have to pick one I think it’s the latter :)

4 more replies

silisili2y ago

Don't want to speak for Colin, but every time this is brought up, it's explained that Tarsnap uses very little data due to its design. Probably much less than rsyncing your data every hour to a cheaper provider.

jpalomaki2y ago

Since pricing is purely based on storage used, it's very cost efficient for certain use cases.

I've been using Tarsnap for 10+ years. There's some Linux stuff getting backed up, configs and such. It costs next to nothing for this kind of usage.

AnonHP2y ago

It’s meant for people who have a lot of duplicate data and store small files. Anyone who has data that cannot be deduplicated much would be paying tons of money.

While on the price, patio11 (Patrick) has written an article about tarsnap’s issues more than nine years ago (April 2014). One of the suggestions was to raise prices, IIRC. It’s a long post, but you can read it [1] and the HN post [2] from that time.

[1]: https://www.kalzumeus.com/2014/04/03/fantasy-tarsnap/

[2]: https://news.ycombinator.com/item?id=7523953

GhostWhisperer2y ago

yes, people have been saying they should "charge more" for over a decade

RockRobotRock2y ago

Can you not be snide and please help me understand? It seems 50 times more expensive than B2. I'm genuinely curious about the product.

JimDabell2y ago

“What I Would Do If I Ran Tarsnap” goes into a lot of detail:

https://www.kalzumeus.com/2014/04/03/fantasy-tarsnap/

mst2y ago

Roughly: The number of hours of my time that would be required to get something with even theoretically equivalent features would be sufficient to make the cost - and opportunity cost - involved seem far more reasonable.

Plus "written by cperciva and heavily battle tested by Serious Sysadmins" is a feature I couldn't recreate myself - notice that while there was an outage, part of the reason for it taking a while was a conscious choice to take a much longer path to resolution than bringing up the previous server in the name of paranoia. Paranoia about data corruption is a nice thing to have in a backup system and something I'm happily willing to trade-off uptime for.

However: For backups of bulk data then, yes, it's going to be relatively expensive. I wouldn't put e.g. my media backups on tarsnap, but "use tarsnap for your git repositories and other high value data, and something else for the rest" is both perfectly doable and an approach I suspect cperciva himself would endorse.

2 more replies

stigz2y ago

The service is a layer on top of S3 if that helps answer things.

https://www.tarsnap.com/faq.html#is-tarsnap-reliable

1 more reply

GhostWhisperer2y ago

i'm being snide if the prices are considered high; i don't have to agree with that

patrec2y ago

Everything about tarsnap is absurd. It's basically the world's most absurd backup service (insanely expensive, poor UX, bus factor of ~1, restoring moderate amounts of data appears to take days (!)[1]), brought about by an absurdly bad allocation of human capital (it's run by a double Putnam challenge winner, with several other impressive accomplishments), and as such, absurdly beloved by HN.

[1] In case of an emergency, you will always be able to get back your data from tarsnap at a blazing rate of 50kB/s https://github.com/Tarsnap/tarsnap/issues/333.

tinco2y ago

If tarsnap has even a modicum of popularity, thanks to these prices it would be bringing in bank. If he's making bank, that means he's providing value (even if it's just to "uninformed"). And it seems this system mostly runs itself, so it's a side gig. It's probably a far more effective allocation than many other possible allocations of human capital.

How many of the world's best and brightest are doing all sorts of busywork? At least Colin has some time to do whatever he wants to do while running tarsnap.

electroly2y ago

> If tarsnap has even a modicum of popularity

It won't, though, because of the points mentioned by the post you're replying to. It's been 15 years; tarsnap is as popular as it's going to get.

Aachen2y ago

> If he's making bank, that means he's providing value

I don't find that that logically follows from making bank. Not everyone who makes bank is a positive influence.

Tarsnap does provide value, even if I think it's less than its cost: I'm just commenting on the general case that making money would mean you're providing good value

recov2y ago

http://www.daemonology.net/blog/2020-09-20-On-the-use-of-a-l...

switch0072y ago

Not to be that guy, but it’s unreadable either zoomed in or in reader mode either horizontal or landscape on iOS.

Colin, could the website be updated to the 2010s? :P

dang2y ago

"Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting."

https://news.ycombinator.com/newsguidelines.html

switch0072y ago

Shame. It actually spawned an interesting discussion about reader mode, ASCII emails, browser rendering engines and the meeting of old and new. And I learnt things.

dang2y ago

I believe you and yes those are upsides - but we have to moderate for how these things work in general, and in general the downside of such digressions is much greater.

TheDong2y ago

It's not Colin's fault that you're using a browser that can't render an html rendition of an email which has been widely in use since before iOS existed.

This is entirely Safari's fault for not having good compatibility with a common existing webpage format.

Anyway, if you're the intended audience (someone using tarsnap), you also received a copy to your email address, where you can read the text with your email reader of choice.

switch0072y ago

A <pre> is the correct kind of HTML use for the body of plain text email ? It looks like paragraphs of text to me.

<p> is far more appropriate

That isn’t apple’s problem, nor mine.

TheDong2y ago

pre is the only correct element to use since in many emails, the exact formatting and linebreaks and such are important.

For example, a code review on a mailing list can only make sense with the linebreaks and spacing preserved.

However, as you knew to try, there is "reader mode", which is meant to heuristically ignore the exact html in order to display textual content.

Firefox's reader mode has no trouble figuring out that this is a block of text that can be reflowed.

Safari's heuristics clearly fall short on one of the more common kinds of textual blurbs you might want to reader-view-ize.

Seems like a safari problem to me.

2 more replies

jrockway2y ago

<pre> is the correct HTML. People writing plain-text email expect to be able to do things like add ASCII diagrams:

  -------        -------
  | foo |  --->  | bar |
  -------        -------

It's an older technology but it checks out.

bananapub2y ago

it's a plain text email, which is exactly the sort of thing <pre> is for - pre-formatted text

Aeolun2y ago

The HTML rendition in this case did it’s best to be hard to read.

It’s not impossible, and likely just a fault of whatever list thing is used, but it could be better, and it’s nice if people let you know as such right?

prmoustache2y ago

How can simple black on white text with a lisible font be hard to read?

1 more reply

Kwpolska2y ago

How should a browser differentiate between a <pre> of hard-wrapped prose (that it could reflow in reader mode) and a <pre> of code?

TheDong2y ago

In an email you can't of course since an email could contain one, the other, or a mix. And, in fact, most mailing list emails I read these days do contain code and code reviews.

However, if the user clicked the "reader mode" button, that's a good sign the user thinks this is reflowable text. Firefox's reader mode figures this out. Safari's doesn't.

1 more reply

Semaphor2y ago

Just FYI, Firefox Reader mode works great with it.

kevincox2y ago

"Great" is a bit of a stretch as there are random short lines where they were originally hard-wrapped. But it certainly does make it readable.

LukeShu2y ago

It's off-the-shelf MHonArc[1]. If implementing a decent mailing list archive were a prerequisite to launching a business, no business would ever be launched.

[1]: https://www.mhonarc.org/

GhostWhisperer2y ago

edit:

i assumed the parent did not know how to do that, i tried locally and it seemed to work, but i did not pay attention to the text

original:

on the left side of the url input field you'll find "AA"(the first smaller then the seconds), tap that

then, near the bottom of the pop-up menu you'll have "Show Reader", tap that

if you're not happy with the text as displayed then, you can go back to the "AA" menu and change the options

1 more reply

ehPReth2y ago

This should work in reader mode: https://pastebin.com/raw/hanm8mgG

switch0072y ago

Thanks!

jpc02y ago

Works fine for me, on iOS, in safari.

memefrog2y ago

It's a mailing list archive. Use a real computer.

switch0072y ago

How dare I use my phone before I get to my computer.

zetalyrae2y ago

>The process of recovering the EC2 instance state consists of two steps: First, reading all of the metadata headers from S3; and second, "replaying" all of those operations locally. (These cannot be performed at the same time, since the use of log-structured storage means that log entries are "rewritten" to free up storage when data is deleted; log entries contain sequence numbers to allow them to be replayed in the correct order, but they must be sorted into the correct order after being retrieved before they can be replayed.)

Far be it from me to tell anyone how to write software, but why build a database on top of S3 when you can just chuck the metadata into RDS with however much replication you want?

The backups themselves should be in S3, but using S3 as a NoSQL append-only database seems unwise.

This would benefit from being further from the metal.

throwawaaarrgh2y ago

Forget about software or data architecture. S3 is the most reliable data storage mechanism in the world, and insanely simpler than a relational database. There is no operational failure mode to S3, other than "region went down". There is no instance to go down, no replication to fail, no worry about whether there's enough capacity for writes or too many connections, no thought to a schema, no migrations to manage, no storage to grow, no logs to rotate, no software to upgrade on a maintenance window. Plus S3 is versioned, has multiple kinds of access control built in, is a protocol supported by many vendors and open source projects, and is (on AWS) strongly read-after-write consistent. I would also argue (though I don't have figures) that it's faster than RDS. Almost guaranteed it's cheaper. And it's integrated into many services in AWS making it easier to build more functionality into the system without programming.

On a less technical note: Always avoid the fancy option when it makes sense. (From a veteran of building and maintaining large scale high performance high availability systems)

ilyt2y ago

The "fancy" option here is trying to act S3 to act like database instead of simple blob storage...

throwawaaarrgh2y ago

Fancy would be writing an application to talk to RDS, creating an RDS instance, creating a database, creating whatever IAM link is needed for auth into the db so you don't need a second set of credentials, creating a schema, creating columns with different data types, and then modifying the application to handle edge cases for the different data types, logic to insert, update, delete rows, select items, yadda yadda yadda.

Dumb is `aws s3 cp` and being done in 5 minutes.

2 more replies

otabdeveloper42y ago

I'm only wildly guessing here, but most likely the "cloud storage" backing all those managed databases is actually S3-like blob storage under the hood.

1 more reply

zetalyrae2y ago

>S3 is the most reliable data storage mechanism in the world

S3 is not the problem here. The problem is building a database on top of S3, and having to reimplement all the consistency, atomicity, transactions etc. on top.

>no thought to a schema, no migrations to manage

There is, in fact, always a schema. Some people choose to ignore it's there, to their detriment.

>Always avoid the fancy option when it makes sense.

It's not the 1980s. Postgres is not fancy, and Greenspunning it is a mistake.

>Almost guaranteed it's cheaper.

Cheaper than a 26-hour outage?

bastawhiz2y ago

> having to reimplement all the consistency, atomicity, transactions etc. on top.

Most of those problems are moot if you're only ever writing from a single head node. If all your data is strictly ordered and you have no meaningful concurrency, this is a far, far simpler system.

1 more reply

throwawaaarrgh2y ago

> having to reimplement all the consistency, atomicity, transactions etc. on top.

Did you miss where I said it's read-after-write strongly consistent?

1 more reply

nemothekid2y ago

>Cheaper than a 26-hour outage?

AFAICT, using HN theres been roughly 30 hours of non-availability over 15 years. RDS didn't even support Postgres when Tarsnap was released.

EDIT: Tarsnap predates RDS.

foldr2y ago

>Far be it from me to tell anyone how to write software, but why build a database on top of S3 when you can just chuck the metadata into RDS with however much replication you want?

Cost and reliability?

* Using S3 as a simple database is generally going to be much cheaper than RDS.

* If you turn on point in time restore, then losing data stored in S3 is not a possibility worth worrying about on a practical level for most people. RDS replication is easy enough to use, but adds more cost and a little bit of extra infra complexity.

zetalyrae2y ago

>Cost

It's a bad trade. Thousands of hours of a high human capital computer scientist vs. a few tens of dollars a month for RDS.

>Reliability

Empirically false: none of this would have happened if Tarsnap used Postgres instead of a home-spun database.

nonethewiser2y ago

>> Cost

> It's a bad trade.

Maybe. But that's the reason. You never acknowledged that advantage in your question so it needed to be emphasized

1 more reply

nijave2y ago

There are some established patterns for S3 as a database. It's extremely common in "data lakes" (throw data of various schemas in and use a tool that can parse at query time).

There's client libraries like Delta Lake that implement ACID on S3.

Much of the Grafana stack uses S3 for storage (Mimir/metrics, Loki/logs, Tempo/traces).

That said, I'm not sure about the implementation Tarsnap uses--if it's completely ad-hoc or based of other patterns/libraries.

amluto2y ago

> This would benefit from being further from the metal.

How, exactly, is that a good thing?

zetalyrae2y ago

How is not rolling your own database a good thing? Mainly because the business of tarsnap is 1) encrypted 2) backups, not building a database storage engine.

catiopatio2y ago

Implementing client-side encrypted, deduplicated, snapshot-enabled backups with server-mediated access control inherently requires building a minimal storage engine to represent your opaque log-structured data.

Embedding the log-structured representation of user data in Postgres would increase complexity and overhead without offering significant resiliency or recoverability advantages — in fact, quite the opposite.

catiopatio2y ago

It’s cute that you think implementing client-side encrypted, deduplicated backups doesn’t involve building a database storage engine.

lordgilman2y ago

FWIW Tarsnap was launched in 2008, the initial RDS for MySQL was launched in 2009.

zetalyrae2y ago

You can always self-host Postgres.

masfuerte2y ago

The VM crashed, corrupting the file system. This could have made a Postgres database unrecoverable. For rock solid reliability you need more than a database instance.

1 more reply

j / k navigate · click thread line to collapse

319 comments

cperciva2y ago

blinks

stigz2y ago

Why would I use your service over restic?

ivanhoe2y ago

stouset2y ago

1 more reply

andrewmunsell2y ago

1 more reply

tomjakubowski2y ago

crossroadsguy2y ago

You really shouldn’t if that’s a major concern for you and that is a valid concern. For the same reason I’ll never use PurelyMail otherwise it’s perfect.

I know you didn’t ask me — but I don’t think Colin can answer differently other than saying that he is training a family member or friend to take over if needed.

Here’s more https://news.ycombinator.com/item?id=7514753 this is also linked there http://mail.tarsnap.com/tarsnap-users/msg00846.html

Very old threads but I am not sure much has changed there https://www.tarsnap.com/contact.html

Why would you use it instead of restic? Well, for pricing in pico dollars ;-)

and for it has a functional GUI with tiny system footprint and that there really aren’t many such solutions out there.

twic2y ago

> God bless you Colin, but reading this, it appears you're the only one in charge of the infrastructure for this service

Hence the toddler.

devonkim2y ago

I am really confused by the communication thread and am interpreting that the toddler is somehow in charge of running the infrastructure as a joke. Yet I can’t see it as either a joke or serious.

I’m a native English speaker but sometimes I swear I’m losing grasp on communication in the Internet age and am sincerely trying to understand this all.

2 more replies

hunter2_2y ago

Are you suggesting that those who build enterprises don't have time for kids? Seems plausible, but is the difference in lifestyle so consistently prevalent as to be stereotypical? Elon has 10!

1 more reply

amluto2y ago

(FWIW, S3 can be somewhat straightforwardly configured so that old data is effectively immutable. Google Cloud Storage’s similarly named versioning feature appears to be far weaker.)

e1g2y ago

1 more reply

889135272y ago

k8sToGo2y ago

If the data is super important you should be setting on two different providers anyways for backups.

VWWHFSfQ2y ago

Honestly, whose data isn't "super important"?. All my data is super important. Even the crap I just throw on my Google drive. I want to keep it.

What is this mythical unimportant data that people still want to back up?

6 more replies

throwaway2902y ago

What use is SLA? If a service goes down for too long, are you really going to hire a lawyer sue it over SLA or just... use another backup?

abofh2y ago

Not even then - most SLAs say that if it's breached, you pay less. Not that you get money back

AceJohnny22y ago

It's not about suing, but defining expectations about how you can rely on a service.

Tarsnap's features are granularity of cost, reliability of storage, and encryption, but not 99.999% uptime.

3 more replies

IntelMiner2y ago

I'm curious how the prices shake out against services like Wasabi, since it's just dumping to an AWS S3 bucket

Wasabi does $7/TB with no ingress/egress fees. My NAS is set up to rclone to it about once a day and I've yet to have any problems

crossroadsguy2y ago

1 more reply

Mawr2y ago

Uptime isn't an important property of a backup solution, so I'm not sure where the expectation comes from?

vntok2y ago

It sure should be up when you need it, exactly at the time you need it.

jacquesm2y ago

In future postmortems (of which I hope there will be very few or even none) you may want to spell out your 'lessons learned' to show why particular items will never recur.

idlewords2y ago

It always amuses me how people want reassurance that the next crisis will be a fresh, new problem, and not one the person can demonstrably solve.

bityard2y ago

That's an overly cynical take, post-mortems are not for anyone's reassurance, they are a learning opportunity.

jacquesm2y ago

That was entirely not what I was getting at and is a cheap shot that is well beneath you, especially because I suspect that you know that that wasn't what I was getting at.

1 more reply

hgsgm2y ago

You'd love my team's recent postmortem, featuring the comment "action items haeve been copied from the previous postmortem".

nijave2y ago

Could be as simple as "test restore a new server every 1-2 years"

rsync2y ago

You should consider this possible lesson:

"Our simple model that fails gracefully did so and was simple to recover"

Redundancies and failsafes are not free - they add complexity.

99.9% availability fails in boring ways.

99.999% availability fails in fascinating ways.

cperciva2y ago

Yeah, I was going to do that but it was getting late, I wanted to get some sleep, and the post-mortem had already been waiting far too long to be sent out.

The main lesson learned was "rehearse this process at least once a year".

jacquesm2y ago

Agreed, that's the big one. But also: when sleep deprived: take a nap!

vintagedave2y ago

The infrastructure page* says,

I read the postmortem as that a hardware failure did cause it to be unavailable and the code could not be restarted, a new server had to be built.

If that is correct, as well as writing up learning (as Jacques mentions) this page could be updated with outage information -- or even info on changes to reduce risk of repetition.

For what it's worth, one outage of a single day in fifteen years is impressive. If my ballpark math is correct, that's 99.992% uptime, ie four nines.

* http://www.tarsnap.com/infrastructure.html

mike_d2y ago

This was an extremely well written and thoughtful postmortem, but I hope to never see one from you again. :)

Tepix2y ago

It was a postmortem without the mandatory "how can we prevent this in the future" steps…

tptacek2y ago

In 15+ years of running this service, this is one of two (2) postmortems he's ever published, and the first in eleven (yes, 11) years.

bluehatbrit2y ago

2 more replies

bombcar2y ago

Time to get your toddler providing round-the-clock support! ;)

Have been having some luck reading https://www.amazon.com/No-Cry-Sleep-Solution-Toddlers-Presch... - available everywhere libraries (blockbuster for books!) are found.

cperciva2y ago

She's generally a wonderful girl. Right now she's dealing with her second molars coming out and just picked up a cold though, which is throwing off her sleep schedule.

gfv2y ago

How long do you keep the transaction logs before rewriting them?

I too had a few EC2 instances go down with signs of being severed from the EBS in the recent couple of weeks; mine were in eu-west.

cperciva2y ago

dharmapure2y ago

Thank you for the post-mortem Colin and I hope you get some sleep!

cperciva2y ago

Thanks, I did! My long suffering wife was up at 3:30 though. :-(

LinAGKar2y ago

What I'm wondering is, I had data on Tarsnap, why am I only hearing about this now?

nodesocket2y ago

Some recommendations on the AWS front (not sure if some of these are already implemented since the postmortem does not go into AWS details).

- Setup nightly automatic snapshots of EBS volumes (this is supported natively now in AWS under lifecycle manager).

- Use EBS volumes of the new GP3 type, and perhaps use provisioned IOPS.

e63f67dd-065b2y ago

nextaccountic2y ago

> the central Tarsnap server (hosted in Amazon's EC2 us-east-1 region)

What prevents you to distribute load among other regions?

(Also: did you ever think about abandoning AWS?)

rlt2y ago

Nice write up. A couple questions:

- The use of “I” begs the question: what’s the “bus factor” of Tarsnap? If you were unavailable, temporarily or permanently, what are the contingency plans?

- Will you be making any other changes to improve the recovery time, or did the system mostly function as designed? For example having a hot spare central server?

throwawaaarrgh2y ago

Are you gonna switch to us-east-2?

deathanatos2y ago

rsync2y ago

"Thanks for being a beacon of light in the dark."

That's well put.

It makes me very happy to live in a world where tarsnap exists and is priced in picodollars.

cperciva2y ago

jpgvm2y ago

They make a good pairing. I backup my ZFS NAS to rsync.net for all my media and Tarsnap for all my documents/critical things.

1 more reply

hightrees20232y ago

idlewords2y ago

cperciva2y ago

gus_massa2y ago

IIUC the process had a few steps were you only had to wait while data was transferred or processed for long times. They were probably useful to take a nap or eat or just drink more coffee.

zokier2y ago

baz002y ago

As someone who just discovered my DR process does not work by testing it, 100% this. The only plan that is likely to work is a repeatable tested one.

tialaramex2y ago

Ideally, the thing you do in an emergency is largely routine, so that it happens by instinct rather than being a special case you need to remember. It should not be different in arbitrary ways.

baz002y ago

This is terrible. Instinct cannot be trusted. Write it down.

1 more reply

cperciva2y ago

Yep! I've been meaning to do it for a while but there was always something higher priority... I didn't realize until this outage that it had been almost a decade since I had tested it.

Rehearsing this annually is definitely going to be a high priority.

mplewis2y ago

I always appreciate seeing a professional, courteous, and honest postmortem like this one.

verytrivial2y ago

jacquesm2y ago

deltarholamda2y ago

While I get this as a risk, I'm not convinced it's any more risky than a larger corporate entity.

This is Colin's job. Colin has his name attached to it. It's really important to Colin.

cowsandmilk2y ago

idlewords2y ago

This is how that scenario shakes out:

  1. Key person gets hit by bus
  2. You see the black bar on Hacker News and learn the sad news
  3. You go download all your data from the service, which is still up because there is no bus access to data centers.
  4. You feel like a jerk for all your creepy "hit by bus" talk.
  5. A few weeks later, some VC-funded operation with multiple employees you depended on disappears overnight without a trace.

2 more replies

XCSme2y ago

Companies and corporations get "killed" often too, even if the people in them are alive.

idlewords2y ago

saalweachter2y ago

koolba2y ago

Pretty sure his brother works on tarsnap too.

They should take separate buses to ______.

cperciva2y ago

Pretty sure his brother works on tarsnap too.

Yes, I hired him in 2015 IIRC. If you look at tarsnap's GitHub you'll see a lot of commits from gperciva.

2 more replies

bombcar2y ago

I would never consider a backup provider to be more reliable than that, because if you depend on it, it will fail you at the hardest time.

Better to have multiple layers of backup, of which tarsnap and friends are only one, and verify regularly.

abiro2y ago

Recommend writing a TLA+ model to catch stuff like this

colonwqbang2y ago

What would be the benefit of tarsnap over using something like restic+backblaze at order(s) of magnitude lower cost? What specific need would motivate you to pay $3000 per TB-year?

carapace2y ago

Some of us have lots of extra money and like an excuse to give some of it to cperciva so he doesn't have to work a shit job and can apply his skills and talents to bigger, better things?

jpgvm2y ago

I have been using Tarsnap for a decade and not only has there been minimal availability issues there have been almost no issues of any kind that I can recall.

mherrmann2y ago

akashshah872y ago

Unfortunately, looks like https://www.tarsnap.com/infrastructure.html will have to be updated.

>> So far such an outage has never occurred; but over time Tarsnap will become more tolerant of failures in order to minimize the probability that such an outage occurs in the future.

viscousviolin2y ago

matthiaswh2y ago

You probably want something like https://cryptomator.org/

ivoras2y ago

Doesn't plain old Duplicity (https://duplicity.us/) do that already? (except for de-duplication)

aborsy2y ago

Tarsnap is undoubtedly expensive, but it also donates to various efforts!

Neglecting the pricing, does Tarsnap have any advantage over Restic?

Restic also deduplicates, using little data.

mattbee2y ago

aborsy2y ago

I backup around 2TB with Restic, also tried locally with Borg. The size is nearly the same. Sadly, I can’t even test with Tarsnap! (absurd pricing for 2TB).

justsomehnguy2y ago

> absurd pricing for 2TB

Well, it can't be that ba..

    $0.25 x 2000 = $500

Yikes. And this is without BW costs.

At $500/M you can just rent a dedicated physical server with a lot of HDDs and still have money left for your favourite pumpkin latte.

For comparison rsync.net says it's $0.015 per GB/Mo, for 2TBs that's $30/m and no BW costs.

2 more replies

sandgiant2y ago

mattbee2y ago

A straight upgrade, that I can do :) It's been running for years without one.

I was only edgy about it because when it takes 36h it blocks the next daily backup, and I wondered whether that was going to get worse (it hasn't).

1 more reply

zgluck2y ago

Tarsnap is undoubtedly expensive, but it also donates to various efforts!

I mean.. you could purchase a cheaper service and also donate to various efforts. Bonus: Then you'd also be able to pick those efforts.

bartvk2y ago

How do you compare the two, price-wise? With Restic, you have to provide your own storage.

RockRobotRock2y ago

Aren't these storage prices absurd? Please let me know if I'm misunderstanding.

gnfargbl2y ago

patrec2y ago

> Tarsnap makes a lot of sense when you benefit from the encryption and (especially) de-duplication features that it offers.

[1] https://www.tarsnap.com/bounty-winners.html

rssoconnor2y ago

I use tarsnap because the asymmetric crypto means I can give my cron job authorization to create backups, but it doesn't have authorization to read or delete(!) backups.

This ability is critical to prevent a compromised system from having its data wiped and having all backups wiped as well.

I haven't been able to figure out how to do this in any other system. But if someone has a tutorial, I am all ears.

4 more replies

gnfargbl2y ago

My post was specifically addressing the comments around cost.

1 more reply

ilyt2y ago

mekster2y ago

It's insane. Not sure how anyone can accept such a rip off pricing.

Tarsnap : $0.25 / GB storage, $0.25 / GB bandwidth cost

rsync.net : $0.015 / GB storage, no bandwidth cost

s3 : $0.023 / GB storage, some complicated bandwidth pricing

If tarsnap is built on top of s3, they're charging 10 times for the storage cost. Easy money from the uninformed?

LukeShu2y ago

How's the saying in every HN thread go? "Don't set your prices based on your costs, set your prices based on the value you deliver." or something like that.

Tarsnap is a wonderful piece of software. You're paying for that.

But I still firmly disagree with the "Colin's just exploiting the uninformed" angle.

patrec2y ago

Completely ignoring costs, can you name a single use case for which tarsnap would be better than Borg or restic on rsync.net?

3 more replies

electroly2y ago

2 more replies

tptacek2y ago

nemothekid2y ago

I'm having a hard time to believe that anyone remotely interested in Tarsnap's value prop is also an "uninformed mom-and-pop".

1 more reply

idlewords2y ago

I tried Tarsnap briefly once and was charged billions of picodollars. It definitely preys on the ignorant.

thenickdude2y ago

S3: Upload bandwidth is free, download is what I'd consider to be astronomical at $0.09/GB ($90/TB).

Geez, that's really not improving the comparison with Tarsnap.

gnfargbl2y ago

If you are primarily cost-driven, you missed one:

Backblaze: $0.005 / GB storage, $0.01 / GB download.

ilyt2y ago

You can send your backups to backblaze and S3 and it still would be cheaper

pritambarhate2y ago

There is also cost development and ongoing support. You haven't factored that in your 10 times the storage cost calculation.

vasco2y ago

The 10x doesn't seem like it's enough to pay for more than a single EC2 server though.

mekster2y ago

Sounds like a very poor excuse when competitors are way cheaper.

1 more reply

RockRobotRock2y ago

>Easy money from the uninformed?

I don't think so. Anyone who can use this software I'm sure knows what other options exist.

mekster2y ago

Then I'd like to know what they think of the benefit of spending $25/mo for just 100GB.

baz002y ago

This is why I use s3 sync, versioning and lifecycles for mine on a Standard-IA bucket. My 120Gb costs $1.80 a month. No way would I pay tarsnap prices.

The 120Gb is the contents of my OneDrive and local repository trees. This is everything I've ever done that I want to keep and is approximately 115Gb of photos and not a lot else!

ilyt2y ago

> If tarsnap is built on top of s3, they're charging 10 times for the storage cost. Easy money from the uninformed?

That's pretty much any SaaS... look at the various log or metrics gathering solution, where you pay serious multipliers of what would cost to run same software on your own instance.

AnonHP2y ago

You can get a HN reader’s discount on rsync.net (email them to ask for it or search on HN), bringing the price down to $0.12 / GB, and everything else remains the same.

rsync2y ago

The HN reader’s discount is lower than that these days (we probably "normalized"[1] parents account to reflect that).

We also have .edu / student / nonprofit discounts. Email us.

Finally, Debian and FreeBSD project members get free accounts. See the committers handbook, etc., for details.

[1] Whenever we lower our prices, we increase quota on existing customers to "normalize" them to the new price/GB. If you do nothing, your rsync.net account just grows over time due to this.

wink2y ago

RockRobotRock2y ago

in that case I'm curious about all the downvotes for asking a reasonable question

crossroadsguy2y ago

Well while I use Tarsnap for a very small amount of data (due to pricing) and I quite like Tarsnap for my use-case scenario, your question might have been downvoted due to two reasons:

- your comment was a very valid question but rather quip-like, offhanded, seemed off etc etc. I mean something like that…

- Tarsnap is an hn darling

If I have to pick one I think it’s the latter :)

4 more replies

silisili2y ago

jpalomaki2y ago

Since pricing is purely based on storage used, it's very cost efficient for certain use cases.

I've been using Tarsnap for 10+ years. There's some Linux stuff getting backed up, configs and such. It costs next to nothing for this kind of usage.

AnonHP2y ago

It’s meant for people who have a lot of duplicate data and store small files. Anyone who has data that cannot be deduplicated much would be paying tons of money.

[1]: https://www.kalzumeus.com/2014/04/03/fantasy-tarsnap/

[2]: https://news.ycombinator.com/item?id=7523953

GhostWhisperer2y ago

yes, people have been saying they should "charge more" for over a decade

RockRobotRock2y ago

Can you not be snide and please help me understand? It seems 50 times more expensive than B2. I'm genuinely curious about the product.

JimDabell2y ago

“What I Would Do If I Ran Tarsnap” goes into a lot of detail:

https://www.kalzumeus.com/2014/04/03/fantasy-tarsnap/

mst2y ago

2 more replies

stigz2y ago

The service is a layer on top of S3 if that helps answer things.

https://www.tarsnap.com/faq.html#is-tarsnap-reliable

1 more reply

GhostWhisperer2y ago

i'm being snide if the prices are considered high; i don't have to agree with that

patrec2y ago

[1] In case of an emergency, you will always be able to get back your data from tarsnap at a blazing rate of 50kB/s https://github.com/Tarsnap/tarsnap/issues/333.

tinco2y ago

How many of the world's best and brightest are doing all sorts of busywork? At least Colin has some time to do whatever he wants to do while running tarsnap.

electroly2y ago

> If tarsnap has even a modicum of popularity

It won't, though, because of the points mentioned by the post you're replying to. It's been 15 years; tarsnap is as popular as it's going to get.

Aachen2y ago

> If he's making bank, that means he's providing value

I don't find that that logically follows from making bank. Not everyone who makes bank is a positive influence.

Tarsnap does provide value, even if I think it's less than its cost: I'm just commenting on the general case that making money would mean you're providing good value

recov2y ago

http://www.daemonology.net/blog/2020-09-20-On-the-use-of-a-l...

switch0072y ago

Not to be that guy, but it’s unreadable either zoomed in or in reader mode either horizontal or landscape on iOS.

Colin, could the website be updated to the 2010s? :P

dang2y ago

"Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting."

https://news.ycombinator.com/newsguidelines.html

switch0072y ago

Shame. It actually spawned an interesting discussion about reader mode, ASCII emails, browser rendering engines and the meeting of old and new. And I learnt things.

dang2y ago

I believe you and yes those are upsides - but we have to moderate for how these things work in general, and in general the downside of such digressions is much greater.

TheDong2y ago

It's not Colin's fault that you're using a browser that can't render an html rendition of an email which has been widely in use since before iOS existed.

This is entirely Safari's fault for not having good compatibility with a common existing webpage format.

Anyway, if you're the intended audience (someone using tarsnap), you also received a copy to your email address, where you can read the text with your email reader of choice.

switch0072y ago

A <pre> is the correct kind of HTML use for the body of plain text email ? It looks like paragraphs of text to me.

<p> is far more appropriate

That isn’t apple’s problem, nor mine.

TheDong2y ago

pre is the only correct element to use since in many emails, the exact formatting and linebreaks and such are important.

For example, a code review on a mailing list can only make sense with the linebreaks and spacing preserved.

However, as you knew to try, there is "reader mode", which is meant to heuristically ignore the exact html in order to display textual content.

Firefox's reader mode has no trouble figuring out that this is a block of text that can be reflowed.

Safari's heuristics clearly fall short on one of the more common kinds of textual blurbs you might want to reader-view-ize.

Seems like a safari problem to me.

2 more replies

jrockway2y ago

<pre> is the correct HTML. People writing plain-text email expect to be able to do things like add ASCII diagrams:

  -------        -------
  | foo |  --->  | bar |
  -------        -------

It's an older technology but it checks out.

bananapub2y ago

it's a plain text email, which is exactly the sort of thing <pre> is for - pre-formatted text

Aeolun2y ago

The HTML rendition in this case did it’s best to be hard to read.

It’s not impossible, and likely just a fault of whatever list thing is used, but it could be better, and it’s nice if people let you know as such right?

prmoustache2y ago

How can simple black on white text with a lisible font be hard to read?

1 more reply

Kwpolska2y ago

How should a browser differentiate between a <pre> of hard-wrapped prose (that it could reflow in reader mode) and a <pre> of code?

TheDong2y ago

In an email you can't of course since an email could contain one, the other, or a mix. And, in fact, most mailing list emails I read these days do contain code and code reviews.

However, if the user clicked the "reader mode" button, that's a good sign the user thinks this is reflowable text. Firefox's reader mode figures this out. Safari's doesn't.

1 more reply

Semaphor2y ago

Just FYI, Firefox Reader mode works great with it.

kevincox2y ago

"Great" is a bit of a stretch as there are random short lines where they were originally hard-wrapped. But it certainly does make it readable.

LukeShu2y ago

It's off-the-shelf MHonArc[1]. If implementing a decent mailing list archive were a prerequisite to launching a business, no business would ever be launched.

[1]: https://www.mhonarc.org/

GhostWhisperer2y ago

edit:

i assumed the parent did not know how to do that, i tried locally and it seemed to work, but i did not pay attention to the text

original:

on the left side of the url input field you'll find "AA"(the first smaller then the seconds), tap that

then, near the bottom of the pop-up menu you'll have "Show Reader", tap that

if you're not happy with the text as displayed then, you can go back to the "AA" menu and change the options

1 more reply

ehPReth2y ago

This should work in reader mode: https://pastebin.com/raw/hanm8mgG

switch0072y ago

Thanks!

jpc02y ago

Works fine for me, on iOS, in safari.

memefrog2y ago

It's a mailing list archive. Use a real computer.

switch0072y ago

How dare I use my phone before I get to my computer.

zetalyrae2y ago

Far be it from me to tell anyone how to write software, but why build a database on top of S3 when you can just chuck the metadata into RDS with however much replication you want?

The backups themselves should be in S3, but using S3 as a NoSQL append-only database seems unwise.

This would benefit from being further from the metal.

throwawaaarrgh2y ago

On a less technical note: Always avoid the fancy option when it makes sense. (From a veteran of building and maintaining large scale high performance high availability systems)

ilyt2y ago

The "fancy" option here is trying to act S3 to act like database instead of simple blob storage...

throwawaaarrgh2y ago

Dumb is `aws s3 cp` and being done in 5 minutes.

2 more replies

otabdeveloper42y ago

I'm only wildly guessing here, but most likely the "cloud storage" backing all those managed databases is actually S3-like blob storage under the hood.

1 more reply

zetalyrae2y ago

>S3 is the most reliable data storage mechanism in the world

S3 is not the problem here. The problem is building a database on top of S3, and having to reimplement all the consistency, atomicity, transactions etc. on top.

>no thought to a schema, no migrations to manage

There is, in fact, always a schema. Some people choose to ignore it's there, to their detriment.

>Always avoid the fancy option when it makes sense.

It's not the 1980s. Postgres is not fancy, and Greenspunning it is a mistake.

>Almost guaranteed it's cheaper.

Cheaper than a 26-hour outage?

bastawhiz2y ago

> having to reimplement all the consistency, atomicity, transactions etc. on top.

Most of those problems are moot if you're only ever writing from a single head node. If all your data is strictly ordered and you have no meaningful concurrency, this is a far, far simpler system.

1 more reply

throwawaaarrgh2y ago

> having to reimplement all the consistency, atomicity, transactions etc. on top.

Did you miss where I said it's read-after-write strongly consistent?

1 more reply

nemothekid2y ago

>Cheaper than a 26-hour outage?

AFAICT, using HN theres been roughly 30 hours of non-availability over 15 years. RDS didn't even support Postgres when Tarsnap was released.

EDIT: Tarsnap predates RDS.

foldr2y ago

>Far be it from me to tell anyone how to write software, but why build a database on top of S3 when you can just chuck the metadata into RDS with however much replication you want?

Cost and reliability?

* Using S3 as a simple database is generally going to be much cheaper than RDS.

zetalyrae2y ago

>Cost

It's a bad trade. Thousands of hours of a high human capital computer scientist vs. a few tens of dollars a month for RDS.

>Reliability

Empirically false: none of this would have happened if Tarsnap used Postgres instead of a home-spun database.

nonethewiser2y ago

>> Cost

> It's a bad trade.

Maybe. But that's the reason. You never acknowledged that advantage in your question so it needed to be emphasized

1 more reply

nijave2y ago

There are some established patterns for S3 as a database. It's extremely common in "data lakes" (throw data of various schemas in and use a tool that can parse at query time).

There's client libraries like Delta Lake that implement ACID on S3.

Much of the Grafana stack uses S3 for storage (Mimir/metrics, Loki/logs, Tempo/traces).

That said, I'm not sure about the implementation Tarsnap uses--if it's completely ad-hoc or based of other patterns/libraries.

amluto2y ago

> This would benefit from being further from the metal.

How, exactly, is that a good thing?

zetalyrae2y ago

How is not rolling your own database a good thing? Mainly because the business of tarsnap is 1) encrypted 2) backups, not building a database storage engine.

catiopatio2y ago

It’s cute that you think implementing client-side encrypted, deduplicated backups doesn’t involve building a database storage engine.

lordgilman2y ago

FWIW Tarsnap was launched in 2008, the initial RDS for MySQL was launched in 2009.

zetalyrae2y ago

You can always self-host Postgres.

masfuerte2y ago

The VM crashed, corrupting the file system. This could have made a Postgres database unrecoverable. For rock solid reliability you need more than a database instance.

1 more reply

j / k navigate · click thread line to collapse