If you aren’t budget constrained today and had to set it all up again. What would you do?
While I’m a Linux guy, I’ll happily run BSDs when appropriate, like for pfSense, and if it really has better mt tools or driver for LTO-9 drives due to the culture/contributors being more old school, then I’d just grab a 1U server to dedicate for it run a BSD and attach the drive to that.
You seem to have extensive practical hands on experience and while I was doing tapes 20 years ago this will be first time I’m hands on again with it since then. So I need to research most reliable drive vendors and state of kernel drivers and tools, just as you are alluding to.
Pretend you have $50K if needed (doubt it). 2PB existing data, 1PB/year targeted rate, probably 10-20%/year acceleration on that rate. with a data center rack location, 20Gb/s interconnect via bonded 10Gb NICs to storage servers (45drives storinators) and then an office center cabinet/rack/desk (your choice) and will put a tape drive holding at least 8 tapes in data center, planning for worst case of 100TB a month and data center visits to swap in new tapes shouldn’t be too frequent. Any details on what you would do would be interesting.
Around $5000 to $6000 should be enough for a LTO-9 tabletop tape drive plus a suitable SAS HBA card and SAS cable. The card must have matching SAS connectors and SAS speed with the tape drive.
More money will not bring anything extra until a much higher amount is reached, which would be enough to buy a tape autoloader/library, which would eliminate the necessity for a human to insert and remove the cartridges into the tape drive when needed. I am not sure if $50K is enough for a tape autoloader.
Tape autoloaders/libraries are worthwhile only for very big organizations where the amount of data that is continuously written or read to or from the tapes is very large. For a small business or for an individual a tape autoloader is certainly not worthwhile, because the tape drive will be in use at most a small fraction of every day.
1 PB/year is less than 3 TB/day. This can be written on a single tape in a little more than 2 hours. Even with a simple non-pipelined implementation of the file uploading with the writing on the tape, the backup can be done in less than 4 hours. Even writing 2 copies can be done in less than 8 hours. The backup can be done mostly or completely overnight.
For a much bigger amount of data one could buy several tape drives, before starting to think about an autoloader. Also it is possible to pipeline the network transfers with the tape writing, for a backup speed higher by around 50%.
If money would not be a problem and if the data needs to be archived for a long term, so that multiple copies are desirable, I would buy 2 tape drives, to be able to write 2 copies simultaneously.
This would also halve the time for archiving the initial 2 PB of existing data, which will take several months, so a speed-up would be desirable. Having 2 drives will also increase the reliability, as the system will continue to work if one becomes defective.
With only 3 TB written per day, a LTO-9 tape, which has a capacity of 18 TB, will be enough for 6 days.
So unless a backup must be restored, the operator would need to change the tape only once per week.
This is a moderate amount of data, easy to handle with a single drive, even if two are preferable for redundancy and for higher speed.
I do not understand your reference to a "a tape drive holding at least 8 tapes in data center". If you mean an autoloader, from what you describe it does not seem that the very big expense for an autoloader would be justified.
The LTO tapes are best stored in suitcases that can contain 20 cartridges, i.e. when using LTO-9 that is 360 TB. Therefore 3 suitcases store more than 1 PB, i.e. a year of data according to your example. The suitcases should be stored in a secure safe or cabinet. They are usually made to be stackable.
I have assumed that your 1 PB is of already compressed data. If the data is compressible than the requirements for the usage time of the drives and for the storage volume would be much smaller.
I have forgotten to mention that after I compress and encrypt the archived files, I add redundancy with a Reed-Solomon code, e.g. with the par2 program. If I choose e.g. a redundancy of 5%, then a file retrieved from the magnetic tape could have defects of up to 5% of its size, while the original data could still be extracted from it.
- understood your rebooting trick. however being full automated (apart from blank tape rotations) is a requirement also. it’s a production infrastructure. if FreeBSD provides significant value it seems safer to spec a dedicated 1U server to use for backups. there is a management node currently that might work though that has to run Linux as it currently does and I need to check if the SAS on it can be used. It has an bunch of SAS ssd drives currently and I would have assumed there is a way to cable up the qualstar drive … but again I’m still early in researching. and the SAS compatibility issue you raise is perfect example of stuff I need to figure out.
- love par2cmdline and our burner with mdisc for IP backup uses that on git repo files and then seqbox as an outer container for data to guard against potential fs metadata corruption issues. there was a newer low level tool (rust rewrite I think) with many bitrot protection features that I can’t recall it’s name currently and isn’t immediately coming up in my notes, but I know it exists and have been meaning to look into it. it has a newer erasure encoding like raptorq and also block metadata like seqbox, I think can replace the par2 seqbox combo we are currently using on MDISC physical backup for IP. I don’t trust a 100% cloud as one can imagine somehow getting all accounts hacked and deleted.
- yes on compressed. the 2PB is already highly highly compressed. so it means 18TB/tape.
Do you have any vendor/distributors you can recommend? I always recommend 45drives to people and I was planning to ask them about LTO when we order next storinator which is coming up soon also.
There is this interesting blog post from a couple of years ago that probably was the seed of my plan to embark on LTO. Our monthly backblaze invoice is totally out of control. But we need a full backup of our data as it’s simply not replaceable and at the heart of the business.
https://blog.benjojo.co.uk/post/lto-tape-backups-for-linux-n...
they talked me into it. not out of it given our specific situation.
If you would use the full configuration with 2 tape drives, the cost of the system might be around $15k, which is very reasonable for a tape library with autoloader.
I think that this autoloader is a good choice, especially if the price includes "1 x IBM LTO-9 SAS Tape Drive Installed".
As I have said, I believe that it is better to choose the option of also including the second tape drive.
For the tapes, there is no reason to worry about specific distributors. I have always bought them from Amazon, but shops that are specialized in storage products should be OK, unless they charge a premium price over what can be found at Amazon or Newegg. While the tapes are made by Fuji or Sony, they are usually easier to find and at at lower prices as IBM, HP or Quantum branded tapes.
The prices vary, so whichever vendor is cheaper when you buy a batch of tapes should be fine. An LTO-9 cartridge should be only slightly over $100. In time the prices of LTO-9 cartridges should drop. For now they are more expensive than the older cartridges, because they are still relatively new.
I store the tapes in Turtle cases:
https://turtlecase.com/collections/lto
You must check the tape drive requirements for the SAS HBA PCIe card that must be installed in the server, which must have compatible connectors, and you must buy an appropriate SAS cable. I believe that the LTO-9 drives require the newer 12 Gb/s SAS standard and also the newer variant of the external SAS connectors (perhaps SAS HD SFF-8644 connectors).
If you already have a 12 Gb/s SAS HBA that has only internal connectors for SSDs, it is possible to reuse it by buying a SAS internal to external adapter of the appropriate connector types, which must occupy one of the empty expansion slots of the server case and which plugs into the internal connectors, while providing external connectors. Such adapters can also be used with server motherboards that have on-board SAS controllers. If you have a SAS HBA card that has external connectors, but different from those on the tape drive, e.g. SAS SFF-8088, there are cables with mixed SAS connectors that can connect the tape drives. The HBA cards usually have at least 2 external SAS connectors, suitable for 2 tape drives.
With the autoloader, it should be easy to make the backup or retrieval process completely automatic, so that an operator should not have to visit the tape autoloader more often than at a few months interval, except for the initial phase when you would have to write 2 PB on almost 120 tapes (or a double number for improved redundancy, beyond the redundancy added per each archive file; 2 copies can be stored in 2 different geographic locations, to avoid the catastrophic loss of all tapes), so you would want to keep the tape autoloader in an easily accessible place for that time.
The initial cost for writing 2 copies of 2 PB of data, i.e. 4 PB of data, would be not much less than $30k for the tapes. This, together with the autoloader with 2 tape drives, HBA card, cases, cables and maybe adapters, would be in the range of $45k to $50k, so within your estimated budget.
As I have said, it is convenient to have a database with the metadata (including content hashes, made e.g. with BLAKE2b-512 or with BLAKE3-256) of all the files that have ever been archived, which shall be used whenever information must be retrieved and which can also be used for deduplication (for which the content hashes are handy), to check whether a file is already present in some earlier archive, so there is no need for its backup.