User: "I want to serve 5TB."
Guru: "Throw it in a GKE PV and put nginx in front of it."
Congratulations, you are already serving 5TB at production scale.
Medium- and smaller-scale can often be more flexible because they don't have to incur the pain of nonuniformity as scale increases. While they may not be able to afford optimizations or discounts with larger, standardized purchases, they can provide more personalized services large scale cannot hope to provide.
Eventually, all outages are black swan events. If you have 1000 independent instances (i.e., 1000 customers), when the unexpected thing hits, you’re still 99.9% available during the time when the impacted instance is down.
Also, you can probably permanently prevent the black swan from hitting again before it hits again.
For rock-solid public hosting Cloudflare is probably a much better bet, but you're also paying 7 times the price. More than a dedicated server to host the files, but you get more on other metrics.
* based on fair use
at 250 TB/mo:
> In order to continue hosting your servers with us, the traffic use will need to be drastically reduced. Please check your servers and confirm what is using so much traffic, making sure it is nothing abusive, and then find ways of reducing it.
At scale, you'll pay a couple thousand dollars for Class B operations on R2, and another bunch for storing the 10 TB in the first place, but that's relatively cheap compared to other offerings where you'd pay for metered egress bandwidth.
https://developers.cloudflare.com/r2/pricing/ https://r2-calculator.cloudflare.com/
support seems non existent cos no one answers emails or web chat…
I suspect the storage will be a bigger concern.
Seedhost.eu has dedicated boxes with 8TB storage and 100TB bandwidth for €30/month. Perhaps you could have that and a lower spec one to make up the space.
Prices are negotiable so you can always see if they can meet your needs for cheaper than two separate boxes.
Though be aware that many (most?) seedbox arrangements have no redundancy, in fact some are running off RAID0 arrays or similar. Host has a problem like a dead drive: bang goes your data. Some are very open about this, afterall for the main use case cheap space is worth the risk, some far less so…
Of course if the data is well backed up elsewhere or otherwise easy to reproduce or reobtain this may not be a massive issue and you've just got restore time to worry about (unless one of your backups can be quickly made primary so restore time is as little as a bit of DNS & other configuration work).
It's kind of like someone going to a group of doctors and saying "I'm in pain", and then the doctors start throwing out reasons the person may be in pain and solutions to that pain.
Sure, there may be some interesting ideas there, but it doesn't really do OP any good without describing where the pain is, when it started, if they have any other known conditions, etc. etc.
I know you think you were helping with this comment, but you really weren't.
Throw that in RAID10 and you'll have 12TB usable space with > 300TB bandwidth.
But if splitting your traffic across multiple servers is possible you can also get the €20 storage box and put a couple Hetzner Cloud servers with a caching reverse proxy in front (that's like 10 lines of Nginx config). The cheapest Hetzner Cloud option is the CAX11 with 4GB RAM, 40GB SSD and 20TB traffic for €3,79. Six of those plus the Storage Box gives you the traffic you need, lots of bandwidth for usage peaks, SSD cache for frequently requested files, and easily upgradable storage in the Storage Box, all for €42. Also scales well at $3,79 for every additional 20TB traffic, or $1/TB if you forget and pay fees for the excess traffic instead.
You will be babysitting this more than the $150/month cloudflare solution, but even if you factor in the cost of your time you should come out ahead.
There’s always a limit, that might be measured in TB, PB or EB, and may be what you determine practical or not, but it’s there
(No affiliation with either; just a happy customer for a tiny personal project)
They have very reasonably priced KVM instances with unmetered 1G (10G for long-standing customers) bandwidth that you can attach “storage slabs” up to 10TB ($5 per TB/mo). Doubt you will find better value than this for block storage.
One multi-vendor, zero-knowledge, HA solution is https://tahoe-lafs.org/trac/tahoe-lafs
It's like Dropbox except peer to peer. So it's free, limited only by your client side storage.
The catch is it's only peer to peer (unless they added a managed option), so at least one other peer must be online for sync to take place.
See: https://wasabi.com/cloud-storage-pricing/#cost-estimates
They could really do with making the bandwidth option on this calculator better.
https://wasabi.com/paygo-pricing-faq/#minimum-storage-durati...
Wasabi isn't meant for scenarios where you're going to be transferring more than you're storing.
That is yourdomain.com -> IP_ISP1, IP_ISP2
Going the other way from yourserver -> outside would indicate some sort of bonding setup.
It is not trivial for a home lab.
I use 3 ISPs at home and just keep each network separate (different hardware on each) even though in theory the redundancy would be nice.
The other way is to have two names, like dl1 and dl2, and have your download web page offer alternating links, depending on how the downloads are handled.
You very rarely can do multi-ISP bonding, often not even with multiple lines from the same ISP, unfortunately.
Of course the downside is that, if you need to download that 10TB, you'll be out $900! If you're worried about recovering specific files only this isn't as big an issue.
Also read their free egress policy first: https://wasabi.com/paygo-pricing-faq/#free-egress-policy
Note that there is Glacier and Glacier Deep Archive. The latter is cheaper but longer minimum storage periods. You can use it as a life cycle rule.
Giving another one of these raid1 rpis to a friend could make it reasonably available.
I am very interested to know if there are good tools around this though, such as a good way to serve a filesystem (nfs-like for example) via torrent/ipfs and if the directories could be password protected in different ways, like with an ACL. That would be the revolutionary tech to replace huggingface/dockerhub, or Dropbox, etc.
Anyone know of or are working on such tech?
The system in (2) works ok for downloading a Dockerfile that points to an IPFS file if you put the link there; however, a considerable number of things don't fit the suggestions in (2), such as not automatically becoming a seeder of that file when downloaded or when running the package. There is also a great amount of opportunity in making the process of uploading files to IPFS much simpler. One example for the code idea would be something like git hooks, such that any time a major version of a git commit was made, a set of files would be added to IPFS for this type of distribution. Ultimately a 'plug-n-play' package to add in a specified way e.g. setup.py would be the best way to get something like that going. Then perhaps a simple program like synching or miniserve that operates on top of that functionality would allow for something more like (1).
Most of their storage servers have 1gbps unmetered public bandwidth options and that should be sufficient to serve ~4TB per day, reliably.
I'm using cloudflare R2 for a couple hundred GB, where I needed something faster.
That said above 80 TB is looking hard for eternity, unless you can provide backup power and endure noise of spinning drives.
I briefly looked at services selling storage on FileCoin / IPFS and Chia, but couldn't find anything that inspired confidence.
Also, any idea on the number of users (both average, and peak) you'd expect to be downloading at once?
Does latency of their downloads matter? eg do downloads need to start quickly like a CDN, or as long as they work is good enough?
Assuming the simplest need is making files available :
1) Sync.com provides unlimited hosting and file sharing from it.
Sync is a decent Dropbox replacement with a few more bells and whistles.
2) BackBlaze business let’s you deliver files for free via their CDN. $5/TB per month storage plus free egress via their CDN.
https://www.backblaze.com/b2/solutions/developers.html
Backblaze seems to be 70-80% cheaper than S3 as it claims.
Traditional best practice cloud paths are optimized to be a best practice to generate profit for the cloud provider.
Luckily it’s nice to rarely be alone or the first to have a need.
You might want to check out OVH or - like mentioned before - Hetzner.
Cloud bandwidth is absolutely enormously overpriced.
I can't justify colo unless I can get 10U for $300/month with 2kW of PDU, 1500 kWh, and 1 GbE uncapped.
I'll take the risk of colo within the same region. If this region were gone, my data would be meaningless.
For FastComments we store assets in Wasabi and have services in Linode that act as an in-memory+on disk LRU cache.
We have terabytes of data but only pay $6/mo for Wasabi, because the cache hit ratio is high and Wasabi doesn't charge for egress until your egress is more than your storage or something like that.
The rest of the cost is egress on Linode.
The nice thing about this is we gets lots of storage and downloads are fairly fast - most assets are served from memory in userspace.
Following thread to look for even cheaper options without using cloudflare lol
This is still crazy expensive. Cloud providers have really warped people’s expectations.
Actually, since the Akami acquisition it would be even cheaper.
$800/mo to serve 100TB with fairly high bandwidth and low latency from cold storage is a good deal IMO. I know companies paying millions a year to serve less than a third of that through AWS when you include compute, DB, and storage.
Backups probably wouldn't be much more.
You could just run Varnish with the S3 backend. Popular files will be cached locally on the server, and you'll pay a lot less for egress from Wasabi.
Hetzner has excellent connectivity: https://www.hetzner.com/unternehmen/rechenzentrum/ They are always working to increase their connectivity. I'd even go so far to claim that in many parts of the world they outperform certain hyperscalers.
Also I know that some cheaper Home ISPs also cheap out on peering.
Now, this was some time ago, so things might have changed, just as you suggested.
And that's only 309Mbits/s (or 39MB/s).
And a used refurbished server you can easily get loads or ram, cores out the wazoo, and dozens of TB's for under $1000. You'll need a rack, router, switch, and batt backup. Shouldn't cost much more than $2000 for this.
who are you serving it to?
how often does the data change?
is it read only?
What are you optimising for, speed, cost or availability? (pick two)
I tried this using Pine ROCKPro64 to possibly install Ceph across 2-5 RAID1 NAS enclosures. The problem is I can't get any of their dusty Linux forks to recognize the storage controller, so they're $200 paperweights.
I wrote a SATA HDD "top" utility that brings in data from SMART, mdadm, lvm, xfs, and the Linux SCSI layer. I set monitoring to look for elevated temperature, seek errors, scan errors, reallocation counts, offline reallocation, and probational count.
> If your monthly egress data transfer is greater than your active storage volume, then your storage use case is not a good fit for Wasabi’s free egress policy.
If you recover only small data, it’s also not expensive. The only problem is if you recover large data. That would be a major problem.
I don’t think many other solutions are equally fast and secure.
AWS operation is pretty transparent, documented, audited and used by governments. You can lock it down heavily with IAM and a CMK KMS key, and audit the repository. The physical security is also pretty tight, and there is location redundancy.
Even hetzner doesn’t have proper redundancy in place. Other major providers in France burned down (apparently with with data loss), or had security problems with hard drives stolen in transport.
I don’t work for AWS, don’t have much data in there, just saying. GCP and Azure are probably also good.