I didn't read all their math but I expect their final result to be off by a factor of 2-5x. Hard drives are a surprisingly low percentage of the cost of a storage system.
Sia systems don't need a ton of networking. I ran the networking buildout costs by some networking people, and again it comes down to cutting corners. If you only need 10 gbps per rack, if you don't mind having extra milliseconds added, etc, you can get away with very scrappy setups. The whole point is that it's not a highly reliable facility.
That puts the cost of 192 TB at more like $6240, not $4945. Could be less if you find a good deal on mini-SAS PCIE cards, but still going to be substantially higher than $4945.
Can't be more than 2.5 because Backblaze B2 already gives you $5/TB/Mo.
That is also not the only service that Backblaze offers and wasn't their first. It could be that B2 is simply a way for them to offset their cost for extra capacity and are running it effectively near-cost for them.
Well it can be, if they have a lot of inefficiencies. Backblaze could have more experienced engineers who overcame these. I assure you, I can accidentally design a very expensive storage system as I’m not that smart ;)
Somewhat of a random question, can you point me to some state of the art research?
I looked at their parts list and it's obvious they aren't serious. CPU is missing, memory is missing, SAS to SATA cables, but no SAS controller, no mounting for the system board. Low effort at best.
THIS! There's no use of a 2TB storage if you can't upload/download this amount each month
Generators?
UPS?
Cooling costs?
Square footage costs for the real estate itself?
Security and staffing?
At the scale they intend to accomplish they will need at minimum several hundred kilowatts of datacenter space. Even assuming somewhere with a very low kWh cost of electricity, that much space for bare metal things isn't cheap. Go price a lot of square footage and 300kW of equipment load in Quincy, WA or anywhere else comparable, the monthly recurring dollar figure will be quite high.
And all of that is before you even start to look into network costs to build a serious IP network and interconnect with transits and peers.
Generators? Who needs those? Just wait for the power to come back on. UPS? Why bother? Square footage? Stick some wooden shelves in the cheapest building possible. Cooling? Locate in a cold climate and buy some window fans.
This isn't anything like the sort of infrastructure you're used to dealing with. Think Bitcoin mining farm, not Backblaze datacenter. Any corners that can be cut will be.
And yet Sia is about half the cost of Backblaze (i.e. not much savings).
Hard to imagine situations where this is a good trade-off.
No generators, just eat the downtime. No batteries. No 24/7 staff. No racks, just shelves (folded sheet metal is cheap). Security varies from farm to farm.
These servers don't need to run cool, as long as you are in a climate that doesn't get over 100 degrees you can get away with fans and no AC.
Reliable data storage for paying customers is a very different thing.
Here they’re depreciating components over 7 years, so will those components last 7 years under those conditions?
Power is similar. Maybe you just use solar and turn the drives off when it's cloudy. With enough distribution throughout the world, it will probably be sunny somewhere.
I haven't done the math and I'm not saying it will work out favorably. I also don't have a use case for 95% availability. (That's two weeks a year where your data is gone!) But it's something that someone with the right needs could consider, and maybe come out ahead of someone shooting for 5 nines and drives that aren't covered in snow.
I get that you're mostly joking by saying "yeah sometimes everything shorts out", but we have electrical codes in the US/Canada for a reason.
Gone from only one of the places it's stored. Your data is still available even if just 10 out of 30 servers are online.
At the (slow) rate Sia is growing, I don't think there will ever be enough demand to justify this design anyway.
The calculations in today's blog post account for the labor cost of assembling hardware, but leave out major other labor costs:
1. You need an SRE to keep the servers online. Sia pushes out updates every few months, and the network penalizes you if you don't upgrade to the latest version. In addition, to optimize costs, you need to adjust your node's pricing in response to changes in the market.
2. You need a compliance officer to handle takedown requests. Since Sia allows anyone to upload data to your server without proving their identity, there's nothing stopping anyone from uploading illegal data to the network. If Sia reached the point where people are building $4k hosting rigs, then it's safe to assume clients would also be using Sia to store illegal data. When law enforcement identifies illegal data, they would send takedown notices to all hosts who are storing copies of it, and those hosts would need someone available to process those takedowns quickly.
Call me skeptical but it seems that they aren't committing to building out this infrastructure themselves or providing a specific amount of storage at this pricing. They seem to be outlining a potential infrastructure that some enterprising individual (or corporation) could use to provide storage at that price to "renters" within their marketplace.
I guess I'll just wait until someone puts their money where their mouth is. Given that this is a marketplace, the fact that a theoretical setup could be built to provide some service doesn't necessarily guarantee it will be built.
1. https://support.sia.tech/article/thvymhf1ff-about-renting
Just looking at the website for Sia I see a bunch of fluffy marketing stuff, fair enough, that's normal these days. But where is the selling point? https://sia.tech/technology tells me my data is stored securely and in a redundant manner, great, just like any storage provider. That is followed by "Renters And Hosts Pay With Siacoin" and talks about payment channels, which links to a wikipedia article and not something that tells me how I would even pay them, not to even talk about how much (I saw the calculator thingy on my way to that site, the messaging is still weird).
The "Getting started" call to action is a similar experience, a bunch of downloads, cool - I don't even know if you're right for me yet. I'm five levels deep into the "Getting started guide" linked there and so far found that I'd apparently have to deal with weird crypto exchanges to pay somebody for this, plus I couldn't use most of my pretty standard tooling anymore (at least not without involving one of those proxy things on the getting started page that cover a few use cases, some of which seem to be operated by others?).
Does that seem low to anyone else? I don’t really have any background in the area, but 25/hr cost to the company would be less than 20/hr pay for the skilled labor. Other countries are different of course, but in US I could make that much flipping burgers in the right area.
I did realize that I completely forgot about RAM, when I get back to a computer I'll have to make some updates, but it won't materially move the numbers, there's 33% margin of error between the number in the spreadsheet and $2 / TB / Mo.
The $80 PSU is what I could link to from Newegg, I do have experience in industrial electronics, and I know from firsthand experience that you can buy a 10+year PSU at 93% efficiency for well under $80 at 300 watts. At that level, you're going to be able to request all the required cabling as well, which means you're getting a much better price than the $7 per cable linked in the post.
95% uptime means 18 days of downtime per year. Consumer grade PSUs and mobos do much better than that.
I earn about €26/h before taxes in western Europe, an income which lets me live in relative luxury (not "private jet" luxury, but I literally do anything I want and still save more than a third of my income with a 36-hour work week), and that's for security consultancy which is way more specialised than the job you're talking about. I think it's also above the national average, but I don't have the statistics on hand. Not sure what the cost to the company is, I think they put in another hundred a month for health insurance or pension or something (they pay 50% and I pay 50%, though I don't see why keeping 50% off my payslip helps anyone, an employer will just deduct that from the salary they can offer) plus some overhead for accounting and whatever, but it's probably not that far off.
It did, or does now anyway, the RYZEN 3 1200 for $95.
EDIT: Although the better option is the 3200G so that you can actually get a display output from the thing. Same price, so it doesn't really change anything, but it does cut the CPU core count down a bit if that matters at all.
That said the buildout still doesn't work because you can't actually plug the "sata splitter" cable they linked into the motherboard. Because the splitter was actually a 4-lane SAS SFF-8087 breakout cable, and there's no consumer motherboard with 8x of those connectors on it. Good luck finding even 1 or 2 of those connectors on a consumer board, and it sure as hell won't be at dirt cheap prices.
So you either need 4x the computers they calculated, or you need to budget for add-in SATA/SAS controller cards. Which, because they aren't used in consumer land, are not cheap. You could go used, but that's still going to increase the bottom line (and won't be a reliable source of parts)
They also aren't factoring in assembly time nor budgeting for that. Building these isn't going to go very quickly.
Edit: to clarify on the CPU usage, could a potential build also get away with a cheap AMD Athlon 3000G?
$5/TB/Mo was B2 Price with Profits, better depreciation ( Blackblaze replaces drives more often ) and faster connection.
$2/TB/Mo was napkins maths with ~10% Gross Profits.
Efficient $$ utilization is bread racks, built out data centers abandoned by the likes of PepBoys that landlords will part for $3/sq foot per year and Google using servers without cases and velcros to keep hard drives attached.
Content marketers and technical marketers - don't miss the opportunity on Medium and other platforms to at VERY LEAST link to your homepage in the first section.
In fact that is at the top of this awesome piece of content marketing is a "Sign Up" button for Medium . . .
I'm looking forward to seeing this project mature as well as have some more layers build on top of it moving forward. I really wish the client offered synchronization or access across multiple devices. For now you have to try third party layers on top of Sia to accomplish this.
Yea I'd actually pick it up now and give it a try if it had this feature.
Or if one of the major hyperscalers or datacenter operators decides to start selling storage to Sia, it seems likely that their control plane across datacenters could result in correlated failures. A networking outage for their AS could result in multiple datacenters appearing offline concurrently, for example.
The profit of $570/year/box is not enough to pay a part-time sysadmin and still have any useful profit.
I wonder how reasonable this assumption really is. For regular CPU-bound crypto-mining we see that it tends to centralize geographically in zones where electricity, workforce and real-estate space to build a datacenter are cheap.
Assuming that Sia ends up following a similar distribution, it wouldn't be surprising if several of these hosts ended up sharing a single point of failure.
Beyond that, if only copying stuff around three times to provide tolerance is enough to lower the costs to $2/TB/Mo, why aren't centralized commercial offerings already offering something like that? Just pool three datacenters with 95+% uptime around the world and you should get the same numbers without the overhead of the decentralized solution, no? Surely the overhead of accounting for hosts going offline and redistributing the chunks alone must be very non-trivial. With a centralized, trusted solution it would be much simpler to deal with.
Or is the real catch that Sia has very high latency?
The adapter they're linking to is SF8087 to 4x SATA, not SATA to 4x SATA (which shouldn't exist). That motherboard doesn't have SF8087, it has 8 SATA3 connections.
Unless I've missed something big, SF8087 can not be plugged into SATA3.
The hosts are not ever going to be fully independent. There will be hundreds, if not thousands, host co-located in the same location -- likely of the cheapest grade, without any extras like fire alarms or halon extinguishers or redundant power feeds. A single fire (flood, broken power station) has a chance of taking out thousands of hosts simultaneously.
And there is management system as well -- AWS has thousands of engineers working on security. Will there be one at this super-cheap farm? What are the chances there will be farms with default passwords and password-less VNC connections? And since machines are likely to be cloned, any compromise affects thousands of hosts.
... and all of those things are made worse by the fact that if you store hundreds of thousands of files, your failure probability raises significantly. If a data center burns down, at least few of your files may be unlucky enough to be lost.
> For a 32 HDD system, you expect about 5 drives to fail per year. This takes time to repair and you will need on-site staff (just not 24/7). To account for these costs, we will budget $50 per year per rig.
will you not also lose 6TB (times utilization) of your lockup every time a drive dies?
> 8x 4 way SATA data splitters
you've linked to SAS breakout cables. they don't plug into SATA ports, they plug into SFF-8087 SAS ports.
they cannot plug into the motherboard you've listed. nor have I ever seen one listed for retail sale that has 8 SFF-8087 ports.
the cheapest way to get 8 SFF-8087 ports is with some SAS expander card, and a SAS HBA. even scraping off eBay that's another $50 per host, and two more components to fail.
there are also actual SATA expanders out there, but they last about 3 months before catastrophic failure in my experience.
FWIW the break out cable they've listed is splitting up a connector that has 4 electrical channels onto 4 physically separate cables, so there's no problem with it. they just don't have anywhere to plug it in.
The economies of scale should make this much less expensive. Colocating your own machine in a real datacenter and hosting your own data shouldn't still be cheaper than practically all of "the cloud" offerings, but it is. What does that tell you about "the cloud"? It's marketing bullshit.
Sure, it's fine for occasional use, but anyone using the hell out of "the cloud" can easily save money by using anything else.
There are cases where you can indeed save money by doing more by yourself. But how much time does it cost you and how much is your time worth?
How much time do you need to research, purchase, and eventually build your hardware? How much time do you need to get a decent data center deal? How much time do you need to bootstrap your setup? How much time do you need to regularly maintain your infrastructure?
My time is worth a tremendous amount to me, which means I want to use my own hardware. "The cloud" does not guarantee reliability.
Any company that does any project that even slightly regularly requires compute / storage can easily justify the time to do all the things you mentioned.
The fact that many companies have gone towards "the cloud" goes hand-in-hand with the fact that many companies use Windows. It's clearly not the best thing to use to get things done, but the IT people don't want to reduce their importance and the management people like the kickbacks and perks they get from buying certain thing from certain companies.
The savings look good on paper, but the reality is that they're based on leaving out lots of information. I've helped several companies move from "the cloud" back to good, local compute resources because of the amount of money they were hemorrhaging to "cloud" providers.
For the most part, it's all marketing bullshit.
My (or their, actually) problem is I don't really get what they are offering right now. There is an impressive landing page with big numbers and pretty pictures which explains pretty much nothing. Project seems to be in production for at least 3 years, there are some apps, but I don't actually see if I can use it to backup/store some data and how much it costs right now. I mean, they say "1TB of files on Sia costs about $1-2 per month" right there on the main page, but it cannot be true, right? It's just what they promise in the hypothetical future, not current price-tag?
The only technical question I'm interested here is why they actually need blockchain? This is always suspicious and I don't remember if I saw any startup at all that actually needs it for things other than hype. It is basically their internal money system to enable actual money exchange between storage providers and their customers, right? So, just a billing system akin to what telecom and ISP companies have? Is it cheaper to implement it on blockchain than by conventional means? How so?
Here's the live pricing, right now: https://siastats.info/storage_pricing
> Is it cheaper to implement it on blockchain than by conventional means?
It's more so that anyone can join the network as a host. They don't have to have a financial or business relationship with anyone, they can just provide their storage service and charge for it. No way to do that currently in the world without a blockchain.
Maybe I misunderstand your point, but I could certainly install MinIO (a S3 compatible object store) on a home NAS and charge people for it without using a blockchain. I see your point about not having a financial or business relationship with a blockchain network acting as an intermediary, but I can assure you that the IRS and various law enforcement and regulatory agencies would tell you that you absolutely do have a financial and business relationship with whoever is paying you via the crypto-network whether you'd like to or not.
S3: https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.... (Max object size: 5TB, Max single multipart size: 5GB)
Backblaze B2: https://www.backblaze.com/b2/docs/large_files.html (Max file size: 10TB, Max single object size: 5GB)
Azure: https://docs.microsoft.com/en-us/rest/api/storageservices/un... (Max file size: 4.75 TB, Max single block size: 100MB)
Google: https://cloud.google.com/storage/quotas (Max file size: 5TB, doesn't appear there is a lower limit for objects to be composed into a single object, docs could be better in this regard)
@khc: Terminology updated to be more clear for S3
It's reliable enough, if you can get it to Microsoft's cloud. But for the last six months I've struggled putting very large files into Azure, using five different connections from five different providers in three locations. Small files are no problem. But large ones take two, three, or four tries.
rsync.net gives you an empty, ZFS filesystem that you can do anything you like with.[1]
I believe the file-size limit is 16 exbibytes (264 bytes).
rsync.net can also talk to every cloud storage provider[2] because rclone[3] is built into the platform:
ssh user@rsync.net rclone s3:/some/bucket rsync/home/dir
[1] http://rsync.net 2**64
16 exbibytes is 2^64 bytes.I wonder why transfer prices are not included? As you explain every transfer is paid does it mean one has to pay for 10 uploads of every single object, right? But as equipment ages, peers go out of business then who pays for the data rebalancing transfers?
https://www.hetzner.de/dedicated-rootserver/matrix-sx?countr...
That is pretty close to the $2 per TB per month from the article (assuming the same factor 1.5 replication they have planned), while also providing 128 GB ECC RAM, enterprise disks, and 24/7 phone support.
The types of farms described in the article are what I imagine Sia to look like 10 years from now, not something we expect to spin up in the next 18 months.
So... Yes, but probably not very much? Particularly when accounting for the price drop in Sia.
Edit: If you do decide to host, it'll likely be about 5-6 months before your contracts start completing if your host settings go by the recommended 26 week max duration. And don't go out buying hardware unless you're in it for the long run, which makes me now realize the point of this article.
We decided to scrap the plan to do p2p storage, ended up using cloud storage. This p2p storage idea is a tough one. People are not willing to make a few dimes renting out their hard drive or CPU. The economic unit is too small to work. But good luck trying this idea. I wouldn't be surprised if someone tries again in 20 years. :)
The reed-solomon also isn't even the most computationally expensive part, the computationally expensive part is computing and verifying the Merkle roots. All parts of the system though can go >1gbps on standard CPUs.
If this has changed I would be interested in hearing about it-
One other thing I am not understanding is how this makes financial sense, even if the demand is there. If I am buying a rig for 4500 bucks to get 200TB, making "570 a year in profit" is nowhere near exciting enough. Practically any other use pays more. Renting a dedicated server for a game, web hosting, hell even GPU mining makes more.
(a single 1080ti can do about 1$ a day in gross revenue on grin/eth/etc - which can be had used for ~400 bucks- Or you can get a p102 which is the mining card version with no display output for 250 bucks) - Payback with power costs/etc.. well below the 10 year threshold of siacoin)
Now where it might be interesting (IF there is demand), is just adding harddrives to an existing infrastructure already in place. So if you are a GPU miner and have 1000 rigs already in place, just adding a single 4TB harddrive to each machine might not be too bad- They go for about $50 each used and according to this, will pay back $8 a month with minimal extra costs
https://siastats.info/hosts_network Only 710 TB is in use. Or about $20k worth of hardware TOTAL for the entire network, according to the above URL.
Also, why is this a cryptocurrency at all? Wouldnt this business be drastically simplified by simply paying people out//letting people rent space with either USD or bitcoin?
Do you even need to ask? Because they have minted a bunch of SIA coin and this is their effort to give it value out of thin air, making them all millionaires. Using someone else's currency, despite its obvious benefits, means no huge pre-minted pool under their control = no lambos.
Every single ICO project is like this.
"Both renters and hosts use Siacoin, a unique cryptocurrency built on the Sia blockchain. Renters use Siacoin to buy storage capacity from hosts, while hosts deposit Siacoin into each file contract as collateral."
It is $10/mo/TB, but has different uptime, speed and security characteristics.
What i don't get is why they don't use 14TB HDDs, they are only 15% more expensive per TB. On the other hand they'd need 2.33x less PCs at $550 each, plus their power use.
So instead of every 7 PCs with 6TB HDDs they'd need 3 with 14TB HDDs.
PS: They could also use a mainboard with 10 SATA ports instead of 8. They are only $15 more than the chosen board. Adding one or more PCIe 8x SATA controller cards might also make sense, depending on the average load of a system.
No consumer ware
The problem comes when you want to store multiple files. If the corresponding erasure code fragments from different files are not stored on the same server then you don't have correlated failures. Contrast this with a typical raid scheme where a failed drive means the nth erasure fragment of every file is gone - correlated failures. If the failures across different files are not correlated, which is the case if you're storing each new block on a random node, then you are basically guaranteed to lose data if you have enough files. Depending on your scheme, this can happen as low as 1 TiB of data for a user. It is similar to the birthday paradox.
For erasure codes to work for a filesystem you need to have correlated failures.
ordinary raid has very slow recovery because it’s concentrated on a hot spot of a new drive. plus recovery waits for a new drive to be inserted (double stupid).
when fragment placement is randomized, recovery is widely distributed and can happen in less total time so lower chance of data loss.
When will the project grow some mobile apps like Dropbox or Google drive that you can just put a credit card number into, pay a few bucks and know your data is safe?
You aren’t touching my enterprise data, not even the cold storage logs.
For my backup, it's not sync in real time but I do manual backup every 3 months. I can loose some data but I feel ok with that.
So.. that turned me off in an instant.
The price for additional “drive”, like 5 bucks per month for 50gb or sthg, is insane. Especially when comparing with Dropbox or Onedrive pricing (or even physical drives sold over the counter).
Edit: apparently that's the unlimited storage in GSuite for 12$/month/user as long as you have 5 users or more.
(Does anyone know how long Scaleway takes in practice? They claim minutes but I haven't used them yet.)
* IPMI is for remote hardware management
Or you need to know some providers I don't know, in which case, tell me. :)
Yep. Stopped reading right there. HDDs use ~15 watts when they boot up. I experienced this and I never allocate less than 20 watts / hard disk.
OK, but what is it today?
It's less than $2 / TB / Mo today, but relies on a completely different set of economics that don't scale beyond a few hundred PB. This article was aimed at people who understand why Sia is so cheap today, but do not believe that Sia will continue to be cheap as the network scales.
We build clusters of low PB scale that have a TCO with everything from labor to hardware, from financing to electric, from routers to cables and can be run below 3€/TB. For that you can store data in Block(rbd, iscsi), Objekt(S3, swift) or Filestorage(CephFS, NFS, SMB), high available on Supermicro hardware in any datacenter worldwide.
Feel free to contact us or use our free community edition to start your own cluster.
It's the same reason increasing redundancy and uptime to the next factor of nines is almost logarithmatically more expensive.
If anyone hasn’t seen the work being done on the skynet platform, I highly recommend taking a look. Amazing stuff.