It is surprising that they didn't make it compatible with the S3 API -- at least for common object/bucket create/delete. This will require more code to be written and it will be harder to adapt client libraries.
The API documentation is here: https://www.backblaze.com/b2/docs/
Other notes:
* The lack of scalable front-end load balancing is shown by the fact that they require users to first make an API call to get an upload URL followed by doing the actual upload.
* They require a SHA1 hash when uploading objects. This is probably overkill over a cheaper CRC. In addition, it means that users have to make 2 passes to upload -- first to compute the hash and then another to upload. This can slow uploads of large objects dramatically. A better method is to allow users to omit the hash and return it in the upload response. Then compare that response with a hash computed while uploading. In the rare case that the object was corrupted in transit, delete/retry. GCS docs here: https://cloud.google.com/storage/docs/gsutil/commands/cp#che...
And... you answered your own question. :-) We reduce our operating costs by not having as many load balancers in the datacenter and pushing off the responsibility to the API. It all comes from our traditional backup product where we wrote all the software on both sides so we could save money this way.
With that said, we are actively considering offering an S3 compatible API for a slightly higher cost (basically what it would cost us to deploy the larger load balancing tech).
Having been on the receiving end of entirely too many corrupted files in my life, I strongly approve of their use of a hash that's been standardized and fast for decades and remains cryptographically strong. "But fast" if you fail to store it isn't very helpful. TCP has a CRC too. We're wallpapering over it with better ones and everyone serious has been for years: it's time to accept that cheap CRCs aren't a good place to get stuck.
Improving the API to avoid the 2-pass problem is spot-on though. Another possible solution is to require either a subsequent API call, or format the first message as a multipart, and use that route to have the caller submit the hash that's used to confirm and commit the file to storage after the body upload. This would solve the 2pass problem while still ensuring the client is actually doing the integrity check -- and since Backblaze is more than likely to take the heat on any corruption issues, it's probably a good policy for them to make sure lazy client implementations aren't going to cause problems that their storage then gets the publicity smear for.
But there is no call for a cryptographic hash here. This isn't being used as any sort of ID or to verify integrity outside of corruption.
Let's say I backed up 8 TB of data for a small business, and I need to restore in 24 hrs, is it possible to request overnight shipments of hard drives of data so I can do the restore locally instead of taking weeks to download all that data
I know amazon has this feature, not sure about google.
Another question, what's the max number of buckets can an account hold?
Looking forward to try this service out, Thanks
Bedros
There's no talk about their backbone or their network capacity. I get that they have terabytes of upload coming in, but as anyone who's used their software can tell you, it's throttled. I don't know how many users they have to tell you how much bandwidth they're actually handling, but can they handle people using B2 as a distribution point for large files for customers? For example, I have a huge S3/CF monthly bill from customers downloading ~400MiB ISO images tens thousands of times a month. Amazon CloudFront is ~$0.085/GB for the first TB, while BackBlaze B2 is an incredible $0.05/GB - but at what performance? Will my technical support representatives be getting angry phone calls about halting download speeds or do they have the capacity for something like this?
Hosting the world's data is no tiny task, I hope they're ready for it and I do, truly, wish them all the luck. I've been a BackBlaze customer for a few years now (at least 5 or 6, I imagine) as a tertiary or quaternary backup (haven't had to restore... yet), and B2 looks and sounds promising, but as far as technical details go, this post is nothing.
EDIT: In response to the reply below, I believe it's throttled by default in the client, though that can be turned off in the application settings. Also, you've replied to my claims of throttling but have ignored my question regarding backbone capacity and network readiness...
We currently have about 100 Gbps symmetric capacity into our datacenter on a couple of redundant providers, but the key is we have open overhead and we'll purchase more as our customers need it.
But here is the best part (if you want OUTBOUND capacity) - our current product fills the INBOUND internet connection, but currently we only use a tiny, tiny fraction of the OUTBOUND connection. So if you want to serve files out of our datacenter we have a metric ton of unused bandwidth we would LOVE you to use. And if you fill it up, we promise to purchase more.
But also keep in mind, Backblaze is very experienced with STORAGE and I have a lot of confidence we won't lose any of your files. What we don't have a huge amount of experience with yet is serving up viral videos and such. So just bear with us during this beta period while we figure it all out. Personally I'm looking forward to that part (all the CDN/caching layers).
:) well, yeah -- but thats also what B2 charges for.... so The business model requires that BW to start getting consumed :-)
Brian from Backblaze here: no it is not throttled (by us). If you only have a 10 Mbit/sec upload capacity you are throttled by your ISP. Also make sure you visit our "Performance" tab in the online backup client and tweak a few settings, like increase the number of threads.
I moved to Linux a few months back, and was going to basically cancel my Backblaze sub when I got around to it since you have no interset in making a Linux client. Maybe B2 can act as a solution to this at a price penalty.
I tried from multiple physical locations but I could not increase my downloads past 1-2mbps, and for TB of data, that seemed like it was throttled by BB considering I was easily uploading 20mbps.
I contacted BB support and they ignored me, so I switched to a competing services and have had no issues ever since.
This really made me sad because BB's blog is amazing and their tech is really cool, but when you see people saying "its throttled" its because of real experiences out there, and not just ones limited to an ISP issue.
I'm bothered by the whole idea of putting all my data with any one vendor (with Backblaze or Amazon) and thinking you don't need a backup. I claim "RAID / Reed-Solomon / real time mirrored copies" is NOT "Backup". If your programmer makes a mistake and a line of code deletes some mission critical data from Amazon S3, then all the Reed-Solomon encoding in the world doesn't help you, the data is still gone.
What you need is a copy of all your data from Amazon S3 in another vendor lagging behind for 24 hours that is NOT real time mirrored. Maybe you lose all the customer data generated that day, but your business survives by restoring from backup. (I chose 24 hours arbitrarily, each business needs to choose their upper limit of loss where they can survive.)
A good rule of thumb for a CONSUMER is three copies of your data: 1) primary, 2) onsite backup, and 3) offsite backup. If you are a business that will lose millions of dollars if a programmer makes a mistake or an IT guy is disgruntled, add 4) another offsite backup with a totally different vendor that doesn't share a single line of code with 1-3 and has separate passwords.
Here are my thoughts on our announcement today: https://www.backblaze.com/blog/b2-cloud-storage-provider/
Gleb
B2 finally creates an option for Linux users to use BackBlaze for back-up (at minimum) at work and at home. I look forward to that.
CAVEAT (PLEASE READ): this does NOT encrypt data yet!! This is just a quick technology demonstration, it isn't a polished backup client. Give us another month for that...
We expect Linux servers (and desktops) to make up a significant percentage of the things communicating to B2, so you can expect a lot more support than you have been getting from the traditional Backblaze Online Backup product line.
Back stuff up, but properly encrypted instead of your Windows client's closed source stuff. Without B2 I would use S3 but at their rate I might as well rent a datacenter myself, so I'm going to do the math again with B2 soon.
This announcement may mean I finally get to test your stuff! (I've been frustrated with the quality and feature creep in open source syncing solutions and procrastinating building my own, bare bones alternative).
Is this only for noncritical, reproducible data as S3 reduced redundancy?
Arq seems really good at supporting a broad variety of cloud providers though, so hopefully they'll add this too. I'm hesitant to use cloud backups generally; I've never seen an audit of how secure Arq's backup scheme is, for example (though it seems pretty simple - https://www.arqbackup.com/s3_data_format.txt). I've used CrashPlan a lot and basically take it on faith that it's secure. It's probably good enough for my use, given that I'm not storing state secrets or anything, but it's still a little unsettling to 'lose control' of one's data.
From Backblaze's point of view, I guess this is either smart (diversifying themselves–people can use other backup software if they like, and Backblaze still profits) or less smart (turning themselves into a commodity), but it seems like their software is still first rate, so I guess it'll work for them.
I'll be taking a look at this of course, but there are things which are more important than price -- for example, reliability. Tarsnap users trust me to not lose their data, and I trust S3 to not lose their data. That's a trust I don't have in B2 yet -- first, simply because B2 hasn't been around for long enough to prove itself, and second based on what I've heard from former Backblaze users.
They use Blowfish. Says it all really - their default encryption is a long-obsolete 64-bit block cipher you might have picked in 1999 because it was faster than 3DES.
I can only assume they do this because migrating would cost them money, and being able to advertise "448 bit encryption" actually sounds like a plus to most people and not the glaring red flag it actually is.
> it seems like their software is still first rate
What, like their backup client that can't actually do restores? It's still all "log in to our website and let us decrypt your data for you" :/
Not defending it, because I know it's old and there are weaknesses, but aren't Blowfish and 3DES both still technically secure? This is a genuine question. It was my understanding that if implemented correctly, with a random key etc., that neither has been formally broken. 3DES is 2^112 no? which is still not practically accessible by brute force. Not that this means anyone should use them, of course, AES is a standard for a reason...
As you say, I had just assumed the migration cost was too high to move to something newer, but I don't think it necessarily means data stored there is unsafe?
I also created a PR to add support for Exoscale.
https://github.com/andrewgaul/object-store-comparison/issues...
I like this offering, but I'm not getting good signals on it's seriousness. It may be something they're going to sunset soon. I would need some reassurance as to what's going on here.
And these are interestingly exactly the same reasons enterprises buy IBM and Oracle.
This is actually a big pet peeve of mine :)
First, they weren't in North America until recently. Having a server in France means high ping times for me and latency for the vast majority of my visitors. OVH started operations in Québec in 2013. So they've had less than three years to establish themselves. EC2 is 9 years old.
Second, it's hard to figure out what to buy. With EC2, they're all Xen instances and you decide on the right CPU/RAM configuration. DigitalOcean, Linode, Vultr, etc. all are easy. With OVH, what am I supposed to buy? Do I want a dedicated server or an infrastructure dedicated server? And then if I click for dedicated, I need to choose from Hosting, Enterprise, Infrastructure, Storage, Custom, or Game. I know computers - tell me the processor, RAM, and storage without breaking it into categories. So, I go with Hosting and half of the options are for "Delivery from September 30". Ok, that's more than a week out. Maybe I want more flexibility like hourly billing on VPSs. I can go to Cloud -> VPS. And now I can choose SSD or Cloud with different prices. Why is the SSD so much cheaper? $3.50 vs $9 and they're both 1 core, 2GB of RAM, 100Mbps network link KVM boxes. Then I wonder if these are the same things as the RunAbove labs vs regular. The labs ones shared the processor cores, but this seems to indicate that both don't have the noisy neighbor problem. So I check RunAbove. Wow, everything has changed. Looks like they don't offer the SSD of Ceph instances anymore, but they have SATA backed instances. So, they're running all sorts of different combinations. And should I be looking into Kimsufi or SYS brands? Do they still exist? What if I want object storage. Ok, the US site takes me to RunAbove which tells me that it's now part of OVH proper which brings me to their UK site with apparently no way of loading it on the American site. Compare that to DigitalOcean where you just get a very simple, "here are the plans, there's no complex stuff with weird names or categories, buy what you need." Even Vultr manages simple with SSD VPS, SATA VPS, and Dedicated Cloud. Perfect. Most likely I want the SSD VPS, but maybe I need more storage or maybe I want metal servers sold to me like cloud servers. Easy.
And to be fair, OVH used to be a lot more complicated and a lot worse. It looks like they're streamlining a ton. But they should still simplify a lot more.
Third, OVH is terrible at marketing. I want to define what I mean by marketing. DigitalOcean is a king of marketing. You go to their site and you see brief comments from the creator of jQuery, the creator of RailsCasts, the creator of Redis, and a Rails core member. You might not use those technologies or even like them, but you recognise that DigitalOcean can't be total crap given that these are people with options and a reasonable amount of taste. DigitalOcean sponsors hackathons like woah. Giving students a dozen or so dollars in credit makes them well-known and an easy service to try. DigitalOcean's site inspires confidence in its simplicity. You don't feel like there's some hidden thing because it's just simple plans that increase rather linearly. Finally, try searching for VPS + some tech term. "VPS Ansible" has a DigitalOcean blog article as #3. "VPS elasticsearch" has DO with the top two spots. The point is that you see that and it's an indication that they're part of the community (supporting some free content) and kinda get it.
OVH, on the other hand, inspires none of those good feelings. OVH has a generic site that you can't tell apart from other generic sites. It has the kind of "throw everything at the user and see what sticks" design that I don't think users want. We want DigitalOcean to say "this! this is good!". OVH is like, we have a lot of different things and someone has written "enterprise" or "cloud" on some of them without really indicating how some options are more "enterprise" or "cloud". And there are stock images of network switches and RAM and such like a pizza place that has a stock picture of a pizza on their take-away menu that isn't their pizza. Do they get it?
I really wish OVH well. More providers means downward pressure on pricing which is good for me. I mean, 2GB of RAM VPS for $3.50? Awesome! Glad to see that graduate from RunAbove. But OVH still has a ways to go. Lots of the time you have to wait for servers. If I want a dedicated SSD box, they're quoting a 10 day wait for all except one model. The entire "hosting" range has quotes of 3-12+ days. "Enterprise" has one box for 120 second provision, two that are 3 days out, and two that are 10 days out. It seems like OVH is a place to get a good deal if you're willing to deal with complicated process, waiting for a box, and them switching things up on you. But maybe OVH is stabalizing. I'm hoping their VPS offering will be a lot more stable than it has been. Seems like they're cutting down on using alternative brands like SYS and Kimsufi.
I can see OVH being a good company, but it's no surprise to me that they aren't as well known as AWS.
We are already looking for another datacenter, but mostly because we're running out of space in the current one due to our traditional business (online backup) doing so well.
So! If you can tolerate the loss of a datacenter, store in Blackblaze. If you need geo-redundancy until Backblaze can offer it? Store in us-east-1 (which is geo-redundant between Virginia and Oregon).
All AWS AZs are physically separated facilities with redundancy on all their infrastructure, although they're obviously in the same general area.
us-east-1 is not geo-redundant. It is entirely on the east-coast, as the name suggests. Although S3 does have geo-redundancy in all regions.
You may have been thinking of "US Standard", but it is the same as "us-east-1".
http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_r...
All I could find on the S3 FAQ says "your objects are redundantly stored on multiple devices across multiple facilities." which seems to contradict the "one datacenter" claim.
Also, do you have a source that us-east-1 is geo-redundant between Virginia and Oregon? That was not my understanding of how it worked.
Or store it in Amazon AND store another copy in Backblaze. This isn't necessarily an "either/or" question. Having two copies with two different vendors in two separate regions is probably more reliable than having two copies inside the same vendor. For example, if Amazon has a large outage that affects both your regions, you can still access the copy in Backblaze.
Buyer beware when it comes to Backblaze.
The technology isn't bad, but their customer service is some of the worst I've ever seen. I was a Backblaze customer for three years and not once did I have what I'd consider a positive experience. If anything goes wrong they leave you hanging. They're not a company I'd ever trust with valuable data again.
We do plan to add file offset access and larger file support very soon, so you would be able to append a 1 MByte chunk to an existing file in Backblaze with a SHA-1 of only the 1 MByte chunk. That should allow you to stream?
All great feedback, by the way. We really want to hear about these shortcomings in our API right away.
Being able to append to a file in 1 MByte chunks (or larger) would be perfect - that is exactly the way Amazon S3 multipart uploads and google drive multipart uploads work.
Yeah, that would be sufficient.
Eventually it could make sense for Backblaze to partner with someone like DigitalOcean or Linode and offer low cost bulk storage and low cost virtualization colocated in the same datacenter: these services seem to be a perfect complement for each other.
What I'd really like is a deal with Amazon where we put a "virtual cross connect" from the Backblaze datacenter into Amazon's EC2 so you could use EC2 instances on B2 data without incurring a download charge (or not exposing that charge to our customers). But I don't know if Amazon is open to that kind of thing.
A real deal breaker is if you need to use an EC2 server to proxy the upload for any reason (content validation). The transfer into EC2 is free, but it's 9 cents for each GB out (18 months of storage cost).
@brianwski - any suggestions here?
To elaborate: I think these two would be able to become a viable competitor to AWS. If you think about it AWS launched with S3 and then EC2.
They could differentiate themselves by staying as a pure IaaS play. Then companies like Dropbox would not be afraid of DigitalBlazeOcean moving up the stack and competing as AWS has done in several instances (e.g. WorkDocs).
I would like to know more of the implementation of this and more information on policies to protect access to my data. And I would like to know where the data is stored. I suppose I got read the manual, but maybe some info tidbits could be included in the announcement.
Also, we are the only company we know of that releases our drive failure rates. We release them quarterly, here is the most recent failure analysis: https://www.backblaze.com/blog/hard-drive-reliability-stats-...
:-) We definitely plan to add an API to append to an existing file. The current largest file size is 5 GBytes, and we want to support much larger (imagine a 1 TByte encrypted disk image). That will be by appending chunks to files followed by a "commit" declaring the file as complete.
I think the reason most of us cloud providers don't like replacing parts of files is it helps our caching layer be much simpler, and it would change the SHA-1 checksum on the file which just means "more complexity". But it isn't out of the question, it might just come with a "cost" (like you can replace the span but it might take a while and then we provide you the final checksum of the whole file in the response).
If there's attention to detail with one thing, odds are you'll find it in other places, too.
Only on https://www.backblaze.com/b2/why-b2.html it is that I can cite the following: "the B2 Cloud Storage service has layers of redundancy to ensure data is durable and available". What that exactly is or what it translates to is nowhere to be found. If you want corporations or developers to use your storage services for their precious data, I'd be a bit more specific.
Backblaze should expand their service to a cheaper cloud storage service similar to Amazon S3. They already have the infrastructure and the know-how.
And voilá ... here it is.
Disclaimer: I worked on the predecessor to AltaVault at Riverbed
I hope they add B2 support at some point.
Just curious as to why I would migrate from S3 for FE assets to Backblaze.
What I'd really like in the short term is to do a deal with Amazon where we put a "virtual cross connect" from the Backblaze datacenter into Amazon's EC2 so you could use EC2 instances on B2 data without incurring a download charge (or not exposing that charge to our customers). But I don't know if Amazon is open to that kind of thing.
You can open yourself up to a large number of customers by making it easy to get started via PowerShell.
I saw the comment about getting drives shipped to you, which is pretty neat, but what about the other way? I have about 50 TB of data we'd like to store, but only 5mbps upstream. Can we ship drives to you?
I ask because I tried Backblaze a while back, and uploads from the UK were very slow.
> I ask because I tried Backblaze a while back, and uploads from the UK were very slow.
Curiously from here in Japan, I've managed to clock 80 MBit/s backing up to Backblaze. I presume it all has to do with what kind of international peering your ISP has.
What's the setup for file permissions? Can I have multiple people writing to the same bucket? Can I restrict deletion & write rules?
We're actively looking for feedback in this area, so as developers ask us for something like Amazon's IAM (AWS Identity and Access Management) we'll be filling that functionality out. Hopefully without adding too much complexity to the simple model we have now.
Personally I'd like to use some access management, and there's one case that I've not seen solved particularly well (though would appreciate anyone chiming in with things I've missed):
Distinct write and create permissions.
I'd like to be able to grant someone permission to create files but not allow them to modify or delete them later. I end up generally adding this externally.
I think B2 is really close to this, as you've got the file ids for multiple versions, so I can effectively ignore the filenames and use the file ids instead. It'd need a difference between "upload new version" and "delete version" though.
Hi Robert,
I don’t know whether or not Arq will be integrated with B2.
- Stefan
Can anyone compare that to other similar providers? While the storage is cheap, it seems more useful for cold storage.
So yeah, I'd agree with you. But for anyone prepared to use S3 for anything but cold storage, this is still a lot cheaper.
My suggestion would be to use this for cold storage + big cache boxes at a provider with low bandwidth charges. Especially if your "hot" objects make up a relatively small percentage.
Backblaze is the same cost or cheaper as the cheapest tier in several other popular services like Amazon S3 and Microsoft Azure. We don't know of anybody with a lower cost of downloads.