So it appears to affect anyone who depends on IBM Cloud.
Maybe it helps with doing a sanity check before picking a provider. And, I guess, at a basic level it helps with accountability/transparency.
Do you have similar %'s of monitored cloud services that have gone off the air during other providers' outages?
(I figure either you’re in devops and you are putting out fires too busy to read this thread or you’re not and your work is halted because of the incident so you might have time to read and reply ;)
Two of the biggest advantages were:
Price for hardware. As a base price, their bare-metal gear was significantly cheaper than equivalent-specced AWS gear (if it was even possible to get something like that). We managed to snag quite a few 'interesting' configurations of things at various times that you just couldn't get at all in AWS. Things like PCI SSDs, very large RAM configs, or High-Frequency low-core count CPUs.
Free international/regional transfer. We took significant advantage of this to move data around. We'd replicate TBs of data around.
At various times management and dev teams would complain and say that we should move everything to AWS (or whatever cloud provider they'd just met with at a conference).
We consistently showed higher performance and lower cost by significant margins. On cost alone, we were paying a small fraction of what it'd cost on AWS, even after taking into consideration ways to reduce cost on AWS such as scaling, spot instances and reserved-instances.
We had a couple thousand bare metal servers, and barely used any of their API stuff.
As with any facility, there were occasional issues with electrical transfer switches, core router failures, fiber cuts, etc. Stuff happens, but we got pretty good communication, and things got resolved in a reasonable amount of time. Service got noticeably worse after IBM, but we were already planning to move to our acquirers hosting, because that's what happens when you're acquired. Oh, and their load balancers had garbage uptime.
Bandwidth prices used to be pretty reasonable, but they've adopted AWS style obscene pricing. At least they still let you use the private network for free (including to other datacenters).
MS and Google do provide those features though.
Over the past few years we have experienced quite a few network-related outages. Not usually to this extent, more generally a failure of some piece of network gear that takes out either backend or frontend traffic from a particular data center. We seriously priced out a migration to another provider recently, but in the end what held us back was cross-AZ transfer costs on AWS. We found it would raise our operating costs significantly, so the matter was dropped.
We were much happier with the service and support we received prior to the IBM acquisition.
Currently on them because we have an OpenVPN based infrastructure that is very challenging to migrate.
Lastly the majority of our customers are in the midwest or Texas, and the proximity of their Dallas DC was a huge performance win for us.
In small and mid size organizations the CSP gave better pricing, or they help with your sales etc
In large organizations - IBM/Oracle bundle their existing products currently being paid for any way, or account managers have great relationships with decision makers , the company already has signed up big multi year deals.
This is not just IBM, it applies to GCP/Azure/AWS as well.
I also really like CouchDB which IBM Cloudant is based on.
Is that enough for me to use IBM cloud? no. not really.
I'm going to wait a bit to see if we get a status update, otherwise we'll be spinning up instances on AWS to failover (which will be enormously costly for bandwidth)
No status, no nothing, we're in the dark.
Literally this morning I was wondering what ever happened to it, like did it die a quiet death? Oh it rebranded to IBM cloud in 2017. Now this news.
I think there's an eponymous law named for this sort of thing.
https://cloud.ibm.com/status?selected=history
- 2020-06-10 02:19 UTC - RESOLVED - The network operations team adjusted routing policies to fix an issue introduced by a 3rd party provider and this resolved the incident
But when it worked, it worked. API was voodoo.
It is not that companies become consciously malicious or are incompetent to start with, it becomes a vicious cycle, as more and more poor management and engineering talent join, the good ones leave, and the cycle continues.
Acquisitions and merge stave off the slow slide into irrelevance for a while, till the best of the new guys leave too. Systemic cultural changes is very very hard to achieve in large organizations.
If they are receptive to feedback and clearly want to do better, I would be kind and explain why I had suggested it not be there in the first place and cite this as an example.
If they were being adamant or denying it was their fault, I'd probably be really quiet and just make subtle remarks about how it would have been better if they listened.
(Was interested to see what you were up to these days, which is how I stumbled on it).
Seriously they probably tested it and it worked in theory, just not in practice and now they fix it for reals.
The idea that they could even get to this point probably seemed unfathomable. It does to me.
Or we just simply accept and making it the norm that even the lowest level of organizational governance is corrupt?
I am serious about this, because how people perceive their own rights, their own roles, their own status, their own influence and their organization's wrongdoing will influence the attitude in the long run against each and every organization in society in my opinion.
I know that I was blowing the question out of proportion, but it bugged me to ask anyway.
But whether you can get away with that depends on culture.
It doesn't help that their status page is also hosted on IBM Cloud.
A better approach is to have it hosted on a different cloud platform. If you really care, you’ll set it up on a different domain and nameserver as well with a long lived redirect (cached on CDNs) from the usual status.example.com or example.com/status.
"Our cloud can never go completely down We are IBM, we have Watson..."
At least give me something I can point my customers at to show them this is not due to my incompetence.
The purpose of the signalling here is two fold.
1) If convincing enough (with details), you can keep current customers from moving to a competitor.
2) It also lets new customers see how you actually handle a crisis. If they can manage the crisis well enough, then you can point to this instance to prove your technical knowhow to handle their needs.
If they don't tell anything, or aren't transparent, then they can expect a mass exodus of customers.
Pindom[1] had a spike of website outages from 11k => 27k.
Sorry to be glib, I'm sure it's a tough time for people who were sold on their cloud platform and work on it!
Hope they get a root cause and a quick fix. I’m not a fan of their cloud service but I know people working on the outage and fix are stressed.
>> A 3rd party network provider was advertising routes which resulted in our WW traffic becoming severely impeded.
No IBM computer has ever made a mistake or distorted information. They are all, by any practical definition of the words, foolproof and incapable of error.
Fastly error: unknown domain: www.ebay.com. Please check that this domain has been added to a service.The cable TV channel is still independent.
IBM Cloud - unsafe
At least AWS signs their routes I think.
If you can't even sign your own routes - hard to have a ton of pity.