Let’s say I’m RobinHood. Let’s pick an SLO. I think three nines monthly SLO is a good start, that budgets ~45 minutes of down time per month. Maybe I can argue for a more aggressive SLO, but let’s pick this one - because I think it will keep users relatively happy as trades aren’t blocked for more than an hour at worst. I drive an agreement with stakeholders that if we needle out of this SLO, we drop all feature work and focus on hardening reliability.
RobinHood was out for a whole day. This is unacceptable. It points to a complete organizational fuck up - product and feature development have too much power and priority at the expense of reliability.
I’m not sure that RobinHood has ever heard of SLOs or reliability engineering. I really hope their leadership is smart enough to hire and empower the right people that will drive organizational change.
The users are not saying "We measured your 5 9's and I'm going to quit if you have 6 minutes more downtime"
Sure they lose some users who get annoyed, but they have a 5.6 billion dollar company, some users will go, a lot more are coming
Your reliability target is a product decision. Maybe with the right features the market will tolerate shitty unreliable financial services that falls over for an entire day. Or maybe RobinHood will go from a 5.6 billion dollar company to a zero dollar company because users hate them.
Point is high reliability is choice based on priorities - which seems like RobinHood does not care about. And I will certainly stay the fuck away from their platform.
This works in the acquisition phase, which I suspect Robinhood is nearing the end of.
Once their userbase turns into the retention or conversion (competitors have $0 trades now, too) phases, mistakes like this are much more costly in the long term.
Nobody is debating whether people will continue using RH and that was never the issue. RH has massively damaged its reputation and reputation _is_ everything.