I hate hearing this awful take, as if every IT organization has the same neat and tidy systems deployed as they do. Never had to deal with 3rd party SaaS vendors certificate pinning requiring service tickets to change, don't have any hardware devices or appliance based software images each with their own web interface to update certs...
Yes companies should have a plan to do their minimum yearly certificate rotates. Yes those companies should have a security plan to rotate affected certificate issues, but in those cases the business users are ok with an outage to remediate a real security issue.
But what happened here is that Digicert invalided the entire domain's worth of certs. All those service.companyname.com certs or duplicates under that domain validation were affected in bulk. In some companies there could be thousands of certs under that domain. Digicert screwed up their system implementation and made their customers suffer.
"It's really disheartening that publicly trusted CAs just ignore their contractual obligations however they see fit."
It's also disheartening to see browsers in the CA consortium ignore the CA resolutions as well. Like how everyone voted for 2 year certs and Apple did their own thing anyways. Any punishment for Apple come? So why pick on the others?
Those SaaS vendors probably shouldn't be doing cert pinning to begin with. If you don't trust your root store either implement support for CAA or DANE, no need to roll out your own workflow. Those hardware devices should either 1) not use publicly trusted certs, 2) renew their own certs, or 3) have an API to automatically update certs.
The only reason they're still getting away with it is because doing it manually once a year isn't horribly painful. If 90-day validity becomes the industry standard, pain-free certificate renewal turns into a must-have for all new contracts.
"several years"? The certs we are getting have one-year lifetimes. It used to be two years, but was reduced to one year some time ago (I don't remember exactly when).
Also, I don't think the problem is cert lifetimes, I think the problem is having so many certs expiring all at the same time. A lot of IT folks are coming off the major pain of the CrowdStrike crash. This is similar: You suddenly have a very large number of certificates that are going to stop working in less than 24 hours, and you have to respond.
Sure, you could say "Well, companies should be resourced to be able to handle that at any point." Except that's not the reality right now.
If the process is automated then revocation can be automatically handled as well (so long as ARI gains traction).
IMO they should just use HTTP challenges to avoid this whole thing, but it's a pretty common pattern I see with a lot of SaaS vendors, even major fintechs.
If clientportal.somebank.com is actually run by somesaas.com, they can define CNAME _acme-challenge.clientportal.somebank.com --> [some_key].domainvalidations.somesaas.com
When the SaaS vendor needs to request a new cert, they set the appropriate TXT record on [some_key].domainvalidations.somesaas.com.
I think this tends to fall into "probably shouldn't have been using Web PKI". I can't immediately think of a reason why you'd need a publicly trusted certificate if you're pinning a specific public key.. at that point who cares who signed it?
I do agree that there are real costs with rotating certificates that ultimately may make it impossible for an organization to complete that work in the revocation window. That is very much an area that needs further automation developed and more importantly, for it to actually be adopted. I believe that's what ACME Renewal Information is attempting to address.
"but in those cases the business users are ok with an outage to remediate a real security issue"
Ideally yes, but that might be the same point you find out the certificate was used in some critical system (let's say Air Traffic Control like a previous CA tried to claim). They still may very well not be okay with the revocation despite the security issue. _Those_ are the people that need to stop using these certificates and there's really no way to weed them out until a revocation actually needs to occur.
"Digicert screwed up their system implementation and made their customers suffer."
And those customers are right to be mad at DigiCert. They probably don't have a legal basis to challenge as the subscriber agreement explicitly permits immediate revocation without prior notice, but they can certainly take their business elsewhere.
"It's also disheartening to see browsers in the CA consortium ignore the CA resolutions as well. Like how everyone voted for 2 year certs and Apple did their own thing anyways. Any punishment for Apple come? So why pick on the others?"
Admittedly I'm not very familiar with the various root programs and the obligations they have with CAs, but it doesn't seem unreasonable that root programs would be free to impose stricter requirements then the BRs.
Though I do find it two-faced for Apple to vote for Ballot 193 only to then impose a stricter requirement. At the very least they should have abstained.
Inter-finance systems mostly, some government. Sometimes they pin the CA issuer, sometimes IP based although with dynamic cloud IPs that is disappearing, sometimes inside a VPN, and other times just the cert issues themselves. Same service handing public users while making bidirectional API calls to other interfaces that are more locked down.
Not everyone is a monolithic copy and paste Wordpress hosting site, a new cloud native cash rich startup, or a massive Google/Amazon/Microsoft with huge teams to orchestrate everything using their own architecture and systems they developed themselves. Private PKI? Even more orchestration layers for enrollment especially in places with BYOD.
There is no point to low expiry certs anyways. If a server is hacked, the primary concern is what data were they able to exfiltrate and for how long - not that a keypair was maybe stolen to be used in a very complicated and unlikely attack to intercept some of the same data they already stole.
Your ATC comment seems to continue your theme that everyone should run a private PKI instead. Airports are full of interconnections between themselves, other airports, airlines, ground crews, satellite relays, and weather monitoring systems. So then all these parties need to do all the same actions as the public PKI - root key signing , cert issue logging, secure interface for issuing certs, develop a trust across all parties and make them install your root in all their systems ..... or, just use the public PKI services which already does that. You are just reinventing the wheel and probably will get it wrong. Maybe for some strictly backend systems, or things like server out of band management it works well, but not anything involving multiple companies.
The CAs work with large and complex business understand these complexes and voted for 2 year duration. The owners of the browsers just wanted to further their own cloud bottom lines.
Not the OP you replied to, but I want to add some nuance: there's a vast solution space between using the WebPKI and rolling your own. The enterprise focused CAs have non-WebPKI CAs and CA-as-a-service offerings, both with way longer certificate lifetimes and way longer revocation periods.
If you don't need WebPKI-compatible certs (because you're not offering services to the general public) and your org cannot abide by the WebPKI rules requiring 24 hours max before revocation, you are doing something very wrong when you use the WebPKI.
It's now ostensibly an ecosystem for use by modern, updated clients - browsers and OSs - for TLS. clientAuth will be gone from the webPKI soon, too, I hope.
It's fast becoming a more fluid, shifting ecosystem. We'll be on 90-day leaf certs very soon, shorter after that. Roots and intermediates will have much reduced lifetimes. New guidelines and regulations change things rapidly. Mass revocation events like this one.
In the ATC example - all parts of that ecosystem should be managed to the point that distributing a private root is relatively easy. It shields them from events like this. As another commenter has pointed out - running a private CA (or what might be known as an 'ecosystem CA' like we see in IoT with Matter, airlines with CertiPath, wireless with WinnForum) can be done 'as-a-service' easily, be it from a cloud vendor or CA or similar provider.
If folks continue to use the web PKI for non-web purposes, then they have to be in a position to deal with challenges like short-lifetime certs, 24-hour revoke/reissuance windows, and frequently-updated trust stores.
Most of the agreements and T&Cs for public CAs already forbid use in 'critical' systems anyway, so you're effectively agreeing to these kind of 24-hour changes from the start.