Why the CrowdStrike bug hit banks hard (opens in new tab)

(bitsaboutmoney.com)

164 pointssideway1y ago238 comments

238 comments

Maybe the IT departments at the affected orgs take solace in the fact that so many other orgs had issues that the heat is off - but in my opinion this was still a failure of IT itself. There's no reason that update should have been pushed automatically to the entire fleet. If Crowdstrike's software doesn't give you a way to rollout updates on a portion of your network before the entire fleet, it shouldn't be used.

candiddevmike1y ago

The update bypassed the controls orgs had in place to defer/schedule updates, AFAIK.

jmuguy1y ago

I've had trouble nailing down if thats the case from searching around online. And if thats true - thats absolutely on Crowdstrike. And that behavior should disqualify it from being used on critical systems. I imagine this incident will cause a lot of teams to consider just what can happen automatically on their systems.

wisemang1y ago

It’s definitely the case. See Crowdstrike’s preliminary post incident review here: https://www.crowdstrike.com/falcon-content-update-remediatio...

The nature of “content updates” vs a full product update. Though you may be right, perhaps they provide controls for those updates, I’ve never used their software. But doesn’t sound like it.

CydeWeys1y ago

It's on CrowdStrike, but it's also on IT for even allowing installation of critical software like this that has a bypass at all. Updates shouldn't even be allowed to bypass IT's safe rollout procedures, at least not without IT signing off on it anyway.

kelnos1y ago

If that's the case, that doesn't change GP's point: if Crowdstrike can bypass your org's controls on rolling out updates to its software, it shouldn't be used.

b4ckup1y ago

Didn't day say in their incident report that they have a batched rollout strategy for software updates but this was a config update and the update path for configs does not have such a mechanism in place.

bonestamp21y ago

Ya, so hopefully it's obvious to them that every rollout needs some kind of batching. I get that all devices within one org might need to have the same config, but in that case batch it out to different orgs over 2-3 days.

Maybe the more critical infrastructure and health care orgs are at the end of that rollout plan so they are at lower risk. It's not ideal if one sandwich shop in Idaho can't run their reports that day, but that's far better than shutting down the hospital next door. CrowdStrike could even compensate those one system shops that are on the front line when something goes down.

Again, better to pay a sandwich shop a few thousand dollars for their lost day of sales than get sued by the people in the hospital who couldn't get their meds, x-rays, etc in time.

qaq1y ago

Generally none gates content updates as they happen multiple times a day

lucianbr1y ago

Management decides to use Crowdstrike, not IT, and IT has no way to rollout updates in controlled fashion.

So not really a failure of IT, at least not for this reason.

hello_moto1y ago

In big companies, it's the Management of the IT team.

I know, not really the DailyWTF materials that majority HNers led to believe.

jmuguy1y ago

My comment assumes that the IT department (including its executive) gets to make these sort decisions - why wouldn't they?

mingus881y ago

In many mature orgs, corporate IT rolls up to the CIO and security will roll up to the CISO

The CISO and security ops will demand to be completely independent from corp IT, for legit reasons, as the security team needs to treat IT as potential insider threat actors with elevated privileges.

They will also demand the ability to push out updates everywhere at any time in response to real-time threats, and per the previous point they will not coordinate or even announce these changes with IT.

There has always been an implicit conflict between security and usability, because of the inherent nature of security deny policies, but they also inherently conflict with conservative change management policies such as IT slow rolling changes through lower environments on fixed schedules and operating with transparency

2 more replies

Retric1y ago

Major purchases tend to be pushed up the ladder. It’s not uncommon for a CEO or non technical director etc to decide what IT systems to use.

1 more reply

patmcc1y ago

IT doesn't steer the ship in banks (and bank-like orgs). IT gets a mandate from the real decision makers that they have to choose something that does x, y, z - see "Regulations which strongly suggest particular software purchases" in the article for examples of x, y, z.

So sure, IT gets to "decide" - between CrowdStrike, SentinalOne, or Palo Alto (and maybe a couple others). But they don't really have much choice, they can't use an OSS solution, or roll their own, or anything else. They have to pick one of a small number of existing solutions.

lucianbr1y ago

Sure, if there's a security exec who made the decision, it may be their fault. I was thinking more of the rank and file, but that's just my bias.

1 more reply

toddmorey1y ago

Was anyone else surprised how little disruption they personally experienced? I had braced for impact that weekend. But all my flights were perfectly on time, all my banking worked, providers worked, and sites & resources were available.

I don’t know if I somehow just have little exposure to Windows in my life or if there’s an untold resiliency story for the global internet in the face of such a massive outage.

All I can say is THANK YOU to all the unsung heroes who answered the call and worked their butts off. Infrastructure doesn’t work without you. We see you & we thank you!

davio1y ago

I was unaffected on my work laptop. One of my coworkers is a long-timer and said when the company first got laptops there was a huge "OMG leave your laptops on overnight" push to make sure updates were applied. I always at least sleep if not shut-down after work so I guess I missed out

doubled1121y ago

I know at least one person who "survived" while her coworker's laptops were down.

My first question was "do you shut your machine off at the end of the day?" She did, and that's probably why about half of her office was affected, and the other half was not.

Can't update it if it isn't on.

blackoil1y ago

IIRC only 5% of Windows machines were affected. So, it is very probable that most people just saw the news but have no real impact on them. Some had minor and maybe memorable impact, like Indian airlines giving handwritten boarding passes.

bostik1y ago

Crowdstrike took out less than 1% of the global Windows installation base.

But they took out a far larger fraction of installation base in regulated industries. The very industries who are tightly regulated because they are supposed to keep the wheels of the society turning.

Supply chain risks are everywhere, and in regulated industries they are highly concentrated.

yowzadave1y ago

I wish that were the case for me! My in-laws had their flight out of JFK delayed by 2-3 days, as did my daughter who was supposed to fly as an unaccompanied minor.

duderific1y ago

I was flying back to the US from Mexico on United the day after the meltdown. Reading the news, I was obviously quite concerned about how it was going to go (I was traveling with my 10 and 6 year old kids). Amazingly, everything went off without a hitch; not even the slightest delay.

I asked the guy at the luggage counter, and he said the day before was pretty crazy, but they had everything straightened out by the next day.

marcosdumay1y ago

People found a really quick workaround. It would take a couple more days to fix if there wasn't any.

bdjsiqoocwk1y ago

> all my banking worked

Vanguard.co.uk was down.

But yes, I echo your feelings. When you examine how complex everything is under the hood it's almost unbelievable that anything works.

mikeocool1y ago

I had a 7am flight on Delta from LGA to MSP. Seeing all of the blue screens in the airport was pretty surreal and our flight was delayed four hours.

But yeah, other than that, the only issue we ran into was that the Jimmy John’s we stopped at for lunch outside of MSP was slammed because Delta had ordered hundreds of sandwiches for their staff.

I’ve definitely experienced much worse travel disruptions due to normal weather (though obviously we got real lucky compared to some Delta customers).

pantulis1y ago

This is a good writeup, but to be fair it's just not a matter of banking regulations. Basically all big companies are under similar obligations regarding endpoint protection.

candiddevmike1y ago

Should endpoint protection require kernel level access? At what point does it stop becoming protection and start becoming a liability? Obligatory who watches/protects the watchmen/protector...

notepad0x901y ago

Yes, it needs kernel access given the userspace api's available in windows. Period. not a single person who knows how the tool works and the threats it protects against has said other wise. userpace can't disable or tamper kernel space but an admin/root process in userspace can.

candiddevmike1y ago

FWIW I asked if it should require access, not what the current status quo is/what limitations exist within the OS.

1 more reply

greenn1y ago

With the current model kernel level access is required. Real security products have to be able to operate above userland. Ideally in the future there can be a layer in between userland and kernel for this sort of thing. Maybe we use some of those extra protection rings?

tmm1y ago

> Maybe we use some of those extra protection rings?

Maybe not. Intel is considering removing rings 1 and 2 for a future 64-bit only x86 architecture, because they "are unused by modern software".

https://www.intel.com/content/www/us/en/developer/articles/t...

1 more reply

ghusto1y ago

Couldn't you just ask some OS APIs provided by something in kernelspace for what you need? In fact, isn't this how macOS does things?

2 more replies

jen201y ago

> With the current model kernel level access is required.

On Windows.

1 more reply

jabroni_salad1y ago

If you don't do it, someone else will.

Unless the OS is locked down to the point that even its owner cannot do that. Actually, this is something I like about Operational Technology, you run into a lot of doodads where the elevation process requires turning a physical key, and the device's main functionality is disabled while it is in service mode. Ofc the doodad has to be engineered to operate reliably, perpetually, for years, and you cant really expect that from a desktop computer.

bluGill1y ago

I have said for 20 years now that Microsoft Word should have a check on startup, if the current user is administrator it should put up a message that administrators are not allowed to use a Word Process, login as someone else. This one change would solve a lot of problems.

Even on home machines where no user has a password, having to do something special to get into administrator mode will stop several attacks just because people will slow down and ask.

3 more replies

supriyo-biswas1y ago

> At what point does it stop becoming protection and start becoming a liability?

If such outages were more frequent, then it could definitely become a liability. But such risks have to be balanced against the risk of being compromised and leaking customer data and other confidential trade secrets, and the risk posed by the latter one is far higher, not to say it's also more common.

adrr1y ago

How else would you monitor a windows box? EU won't allow Microsoft to lock down their kernel and provide MacOS type solution with APIs for trust publishers.

imtringued1y ago

From what I have heard, Microsoft is allowed to do that, they merely aren't allowed to be a competitor to the software that uses the API.

SkyPuncher1y ago

Yes. Absolutely yes.

It's the only way to detect certain types of advanced threats.

SkyPuncher1y ago

Basically all B2B companies are under some sort of obligation to have endpoint protection.

All of these requirements essentially become transitive across a company's entire supply chain.

* Big bank needs to comply with X, so do all of their vendors.

* Vendor wants to sell to big bank, so they comply with X. They also need all of their vendors to comply with X.

* So on and so on.

----

Ultimately, there are a lot more options than CrowdStrike, but this is a case of "Nobody gets fired for buying IBM". Even if CrowdStrike isn't the "best", it's good enough. Because it's use is sooo widespread, an issue with it often affects dozens and dozens of other companies when you're affected. One of the great things about this effect is everyone "goes down at the same time", so people don't tend to point fingers at you. In fact, they might not have any clue you're down because some other, more critical system is down internally and preventing them from accessing you.

I remember a similiar situation happening a few years back. A big outage hit large parts of the internet. A pretty major part of our app got taken offline with this outage. This was a known risk and something that we accepted. We expected some backlash and inquires if this situation should ever happen. It was a calculated risk to dedicate more effort towards building customer-facing value.

I think we got one inquiry. It was basically just an FYI. This person had so many things broken on their end that "one more thing" being broken was just a drop in the bucket.

pantulis1y ago

Yes, this is a good summary of the situation. As a matter of fact, I guess there were quite a lot of systems and services that went down even though they were not using Crowdstrike themselves, but some part of their cloud supply chain was. I see Salesforce and Adobe were impacted in some way, probably due to the collateral Azure disruption.

On the other hand, count me surprised at the sales prowess of Crowdstrike, I did not know how big they were.

SoftTalker1y ago

If not regulations, then demands by insurers for cyberattack insurance coverage.

taeric1y ago

Any explanation that doesn't boil this down to "software required by corporate policy checklist not written by technical team" is almost certainly missing something here. This is almost definitionally policy capture by a security team and the all too common consequences that attach.

The section that goes over why this wasn't federally pushed is largely accurate, mind. Not all capture is at the federal level. Is why you can get frustrated with customer support for asking you a checklist of unrelated questions to the problem you have called in.

And the super frustrating thing is that these checklists are often very effective for why they exist.

shadowgovt1y ago

Just as a general comment on this whole affair:

This would be the third incident I'm familiar with of a file of entirely zeroes breaking something big.

Folks, as much as we wish it weren't true, null comes up all the damn time, and if you don't have tests trying to force-feed null into your system in novel and exciting ways, production will demonstrate them for you.

Never assume 'zero' (for whatever form zero takes in context) can't be an input.

tempodox1y ago

As long as the botchers get away with impunity, they won't “waste” resources on higher standards.

saltminer1y ago

> This created a minor emergency for me, because it was an other-than-minor emergency for some contractors I was working with.

> Many contractors are small businesses. Many small businesses are very thinly capitalized. Many employees of small businesses are extremely dependent on receiving compensation exactly on payday and not after it. And so, while many people in Chicago were basically unaffected on that Friday because their money kept working (on mobile apps, via Venmo/Cash App, via credit cards, etc), cash-dependent people got an enormous wrench thrown into their plans.

I never really thought about not having to worry about cashflow problems as a privilege before, but it makes sense, considering having access to the banking system to begin with is a privilege. I remember my bank's website and app were offline, but card processing was unaffected - you could still swipe your cards at retailers. For me, the disruption was a minor annoyance since I couldn't check my balance, but I imagine many people were probably panicking about making rent and buying groceries while everything was playing out.

HeyLaughingBoy1y ago

The really admirable thing about this is that Patrick acknowledged that it was "an other-than-minor emergency" for the contractors and took steps to ensure that they were paid rapidly. In a similar situation many people would have shrugged and taken an attitude of "sorry, bank's down. I'll pay you when it comes back up."

MadVikingGod1y ago

While reading this I was struck with an interesting question: What risk does any particular software vendor pose to an industry at large?

For example (making up numbers here): if 75% of all airline computers have croudstrike falcon installed that seems like a very concentrated risk.

I actually wouldn't be surprised if we had this we would see really high concentrations of a small number of vendors in any industry.

anticristi1y ago

The EU DORA regulation (Digital Operational Resilience Act for Financial Entities) has explicit provisions to avoid concentration risks. I heard a story that a bank was forced to use Google Cloud, because two other banks were already on AWS and Azure.

saltminer1y ago

Alternatively, if Oracle hikes the price on an industry-specific product by 75%, how much of that industry goes under?

adrr1y ago

Did it really hit banks hard? Core banking systems don't run windows, they run on mainframes typically on IBM z/OS. I know it hit the financial firms hard and knocked out their trading systems but I don't know of any major bank losing their core bank system due to crowdstrike.

Australia got hit hard because they modernized their bank systems and now most are cloud based. I am not aware of any major bank running their core systems on the cloud or on windows.

tempodox1y ago

> they modernized their bank systems

You mean they made them more vulnerable?

HeyLaughingBoy1y ago

> Configuration bugs are a disturbingly large portion of engineering decisions which cause outages

I work in medical device software -- the stuff that runs on machines in hospital labs, ER's or at patient bedside.

The first "ohmigod do we need to recall this?" bug I remember was an innocuous piece of code that was inserted to debug a specific problem, but which was supposed to be disabled in the "non-debug" configuration.

Then somehow, the software update shipped with a change to the configuration file that enabled that code to run. Timing-critical debug code running on a real-time system with a hard deadline is a recipe for disaster.

Thankfully, we got out of that pretty easily before it affected more than a small handful of users, but things could have been a lot worse.

kristaps1y ago

The article specifically mentions US banks and as I personally didn't see any disruption over here - is there (anec)data on how popular CrowdStrike is in the US vs the EU?

Muromec1y ago

Can't have disruption from CrowdStrike if you run on IBM mainframes with cobol coz your math only opens gates for new technologies once in 25 years.

MattSayar1y ago

Oh wow an Anathem reference!

To answer the question, CrowdStrike is a global company with thousands of employees around the world. Not sure why the EU wasn't hit as hard.

Ekaros1y ago

Might be question what type of disruption it is. Transfers and web bank is likely to work. Branches offices and ATMs might have issues. So if you try to do anything in person or negotiate anything with workers in bank there could be issues.

bob10291y ago

I feel like this only impacted the larger banks. I've heard absolutely no explosion noises coming from smaller institutions. The effect of regulations and their enforcement is felt differently across the spectrum.

There is something to be said for a diverse banking industry when it comes to this kind of problem. Also, this event is a powerful argument for keeping the core systems on unusual mainframe architectures. I think building a bank core on windows would be a really bad choice, but some vendors have already done this.

waihtis1y ago

Regulations are a big reason why this happened, sure, but also it hit the companies with great security budgets more.

Hospitals, for instance, weren't that widely affected as they barely have any money to buy security tooling.

Silver linings and all that, I guess.

cookiengineer1y ago

> Hospitals

Everybody seems to be quick to forget about WannaCry.

c0balt1y ago

Wannacry was not an accident. It was inarguably an intentional attach against general IT infrastructure instead of a borked update.

waihtis1y ago

That really just proves my point.

hpen1y ago

We blame car manufacturers for defects from suppliers, but we don't blame platform manufacturers (Microsoft) for holes in their architecture?

cibyr1y ago

You don't blame your car's manufacturer if it won't start because the monitoring dongle your insurance provider sent you in exchange for a discount drained the car's battery.

SirMittens1y ago

I think that's the wrong analogy. A more correct one would be "Should we blame a car company for a broken engine, that was modified after it was sold to you?".

A kernel level driver from a 3rd party is something that you willingly add to the OS, it wasn't there.

Just because windows allow you to do it, doesn't mean you should.

I mean, you can apply some dangerous mods to your car's engine, but you probably shouldn't, and if you do, it's your responsibility, not the car company.

hpen1y ago

Does crowdstrike void the warranty like an engine add on?

vel0city1y ago

If you had a support contract with Microsoft for your Windows installs and CrowdStrike is breaking your system they'll tell you to go talk to CrowdStrike, yes.

1 more reply

vel0city1y ago

If I add a NOS kit to my car and it blows up my engine, is that Honda's fault?

hpen1y ago

Doesn't Honda say "don't do this or it's your fucking problem"

vel0city1y ago

Right, so adding the NOS is making a third party addon that changes the behavior of the product outside the original designs of the product.

And installing a third-party kernel module (driver) is...a third party addon that changes the behavior of the product outside of the original designs of the product?

Honda didn't build the engine with NOS in mind. Microsoft didn't build the NT kernel for CrowdStrike. It is a third-party modification to the system the user chose to add on after taking delivery of the product that ultimately changes the behaviors of the system.

Arguing like Microsoft is liable for CrowdStrike's bad software is like arguing Honda is responsible for that NOS kit.

If I write a buggy kernel module that instantly kernel panics my Linux system, is Linus Torvalds responsible? Or am I responsible for the software I wrote?

1 more reply

Retr0id1y ago

> For historical reasons, that area where almost everything executes is called “userspace.”

It's an old term at this point, but I don't think the reasons for it being called "userspace" have changed or become outdated since then, so I wouldn't call them historic per se.

Macha1y ago

Things have gotten messier with virtualization, containerisation, hypervisors etc. The internet loves to produce pedants to argue the post should go into the finer points of these even when it's not relevant to the message. And so people like the author have a defensive reflex to throw in some language to bounce the pedants away.

SoftTalker1y ago

I used to like Patrick's posts but lately they are way to long and full of irrelevant minutia.

Decide who you're writing for, and write to that audience.

rozenmd1y ago

> Decide who you're writing for, and write to that audience.

He has, and he does.

rescbr1y ago

Some of his audience likes the irrelevant minutia.

arduanika1y ago

Congrats, you've been screenshotted and tweeted by him!

"In which an HN commenter offers me writing advice but fails to understand the implication of second sentence"

https://x.com/patio11/status/1818757982706139297

shadowgovt1y ago

Why is it called "userspace" when all it runs is some Docker containers hosting a web frontend's server, and no human being ever telnets into it? Where's the "user" in that story?

Where is the "user" when the machine is a Windows box stuffed behind a façade wall that displays airport directions, notifications, and ads on rotate?

anticristi1y ago

I always understood "user" in "userspace" as "the user of the operating system kernel".

btbuildem1y ago

The takeaway from this article seems to be: buy crowdstrike shares, because major corps are unable to make any changes, and will continue to pay licensing fees for this "service" for the foreseeable future.

tootie1y ago

This is going to crush their sales pipeline and lead to at least a few attempting a migration off. Crowdstrike is unlikely to go out of business, but this is not a good time to buy.

alephnerd1y ago

Safe Harbor: Don't follow random internet commentators opinions on public markets. This is just an opinion and not advice.

I disagree. Long term, the fundamentals of CRWD continue to remain unabated.

Endpoint protection is still a critical need no matter what - for every bug like CRWD, there's always a company you can point to who's operations were shut down due to an attack.

CRWD skimped on QA and customer support, but long term there aren't many other vendors that can provide a similar service, and CRWD is large enough to pull a PANW and M&A into entirely new segments (eg. DSPM with Flow Security, Observability/Data Lake with Humio, ASPM with Bionic) along with greenfield category makers like Charlotte AI for AI Security and AI EDR.

There will be short term pain for CRWD's Windows endpoint business with churn to MDE, SentinelOne, Tanium, etc but they have enough dry powder and a diversified security portfolio that they can safely recover within a year at most.

> crush their sales pipeline

With CRWD sized companies, most of their revenue comes from multi-year contracts and renewals.

They'll probably have a decently large layoff in the sales org, but enterprise sales tends to be fairly stable due to contract sizes along with riders about liability

btbuildem1y ago

> They'll probably have a decently large layoff in the sales org

Bingo. That's the buy signal

nappsec1y ago

That depends what sort of timeline you're looking at. I wouldn't be surprised if the price fell more, but the markets are forward looking and long term they're a key player in the space.

nkassis1y ago

SolarWinds comes to mind they haven't fully recovered but they are still around and kicking.

nerdponx1y ago

Seems ideal. Get in while the price is discounted relative to the overall market.

dangus1y ago

Just depends how far the stock falls and at what point it's undervalued.

candiddevmike1y ago

The lawsuits alone are going to be eyewatering. But sure, buy those shares.

lucianbr1y ago

Just spitballing, but I think the lawsuits will take years to come to any conclusion, and in the mean time Crowdstrike will continue to be paid and make a profit. And the conclusion is not really predictable.

davio1y ago

Delta airlines is in the headlines saying they had a $500 million impact and have no choice but to sue

dangus1y ago

Do any of their customers have a case? I'm pretty sure their contracts would cover this kind of outage as an expected eventuality.

A lot of lawsuits are going to be thrown out, I think.

tempodox1y ago

I admire your optimism.

deepsun1y ago

I'm still amazed how the blame shifted from Microsoft to CrowdStrike. Yes, CrowdStrike update caused that -- but applications fail all the time. It was Microsoft's oversight to put it on Windows critical path.

And banks/airlines etc were hit hard because their _Windows_ didn't boot, not because of an application crash on a perfectly working Windows.

ctxc1y ago

The application (Crowdstrike) was part of Windows' booting process.

Windows cannot simply "skip" failed drivers. Say Crowdstrike driver failed as a one time thing, Windows skipped it instead of retrying which led to the endpoint being vulnerable and a ransomware happens. We'd be saying the opposite now.

This is a high-impact ability Windows offers to applications - and applications should take responsibility and treat it as such.

I spoke to another EDR lead I know - they said they had provisions in place to read the dump if boot crashed, check if it was due to their driver and skip it if it was (and then send telemetry after startup so that it can be fixed, probably). Crowdstrike should have done the same.

One more thing to note is that we cannot say Windows shouldn't provide this ability - that becomes an anti-trust monopoly, because MS themselves are a competitor in this space.

mewpmewp21y ago

But then again ransomware would happen like you said if they skipped it? And ransomware sounds even worse.

burnished1y ago

The difference is that if windows does the skipping then you probably don't find out until its too late, if the application does the skipping there is the opportunity to set up alerting so you can fix whatever went wrong.

1 more reply

makeitdouble1y ago

Windows could sure handle this kind of error better, but IMHO it would be a mistake to require Microsoft to absolutely block any path Windows could be crashing due to third party software.

We'd end in a situation similar to Mac OS where there's a single gatekeeper and whole industries are subjected to the will of the platform owner.

Enterprises have chosen Windows because of that flexibility and control, while having a business partner they don't get with linux. If anything the blame should fall on them for getting hosed even as they fully had the means to avoid that situation.

CydeWeys1y ago

I don't think "Microsoft should lock down Windows so hard" is the solution we want here. I don't want my desktop OS to be a walled garden like iOS is. I want to be able to install software on it that does anything I need to be able to do -- and yes, having that capability to run software at the lowest possible level in the OS does also mean that that software has extra responsibility to be well-behaved, as the OS can't protect the system from it. But I still would rather have that option than not have it (and also I wouldn't use CrowdStrike).

klodolph1y ago

How did Microsoft put it on the Windows critical path? (Informational question—I’m not following the issue super closely, but I thought CrowdStrike was a third-party system. Crowdstrike was wrong to put so much code in the kernel. Microsoft was reportedly legally bound to provide this access and allow third-party code to run in the kernel.)

shombaboor1y ago

There was an interesting article that these third parties who lobbied to run in the kernel and microsoft acquiesced about 20 years ago which led us down this path. https://web.archive.org/web/20061023112233/http://software.s...

musjleman1y ago

If you dig a little more about what this is talking about, Microsoft did not actually make any kernel related changes.

This was just Symantec and McAffee ranting about PatchGuard and MS did not remove it.

dblohm71y ago

Microsoft added a feature to Windows that allows specially-signed antimalware drivers to be loaded extremely early in the boot sequence and be marked as non-optional. The idea is to give antimalware drivers the opportunity to load first, before anything else has had the chance to start.

Furthermore, if a driver is marked as optional and crashes, Windows can reboot with that optional driver disabled next time, preventing infinite crash/boot loops. Obviously that's no good if your antimalware driver gets disabled, so they can mark theirs as "required." Obviously in the CrowdStrike case, we got the worst of both worlds.

umanwizard1y ago

Microsoft is not who made the decision to put this on Windows' critical path; CrowdStrike was. Nothing stops you from running whatever dodgy third-party kernel modules you like on Linux or FreeBSD and they could easily cause the same sort of problem.

Bjartr1y ago

In fact, CrowdStrike has taken down Linux systems in much the same way in the past year (in April I think). It's just that the impact was less widespread.

deepsun1y ago

Linux yes, but *BSD systems have microkernel architecture, so must be more resilient to failures of one of the components. Although I have no idea whether the full system would boot either, I'm pretty sure it could partially load, give more information to user, and make it easier to fix.

deepsun1y ago

Partially agree. Linux yes, but *BSD systems have microkernel architecture, so must be more resilient to failures of one of the components. Although I have no idea whether the full system would boot either, I'm pretty sure it could partially load, give more information to user, and make it easier to fix.

imiric1y ago

To be fair, AFAIK the CrowdStrike driver was WHQL-certified. The loophole is that the driver loaded files at runtime, which made it impossible to predict every failure scenario.

Maybe this is the loophole that needs closing. You can't claim a driver is certified for Windows if the manufacturer can push arbitrary files that change its behavior. Especially if that manufacturer has sloppy development practices.

I understand that a primary goal of endpoint monitoring software is to be able to quickly react to new threats, and that the turn around time for Windows certification is surely unacceptable in this scenario, but this functionality can never be allowed to jeopardize the stability of the system it's supposed to protect. So it's ultimately on Microsoft to fix this for their users.

shadowgovt1y ago

Ironically, this is exactly the failure pattern that the changes in Chrome extensions to manifest v3 try to prevent. You can't provide a guarantee to the end-user of pre-vetted safety when the application is downloading and executing arbitrary code from a third-party source. That's like expecting a static code verifier to prevent all runtime errors.

It is, perhaps, a guarantee that no vendor should be expected to make.

SoftTalker1y ago

> You can't provide a guarantee to the end-user of pre-vetted safety when the application is downloading and executing arbitrary code from a third-party source.

So a web browser can't be trusted or certified, ever. Unless JavaScript is disabled?

1 more reply

Cthulhu_1y ago

In the article it states that Microsoft HAD to allow Crowdstrike to run in kernelspace by EU laws, because else MS would have the monopoly on kernel-level security solutions / integrations.

Macha1y ago

They probably had to, in the same way that banks had to use crowdstrike. Much as it's easy for banks to say "we use crowdstrike, like everyone else" rather than implement a bespoke and accountable framework for risk assessment and mitigation for every type of endpoint use case (and argue that case to both the auditor and regular). In this case it's easier for Microsoft to say "see, they can run in kernel space" rather than provide a bunch of API functions that achieve what's needed, convince all third party vendors to use them, and put in place a process to convince an auditor that Microsoft security software will never use any knowledge or functionality from the OS outside this.

ghusto1y ago

Exactly this. Microsoft did this poorly, so they were forced to allow others to do things poorly too.

1 more reply

fmbb1y ago

Did they have to?

Or did they choose to keep their own security software to run in kernel space thus forcing themselves to let others play by the same rules?

marcosdumay1y ago

They had to allow the same kind of access they have on their own "security" software.

Nothing in that means they need ring-0 access.

davidgerard1y ago

So why didn't MS lock it down in the US if it's an EU-local rule? Their excuse isn't plausible.

voytec1y ago

You're spilling cheap propaganda. Microsoft likely never had[0] an appropriate userland-level API in place and them blaming the EU should not be repeated by someone calling themselves a journalist.

[0] https://www.youtube.com/watch?v=EGttFWntctU - I need to state here that I do not possess the level of knowledge the author of video presents and therefore am unable to confirm findings included in the video

1 more reply

asr1y ago

Not MSFT’s fault: https://stratechery.com/2024/crashes-and-competition/

9999000009991y ago

I’ve used this analogy before.

If I sell you a bike and you remove the breaks you can’t sue me when you crash.

Any OS which allows users to do what they generally want to do, also allows users to fubar their own systems.

deepsun1y ago

Let me exaggerate a bit to show how bad that analogy is:

Let's say I've developed an laptop that bricks whenever you open a website with incorrectly formatted HTML.

Not sure how to adapt your bike analogy to this... Let's say you made a bike that's intended to be ridden outdoors, but breaks down whenever user sits on indoors. Yea, no one is supposed to ride it indoors. Not sure it's the best analogy though.

UPDATE: let's say the bike breaks down completely whenever it's ridden in the rain.

9999000009991y ago

No one's forcing you to install kernel level software.

If I install some kernel level anti cheats and they stop Windows from booting, I need to blame the game developers. Not Microsoft.

Your free to install pretty much whatever you want on Windows.

Saris1y ago

What about the previous crowdstrike bugs that hit Linux systems in a similar fashion?

I don't understand how this has anything to do with Windows, Crowdstrike is the one who built the application.

deepsun1y ago

It has everything to do with Windows, because it's Windows who crashed.

Applications crash all the time. But in this case people weren't able to even load the Windows to figure what's wrong or what app has crashed.

Microsoft allowed a third-party to self-update and didn't put a proper system of review and updates control to the heart of its OS.

Saris1y ago

The same thing happened before with Linux, crowdstrike made systems unbootable.

So I don't understand why you're focusing on windows here. Linux allows anyone to update too, there's no review or control either.

Just because an OS allows you to break it, does not mean the maker of the OS is liable when you do break it.

1 more reply

btbuildem1y ago

Isn't corporate malware by definition on the "critical path"? The article outlines the reasons why that jank runs in kernel space, and why MS is unable to "downgrade" it to userspace.

ClumsyPilot1y ago

This is the comment I expected, begging to handover your freedoms to run software to a big carry.

If you replace parts in your BMW, and put in some garbage or incompatible parts, it your fault if it doesn’t run.

You expect to sue your mechanic if he messed up, and for him to cover the full cost. For some reason people do not expect CrowdStrike to pay for their stupidity, which is the root of the problem. And the management that installed crowdstrike without due diligence

ClumsyPilot1y ago

This is the comment I expected, begging to handover your freedoms to run software to a big carry.

If you replace parts in your BMW, and put in some garbage or incompatible parts, it your fault if it doesn’t run.

deepsun1y ago

Bit it wasn't some garbage parts in a car, it was an app. And apps fail all the time, OS is expected to handle that. Same as car is expected to handle rain for example.

vel0city1y ago

Buggy third-party kernel modules cause kernel panics all the time in Linux. You can easily write a kernel module to make a Linux system explode.

ClumsyPilot1y ago

> it was an app. And apps fail all the time

Exactly

The fact that developers do not take their responsibility as seriously as an average car mechanic bring shame on our entire industry

ortusdux1y ago

Is there any merit to Microsoft's argument that the EU forced them into keeping their kernel accessible by 3rd parties?

https://www.theregister.com/2024/07/22/windows_crowdstrike_k...

kmeisthax1y ago

The EU's rules are that Microsoft can't hoard APIs away from competitors, not that they have to give competitors a kernel driver SDK. If Microsoft says Windows Defender needs a kernel driver, then CrowdStrike gets to ship a kernel driver, too.

Microsoft, interestingly enough, is working on a project to add an eBPF[0] runtime to the NT kernel. If they were to use this for their own security products then I doubt the EU would prohibit them from transitioning third-party security products to eBPF programs. Antitrust and competition law do not care about specific technical measures competitors use to compete, just that dominant companies are not shutting competitors out of markets.

[0] Formerly "extended Berkley Packet Filter", eBPF lets you run safety-verified code in kernel space. Notably, the verifier isn't just a signing check, it can actually ensure the code won't crash the kernel directly.

ghusto1y ago

Yes and no. As others have pointed out above, it is factually correct that they were forced by the EU to give access to kernelspace. However, it is also true that the only reason for that was that _they_ were using kernelspace for the same things (instead of creating a framework and API into the features needed).

davidgerard1y ago

No. They could have done an EU-only edition that behaved that way. But they didn't.

gciguy1y ago

Dave's Garage has a great video on this: https://www.youtube.com/watch?v=wAzEJxOo1ts

kmeisthax1y ago

Microsoft didn't write the Falcon sensor software nor did they put it in the kernel. In fact, Microsoft has been shouting to the heavens trying to shift the blame from CrowdStrike onto the European Commission, because they want people to irrationally hate antitrust so they can turn Windows into shitty iOS and monopolize the security market (and applications market) for it.

Furthermore, Microsoft does actually have some rules regarding what you can and can't put into a signed kernel driver. Specifically, they won't sign kernel code unless they've seen and tested it first. CrowdStrike deliberately circumvented this rule by implementing their own configuration format - really, just a fancy way of loading code into the kernel that Microsoft doesn't have signing control over.

If there is blame to be had here for Microsoft, maybe it's that their kernel code signing program doesn't scrutinize third-party configuration formats hard enough. I mean, if you sign a code loader, you're really signing all possible programs, making code signing irrelevant. And configuration is more often than not, code in a trenchcoat. It's often Turing-complete, and almost certainly more complicated than the actual programming languages used to write the compiled code being signed off on.

But at the same time I imagine Microsoft tried this and got pushback. That might be why they feel (incorrectly) like they can blame the EU for this. Every third-party security solution does absolutely unspeakable things in kernel space that no one with actual computer science training would sign off on, using configuration to wrestle signing control away from Microsoft. Remember: Crowdstrike is designed to backdoor Windows systems so that their owners know if an attack has succeeded, not to make them more secure from attacks in the first place. Corporations are states[0], and states fundamentally suffer from poor legibility: they own and operate far too much stuff for a tribe[1] of humans to meaningfully control or remember.

The problem is that we have two different entities that all have the ability to stop this madness. When states run into this situation, they impose "joint and several liability", which means "I don't care how we precisely assign blame, I'm just going to say you all caused it and move on". In other words, it's Microsoft's fault and it's CrowdStrike's fault.

[0] ancaps fite me

[1] Maximally connected social graph with node degree below Dunbar's number.

supriyo-biswas1y ago

> because they want people to irrationally hate antitrust

One only needs to look at what's happening with Google's privacy sandbox to know the perils of antitrust with regard to introducing new interfaces. Even though Google has offered new interfaces and APIs that they themselves intend to migrate to (and take a ~20% revenue reduction), they've attracted the scrutiny of regulators who claim that this is a way of locking out competitors in the advertising space.

> [0] ancaps fite me

This part is simply inciting a flamewar, and something that you can do without in the spirit of the website guidelines[1].

[1] https://news.ycombinator.com/newsguidelines.html

kmeisthax1y ago

It's important to remember that every other browser dropped third-party cookie support years before Chrome did. Google dragged their feet on it until they could come up with a solution that would give Google the same level of tracking, because Google is an advertising company. So the competition authorities are telling Google - and only Google - that they can't drop third-party cookies anymore.

I've never actually heard anyone claim Privacy Sandbox[0] APIs would give third-party ad networks the same level of tracking as Google. But I imagine even if they did, the APIs would probably be a poor fit for competing ad networks, in the same way that, say, the iOS File Provider APIs are a terrible fit for Dropbox[1].

There are three different ways you can introduce a new standard or interface:

- You can go to or form a standards body with all the relevant market players and agree on a technical specification for that interface. This is preferred, and it's how the Web is usually done.

- You can take a competitor's interface people are already using and adopt that. This is how you get de-facto standards, and while they might have loads of technical problems[2], none of them give you an unfair market advantage.

- You can make your own interface and force competitors to adopt that. You get all the technical problems of a de-facto standard, but those are all problems your competition has to deal with, not you.

The difference is a matter of market advantage. Out of all the major browser vendors, only Google has dominance in online marketing. Microsoft and Apple would like to have a piece of that pie, but they all dropped third-party cookies without tying it to their own competing standards that they wanted to force other people to use.

[0] Hell of an Orwellian name

[1] For example, if you use Dropbox as your file storage, you can't pick folders. At all. On an operating system built by the company whose engineers are obsessed with bundles (directories that look and act like files instead of folders).

[2] laughs in SWF

hulitu1y ago

I think they said it was a windows driver, not a normal application. Running crap in kernel mode does not end well on any OS.

concerned_user1y ago

Yes it is a driver which is signed and tested by Microsoft. Driver allows to run arbitrary unsigned code. Why is that allowed?

cyberpunk1y ago

The driver is some kind of AV/Signature detection hook. E.g check every open() for this list of checksums and refuse to open known viruses style system. The 'update' was a borked definition file which triggered a bug in that system.

It's not code execution without signing, and I think probably they do want these files to be updated hands free.

The real problem was the lack of testing, rather than the actual mechanism I think.

shadowgovt1y ago

This is the nugget of the issue. The code-signing process, in this case, was abused to verify something that, fundamentally, cannot give the guarantee "Doesn't crash your OS" because it is allowed to run arbitrary code in the form of novel commands in what is essentially a DSL. So if code-signing is supposed to be a guarantee from MS that "this code can't crash your system," it should never have been signed... But then MS would have been on hooks for blocking a competitor.

There is no guarantee the law is written soundly.

wolpoli1y ago

To get a driver signed by Microsoft, the developer of the driver is required to provide a full cert pass log from the Windows Hardware Lab Kit to dev center [0]. Do you have any article that says the CrowdStrike driver has been tested by Microsoft?

[0]: https://learn.microsoft.com/en-us/windows-hardware/drivers/i...

1 more reply

Joker_vD1y ago

...you want Microsoft to forbid you from running certain kinds of programs on your own machine, even if you really, really insist on it, do I understand you correctly?

1 more reply

Sohcahtoa821y ago

The Crowdstrike failure was not caused by running unsigned code.

hpen1y ago

This is a valid opinion and I don't know why you were downvoted (well other than the hacker news bubble mindset (or mindless-set).

How is Microsoft not to blame, it's their product? We wouldn't blame a Toyota supplier for a failure in a car, but we somehow segment that in the software world?

vel0city1y ago

Toyota chose the supplier, worked with them on the specs and designs, and put it in their OE car delivered to the customer. It has Toyota's name on it, it was bought at a Toyota dealership, is a part of Toyota's warranty.

Crowdstrike is entirely optional software that doesn't come from Microsoft. Microsoft doesn't market it. Microsoft had no hand in making it. Microsoft doesn't sell it. Microsoft had no hand in a user installing Crowdstrike.

Do you not see the obvious differences there?

Sohcahtoa821y ago

> How is Microsoft not to blame, it's their product?

Do you think Crowdstrike is a Microsoft product?

hpen1y ago

No. My point is that Microsoft allows the damn thing to be ran in kernel space. Mac, linux don't have this problem due to how THEY architected the system. Yes I think that puts Microsoft at blame.

2 more replies

jsmith991y ago

Yes, use a different operating system, one that gracefully handles null pointer dereferencing by third party kernel modules? /s

voytec1y ago

> Another way is if it has recently joined a botnet orchestrated from a geopolitical adversary of the United States after one of your junior programmers decided to install warez because the six figure annual salary was too little to fund their video game habit.

Fictional statements like this make me reluctant to read further, and ignore source of such "news" in the future.

bdamm1y ago

It's obviously fictional, but let's call it contemporary drama based on a true story. I thought the point was well made. The author already noted this was a handwaving segment.

samspot1y ago

I got in trouble for something like this early in my career (running bittorrent over my work vpn).

davidgerard1y ago

what makes you think it was fictional?

also, bragging about your inability to read text seems an odd way to interact.

voytec1y ago

Bragging? Reluctant==unable?

__MatrixMan__1y ago

I like the technical stuff here.

I'm not so sure about this:

> money is core societal infrastructure, like the power grid and transportation systems are. It would be really bad if hackers working for a foreign government could just turn off money.

Sure, it would be inconvenient in the short term. But I think the current design is holding us back.

I suspect that most of us would have more to gain than to lose if we managed to shut off money-as-we-know-it and keep it off for long enough to iterate on alternatives. Any design that even tried to step beyond "well that's how we've always done it" would likely land somewhere better than what we're doing. Much has changed since Alexander Hamilton.

Joker_vD1y ago

In the early 90's Russia, essentially, voided almost all of the Soviet money that remained in monetary system (most of which were bank deposits; they simply vanished with zero compensation), allowing rather small upper limit on the amount of old Soviet roubles one person was allowed to exchange for the new Russian roubles.

Believe it or not, that really did not help the low and low-middle classes with their growing financial problems; and the upper-middle and top classes mostly operated in dollars (or less often, in deutschmarks) by this time anyhow, so that didn't inconvenience them much at all.

__MatrixMan__1y ago

Losing access to one currency but not others is quite a different thing, I don't think that would help anybody.

What I think would help is something that evolved in a less stable computing environment. Something which had to be partition tolerant. Such a thing would have to remain more closely coupled with the consent and merits of its participants because it would lack a reliable connection to a far away authority (currently used to uphold the wishes of extraneous parties to the transaction). Something like local-first software, but for money.

gadders1y ago

In the short term people would probably starve to death.

Joker_vD1y ago

Probably not. A competent government could install temporary rationing for the most essential goods such as food. It happened through the the whole of the 1917—1920 Russian revolution, with four or five kinds of paper money being circulated around, and the urban population managed through it only if barely. That government was much less competent than the US government is today.

mminer2371y ago

I mean, millions still starved during the revolution, even with the American Relief Administration feeding 10% of the country.

1 more reply

imtringued1y ago

I agree there needs to be more competition, but that doesn't mean you need to get rid of the old way. It is better when two approaches run in parallel, to compensate the other's shortcomings.

__MatrixMan__1y ago

That would indeed be ideal: one as a backup for the other, and when both are functioning, chose the one that suits you best. I just think that it's outages that will convince us that we need this... stakeholders in the status quo certainly aren't going to do it.

vel0city1y ago

The uber-wealthy don't have most of their assets in currency. Its in stocks, houses, cars, boats, etc. Delete the dollars, it'll hurt them a bit, but in the end they still have a house(es).

But now all those people who were using currency to trade for housing now suddenly need to find a new way to trade for shelter.

Who got hurt worse here?

__MatrixMan__1y ago

I'm not going to try to lay down the exact parameters of what we'd come up with in money's place, but if it's going to be resilient in the face of far-away servers behaving badly then it would have to derive it's legitimacy not from some shiny golden ledger of who owns which dollar, house, or car, but instead from who is behaving in a way that benefits the people around them.

So yeah, it could go as you say, but only of the wealthy are behaving in a way that justifies their outsized share while the renters are just spending from a pile of money that they got through less honorable means.

I don't think that's the most likely scenario though.

j / k navigate · click thread line to collapse

238 comments

jmuguy1y ago

candiddevmike1y ago

The update bypassed the controls orgs had in place to defer/schedule updates, AFAIK.

jmuguy1y ago

wisemang1y ago

It’s definitely the case. See Crowdstrike’s preliminary post incident review here: https://www.crowdstrike.com/falcon-content-update-remediatio...

The nature of “content updates” vs a full product update. Though you may be right, perhaps they provide controls for those updates, I’ve never used their software. But doesn’t sound like it.

CydeWeys1y ago

kelnos1y ago

If that's the case, that doesn't change GP's point: if Crowdstrike can bypass your org's controls on rolling out updates to its software, it shouldn't be used.

b4ckup1y ago

bonestamp21y ago

Again, better to pay a sandwich shop a few thousand dollars for their lost day of sales than get sued by the people in the hospital who couldn't get their meds, x-rays, etc in time.

qaq1y ago

Generally none gates content updates as they happen multiple times a day

lucianbr1y ago

Management decides to use Crowdstrike, not IT, and IT has no way to rollout updates in controlled fashion.

So not really a failure of IT, at least not for this reason.

hello_moto1y ago

In big companies, it's the Management of the IT team.

I know, not really the DailyWTF materials that majority HNers led to believe.

jmuguy1y ago

My comment assumes that the IT department (including its executive) gets to make these sort decisions - why wouldn't they?

mingus881y ago

In many mature orgs, corporate IT rolls up to the CIO and security will roll up to the CISO

The CISO and security ops will demand to be completely independent from corp IT, for legit reasons, as the security team needs to treat IT as potential insider threat actors with elevated privileges.

2 more replies

Retric1y ago

Major purchases tend to be pushed up the ladder. It’s not uncommon for a CEO or non technical director etc to decide what IT systems to use.

1 more reply

patmcc1y ago

lucianbr1y ago

Sure, if there's a security exec who made the decision, it may be their fault. I was thinking more of the rank and file, but that's just my bias.

1 more reply

toddmorey1y ago

I don’t know if I somehow just have little exposure to Windows in my life or if there’s an untold resiliency story for the global internet in the face of such a massive outage.

All I can say is THANK YOU to all the unsung heroes who answered the call and worked their butts off. Infrastructure doesn’t work without you. We see you & we thank you!

davio1y ago

doubled1121y ago

I know at least one person who "survived" while her coworker's laptops were down.

My first question was "do you shut your machine off at the end of the day?" She did, and that's probably why about half of her office was affected, and the other half was not.

Can't update it if it isn't on.

blackoil1y ago

bostik1y ago

Crowdstrike took out less than 1% of the global Windows installation base.

Supply chain risks are everywhere, and in regulated industries they are highly concentrated.

yowzadave1y ago

I wish that were the case for me! My in-laws had their flight out of JFK delayed by 2-3 days, as did my daughter who was supposed to fly as an unaccompanied minor.

duderific1y ago

I asked the guy at the luggage counter, and he said the day before was pretty crazy, but they had everything straightened out by the next day.

marcosdumay1y ago

People found a really quick workaround. It would take a couple more days to fix if there wasn't any.

bdjsiqoocwk1y ago

> all my banking worked

Vanguard.co.uk was down.

But yes, I echo your feelings. When you examine how complex everything is under the hood it's almost unbelievable that anything works.

mikeocool1y ago

I had a 7am flight on Delta from LGA to MSP. Seeing all of the blue screens in the airport was pretty surreal and our flight was delayed four hours.

But yeah, other than that, the only issue we ran into was that the Jimmy John’s we stopped at for lunch outside of MSP was slammed because Delta had ordered hundreds of sandwiches for their staff.

I’ve definitely experienced much worse travel disruptions due to normal weather (though obviously we got real lucky compared to some Delta customers).

pantulis1y ago

This is a good writeup, but to be fair it's just not a matter of banking regulations. Basically all big companies are under similar obligations regarding endpoint protection.

candiddevmike1y ago

Should endpoint protection require kernel level access? At what point does it stop becoming protection and start becoming a liability? Obligatory who watches/protects the watchmen/protector...

notepad0x901y ago

candiddevmike1y ago

FWIW I asked if it should require access, not what the current status quo is/what limitations exist within the OS.

1 more reply

greenn1y ago

tmm1y ago

> Maybe we use some of those extra protection rings?

Maybe not. Intel is considering removing rings 1 and 2 for a future 64-bit only x86 architecture, because they "are unused by modern software".

https://www.intel.com/content/www/us/en/developer/articles/t...

1 more reply

ghusto1y ago

Couldn't you just ask some OS APIs provided by something in kernelspace for what you need? In fact, isn't this how macOS does things?

2 more replies

jen201y ago

> With the current model kernel level access is required.

On Windows.

1 more reply

jabroni_salad1y ago

If you don't do it, someone else will.

bluGill1y ago

Even on home machines where no user has a password, having to do something special to get into administrator mode will stop several attacks just because people will slow down and ask.

3 more replies

supriyo-biswas1y ago

> At what point does it stop becoming protection and start becoming a liability?

adrr1y ago

How else would you monitor a windows box? EU won't allow Microsoft to lock down their kernel and provide MacOS type solution with APIs for trust publishers.

imtringued1y ago

From what I have heard, Microsoft is allowed to do that, they merely aren't allowed to be a competitor to the software that uses the API.

SkyPuncher1y ago

Yes. Absolutely yes.

It's the only way to detect certain types of advanced threats.

SkyPuncher1y ago

Basically all B2B companies are under some sort of obligation to have endpoint protection.

All of these requirements essentially become transitive across a company's entire supply chain.

* Big bank needs to comply with X, so do all of their vendors.

* Vendor wants to sell to big bank, so they comply with X. They also need all of their vendors to comply with X.

* So on and so on.

----

I think we got one inquiry. It was basically just an FYI. This person had so many things broken on their end that "one more thing" being broken was just a drop in the bucket.

pantulis1y ago

On the other hand, count me surprised at the sales prowess of Crowdstrike, I did not know how big they were.

SoftTalker1y ago

If not regulations, then demands by insurers for cyberattack insurance coverage.

taeric1y ago

And the super frustrating thing is that these checklists are often very effective for why they exist.

shadowgovt1y ago

Just as a general comment on this whole affair:

This would be the third incident I'm familiar with of a file of entirely zeroes breaking something big.

Never assume 'zero' (for whatever form zero takes in context) can't be an input.

tempodox1y ago

As long as the botchers get away with impunity, they won't “waste” resources on higher standards.

saltminer1y ago

> This created a minor emergency for me, because it was an other-than-minor emergency for some contractors I was working with.

HeyLaughingBoy1y ago

MadVikingGod1y ago

While reading this I was struck with an interesting question: What risk does any particular software vendor pose to an industry at large?

For example (making up numbers here): if 75% of all airline computers have croudstrike falcon installed that seems like a very concentrated risk.

I actually wouldn't be surprised if we had this we would see really high concentrations of a small number of vendors in any industry.

anticristi1y ago

saltminer1y ago

Alternatively, if Oracle hikes the price on an industry-specific product by 75%, how much of that industry goes under?

adrr1y ago

Australia got hit hard because they modernized their bank systems and now most are cloud based. I am not aware of any major bank running their core systems on the cloud or on windows.

tempodox1y ago

> they modernized their bank systems

You mean they made them more vulnerable?

HeyLaughingBoy1y ago

> Configuration bugs are a disturbingly large portion of engineering decisions which cause outages

I work in medical device software -- the stuff that runs on machines in hospital labs, ER's or at patient bedside.

Thankfully, we got out of that pretty easily before it affected more than a small handful of users, but things could have been a lot worse.

kristaps1y ago

The article specifically mentions US banks and as I personally didn't see any disruption over here - is there (anec)data on how popular CrowdStrike is in the US vs the EU?

Muromec1y ago

Can't have disruption from CrowdStrike if you run on IBM mainframes with cobol coz your math only opens gates for new technologies once in 25 years.

MattSayar1y ago

Oh wow an Anathem reference!

To answer the question, CrowdStrike is a global company with thousands of employees around the world. Not sure why the EU wasn't hit as hard.

Ekaros1y ago

bob10291y ago

waihtis1y ago

Regulations are a big reason why this happened, sure, but also it hit the companies with great security budgets more.

Hospitals, for instance, weren't that widely affected as they barely have any money to buy security tooling.

Silver linings and all that, I guess.

cookiengineer1y ago

> Hospitals

Everybody seems to be quick to forget about WannaCry.

c0balt1y ago

Wannacry was not an accident. It was inarguably an intentional attach against general IT infrastructure instead of a borked update.

waihtis1y ago

That really just proves my point.

hpen1y ago

We blame car manufacturers for defects from suppliers, but we don't blame platform manufacturers (Microsoft) for holes in their architecture?

cibyr1y ago

You don't blame your car's manufacturer if it won't start because the monitoring dongle your insurance provider sent you in exchange for a discount drained the car's battery.

SirMittens1y ago

I think that's the wrong analogy. A more correct one would be "Should we blame a car company for a broken engine, that was modified after it was sold to you?".

A kernel level driver from a 3rd party is something that you willingly add to the OS, it wasn't there.

Just because windows allow you to do it, doesn't mean you should.

I mean, you can apply some dangerous mods to your car's engine, but you probably shouldn't, and if you do, it's your responsibility, not the car company.

hpen1y ago

Does crowdstrike void the warranty like an engine add on?

vel0city1y ago

If you had a support contract with Microsoft for your Windows installs and CrowdStrike is breaking your system they'll tell you to go talk to CrowdStrike, yes.

1 more reply

vel0city1y ago

If I add a NOS kit to my car and it blows up my engine, is that Honda's fault?

hpen1y ago

Doesn't Honda say "don't do this or it's your fucking problem"

vel0city1y ago

Right, so adding the NOS is making a third party addon that changes the behavior of the product outside the original designs of the product.

And installing a third-party kernel module (driver) is...a third party addon that changes the behavior of the product outside of the original designs of the product?

Arguing like Microsoft is liable for CrowdStrike's bad software is like arguing Honda is responsible for that NOS kit.

If I write a buggy kernel module that instantly kernel panics my Linux system, is Linus Torvalds responsible? Or am I responsible for the software I wrote?

1 more reply

Retr0id1y ago

> For historical reasons, that area where almost everything executes is called “userspace.”

It's an old term at this point, but I don't think the reasons for it being called "userspace" have changed or become outdated since then, so I wouldn't call them historic per se.

Macha1y ago

SoftTalker1y ago

I used to like Patrick's posts but lately they are way to long and full of irrelevant minutia.

Decide who you're writing for, and write to that audience.

rozenmd1y ago

> Decide who you're writing for, and write to that audience.

He has, and he does.

rescbr1y ago

Some of his audience likes the irrelevant minutia.

arduanika1y ago

Congrats, you've been screenshotted and tweeted by him!

"In which an HN commenter offers me writing advice but fails to understand the implication of second sentence"

https://x.com/patio11/status/1818757982706139297

shadowgovt1y ago

Why is it called "userspace" when all it runs is some Docker containers hosting a web frontend's server, and no human being ever telnets into it? Where's the "user" in that story?

Where is the "user" when the machine is a Windows box stuffed behind a façade wall that displays airport directions, notifications, and ads on rotate?

anticristi1y ago

I always understood "user" in "userspace" as "the user of the operating system kernel".

btbuildem1y ago

tootie1y ago

This is going to crush their sales pipeline and lead to at least a few attempting a migration off. Crowdstrike is unlikely to go out of business, but this is not a good time to buy.

alephnerd1y ago

Safe Harbor: Don't follow random internet commentators opinions on public markets. This is just an opinion and not advice.

I disagree. Long term, the fundamentals of CRWD continue to remain unabated.

Endpoint protection is still a critical need no matter what - for every bug like CRWD, there's always a company you can point to who's operations were shut down due to an attack.

> crush their sales pipeline

With CRWD sized companies, most of their revenue comes from multi-year contracts and renewals.

They'll probably have a decently large layoff in the sales org, but enterprise sales tends to be fairly stable due to contract sizes along with riders about liability

btbuildem1y ago

> They'll probably have a decently large layoff in the sales org

Bingo. That's the buy signal

nappsec1y ago

That depends what sort of timeline you're looking at. I wouldn't be surprised if the price fell more, but the markets are forward looking and long term they're a key player in the space.

nkassis1y ago

SolarWinds comes to mind they haven't fully recovered but they are still around and kicking.

nerdponx1y ago

Seems ideal. Get in while the price is discounted relative to the overall market.

dangus1y ago

Just depends how far the stock falls and at what point it's undervalued.

candiddevmike1y ago

The lawsuits alone are going to be eyewatering. But sure, buy those shares.

lucianbr1y ago

davio1y ago

Delta airlines is in the headlines saying they had a $500 million impact and have no choice but to sue

dangus1y ago

Do any of their customers have a case? I'm pretty sure their contracts would cover this kind of outage as an expected eventuality.

A lot of lawsuits are going to be thrown out, I think.

tempodox1y ago

I admire your optimism.

deepsun1y ago

And banks/airlines etc were hit hard because their _Windows_ didn't boot, not because of an application crash on a perfectly working Windows.

ctxc1y ago

The application (Crowdstrike) was part of Windows' booting process.

This is a high-impact ability Windows offers to applications - and applications should take responsibility and treat it as such.

One more thing to note is that we cannot say Windows shouldn't provide this ability - that becomes an anti-trust monopoly, because MS themselves are a competitor in this space.

mewpmewp21y ago

But then again ransomware would happen like you said if they skipped it? And ransomware sounds even worse.

burnished1y ago

1 more reply

makeitdouble1y ago

Windows could sure handle this kind of error better, but IMHO it would be a mistake to require Microsoft to absolutely block any path Windows could be crashing due to third party software.

We'd end in a situation similar to Mac OS where there's a single gatekeeper and whole industries are subjected to the will of the platform owner.

CydeWeys1y ago

klodolph1y ago

shombaboor1y ago

musjleman1y ago

If you dig a little more about what this is talking about, Microsoft did not actually make any kernel related changes.

This was just Symantec and McAffee ranting about PatchGuard and MS did not remove it.

dblohm71y ago

umanwizard1y ago

Bjartr1y ago

In fact, CrowdStrike has taken down Linux systems in much the same way in the past year (in April I think). It's just that the impact was less widespread.

deepsun1y ago

imiric1y ago

To be fair, AFAIK the CrowdStrike driver was WHQL-certified. The loophole is that the driver loaded files at runtime, which made it impossible to predict every failure scenario.

shadowgovt1y ago

It is, perhaps, a guarantee that no vendor should be expected to make.

SoftTalker1y ago

> You can't provide a guarantee to the end-user of pre-vetted safety when the application is downloading and executing arbitrary code from a third-party source.

So a web browser can't be trusted or certified, ever. Unless JavaScript is disabled?

1 more reply

Cthulhu_1y ago

In the article it states that Microsoft HAD to allow Crowdstrike to run in kernelspace by EU laws, because else MS would have the monopoly on kernel-level security solutions / integrations.

Macha1y ago

ghusto1y ago

Exactly this. Microsoft did this poorly, so they were forced to allow others to do things poorly too.

1 more reply

fmbb1y ago

Did they have to?

Or did they choose to keep their own security software to run in kernel space thus forcing themselves to let others play by the same rules?

marcosdumay1y ago

They had to allow the same kind of access they have on their own "security" software.

Nothing in that means they need ring-0 access.

davidgerard1y ago

So why didn't MS lock it down in the US if it's an EU-local rule? Their excuse isn't plausible.

voytec1y ago

You're spilling cheap propaganda. Microsoft likely never had[0] an appropriate userland-level API in place and them blaming the EU should not be repeated by someone calling themselves a journalist.

1 more reply

asr1y ago

Not MSFT’s fault: https://stratechery.com/2024/crashes-and-competition/

9999000009991y ago

I’ve used this analogy before.

If I sell you a bike and you remove the breaks you can’t sue me when you crash.

Any OS which allows users to do what they generally want to do, also allows users to fubar their own systems.

deepsun1y ago

Let me exaggerate a bit to show how bad that analogy is:

Let's say I've developed an laptop that bricks whenever you open a website with incorrectly formatted HTML.

UPDATE: let's say the bike breaks down completely whenever it's ridden in the rain.

9999000009991y ago

No one's forcing you to install kernel level software.

If I install some kernel level anti cheats and they stop Windows from booting, I need to blame the game developers. Not Microsoft.

Your free to install pretty much whatever you want on Windows.

Saris1y ago

What about the previous crowdstrike bugs that hit Linux systems in a similar fashion?

I don't understand how this has anything to do with Windows, Crowdstrike is the one who built the application.

deepsun1y ago

It has everything to do with Windows, because it's Windows who crashed.

Applications crash all the time. But in this case people weren't able to even load the Windows to figure what's wrong or what app has crashed.

Microsoft allowed a third-party to self-update and didn't put a proper system of review and updates control to the heart of its OS.

Saris1y ago

The same thing happened before with Linux, crowdstrike made systems unbootable.

So I don't understand why you're focusing on windows here. Linux allows anyone to update too, there's no review or control either.

Just because an OS allows you to break it, does not mean the maker of the OS is liable when you do break it.

1 more reply

btbuildem1y ago

Isn't corporate malware by definition on the "critical path"? The article outlines the reasons why that jank runs in kernel space, and why MS is unable to "downgrade" it to userspace.

ClumsyPilot1y ago

This is the comment I expected, begging to handover your freedoms to run software to a big carry.

If you replace parts in your BMW, and put in some garbage or incompatible parts, it your fault if it doesn’t run.

ClumsyPilot1y ago

This is the comment I expected, begging to handover your freedoms to run software to a big carry.

If you replace parts in your BMW, and put in some garbage or incompatible parts, it your fault if it doesn’t run.

deepsun1y ago

Bit it wasn't some garbage parts in a car, it was an app. And apps fail all the time, OS is expected to handle that. Same as car is expected to handle rain for example.

vel0city1y ago

Buggy third-party kernel modules cause kernel panics all the time in Linux. You can easily write a kernel module to make a Linux system explode.

ClumsyPilot1y ago

> it was an app. And apps fail all the time

Exactly

The fact that developers do not take their responsibility as seriously as an average car mechanic bring shame on our entire industry

ortusdux1y ago

Is there any merit to Microsoft's argument that the EU forced them into keeping their kernel accessible by 3rd parties?

https://www.theregister.com/2024/07/22/windows_crowdstrike_k...

kmeisthax1y ago

ghusto1y ago

davidgerard1y ago

No. They could have done an EU-only edition that behaved that way. But they didn't.

gciguy1y ago

Dave's Garage has a great video on this: https://www.youtube.com/watch?v=wAzEJxOo1ts

kmeisthax1y ago

[0] ancaps fite me

[1] Maximally connected social graph with node degree below Dunbar's number.

supriyo-biswas1y ago

> because they want people to irrationally hate antitrust

> [0] ancaps fite me

This part is simply inciting a flamewar, and something that you can do without in the spirit of the website guidelines[1].

[1] https://news.ycombinator.com/newsguidelines.html

kmeisthax1y ago

There are three different ways you can introduce a new standard or interface:

- You can go to or form a standards body with all the relevant market players and agree on a technical specification for that interface. This is preferred, and it's how the Web is usually done.

[0] Hell of an Orwellian name

[2] laughs in SWF

hulitu1y ago

I think they said it was a windows driver, not a normal application. Running crap in kernel mode does not end well on any OS.

concerned_user1y ago

Yes it is a driver which is signed and tested by Microsoft. Driver allows to run arbitrary unsigned code. Why is that allowed?

cyberpunk1y ago

It's not code execution without signing, and I think probably they do want these files to be updated hands free.

The real problem was the lack of testing, rather than the actual mechanism I think.

shadowgovt1y ago

There is no guarantee the law is written soundly.

wolpoli1y ago

[0]: https://learn.microsoft.com/en-us/windows-hardware/drivers/i...

1 more reply

Joker_vD1y ago

...you want Microsoft to forbid you from running certain kinds of programs on your own machine, even if you really, really insist on it, do I understand you correctly?

1 more reply

Sohcahtoa821y ago

The Crowdstrike failure was not caused by running unsigned code.

hpen1y ago

This is a valid opinion and I don't know why you were downvoted (well other than the hacker news bubble mindset (or mindless-set).

How is Microsoft not to blame, it's their product? We wouldn't blame a Toyota supplier for a failure in a car, but we somehow segment that in the software world?

vel0city1y ago

Do you not see the obvious differences there?

Sohcahtoa821y ago

> How is Microsoft not to blame, it's their product?

Do you think Crowdstrike is a Microsoft product?

hpen1y ago

No. My point is that Microsoft allows the damn thing to be ran in kernel space. Mac, linux don't have this problem due to how THEY architected the system. Yes I think that puts Microsoft at blame.

2 more replies

jsmith991y ago

Yes, use a different operating system, one that gracefully handles null pointer dereferencing by third party kernel modules? /s

voytec1y ago

Fictional statements like this make me reluctant to read further, and ignore source of such "news" in the future.

bdamm1y ago

It's obviously fictional, but let's call it contemporary drama based on a true story. I thought the point was well made. The author already noted this was a handwaving segment.

samspot1y ago

I got in trouble for something like this early in my career (running bittorrent over my work vpn).

davidgerard1y ago

what makes you think it was fictional?

also, bragging about your inability to read text seems an odd way to interact.

voytec1y ago

Bragging? Reluctant==unable?

__MatrixMan__1y ago

I like the technical stuff here.

I'm not so sure about this:

> money is core societal infrastructure, like the power grid and transportation systems are. It would be really bad if hackers working for a foreign government could just turn off money.

Sure, it would be inconvenient in the short term. But I think the current design is holding us back.

Joker_vD1y ago

__MatrixMan__1y ago

Losing access to one currency but not others is quite a different thing, I don't think that would help anybody.

gadders1y ago

In the short term people would probably starve to death.

Joker_vD1y ago

mminer2371y ago

I mean, millions still starved during the revolution, even with the American Relief Administration feeding 10% of the country.

1 more reply

imtringued1y ago

I agree there needs to be more competition, but that doesn't mean you need to get rid of the old way. It is better when two approaches run in parallel, to compensate the other's shortcomings.

__MatrixMan__1y ago

vel0city1y ago

The uber-wealthy don't have most of their assets in currency. Its in stocks, houses, cars, boats, etc. Delete the dollars, it'll hurt them a bit, but in the end they still have a house(es).

But now all those people who were using currency to trade for housing now suddenly need to find a new way to trade for shelter.

Who got hurt worse here?

__MatrixMan__1y ago

I don't think that's the most likely scenario though.

j / k navigate · click thread line to collapse