Timeline
15:45 UTC on 29 October 2025 – Customer impact began.
16:04 UTC on 29 October 2025 – Investigation commenced following monitoring alerts being triggered.
16:15 UTC on 29 October 2025 – We began the investigation and started to examine configuration changes within AFD.
16:18 UTC on 29 October 2025 – Initial communication posted to our public status page.
16:20 UTC on 29 October 2025 – Targeted communications to impacted customers sent to Azure Service Health.
17:26 UTC on 29 October 2025 – Azure portal failed away from Azure Front Door.
17:30 UTC on 29 October 2025 – We blocked all new customer configuration changes to prevent further impact.
17:40 UTC on 29 October 2025 – We initiated the deployment of our ‘last known good’ configuration.
18:30 UTC on 29 October 2025 – We started to push the fixed configuration globally.
18:45 UTC on 29 October 2025 – Manual recovery of nodes commenced while gradual routing of traffic to healthy nodes began after the fixed configuration was pushed globally.
23:15 UTC on 29 October 2025 - PowerApps mitigation of dependency, and customers confirm mitigation.
00:05 UTC on 30 October 2025 – AFD impact confirmed mitigated for customers.
Very circular way of saying “the validator didn’t do its job”. This is AFAICT a pretty fundamental root cause of the issue.
It’s never good enough to have a validator check the content and hope that finds all the issues. Validators are great and can speed a lot of things up. But because they are independent code paths they will always miss something. For critical services you have to assume the validator will be wrong, and be prepared to contain the damage WHEN it is wrong.
Troubleshooting has completed
Troubleshooting was unable to automatically fix all of the issues found. You can find more details below.
>> We initiated the deployment of our ‘last known good’ configuration.
System Restore can help fix problems that might be making your computer run slowly or stop responding.
System Restore does not affect any of your documents, pictures, or other personal data. Recently installed programs and drivers might be uninstalled.
Confirm your restore point
Your computer will be restored to the state it was in before the event in the Description field below.
Looks like there was no monitoring and no alerts.
Which is kinda weird.
Azure Portal Access Issues
Starting at approximately 16:00 UTC, we began experiencing Azure Front Door issues resulting in a loss of availability of some services. In addition. customers may experience issues accessing the Azure Portal. Customers can attempt to use programmatic methods (PowerShell, CLI, etc.) to access/utilize resources if they are unable to access the portal directly. We have failed the portal away from Azure Front Door (AFD) to attempt to mitigate the portal access issues and are continuing to assess the situation.
We are actively assessing failover options of internal services from our AFD infrastructure. Our investigation into the contributing factors and additional recovery workstreams continues. More information will be provided within 60 minutes or sooner.
This message was last updated at 16:57 UTC on 29 October 2025
---
Update: 16:35 UTC:
Azure Portal Access Issues
Starting at approximately 16:00 UTC, we began experiencing DNS issues resulting in availability degradation of some services. Customers may experience issues accessing the Azure Portal. We have taken action that is expected to address the portal access issues here shortly. We are actively investigating the underlying issue and additional mitigation actions. More information will be provided within 60 minutes or sooner.
This message was last updated at 16:35 UTC on 29 October 2025
---
Azure Portal Access Issues
We are investigating an issue with the Azure Portal where customers may be experiencing issues accessing the portal. More information will be provided shortly.
This message was last updated at 16:18 UTC on 29 October 2025
---
Message from the Azure Status Page: https://azure.status.microsoft/en-gb/status
Starting at approximately 16:00 UTC, we began experiencing Azure Front Door issues resulting in a loss of availability of some services. We suspect that an inadvertent configuration change as the trigger event for this issue. We are taking two concurrent actions where we are blocking all changes to the AFD services and at the same time rolling back to our last known good state.
We have failed the portal away from Azure Front Door (AFD) to mitigate the portal access issues. Customers should be able to access the Azure management portal directly.
We do not have an ETA for when the rollback will be completed, but we will update this communication within 30 minutes or when we have an update.
This message was last updated at 17:17 UTC on 29 October 2025
They quickly updated the message to REMOVE the link. Comical at this point.
We already had to do it for large files served from Blob Storage since they would cap out at 2MB/s when not in cache of the nearest PoP. If you’ve ever experienced slow Windows Store or Xbox downloads it’s probably the same problem.
I had a support ticket open for months about this and in the end the agent said “this is to be expected and we don’t plan on doing anything about it”.
We’ve moved to Cloudflare and not only is the performance great, but it costs less.
Only thing I need to move off Front Door is a static website for our docs served from Blob Storage, this incident will make us do it sooner rather than later.
Bunch of on-call peeps over there that definitely know the instant something major goes down
I always go everywhere adequately prepared for beverages and food. Thanks to your comment, I have a new reason to do so. Take out coffees are actually far from guaranteed. Payment systems could go down, my bank account could be hacked or maybe the coffee shop could be randomly closed. Heck, I might even have an accident crossing the road. Anything could happen. Hence, my humble flask might not have the top beverage in it but at least it works.
We all design systems with redundancy, backups and whatnot, but few of us apply this thinking to our food and drink. Maybe get a kettle for the office and a backup kettle, in case the first one fails?
Here in The Netherlands, almost all trains were first delayed significantly, and then cancelled for a few hours because of this, which had real impact because today is also the day we got to vote for the next parlement (I know some who can't get home in time before the polls close, and they left for work before they opened).
If it’s a multi day event, it’s probably that way for a reason. Partially the same as the solution to above.
I really do feel the only viable future for clouds is hybrid or agnostic clouds.
Personally I am thinking more and more about hetzner, yes I know its not an apples to orange comparison. But its honestly so good
Someone had created a video where they showed the underlying hardware etc., I am wondering if there is something like https://vpspricetracker.com/ but with geek-benchmarks as well.
This video was affiliated with scalahosting but still I don't think that there was too much bias of them and they showed at around 3:37 a graph comparison with prices https://www.youtube.com/watch?v=9dvuBH2Pc1g
Now it shows how contabo has better hardware but I am pretty sure that there might be some other issues, and honestly I feel a sense of trust with hetzner I am not sure about others.
Either hetzner or self hosting stuff personally or just having a very cheap vps and going to hetzner if need be but hetzner already is pretty cheap or I might use some free service that I know of are good as well.
I have never had much confidence in Azure as a cloud provider. The vertical integration of all the things for a Microsoft shop was initially very compelling. I was ready to fight that battle. But, this fantasy was quickly ruined by poor execution on Microsoft's part. They were able to convince me to move back to AWS by simply making it difficult to provision compute resources. Their quota system & availability issues are a nightmare to deal with compared to EC2.
At this point I'd rather use GCP over Azure and I have zero seconds of experience with it. The number of things Microsoft gets right in 2025 can be counted single-handedly. The things they do get right are quite good, but everything else tends to be extremely awful.
I remember I at one point had expanded enough menus that it covered the entirety of the screen.
Never before have I felt so lost in a cloud product.
TBH, GCP is very good! More people should use it.
They think they have the market captured, but I think what their dwindling quality and ethics are really going to drive is adoption of self hosting, distributed computing frameworks. Nerds are the ones who drove adoption of these platforms, and we can eventually end if we put in the work.
Seriously with container technology, and a bit more work / adoption on distributed compute systems and file storage (IPFS,FileCoin) there is a future where we dont have to use big brothers compute platform. Fuck these guys.
Last week I couldn't pay for flowers for grandma's grave because smartphone-sized card terminal refused to work - it stuck on charging-booting loop so I had to get cash. Tho my partner thinks she actually wanted to get cash without a receipt for herself excluding taxes
There's a fairly large supermarket near me that has both kinds of outages.
Occasionally it can't take cards because the (fiber? cable?) internet is down, so it's cash only.
Occasionally it can't take cash because the safe has its own cellular connection, and the cell tower is down.
I was at Frank's Pizza in downtown Houston a few weeks ago and they were giving slices of pizza away because the POS terminal died, and nobody knew enough math to take cash. I tried to give them a $10 and told them to keep the change, but "keep the change" is an unknown phrase these days. They simply couldn't wrap their brains around it. But hey, free pizza!
And microsoft.com too - that's gotta hurt
- on a US tenant I am unable to access login.microsoftonline.com and the login flow stalls on any SSO authentication attempt.
- on a European tenant, probably germany-west, I am able to login and access the Azure portal.
I feel pretty justified in my previous decisions to move away from Azure. Using it feels like building on quicksand…
At this point I dont believe that any one of them is any better or reliable than the others.
I felt this way about AWS last week
Luckily, we moved off Azure Front Door about a year ago. We’d had three major incidents tied to Front Door and stopped treating it as a reliable CDN.
They weren’t global outages, more like issues triggered by new deployments. In one case, our homepage suddenly showed a huge Microsoft banner about a “post-quantum encryption algorithm” or something along those lines.
Kinda wild that a company that big can be so shaky on a CDN, which should be rock solid.
Error: visual-studio-code: Download failed on Cask 'visual-studio-code' with message: Download failed: https://update.code.visualstudio.com/1.105.1/darwin-arm64/st...
The root zone and www. do not: https://dnschecker.org/#A/microsoft.com (all resolvers return records)
And querying https://www.microsoft.com/ results in HTTP 200 on the root document, but the page elements return errors (a 504 on the .css/.js documents, a 404 on some fonts, Name Not Resolved on scripts.clarity.ms, Connection Timed Out on wcpstatic.microsoft.com and mem.gfx.ms). That many different kinds of errors is actually kind of impressive.
I'm gonna say this was a networking/routing issue. The CDN stayed up, but everything else non-CDN became unroutable, and different requests traveled through different paths/services, but each eventually hit the bad network path, and that's what created all the different responses. Could also have been a bad deploy or a service stopped running and there's different things trying to access that service in different ways, leading to the weird responses... but that wouldn't explain the failed DNS propagation.
I wonder if this is microsoft "learning" to "prevent" such an issue and instead triggered it...
"One often meets his destiny on the path he takes to avoid it" -- Master Oogway
2028: the year of migrating from a managed provider to the cloud
2029: the year of migrating from the cloud to your own metal in a rack
People keep thinking the solution to their problems is to do something new (that they don't fully understand).
TIL it's called Nirvana Fallacy
Moving a website quickly is never fun.
> There are currently no active events. Use Azure Service Health to view other issues that may be impacting your services.
Links to a page on Azure Portal which is down...
"We are investigating an issue with the Azure Portal where customers may be experiencing issues accessing the portal. More information will be provided shortly."
HTTPSConnectionPool(host='schemas.xmlsoap.org', port=443): Max retries exceeded with url: /soap/encoding/ (Caused by SSLError(CertificateError("hostname 'schemas.xmlsoap.org' doesn't match '*.azureedge.net'")))
A service we rely on that isn't even running on Azure is inaccessible due to this issue. For an asset that probably never changes. Wild for that to be the SPOF.160k+ results on GitHub: https://github.com/search?q=http%3A%2F%2Fschemas.xmlsoap.org...
It's only after the fact they are transparent about the impact
How did we get here? Is it because of scale? Going to market in minutes by using someone else's computers instead of building out your own, like co-location or dedicated servers, like back in the day.
> Big Tech lobbying is riding the EU’s deregulation wave by spending more, hiring more, and pushing more, according to a new report by NGO’s Corporate Europe Observatory and LobbyControl on Wednesday (29 October).
> Based on data from the EU’s transparency register, the NGOs found that tech companies spend the most on lobbying of any sector, spending €151m a year on lobbying — a 33 percent increase from €113m in 2023.
Gee whizz, I really do wonder how they end up having all the power!
Decentralisation is winning it seems.
I think the response lies in the surrounding ecosystem.
If you have a company it's easier to scale your team if you use AWS (or any other established ecosystem). It's way easier to hire 10 engineers that are competent with AWS tools than it is to hire 10 engineers that are competent with the IBM tools.
And from the individuals perspective it also make sense to bet on larger platforms. If you want to increase your odds of getting a new job, learning the AWS tools gives you a better ROI than learning the IBM tools.
In our highly interconnected world, decentralization paradoxically requires a central authority to enforce decentralization by restricting M&A, cartels, etc.
Pick your point on the scale
Stonks
And it's very clear from these updates that they're more focused on the portal than the product, their updates haven't even mentioned fixing it yet, just moving off of it, as if it's some third party service that's down.
Unsubstantiated idea: So the support contract likely says there is a window between each reporting step and the status page is the last one and the one in the legal documents giving them several more hours before the clauses trigger.
If Azure goes down and nobody feels it, does Azure really matter?
If Azure goes down, it's mostly affecting internal stuff at big old enterprises. Jane in accounting might notice, but the customers don't. Contrast with AWS which runs most of the world's SaaS products.
People not being able to do their jobs internally for a day tends not to make headlines like "100 popular internet services down for everyone" does.
You name them. Other good providers you have experience with?
There is no reason for an expensive cloud. Never has been, but decision makers tried to keep their pants dry.
It acts as a GSLB controller inside Kubernetes — doing DNS-level health checks, region awareness, and automatic failover between clusters when one goes down.
It integrates with ExternalDNS and supports multiple DNS providers (Infoblox, Route53, Azure DNS, NS1, etc.), so it can handle failover across both on-prem and cloud clusters.
It’s not a silver bullet for every architecture, but it’s one of the few OSS projects that make multi-region failover actually manageable in practice.
Even the national digital id service is down.
Can't help but smirk as my country is ramming through "Digital ID" right now
And more importantly nobody lose any reputation except AWS/Azure/Google.
The real reason is that outages are not your fault. Its the new version of "nobody ever got fired for buying IBM" - later it became MS, and now its any big cloud provider.
On the merits though, I agree, haven’t had any serious issues with Hetzner.
The Microsoft status page mostly referenced the portal outage, but it was more than that.
https://microsoft.com/deviceloginus
Seems like they migrated the non-Gov login but not the Gov one. C'mon Microsoft, I've got a deadline in a few days.
Azure Portal Access Issues
Starting at approximately 16:00 UTC, we began experiencing DNS issues resulting in availability degradation of some services. Customers may experience issues accessing the Azure Portal. We have taken action that is expected to address the portal access issues here shortly. We are actively investigating the underlying issue and additional mitigation actions. More information will be provided within 60 minutes or sooner.
This message was last updated at 16:35 UTC on 29 October 2025
----
Azure Portal Access Issues
We are investigating an issue with the Azure Portal where customers may be experiencing issues accessing the portal. More information will be provided shortly.
This message was last updated at 16:18 UTC on 29 October 2025
-- From the Azure status page
-Cloudflare for R2 (object storage) and CDN (Fastly+backblaze also available). -Two VPS/Server providers with a decent reputation and mid-size (using a comparison site like https://serversearcher.com or look directly into people like Hetzner or latitude) -PlanetScale or Neon for database if you don't co-locate it, though better to use someone like digital ocean, vultr or latitude who offer databases too)
But then who do we blame when things are down? If we manage our own infrastructure we have to stay late to fix it when it breaks instead of saying “sorry, Microsoft, nothing we can do” and magically our clients accepting that…
Lol
So if we look at these companies' bottom lines, all those big wigs are actually doing something right. Sales and lobbying capacity is way more effective than reliability or good engineering (at least in the short term).
Oh, that'll be why Scan & Go was down yesterday evening. I thought it was another instance of an iOS 26 update breaking their crappy code.
[0] https://corporate.asda.com/newsroom/2025/22/09/asda-announce...
Much of Xbox is behind that too.
"We’re investigating an issue impacting Azure Front Door services. Customers may experience intermittent request failures or latency. Updates will be provided shortly."
I have been having issues with GitHub and the winget tool for updates throughout the day as well. I imagine things are pulling from the same locations on Azure for some of the software I needed to update (NPM dependencies, and some .NET tooling).
(couldn't resist adding it. i acknowledge this comment adds no value to the discussion)
Be interesting to understand cause here. Pretty big impact on services we use
Best of luck to the teams responding to this incident.
For example when I try to log into our payroll provider Brightpay, it sends me here:
https://bpuk1prod1environment.blob.core.windows.net/host-pro...
Edit: As of 9:19 AM Pacific time, I'm now getting successful A responses but they can take several seconds. The web server at that address is not responding.
But seriously I thought it would be the console, not a CDN.
Microsoft CDN
There, that's it. You're selling it to (hopefully) technical people
Had some Frontdoor operations timing out, but now I'm straight up denied with "Message: All Changes to Azure Frondoor Configuration are blocked currently."
What a mess.
This message was last updated at 16:35 UTC on 29 October 2025”
And so is Microsoft: http://www.microsoft.com/
How can one of the richest companies in the world not offer a better service?
The actual stuff I was working on (App Insights, Function App) that was still open was operational.
>What is required to be able to use MyGet? ... MyGet runs its operations from the Microsoft Azure in the West Europe region, near Amsterdam, the Netherlands.
Doesn't seem to be too bad of an outage unless you were relying on Azure Front Door.
We had to bypass the Frontdoor
Any guess on what's causing it?
In hindsight, I guess the foresight of some organizations to go multi-cloud was correct after all.
I also got weird notification in VS2022 that my license key was upgraded to Enterprise, but we did not purchase anything.
There's a lot of outages this month!
if that's true then it's a sign that Azure's control / data plane separation is doing it's job! at least for now
Kind of mindboggling it's still sometimes DNS maybe.
This is not the first or second time this happened, multiple Hyperscaler failed one by one.
Go cloud!
I can at least login to Azure. But several MS sites are down.
(Coder is currently at the top of the experiment list. Any other suggestions?)
edit: it worked once, then died again. So I guess - some resolvers, or FD servers may be working!
QNBQ-5W8
Funnily enough, AI has been training on its own data as generated by users writing AI conversations back to the internet - there's a feedback loop at play.
Except that it is not!
Interesting times...
I noticed that winget is also down eg.
winget upgrade fabric
Failed in attempting to update the source: winget
An unexpected error occurred while executing the command:
InternetOpenUrl() failed.
0x80072ee7 : unknown errorThere is no way it’s DNS
It was DNS
What a terrible advise.
There's no way to tell, and after about 30 minutes, the release process on VS Code Marketplace failed with a cryptic message: "Repository signing for extension file failed.". And there's no way to restart/resume it.