Last week I transferred a domain used for a personal project from my old registrar to Cloudflare. After the transfer was finalized and new NS records had propagated, everything resolved normally and everything was working fine. I then enabled DNSSEC, and after a while the domain would no longer resolve. Every DNS server I try - Google, Quad9, OpenDNS, even Cloudflare's own DNS on 1.1.1.1 - returns SERVFAIL. The excellent diagnostic tool on dnsviz.net tells me that the domain is returning bogus DNSKEY/DS/NSEC responses and bogus delegation status. "no SEP matching the DS found".
I tried canceling the DNSSEC setup and waiting for over a day, with no effect. I re-enabled DNSSEC setup and waited for 3 days, with no effect. Cloudflare's control panel has since several days now been saying that DNSSEC will be enabled "in the next 24 hours". My site cannot be reached, and Cloudflare's support cannot be reached.
I've been forced to migrate the project and its (few) users to a completely different domain. I cannot inconvenience users by bouncing them back and forth, so the domain Cloudflare ruined for me is now effectively lost, as is the "branding" of the project which was reflected in the domain's name.
How can I get their attention without paying for an Enterprise plan? I would like to think that basic functional service should be accessible even when using Cloudflare only as a registrar with fundamental DNS on a free plan.
Can you email me - silverlock at cloudflare - with your ticket ID and domain name so I can understand what broke?
Ultimately it looks like the existing DS records for your domain weren't removed (and you can see that in your DNSViz output). Still have some questions for "how" it was working beforehand (see the email for those).
For others: I'll let the OP share what details they would like to, as this is their domain.
I'm nervous about migrating a domain to CF which is used for glue records, and want to have immediate support access if something goes wrong.
Roughly one hour after I e-mailed @elithrar who kindly reached out and offered to expediate the issue, the broken DNSSEC records were partly fixed. The domain once again resolved through all major DNSes, and public access was restored. At that point dnsviz.net told me that A, MX, etc. records were "insecure", though name resolution worked fine. A few minutes ago I took another look with dnsviz and it's now telling me that all records are secure. Everything looks normal again.
Thanks a bunch for helping out, @elithrar. I really appreciate that you were proactive.
If the problem had somehow fixed itself or if the support ticket had gotten any attention or feedback at all within a day or two instead of just being "snoozed" by support staff, I wouldn't have made any noise about it. After four days of complete silence a bit of "cry-baby consumer activism" seemed like the only resort.
If CF reconnects to me with an update on why the domain dead-locked and why it took 4 days to untilt everything I'll add that info as well.
I've been OP and this has been an update about my domain woes.
All plans come with support. Even the free plans (community, or email, the bot will deflect the request but if you email you're still stuck, you will get a reply _eventually_ (due to heavy support load, it can take a while though).
The correct procedure would be:
* turn off DNSsec on old registrar (and wait a day or two)
* update NS and/or migrate domain
* wait a while and make sure it works
* turn on DNSsec in CF dash and update DNSsec settings in the domain
It's not that DNSsec doesn't work -- it's doing exactly what it's supposed to be doing.
Transport security is like HTTPS. DNSSEC was the equivalent of PGP signing every webpage. The former brings value to the end user, the latter not so much.
Even the government has issued memo M-18-23 ("Shifting From Low-Value to High-Value Work") that rescinds the requirements for the government to implement DNSSEC.
WebPKI is a joke because it lacks name constraints and so isn't and can't be hierarchical. DNSSEC is a true PKI -- you can still have multiple roots if you like and don't trust ., but it's got name constraints, so whatever domain you graft an alternate PKI at, from there on down you get bound to that PKI. This is really, truly fantastic.
Add DANE and you have a complete replacement for WebPKI.
DNSCrypt is needed to increase confidentiality, it's cheaper than DNSQuic and such things. Unfortunately .'s and com's and major TLDs' NSes are unlikely to want to waste CPU cycles on any DNS confidentiality solution, and even if some TLDs did, unless clients use QName minimization, users gain no confidentiality -- . and the TLDs all have to adopt it.
The two, together, would be truly fantastic.
Source: me, having lost access to my domain for 4 days for reasons that are not yet fully clear to me.
My understanding is that people break things.
> Just culture is a concept related to systems thinking which emphasizes that mistakes are generally a product of faulty organizational cultures, rather than solely brought about by the person or persons directly involved. In a just culture, after an incident, the question asked is, "What went wrong?" rather than "Who caused the problem?".
Prominent (and very effective) example: Aviation safety.
DNSSEC is both easy to break and hard to fix. https://sockpuppet.org/blog/2015/01/15/against-dnssec/
Upgrading to a non-free plan?
You don't have to upgrade to enterprise, but even their $20/mo plan comes with support.
(Also, I hate to victim-blame here but using DNSSEC was a bad idea in the first place)
Slack had a 24 hour outage doing exactly that; so I don't think "revert to non-DNSSEC without issues" is trivial at all.
It does create a lot of operational risk, as you’ve discovered. It also checks a box if you’re building a system for the US Federal .gov.
tptacek has written about this at length on this site and other places.
Just comment on HN and they'll crawl out of the woodwork.
I registered a domain at Google Domains.
Then I configured the domain at CloudFlare.
At first it worked OK then I started getting SERVFAIL.
I found the problem was there was still DNSSEC configuration set up at Google Domains. I deleted that and everything worked OK.
Cloudflare was not at fault in my case.
By paying for the cheapest plan, or any plan at all for that matter.
Cloudflare's kind policy of zero markup on domain registrations on a free plan is remarkably generous. OK, sure, the traffic data has an obvious value to them, but maybe the support environment could improve with, I dunno, a tiny 5% markup.
If this was that important then you should not have used the free plan.
There is by users' own hands no way out of domain registration issues like these, sooner than 30-45 days when the domain can be transferred once again. Those who decide to offer registrar services, even for free, must hold some liability towards the users and the ecosystem and offer some support to make sure their product actually works.
Its frankly, disgusting.
I have an issue with the Cloudflare infrastructure on my domain since WEEKS, giving me thousands of 503 Service Temporarily Unavailable errors per day (cloudflare side, not the origin server) and nobody seems to care or able to resolve.
Removing the ability to create support tickets on free plan doesn't help at all, I mean, I get it why they're doing it, but asking on their community forum as an alternative it's not an acceptable solution. Neither going after Cloudflare employees on social media platforms hoping for a reply.
If I'm also going to pay for their services such as Zero Trust, domains registrar and R2, why do I have to switch to a Pro plan just to open a support ticket? Perhaps a middle-ground solution like 1 free support ticket per month on a free plan would be a good compromise?
I still think they're giving an incredible service and value for free, but this sucks.
Recently I had to reach out to @CloudflareSupport on Twitter to get my several day old report of bad Warp+ routing on the forums looked at. It was eventually fixed but it was done in silence and really wasn't something that should've happened in the first place. Nor has there been a followup report on what went wrong.
It’s weird; If I was affected by a bug on Cloudflare, my first instinct would not be to start giving them money in order to be allowed to inform them about it.
We're on a pro plan and have had an outstanding support ticket since March 22nd. With the last cloudflare response being 19 days ago.
I can't seem to get cloudflare to talk directly to backblaze (it's a domain mapping issue) and playing the middle-man in a back and forth between cloudflare and backblaze support seems to be recipe for not getting things resolved promptly.
I know it's covid times and organizations may be short staffed but compare this to cloud66 support who implemented a whole code update to support a special edge case for a non-paying customer within 48 hours. That makes an almost 2 month old unresolved ticket seem a bit tired.
@jgrahamc I'll email you ticket details in case you'd like to take a look.
I've had nothing but problems with them personally
I know some people swear by them ... I'm in the "swear at them" camp
It's still pretty early days for DNSSEC, if you're going to use it it's worthwhile to know a lot about it. Just look at the several Slack outages caused by their attempts to implement it. Eventually the tooling will catch up, and registrars will all give you warnings about moving DNS and registration and the importance of syncing up your keys but we just aren't there yet.