Weird that https://www.cloudflarestatus.com/ isn't reporting this properly. It should be full of red blinking lights.
Something must have gone really wrong.
If a closing brace take your whole infra. down, my guess is that we'll see more of this.
I don't think anyone's is.
There's a reason Cloudflare has been really struggling to get into the traditional enterprise space and it isn't price.
At first blush it's getting harder to "defend" use of Cloudflare, but I'll wait until we get some idea of what actually broke. For the time being I'll save my outrage for the AI scrapers that drove everyone into Cloudflare's arms.
Akamai was historically only serving enterprise customers. Cloudflare opened up tons of free plans, new services, and basically swallowed much of that market during that time period.
They shouldn't need to do that unless they're really disorganised. CEOs are not there for day to day operations.
> Investigating - Cloudflare is investigating issues with Cloudflare Dashboard and related APIs.
> These issues do not affect the serving of cached files via the Cloudflare CDN or other security features at the Cloudflare Edge.
> Customers using the Dashboard / Cloudflare APIs are impacted as requests might fail and/or errors may be displayed.
Their own website seems down too https://www.cloudflare.com/
--
500 Internal Server Error
cloudflare
"Might fail"
which datacenter got flooded?
It's a scheduled maintenance, so SLA should not apply right ?
They seem to now, a few min after your comment
That's not how status pages if implemented correctly work. The real reason status pages aren't updated is SLAs. If you agree on a contract to have 99.99% uptime your status page better reflect that or it invalidates many contracts. This is why AWS also lies about it's uptime and status page.
These services rarely experience outages according their own figures but rather 'degraded performance' or some other language that talks around the issue rather than acknowledging it.
It's like when buying a house you need an independent surveyor not the one offered by the developer/seller to check for problems with foundations or rotting timber.
Most of the time people will just get by and ignore even full day of downtime as minor inconvenience. Loss of revenue for the day - well you most likely will have to eat that, because going to court and having lawyers fighting over it most likely will cost you as much as just forgetting about it.
If your company goes bankrupt because AWS/Cloudflare/GCP/Azure is down for a day or two - guess what - you won't have money to sue them ¯\_(ツ)_/¯ and most likely will have bunch of more pressing problems on your hand.
Netflix doesn't put in the contract that they will have high-quality shows. (I guess, don't have a contract to read right now.)
I'm sure there are gray areas in such contracts but something being down or not is pretty black and white.
Is it? Say you've got some big geographically distributed service doing some billions of requests per day with a background error rate of 0.0001%, what's your threshold for saying whether the service is up or down? Your error rate might go to 0.0002% because a particular customer has an issue so that customer would say it's down for them, but for all your other customers it would be working as normal.
This is so obviously not true that I'm not sure if you're even being serious.
Is the control panel being inaccessible for one region "down"? Is their DNS "down" if the edit API doesn't work, but existing records still get resolved? Is their reverse proxy service "down" if it's still proxying fine, just not caching assets?
it really isn't. We often have degraded performance for a portion of customers, or just down for customers of a small part of the service. It has basically never happened that our service is 100% down.
Reality is that in an incident, everyone is focused on fixing issue, not updating status pages; automated checks fail or have false positives often too. :/
The compensation is peanuts. $137 off a $10,000 bill for 10 hours of downtime, or 98.68% uptime in a month, is well within the profit margins.
If communication disappears entirely during an outage, the whole operation suffers. And if that is truly how a company handles incidents, then it is not a practice I would want to rely on. Good operations teams build processes that protect both the system and the people using it. Communication is one of those processes.
There is no quicker way for customers to lose trust in your service than it to be down and for them to not know that you're aware and trying to fix it as quickly as possible. One of the things Cloudflare gets right is the frequent public updates when there's a problem.
You should give someone the responsibility for keeping everyone up to date during an incident. It's a good idea to give that task to someone quite junior - they're not much help during the crisis, and they learn a lot about both the tech and communication by managing it.
"Cloudflare Dashboard and Cloudflare API service issues"
Investigating - Cloudflare is investigating issues with Cloudflare Dashboard and related APIs.
Customers using the Dashboard / Cloudflare APIs are impacted as requests might fail and/or errors may be displayed. Dec 05, 2025 - 08:56 UTC
500 Internal Server Error cloudflare
No need. Yikes.
(edit: it's working now (detecting downdetector's down))
This one is green: https://downdetectorsdowndetector.com
This one is not openning: https://downdetectorsdowndetectorsdowndetector.com
This one is red: https://downdetectorsdowndetectorsdowndetectorsdowndetector....
software was a mistake
Imagine how productive we'll be now!
We can now see which companies have failed in their performative systems design interviews.
Looking forward to the post-mortem.
On what? There are lots of CDN providers out there.
Left alone corporations to rival governments emerge, which are completely unaccountable. At least there is some accountability of governments to the people, depending on your flavour of government.
the problem is, below a certain scale you can't operate anything on the internet these days without hiding behind a WAF/CDN combo... with the cut-off mark being "we can afford a 24/7 ops team". even if you run a small niche forum no one cares about, all it takes is one disgruntled donghead that you ban to ruin the fun - ddos attacks are cheap and easy to get these days.
and on top of that comes the shodan skiddie crowd. some 0day pops up, chances are high someone WILL try it out in less than 60 minutes. hell, look into any web server log, the amount of blind guessing attacks (e.g. /wp-admin/..., /system/login, /user/login) or path traversal attempts is insane.
CDN/WAFs are a natural and inevitable outcome of our governments and regulatory agencies not giving a shit about internet security and punishing bad actors.
If you switch from CF to the next CF competitor, you've not improved this dependency.
The alternative here, is complex or even non-existing. Complex would be some system that allows you to hotswap a CDN, or to have fallback DDOS protection services, or to build you own in-house. Which, IMO, is the worst to do if your business is elsewhere. If you sell, say, petfood online, the dependency-risk that comes with a vendor like CF, quite certainly is less than the investment needed- and risk associted with- building a DDOS protection or CDN on your own; all investment that's not directed to selling more pet-food or get higher margins at doing so.
Needs an ASN and a decent chunk of PI address space, though, so not exactly something a random startup will ever be likely to play with.
There are many alternatives
Of varying quality depending on the service. Most of the anti-bot/catpcha crap seems to be equivalently obnoxious, but the handful of sites that use PerimeterX… I've basically sworn off DigiKey as a vendor since I keep getting their bullshit "press and hold" nonsense even while logged in.I don't like that we're trending towards a centralized internet, but that's where we are.
It turns out so far, there isn't one. Other than contacting the CEO of Cloudflare rather than switching on a temporary mitigation measure to ensure minimal downtime.
Therefore, many engineers at affected companies would have failed their own systems design interviews.
Plus most people don't get blamed when AWS (or to a lesser extent Cloudflare) goes down, since everyone knows more than half the world is down, so there's not an urgent motivation to develop multi-vendor capability.
In some cases it is also a valid business decision. If you have 2 hour down time every 5 years, it may not have a significant revenue impact. Most customers think it's too much bother to switch to a competitor anyway, and even if it were simple the competition might not be better. Nobody gets fired for buying IBM
The decision was probably made by someone else who moved on to a different company, so they can blame that person. It's only when down time significantly impacts your future ARR (and bonus) that leadership cares (assuming that someone can even prove that they actually lose customers).
It’s actually fairly easy to know which 3rd party services a SaaS depends on and map these risks. It’s normal due diligence for most companies to do so before contracting a SaaS.
If it turns out that this was really just random bad luck, it shouldn't affect their reputation (if humans were rational, that is...)
But if it is what many people seem to imply, that this is the outcome of internal problems/cuttings/restructuring/profit-increase etc, then I truly very much hope it affects their reputation.
But I'm afraid it won't. Just like Microsoft continues to push out software, that, compared to competitors, is unstable, insecure, frustrating to use, lacks features, etc, without it harming their reputation or even bottomlines too much. I'm afraid Cloudflare has a de-facto monopoly (technically: big moat) and can get away with offering poorer quality, for increasing pricing by now.
I've said to many people/friends that use Cloudflare to look elsewhere. When such a huge percentage of the internet flows through a single provider, and when that provider offers a service that allows them to decrypt all your traffic (if you let them install HTTPS certs for you), not only is that a hugely juicy target for nation-states but the company itself has too much power.
But again, what other companies can offer the insane amount of protection they can?
The issue is the uninformed masses being led to use Windows when they buy a computer. They don't even know how much better a system could work, and so they accept whatever is shoved down their throats.
Eh.... This is _kind_ of a counterfactual, tho. Like, we are not living in the world where MS did not do that. You could argue that MS was in a good place to be the dominant server and mobile OS vendor, and simply screwed both up through poor planning, poor execution, and (particularly in the case of server stuff) a complete disregard for quality as a concept.
I think someone who'd been in a coma since 1999 waking up today would be baffled at how diminished MS is, tbh. In the late 90s, Microsoft practically _was_ computers, with only a bunch of mostly-dying UNIX vendors for competition. And one reasonable lens through which to interpret its current position is that it's basically due to incompetence on Microsoft's part.
They problem is architectural.
it will randomly fail. there is no way it cannot.
there is a point where the cost to not fail simply becomes too high.
How do they not have better isolation of these issues, or redundancy of some sort?
"How do you know?"
"I'm holding it!"
Reddit was once down for a full day and that month they reported 99.5% uptime instead of 99.99% as they normally claimed for most months.
There is this amazing combination of nonsense going on to achieve these kinds of numbers:
1. Straight up fraudulent information on status page. Reporting incendents as more minor than any internal monitors would claim.
2. If it's working for at least a few percent of customers it's not down. Degraded is not counted.
3. If any part of anything is working then it's not down. For example with the reddit example even if the site was dead as long as the image server is still at 1% functional with some internal ping the status is good.
canva.com
chess.com
claude.com
coinbase.com
kraken.com
linkedin.com
medium.com
notion.so
npmjs.com
shopify.com (!)
and many more I won't add bc I don't want to be spammy.
Edit: Just checked all my websites hosted there (~12), they're all ok. Other people with small websites are doing well.
Only huge sites seem to be down. Perhaps they deal with them separately, the premium-tier of Cloudflare clients, ... and those went down, dang.
Can't get to the Dashboard though.
Nice thing about Cloudflare being down is that almost everything is down at once. Time for peace and quiet.
>We will be performing scheduled maintenance in ORD (Chicago) datacenter
>Traffic might be re-routed from this location, hence there is a possibility of a slight increase in latency during this maintenance window for end-users in the affected region.
Looks like it's not just Chicago that CF brought down...
I thought we were meant to learn something ... ?
If the in house tech team breaks something and fixes it, that's great from an engineer point of view - we like to be useful, but the person at the top is blamed.
If an outsourced supplier (one which the consultants recommend, look at Gartner Quadrants etc) fails, then the person at the top is not blamed, even though they are powerless and the outage is 10 times longer and 10 times as frequent.
Outsourcing is not about outcome, it's about accountability, and specifically avoiding it.
The previous one affected European users for >1h and made many Cloudflare websites nearly unusable for them.
Of course, vibe coding will always find a way to make something horribly broken but pretty.
So it seems like it's just the big ol' "throw this big orange reverse proxy in front of your site for better uptime!" is what's broken...
[0] Workers, Durable Objects, KV, R2, etc
Cynicism aside, something seems to be going wrong in our industry.
P.S. it’s a joke, guys, but you have to admit it’s at least partially what’s happening
.unwrap() literally means “I’m not going to handle the error branch of this result, please crash”.
For trapping a bad data load it's as simple as:
try {
data = loadDataFile();
} catch (Exception e) {
LOG.error("Failed to load new data file; continuing with old data", e);
}
This kind of code is common in such codebases and it will catch almost any kind of error (except out of memory errors). try {
data = loadDataFile();
} catch (Exception e) {
LOG.error("Failed to load new data file", e);
System.exit(1);
}
So the "bad data load" was trapped, but the programmer decided that either it would never actually occur, or that it is unrecoverable, so it is fine to .unwrap(). It would not be any less idiomatic if, instead of crashing, the programmer decided to implement some kind of recovery mechanism. It is that programmer's fault, and has nothing to do with Rust.Also, if you use general try-catch blocks like that, you don't know if that try-catch block actually needs to be there. Maybe it was needed in the past, but something changed, and it is no longer needed, but it will stay there, because there is no way to know unless you specifically look. Also, you don't even know the exact error types. In Rust, the error type is known in advance.
> It is that programmer's fault, and has nothing to do with Rust.
It's Rust's fault. It provides a function in its standard library that's widely used and which aborts the process. There's nothing like that in the stdlibs of Java or .NET
> Also, if you use general try-catch blocks like that, you don't know if that try-catch block actually needs to be there.
I'm not getting the feeling you've worked on many large codebases in managed languages to be honest? I know you said you did but these patterns and problems you're raising just aren't problems such codebases have. Top level exception handlers are meant to be general, they aren't supposed to be specific to certain kinds of error, they're meant to recover from unpredictable or unknown errors in a general way (e.g. return a 500).
> The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they’ve been fixed. There’s nothing wrong with it. It doesn’t acquire bugs just by sitting around on your hard drive.
> Back to that two page function. Yes, I know, it’s just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I’ll tell you why: those are bug fixes. One of them fixes that bug that Nancy had when she tried to install the thing on a computer that didn’t have Internet Explorer. Another one fixes that bug that occurs in low memory conditions. Another one fixes that bug that occurred when the file is on a floppy disk and the user yanks out the disk in the middle. That LoadLibrary call is ugly but it makes the code work on old versions of Windows 95.
> Each of these bugs took weeks of real-world usage before they were found. The programmer might have spent a couple of days reproducing the bug in the lab and fixing it. If it’s like a lot of bugs, the fix might be one line of code, or it might even be a couple of characters, but a lot of work and time went into those two characters.
> When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work.
From https://www.joelonsoftware.com/2000/04/06/things-you-should-...
Started after the GFC and the mass centralisation of infrastructure
Also, I don't think their every service got affected. I am using their proxy and pages service and both are still up.
Impossible not to feel bad for whoever is tasked to cleanup the mess.
But my goodness, they're really struggling over the last couple weeks... Can't wait to read the next blog post.
I have a few domains on cloudflare and all of them are working with no issues so it might not be a global issue
Please avoid Imgur.
>We are sorry, something went wrong. >Please try refreshing the page in a few minutes. If the problem persists, please visit status.cloud.microsoft for updates regarding known issues.
The status page of course says nothing
Even if you could, having two sets of TLS termination is going to be a pain as well.
Then I go to Hacker News to check. Lo and behold, it's Cloudflare. This is sort of worrying...
bunny.net
fastly.com
gcore.com
keycdn.com
Cloudfront
Probably some more I forgot now. CF is not the only option and definitely not the best option.
> Yeah, now we'll save everyone from DDoS, everything's perfect, we'll speed up your site,
... and host the providers selling DDoS services. https://privacy-pc.com/articles/spy-jacking-the-booters.html