Discord Incident

82 points by moelf 2 hours ago | 47 comments

rweichler 2 hours ago |
Times like this make me miss the IRC days, I was just able to reproduce a bug in an semi-open-source project, and Discord went down right in the middle of me sending my findings. Now there's nothing I can do about it. I can only wait.
aroman 2 hours ago |
I mean, when freenode would go down it was more or less the same thing, no?
Brian_K_White an hour ago |
Not remotely.
IRC is distributed and federated. Not only are there countless networks, each network has countless servers, and each group of servers that are up and can see each other can operate on their own, all the way down to a single server, or up to any subset up to all.
When a peering connection goes down and the network splits, maybe some people in the group disappear, or maybe from your point of view everyone else disappears.
Maybe the remaining subset of other users is already good enough because it's enough to continue what you were tallking about and who you were talking with, or if not, you have the option to just try some other servers until you find where everyone alse is. Were "server" is an actual seperate instance of the server software operated by an independant person, hosted on whatever kind of hardware or vm they set up, connected to whatever network they are on, not what Discord calls a "server".
Even if the entire group of say freenode servers goes down somehow (even though that's not really possible) there is still undernet and 400 other nets. Even without prior coordination it would be essentially trivial for the users to all just go looking for, or create on the spot, the same channel on some other net, and basically everyone finds each other again almost effortlessly. And that's if something unbelievable actually happens, let alone the normal minor breaks that actually happen once in a while.
This is entirely different from being wholly at the mercy of the single entity Discord.
aroman an hour ago |
You're arguing against a claim I did not make.
Freenode had full-network outages periodically. ddos attacks, infrastructure failures etc. and when those happened, the practical experience was the same... people waited it out. Nobody coordinated a mass migration to undernet or stood up alternative servers for a few hours. (It took much bigger issues - social/organizational/political, not technical - to catalyze the mass migration.)
You're making an argument about the virtues of decentralization - and I agree, decentralization is great! Just in practical reality, freenode (not IRC itself) had exactly the same failure mode as we just saw today.
nubinetwork 30 minutes ago |
> Nobody coordinated a mass migration to undernet or stood up alternative servers for a few hours.
There was always oftc...
Analemma_ 2 hours ago |
How would that have been different in the IRC days?
omoikane an hour ago |
There would have been a server split, and half of the people would be chatting among themselves wondering why the some people suddenly got disconnected, while being unaware that there might be a server problem because they can still continue to chat. The other half of the people would think the same.
StableAlkyne an hour ago |
Don't forget everyone flooding both halves with "z0mg net split!"
:P
echelon an hour ago |
> "z0mg net split!" :P
The 00's were an interesting time in internet culture.
Internet slang like this disappeared almost completely once the whole world got access and platforms rooted out all the weird and niche communities.
skerit an hour ago |
Netsplits were fun. Especially if you were on the splitting part. You could get to know new people you got stuck with.
uproarchat an hour ago |
In the IRC days you could have been running your own server for you and friends. It would've taken something much worse, like your VPS dying or something upstream failing.
Aurornis an hour ago |
I feel like I have different memories about the instability of Internet services in the past than some people do.
Common IRC servers were not without problems. I think it was just more common to shrug it off and do something else until the problems went away.
cogman10 an hour ago |
The difference was it wasn't one global server for everyone. I think that's why the past feels like it was more stable.
Now, aws or cloudflare gets a hickup and half the internet is nuked.
The old internet was far more federated so doing something else meant to me "Welp, anandtech is down, let's go to pcper, digg, tomshardware, slashdot, etc"
Sure stuff would go down, but it would be just that small community rather than most of chat for the internet.
filoleg 33 minutes ago |
Yeah, but (as a user) I would rather have one global server crash for 1-2hrs two-three times per year, as opposed to having each individual server randomly crash once a month for at least one each time.
The more I sit down and try to remember how it actually was to use internet in late 00s, the only thing that always comes up is "there is no way people today would tolerate it nearly as well as we did back then".
bayindirh an hour ago |
I never seen long-running problems in big server-federations like DALNet. Our local "big" IRC servers were generally down for 10 minutes at most. They were not empty either.
Simple services recover faster. Federated infrastructure is much more resilient. We had slower computers, more considerate coders, and simpler software; so everything was snappier, even with 56K modems.
For example, navigate to https://git.sr.ht/~bayindirh/. No scripts, pure HTML. running on a single server. Served instantly.
This is possible. We, as in the world, just ignore it for shinier stones.
Now, a small VPS in an AWS server lapses for 5 seconds, and half of internet is toast. Centralization for the PWN!
piva00 an hour ago |
Yeah, netsplits were really common; nickserv and/or chanserv not working for long periods making popular channels a hell without ops.
I think the centralisation is the issue, I could connect to a different IRC network with a community around the same topic/game. When Discord is down there's nowhere else to go.
bayindirh an hour ago |
Ah yes. chanserv and nickserv hiccups were bad. I remember that now, but they were not as catastrophic as outages we see today.
BoredPositron 40 minutes ago |
If there was a netsplit you just bunched together on one server. It was more decentralized and a bit more reliant in a way.
jstummbillig an hour ago |
I don't exactly know what you are comparing. No popular IRC network came anywhere near what we would find acceptable in terms of reliability today. It was an absolutely (wonderful) wildfire.
anyfoo 44 minutes ago |
Netsplits, where the entire IRC network would "split" into two (or more) effectively independent networks because some a link between two servers went down, were extremely common. I don't know if daily, or weekly, but common enough to be perceived as normal and expected in any case.
In the earlier days of IRC, netsplits were sometimes used for channel takeover. If someone was on a split off part of the net where there were so few people in the channel that they could obtain op status, they could kill and ban the "legitimate" ops when the nets joined back together.
BoredPositron 41 minutes ago |
It was so much fun placing some eggdrops on servers that usually split to takeover channels.
113 2 hours ago |
It's not a novel opinion but I'm tired of things being fucked all the time.
grim_io 2 hours ago |
Can't wait for Cloudflare to go the way of GitHub :)
this_user 2 hours ago |
It's gonna get a lot worse with all the AI slopcode that is about to be pushed directly to production.
majorchord an hour ago |
You mean all the AI "slop" that's finding and writing new kernel exploits every day? And submitting hundreds of previously-unknown security bugs in critical software?
suprjami an hour ago |
What are you saying here? An LLM helped humans do something right once therefore it's perfect to use in every other situation too?
dakolli an hour ago |
llms aren't doing any of that. Some smart people are using llms find those vulns, there's a huge difference.
block_dagger an hour ago |
Your definition of "do" seems different than mine.
eowln an hour ago |
I predict in the future many will blame the poor overall quality of software and the poor uptime of services on AI, as if things weren’t terrible before AI.
pixl97 an hour ago |
Isn't consolidation great.
uproarchat 2 hours ago |
I keep second guessing whether or not to mention Uproar here out of fear we'll get roasted, because it's still got its blemishes, but here goes: https://uproar.chat
We're hoping to do better than discord, hopefully you get some use from it!
Backend is written in go, frontend is vanilla html/js/css, TOS and PP are readable in one breath each.
edit: looks like nobody can see this unless it's vouched. I guess because of the link and VPN.
dang 31 minutes ago |
(It doesn't look like you've been spamming HN so I restored the comment)
uproarchat 21 minutes ago |
Thanks Dang!
coreylane 2 hours ago |
im surprised the aws outage hasn't been bigger news today https://www.cnbc.com/2026/05/08/aws-outage-data-center-fandu...
jeffwask an hour ago |
It was one AZ. Kinda surprised those guys are built in a way where a single AZ failure takes them down.
dietr1ch an hour ago |
AWS makes it annoying to be resilient as AZs aren't transparent to their users, so I'm more surprised some were prepared for it.
It seems to me these day people are OK with AWS going down and just blaming it on AWS rather than on themselves for not being prepared for big outages.
"Oh, nothing we can do because AWS/Cloudflare is down"
baronvonsp 18 minutes ago |
> AWS makes it annoying to be resilient as AZs aren't transparent to their users
What does transparent mean here? AWS is super clear what resources are zonal and provides tons of guidance around making things multi-AZ. AZ outages aren't exactly frequent but they're reasonably likely.
Being susceptible to AZ (or region) outages is very much an architectural decision. Or a bug that needs to be fixed (I'm sure Coinbase didn't YOLO single-AZ, they've undoubtedly learned about some edge case that needs to be fixed). Sure it may not be worth the cost/complexity for some systems but resiliency is like job one for anything in the cloud that costs money when it's down.
dietr1ch 7 minutes ago |
> What does transparent mean here?
Transparent as a system/box can be, meaning that you can't see / know about it. (Yeah, I guess you can read that not transparent as obscure in disclosure of how their system works, but it shouldn't make much sense)
> AWS is super clear what resources are zonal and provides tons of guidance around making things multi-AZ.
Yeah, they allow people cheaping out for zonal resources and then going down with their zone.
uproarchat 18 minutes ago |
For a while there was a joke that if us-east-2 goes down it's not as big a deal because everything is down.
ChrisArchitect an hour ago |
https://news.ycombinator.com/item?id=48057294
offmycloud an hour ago |
I'm getting Access Denied - You don't have permission to access "http://www.cnbc.com/2026/05/08/aws-outage-data-center-fandue..." on this server. (with an Edgesuite error code)
dang 32 minutes ago |
Thanks! - I've re-upped https://news.ycombinator.com/item?id=48058197 using the SCP mechanism (https://news.ycombinator.com/item?id=26998308).
huxflux an hour ago |
IRC unite!
n80sire an hour ago |
Likely due to aws incident.
ChrisArchitect an hour ago |
Can update to the direct link: https://discordstatus.com/incidents/4hpm4454hxtx
throwawayk7h an hour ago |
matrix.org alive and well