Would you like help?
- Get help with developing the software
- Just develop the software without help
[ ] Don't show me this tip again"
Maybe it wasn't as noticeable when Github had less features, but our CI runners and other automation using the API a decade ago always had weekly issues caused by Github being down/degraded.
If there was a prediction market for when GitHub experiences an outage every week, then you would make a lot of money.
there are tens of thousands of stupid scripts hosted on github itself that have scheduled progmatic pushes or pulls to repos via cron jobs with millions and millions of users -- yeah LLMs accelerate the fire but let's not pretend that GH was some bastion of real-user-dom somehow at some point.
1 - 10 ^ -N (multiply by 100 for percent)
So 9% is 0.09 for the calc
1 - 10 ^ -N = 0.09
So
10 ^ -N = 0.81
So
N = -log10 0.81
So 0.09 (9%) reliability is 0.0915149811 of a nine.
And running it thru... a tenth of a nine is 0.2056717653 or about 20.57% reliability
99.99
99.90
99.00
90.00
Currently consulting somwhere with 30 services per engineer. I cannot convince them this is hell. Maybe that makes it my personal hell.
One strategy to convince is to get someone less technical than you to sit by you while you try and trace everything from one error'd HTTP request from start to finish to diagnose the problem. If they see it takes half a day to check every call to every internal endpoint to 100% satisfy a particular request sometimes that can help.
Also sometimes they just think "this is a bunch of nerd stuff, why are you involving me?!" So it's not foolproof.
The real solution is probably to leave, but the market sucks at the moment. At least AI makes the 10-repos-per-tiny-feature thing easier.
In that every night you're playing murder mystery, and its never fun.
how is such service spam different from unix "small functions that do one thing only" culture?
why in unix case it is usually/historically seen as nice, while in web case it makes stuff worse?
You will basically need to employ solutions for problems only caused by your microservices arch. E.g. take reading the logs for a single request. In a monolith, just read the logs. For the many-service approach, you need to work out how you're going to correlate that request across them all.
Even the aforementioned network failures require a lot of design, and there's no standardization. Does the calling service retry? Does the callee have a durable queue and pick back up? What happens if a call/message gets 'too old'?
Also, from the other end, command line utils are typically made by entirely different people with entirely different philosophies/paradigms, so the encapsulation makes sense. That's not true when you're the one writing all the services, especially not at small-to-mid-size companies.
Plus, you already can do the single-concern thing in a monolith, just with modules/interfaces/etc.
That helps with Git not so much issues etc.
To explain this one-word comment for those unfamiliar, see previously:
GitHub will prioritize migrating to Azure over feature development (5 months ago) https://news.ycombinator.com/item?id=45517173
In particular:
> GitHub has recently seen more outages, in part because its central data center in Virginia is indeed resource-constrained and running into scaling issues. AI agents are part of the problem here. But it’s our understanding that some GitHub employees are concerned about this migration because GitHub’s MySQL clusters, which form the backbone of the service and run on bare metal servers, won’t easily make the move to Azure and lead to even more outages going forward.
I'm sure the people with the purse strings didn't care, though, and just wanted to funnel the GH userbase into Azure until the wheels fell off, then write off the BU. Bought for $7.5B, it used to make $250M, but now makes $2B, so they could offload it make a profit. I wonder who'll buy it. Prob Google, Amazon, IBM, Oracle, or a hedge fund. They could choose not to sell it, but it'll end up a writeoff if the userbase jumps ship.
At any rate, it seems like GitHub is back up now, so we'll see how long that lasts.
What? No, no it's not. The entire discipline of Infrastructure and Systems engineering are dedicated to doing these sorts of things. There are well-worn paths to making stable changes. I've done a dozen massive infrastructure migrations, some at companies bigger than Github, and I've never once come close to this sort of instability.
This is a botched infrastructure migration, onto a frankly inferior platform, not something that just happens to everyone.
Artificial intelligence, Azure integration, many other things.
https://www.forbes.com/sites/bernardmarr/2025/07/08/microsof...
Edit: oh look, their site says all good, but I still have jobs stuck. What a pile of garbage.
I'm so sick of this.
FTFY. (I've read AWS word it like that)
Sorry, I realise this comment isn't up to HN's usual standards for thoughtfulness and it is perhaps a bit inflammatory but... look, I'd bet the majority of us on this site rely on GitHub and I can't be the only one becoming incredibly frustrated with its recent unreliability[0]?
(And, yes, I did enough basic data analysis to confirm that it IS indeed getting worse versus a year, two years, and three years ago, and is particularly bad since the start of this year.)
[0] EDIT: clearly not from looking at the rest of the comments in this discussion.
> And, yes, I did enough basic data analysis to confirm
Perhaps you'd consider showing us that analysis? That sounds like it would make a pretty substantive, thoughtful comment.
Gaze upon the tapestry in which github paints it's failure with a thin copper red thread:
https://foja.applycreatures.com
Edit: it has a wonderful API so I posted the link it may tempt some to ditch MS/Azure hub.
https://trends.google.com/trends/explore?date=all&geo=GB&q=s...
IMO it's much better now.
(And the first thing to go was occasional 500's on github-hosted files.. the core service itself - git, PR, actions - were pretty stable until recently)
I just use an offline server, so I wouldn't notice if they had GitHub levels of availability.
So needless to say, if you depend on GitHub for critical business operations, you need to start thinking about what a world without GitHub looks like for your business and start working your way toward that. I know my confidence in GitHub's engineering leadership is at rock bottom.
Everywhere I’ve worked, if a migration is causing this much downtime then you kill the migration or slow it down. If every change has a 10% chance of bringing the site down, you only do a change every week or two until you can work out the kinks.
> In a message to GitHub’s staff, CTO Vladimir Fedorov notes that GitHub is constrained on capacity in its Virginia data center. “It’s existential for us to keep up with the demands of AI and Copilot, which are changing how people use GitHub,” he writes.
> The plan, he writes, is for GitHub to completely move out of its own data centers in 24 months. “This means we have 18 months to execute (with a 6 month buffer),” Fedorov’s memo says. He acknowledges that since any migration of this scope will have to run in parallel on both the new and old infrastructure for at least six months, the team realistically needs to get this work done in the next 12 months.
If you consider that six month parallel window to have started from the time of the October memo (written presumably at the start of October), then that puts us currently or past the point where they would have cut off their old DC and defaulted to Azure only.
Whether plans or timelines changed, I have no idea of course but the above does make for a convenient timeline that would explain the recent instability. Of course, it could also just be symptomatic of increased AI usage generally and the same problems might have surfaced at a software level regardless of whether they were in a DC or on Azure.
Putting that nuance aside, personally I like the idea that Azure is simply a giant pile of shit operated by a corporation with no taste.
[1]: https://thenewstack.io/github-will-prioritize-migrating-to-a...
if by chance the CTO reads this, as a user of GitHub I would find it really existential if GitHub continues functioning as a reliable hub for git workflows (hence the name), and I have the strong suspicion nobody except for the shareholders gives a lick about copilot or 'AI' if it makes the core service the site was designed for unusable
I wonder if the extended downtime is just due to the on-call engineers waiting for their azure auth tokens to refresh within azure's own damn network.
"The evidence is clear: Either you embrace AI, or get out of this career." -Github CEO
"Sooner than later, 80% of the code is going to be written by Copilot. And that doesn’t mean the developer is going to be replaced." -Github CEO
I can’t be specific but we are constantly complaining.
I think they may need to do that once again. Almost every product of theirs feels like a dumpster fire. GitHub is down constantly, Windows 11 is a nightmare and instead of patching things they're adding stupid features nobody asked for. I think they need to stop and really look closely at what they're prioritizing.
I like AI but actually not for coding because code quality is correlated to how well you understand the underlying systems you're building on, and AI is not really reasoning on this level at all. It's clearly synthesizing training data and it's useful in limited ways.
Seemingly the decline started with the Microsoft acquisition in 2018, and subsequent "unlimited private repository" change in 2019 (to match Gitlab's popular offer)
Did you hear about the screenwriters school in which the professors said to avoid AI for writing, but it's great for storyboards. And the storyboard school where the professors said the opposite?
The reality is that AI isn't actually "good" at anything. It produces passable ersatz facsimiles of work that can fool those not skilled in the art. The second reality of AI is that everyone is busy cramming it into their products at the expense of what their products are actually useful for.
Once people realise (1), and stop doing (2), the tech industry has a chance of recovering.
Another site was constantly getting DDoS by Russians who were made we took down their scams on forums, that had to go through verisign back then, not sure who they're using now. They may have enough aggregate pipe it doesn't matter at this point
I've been considering it for a while, but I'm definitely now pitching a move away from GitHub at our organization.
They have not even bothered to implement entra login when they have their competitors login for years, do they even know what their product is? Or are you just a middle man for slop?
https://gitlab.com/gabriel.chamon/ci-components/-/tree/main/...
Might catch 90% of problems before they make it into the real stack?
E.g. every step of GitHub's migration to Azure could be mimicked on the duplicate stack before it's implemented on the primary stack. Is this just considered too much work? (I doubt cost would be the issue, because even if it costs millions, it would pay for itself in reduced reputational damage from outages).
EDIT: downvotes - why? - I think this is a good idea (I'd do it for my sites if outages were an issue).
Because that's a monumental amount of work, and extraordinarily difficult to retrofit into a system that wasn't initially designed that way. Not to mention the unstated requirement of mirroring traffic to actually exercise that system (given the tendency of bugs to not show up until something actually uses the system).
Agree, but look at the alternative; GitHub is constantly being savaged by users who (quite reasonably) expect uptime. Ignoring impacts on morale and reputation, damage to their bottom line alone might tens (hundreds?) of millions per year.
> mirroring traffic
yeah, I agree that's difficult, but it need to not be exact to still be useful.
If you'd ever worked on a codebase as terrible as I imagine GH's internals are and looked at the git history, you'd find two things:
1) fixing it would require rolling back 100's-1000's of engineer-years of idiocy that make things like testing or refactoring untenable
2) many prior engineers got part of the way through such improvements before leaving or being kicked out. Their efforts mostly just made it worse, because now you never know what sort of terribleness to expect when you open an unfamiliar file.
Is all the recent GitHub downtime entirely attributable to GitHub AI Copilot related development? How hard can it be to reduce the blast radius of new AI features to not affect the core parts of hosting repositories? Because of Copilot everywhere, The UX has become bad and I had to click all over the place and on my profile to find repositories.