Everyone knows reading code is one-hundredth as fun as writing it, and while we have to accept some amount of reading as the "eating your vegetables" part of the job, FOSS project maintainers are often in a precarious enough position as it is re: job satisfaction. I think having to dramatically increase the proportion of reading to writing, while knowing full well that a bunch of what they are reading was created by some bozo with a CC subscription and little understanding of what they were doing, will lead to a bunch of them walking away.
It's been proposed that we start collaborating in specs, and just keep regenerating the code like it's CI, to get back to the feeling of collaboration without holding back on the energy and speed of agent coding
I really like this thought. We used to take pride in elegant solutions and architectural designs. Now, in the era of shipping fast and AI, this has been disregarded. Redundancy is everywhere, spaghetti is normalized. AI code has always been unsettling for me and I think this is why.
I see a future where those that survive are doing mostly architecture work, and a few druids are hired by AI companies.
That's only true if the LLM understands the code in the same way you do - that is, it shares your expectations about architecture and structure. In my experience, once the architecture or design of an application diverges from the average path extracted from training data, performance seriously degrades.
You wind up with the LLM creating duplicate functions to do things that are already handled in code, or using different libraries than your code already does.
Typing speed is your bottleneck?
no, it isn't. unless the generated code is just a few lines long, and all you are doing is effectively autocompletion, you have to go through the generated code with a fine toothed comb to be sure it actually does what you think it should do and there are no typos. if you don't, you are fooling yourself.
The reason I put scare quotes on "understand" is that we need to acknowledge that there are degrees of understanding, and that different degrees are required in different scenarios. For example, when you call syscall(), how well do you understand what is happening? You understand what's in the manpage; you know that it triggers a switch to kernel space, performs some task, returns some result. Most of us have not read the assembly code, we have a general concept of what is going on but the real understanding pretty much ends at the function call. Yet we check that in because that level of understanding corresponds to the general engineering standard.
In some cases, with AI, you can be reasonably sure the result is correct without deeply understanding it and still meet the bar. The bazel rule example is a good one. I prompt, "take this openapi spec and add build rules to generate bindings from it. Follow existing repo conventions." From my years of engineering experience, I already know what the result should look like, roughly. I skim the generated diff to ensure it matches that expectation; skim the model output to see what it referenced as examples. At that point, what the model produced is probably similar to what I would have produced by spending 30 minutes grepping around, reading build rules, et cetera. For this particular task, the model has saved me that time. I don't need to understand it perfectly. Either the code builds or it doesn't.
For other things, my standard is much higher. For example, models don't save me much time on concurrent code because, in order to meet the quality bar, the level of understanding required is much higher. I do need to sit there, read it, re-read it, chew on the concurrency model, et cetera. Like I said, it's situational.
There are many, many other aspects to quantifying the effects of AI on productivity, code quality is just one aspect. It's very holistic and dependent on you, how you work, what domain you work in, the technologies you work with, the team you work on, so many factors.
so the exact same thing you should be doing in code reviews anyway?
and as i wrote in my other comment, reviewing the code of a junior developer includes the satisfaction of helping that developer grow through my feedback. AI will never grow. there is no satisfaction in reviewing its code. instead it feels like a sisyphusian task, because the AI will make the same mistakes over and over again, and make mistakes a human would be very unlikely to make. unlike human code with AI code you have to expect the unexpected.
If it was predictable like a transpiler, you wouldn't have to read it. you can think of it as a pure gain but you are just not reading the code its outputting.
The productivity gets siphoned to the AI companies owning the AI.
Which is harder, writing 200 lines of code or reading 200 lines of code someone else wrote.
I pretty firmly find the latter harder, which means for me AI is most useful for finessing a roughly correct PR rather than writing the actual logic from scratch.
Great comment. Understanding is mis-"understood" by almost everyone. :)
Understanding a thing equates to building a causal model of the thing. And I still do not see AI as having a causal model of my code even though I use it every day. Seen differently, code is a proof of some statement, and verifying the correctness of a proof is what a code-review is.
There is an analogue to Brandolini's bullshit asymmetry principle here. Understanding code is 10 times harder than reading code.
Only if the person doesn't want the AI to help in understanding how it works, in which case it doesn't matter whether they use AI or not (except without they couldn't push some slop out the door at all).
If you want that understanding, I find that AI is actually excellent with it, when given proper codebase search tools and an appropriately smart model (Claude Code, Codex, Gemini), easily browsing features that might have dozens of files making them up - which I would absolutely miss some details of in the case of enterprisey Java projects.
I think the next tooling revolution will probably be automatically feeding the model all of the information about how the current file fits within the codebase - not just syntax errors and automatically giving linter messages, but also dependencies, usages, all that.
In my eyes, the "ideal" code would be simple and intuitive enough to understand so that you don't actually need to spend hours to understand how a feature works OR use any sort of AI tool, or codebase visualization as a graph (dependency and usage tracking) or anything like that - it just seems that you can't represent a lot of problems like that easily, given time constraints and how badly Spring Boot et al fucks up any codebase it touches with accidental complexity.
But until then, AI actually helps, a lot. Maybe I just don't have enough working memory (or time) to go through 30 files and sit down and graph it out in a notebook like I used to, but in lieu of that an AI generated summary (alongside docs/code tests/whatever I can get, but seems like humans hate writing docs and ADRs, at least in the culture here) is good enough.
At the same time, AI will also happily do incomplete refactoring or not follow the standards of the rest of the codebase and invent abstractions where it doesn't need any, if you don't have the tooling to prevent it automatically, e.g. prebuild checks (or the ability to catch it yourself in code review). I think the issue largely is limited context sizes (without going broke) - if I could give the AI the FULL 400k SLoC codebase and the models wouldn't actually start breaking down at those context lengths, it'd be pretty great.
for prototypes and throwaway stuff where only the results count, it may be ok. but not for code that goes into a larger project. especially not FOSS projects where the review depends on volunteers.
1. These llms are smart and dumb at the same time. They make a phenomenal contribution in such a short time and also do a really dumb change that no one asked for. They break working code in irrational ways. I’ve been asking them to add so many tests for all the functions I care about. This acts as a first guard rail when they trip over themselves. Excessive tests.
2. Having a compiler like Rust’s helps to catch all sorts of mines that the llms are happy to leave.
3. The LLMs don’t have a proper working memory. Their context is often cluttered. I find that curating that context (what is being done, what was tried, what is the technical goal, specific requests etc) in concise yet “relevant for the time” manner helps to get them to not mess up.
Perhaps important open source projects that choose to accept AI generated PRs can have such excessive test suites, and run the PRs through them first as a idiotic filter before manually reviewing what the change does.
About the README etc: we ship an SDK and a lot of people use our source code as docs or a prototyping environment. I think a lot about agents as consumers of the codebase and I want help them navigate the monorepo quickly. That said, I'm not sure if the CONTEXT.md system I made for tldraw is actually that useful... new models are good at finding their way around and I also worry that we don't update them enough. I've found that bad directions are worse than no directions over time.
Now that writing the code is the easy part, we're just going to transition to having very few contributors, who are needed for their architectural skills, product vision, reasoned thinking, etc, rather than pure code-writing.
"The Ghostty project allows AI-assisted code contributions, which must be properly disclosed in the pull request."
https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTIN...Mitchell Hashimoto (2025-12-30): "Slop drives me crazy and it feels like 95+% of bug reports, but man, AI code analysis is getting really good. There are users out there reporting bugs that don't know ANYTHING about our stack, but are great AI drivers and producing some high quality issue reports.
This person (linked below) was experiencing Ghostty crashes and took it upon themselves to use AI to write a python script that can decode our crash files, match them up with our dsym files, and analyze the codebase for attempting to find the root cause, and extracted that into an Agent Skill.
They then came into Discord, warned us they don't know Zig at all, don't know macOS dev at all, don't know terminals at all, and that they used AI, but that they thought critically about the issues and believed they were real and asked if we'd accept them. I took a look at one, was impressed, and said send them all.
This fixed 4 real crashing cases that I was able to manually verify and write a fix for from someone who -- on paper -- had no fucking clue what they were talking about. And yet, they drove an AI with expert skill.
I want to call out that in addition to driving AI with expert skill, they navigated the terrain with expert skill as well. They didn't just toss slop up on our repo. They came to Discord as a human, reached out as a human, and talked to other humans about what they've done. They were careful and thoughtful about the process.
People like this give me hope for what is possible. But it really, really depends on high quality people like this. Most today -- to continue the analogy -- are unfortunately driving like a teenager who has only driven toy go-karts. Examples: https://github.com/ghostty-org/ghostty/discussions?discussio... " ( https://x.com/mitchellh/status/2006114026191769924 )
> @zeroxBigBoss: .. It's not all AI, I have experience with Zig and MacOS, ..
> @mitchellh: I appreciate it! And my bad on the experience, I must have misunderstood or misremembered your messages
Use xcancel. For the very least to see an entire thread.
Step 1: thought leader reveals Shocking(tm) AI achievement
Step 2: post gets traction
Step 3: additional context is revealed, dragging the original claim from the realm of the miraculous to "merely" useful.
I don't think Mitchell intentionally misrepresented/exaggerated, but the phenomenon is reccuring. What's the logical explanation for the frequency?
Imo, an issue is that the majority of people who submit AI slop as PRs have different motivations than this person (developing a PR portfolio whatever that may mean), or are much less competent and eager to do actual work themselves (which AI use can worsen).
is this satire?
I wouldn't bet on it
SlopHub
This has always been the problem with github culture.
On the Linux and GCC mailing lists, a posted patch does not represent any kind of commitment whatsoever from the maintainers. That's how it should be.
The fact that github puts the number of open PR requests at the very top of every single page related to a project, in an extremely prominent position, is the sort of manipulative "driving engagement" nonsense you'd expect from social media, not serious engineering tools.
The fact that you have to pay github money in order to permanently turn off pull requests or issues (I mean turn off, not automatically close with a bot) is another one of these. BTW codeberg lets any project disable these things.
Skynet was evil and impressive in The Terminator. Skynet 3.0 in reallife sucks - the AI slop annoys the hell out of me. I now need a browser extension that filters away ALL AI.
Then I just took my hosting private. I can’t be arsed to put in the effort when they don’t.
> If the job market is unfavourable to juniors, become senior.
That requires networking with a depth deep enough that other professionals are willing to critique your work.
So... open-source contributions, I guess?
This increases pressure on senior developers who are the current maintainers of open-source packages at the same time that AI is stealing the attention economy that previously rewarded open-source work.
Seems like we need something like blockchain gas on open-source PRs to reduce spam, incentivize open-source maintainers, and enable others to signal their support for suggestions while also putting money where their mouth is.
Don't love your job, job your love.
That’s just the regular LinkedIn nonsense. Very few people have the time and other resources to become seniors while unemployed. On top of that, it’s still unlikely that they’ll pass the HR filter without senior positions on their resumes, regardless of their actual knowledge.
You need a literary agent for just about all of them
A system where I can mark other people as trusted and see who they trust, so when I navigate to a web page or in this case, a Github pull request, my WoT would tell me if this is a trusted person according to my network.
Also, there needs to be some significant consequence to people who are bad actors and, transitively, to people who trust bad actors.
The hardest part isn’t figuring out how to cut off the low quality nodes. It’s how to incentivize people to join a network where the consequences are so high that you really won’t want to violate trust. It can’t simply be a free account that only requires an a verifiable email address. It will have to require a significant investment in verifying real world identity, preventing multiple accounts, reducing account hijackings, etc. those are all expensive and high friction.
It is the exact thing this system needs
AI slop is so cheap that it has created a blight on content platforms. People will seek out authentic content in many spaces. People will even pay to avoid the mass “deception for profit” industry (eg. Industries where companies bot ratings/reviews to profit and where social media accounts are created purely for rage bait / engagement farming).
But reputation in a WoT network has to be paramount. The invite system needs a “vouch” so there are consequences to you and your upstream vouch if there is a breach of trust (eg. lying, paid promotions, spamming). Consequences need to be far more severe than the marginal profit to be made from these breaches.
If someone showed up on at-proto powered book review site like https://bookhive.buzz and started trying to post nonsense reviews, or started running bots, it would be much more transparent what was afoot.
More explicit trust signalling would be very fun to add.
A curation network, one which uses SSL-style chain-of-trust (and RSS-style feeds maybe?) seems like it could be a solution, but I'm not able to advance the thought from just being an amorphous idea.
I don't think it's coming to an end. It's getting more difficult, yes, but not impossible. Currently I'm working on a game, and since I'm not an artist, I pay artists to create the art. The person I'm working closest with I have basically no idea who they are, except their name, email and the country they live in. Otherwise it's basically "they send me a draft > I review/provide feedback > Iterate until done > I send them money", and both of us know basically nothing of the other.
I agree that trust in the individual is becoming more important, but it's always been one of the most important thing for collaborations or anything that involves other human beings. We've tried to move that trust to other system, but seems instead we're only able to move the trust to the people building and maintaining those systems, instead of getting rid of it completely.
Maybe, "trust" is just here to stay, and we all be better off as soon as we start to realize this, and reconnect with the people around us and connect with the people on the other side of the world.
There is an individual who you trust to do good work, and who works well with you. They're not anonymous. Addressing the topic of this thread, you know (or should know) that it is not AI slop.
That is a significant amount of knowledge and trust in an individual, and the very point I thought the GP was making.
These are very important questions that cut to the heart of "what is art".
Unless AI companies already developed and launched plugins/extensions for people to do something that looks like hand drawn sketches inside of Clip Studio, and suddenly got a lot better at understanding prompts (including having inspiration of their own), then I'm pretty sure it's a human.
I don't think I'd get to see in-progress sketches and it wouldn't be as good at understanding what I wanted to have had changes then. I've used various generative AI image generators (latest one Qwen Image 2511 and a whole bunch of others) and none of them, including with "prompt enhancements" can take very vague descriptions of "I want it to feel like X" or "I'm not sure about Y but something like Z" and turn it into something that looks acceptable. At least not yet.
And because I've spent a lot of time with various generative image making processes and models, I'm fairly confident I'd recognize if that was what was happening.
Movie/show reviews, product reviews, app/browser extension reviews, programming libraries, etc all get gamed. An entire industry of booting reviews has sprung up from PR companies brigading positive reviews for their clients.
The better AI gets at slop and controlling bots to create slop which is indistinguishable from human content, the less people will trust content on those platforms.
Your trust relationship with your artist almost certainly was based on something other than just contact info. Usually you review a portfolio, a professional profile, and you start with a small project to limit your downside risk. This tentative relationship and phased stages where trust is increased is how human trust relationships have always worked.
But for a long time, unrelated to AI. When Amazon was first available here in Spain (don't remember exactly what year, but before LLMs for sure), the amount of fraudulent reviews filling the platform was already noticeable at that point.
That industry you're talking about might have gotten new wings with LLMs, but it wasn't spawned by LLMs, it existed long time before that.
> the less people will trust content on those platforms.
Maybe I'm jarred from using the internet from a young age, but both me and my peers basically has a built-in mistrust against random stuff we see on the internet, at least compared to our parents and our younger peers.
"Don't believe everything you see on the internet" been a mantra almost for as long as the internet has existed, maybe people forgot and needed an reminder, but it was never not true.
When snail mail had a cost floor of $0.25 for the price of postage, email was basically free. You might get 2-3 daily pieces of junk mail in your house’s mailbox, but you would get hundreds or thousands in your email inbox. Slop comes at scale. LLMs didn’t invent spam, but they are making it easier to create more variants of it, and possibly ones that convert better than procedurally generated pieces.
There’s a difference between your cognitive brain and your lizard brain. You can tell yourself that mantra, but still occasionally fall prey to spam content. The people who make spam have a financial incentive to abuse the heuristics/signals you use to determine the authenticity of a piece of content in the same way cheap knockoffs of Rolex watches, Cartier jewelry, or Chanel handbags have to make the knockoffs appear as authentic as possible.
Hence I suspect that quite a few of these interfaces that are now being spammed with AI crap will end up implementing something that represents a fee, a paywall, or a trustwall. That should keep armies of AI slop responses from being worthwhile.
How we do that without killing some communities is yet to be seen.
the web brought instant infinite 'data', we used to have limits, limits that would kinda ensure the reality of what is communicated.. we should go back to that it's efficient
I'd add science here too.
A strategy I sometimes use for external contributions is to immediately ask a question about the pull request. Ignoring PRs where I don't get a reply or the reply doesn't make sense potentially eliminates a lot of low quality contributions.
I wonder if a "no AI" rule is an overly blunt instrument. I can sympathise with it but babies and bathwater etc.
The current wave of "AI Coding Agents" are just wrappers around Vector DBs that fetch fuzzy context. They don't "understand" the codebase; they statistically guess the next token based on a cosine similarity match.
Of course they generate subtle bugs. They have no concept of topological consistency.
I realized this 3 months ago and stopped using standard agents. I built a local memory protocol (Remember-Me) that uses Wasserstein Distance to enforce strict consistency before the AI is allowed to write a line of code. If the memory doesn't mathematically fit the context topology, it rejects the edit.
We need to move from "Generative" coding to "Verifiable" coding, or this slop will drown every OSS maintainer.