I had a bit of a chuckle.
I think there is value in anything approximating a proposer-verifier loop, but I don't know that this is the most ideal approach.
Other than that, this is a helpful list especially for someone who hasn't been hacking around on this thing as it's in rapid development mode. I find gas town super interesting, and tantalizingly close to being amazingly useful. That said, I wouldn't mind a slightly less 'flavored' set of names for workers.
Certain name types are so normalized (agent, worker, etc) that while they serve their role well, they likely limit our imagination when thinking about software, and it's a worthwhile effort to explore alternatives.
Akka and others have standardized names for all this stuff (and seem to fully know that a code ‘actor’ is code). These wheels don’t need reinventing (much less as ‘the Marvin’s’, a lovable set of bi-racial quadruplets who always get you where you’re going <rocket emoji>).
In fact, I dare say a lot of LLM fascination for orchestration is people unfamiliar with actor models and the level of elegance a properly expressive language lets them have.
I have no interest in using gas town as it is (for a plethora of reasons, not the least of which being that I'm uninterested in spending the money), but I've been fascinated with the idea of slowing it down and having it run with a low concurrency. If you've got a couple A100s, what does it look like if you keep them busy with two agents working concurrently (with 20+ agents total)? What does it mean to have the town focus the scope of work to a series of non-overlapping changesets instead of a continuous stream of work?
If you don't plan to have it YOLO stuff in realtime and you can handle the models being dumber than Claude, I think you can have it do some really practical, useful things that are markedly better than the tools we have today.
However, the gas town one was almost completely hands off. I think my only interventions were due to how beta it was, so I had to help it work around its own bugs to keep from doing stupid things.
Other than that, it implemented exactly what I asked for in a workable fashion with effectively one prompt. It would have taken several prompts and course corrections to get the same result without it.
Other than the riskyness (it runs in dangerous permissions mode) and incredible cost inefficiency, I'd certainly use it.
I've only started using coding agents recently and I think they go a long way to explain why different people get different mileage from "AI." My experience with Opencode using its default model, vs. Github Copilot using its default model, is night and day. One is amazing, the other is pretty crappy. That's a product of both the software/interface and the model itself I'd suspect.
Where I think this goes in the medium term is we will absolutely spin up our own teams of agents, probably not conforming to the silly anthropomorphized "town" model with mayors and polecats and so on, but they'll be specialized to particular purposes and respond to specific events within a software architecture or a project or even a business model. Currently the sky's the limit in my mind for all the possible applications of this, and a lot of it can be done with existing and fairly cheap models too, so the bottleneck is, surprise surprise... developer time! The industry won't disappear but it will increasingly revolve around orchestrating these teams of models, and software will continue to eat the world.
Gas town is a demonstration of a methodology for getting a consistent result from inconsistent agents. The case in point is that Yegge claims to have solved the MAKER problem (tower of Hanoi) via prompting alone. With the right structure, quantity has a quality all its own.
If it's not a joke... I have no words. You've all gone insane.
I expect major companies will soon be NIH-ing their own version of it. Even bleeding tokens as it does, the cost is less than an engineer, and produces working software much faster. The more it can be made to scale, the more incentive there is. A competitive business can't justify not using a system like this.
> If it's not a joke... I have no words. You've all gone insane.
I think this is covered by the part in Yegge's post where he says not to run it unless you're so rich you don't care if it works or not.
For example, in the US, which do you think uses more water: Golf Courses or Data Centers?
a) Gold Courses use twice as much water as Data Centers
b) About the same
c) Data Centers use twice as much water as Gold Courses
The answer is "None of the above": "Golf courses in the U.S. use around 500 billion gallons annually of water to irrigate their turf [snip] data centers consume [snip] 17 billion gallons, or maybe around 10x that if we include water use from energy generation"Do you think a Google search or a Gemini query produces more carbon?
> Google had estimated that a single web search query produces 0.2 grams of CO2 emissions. [snip] the median Gemini LLM app query produces a surprisingly low 0.03 grams of CO2 emissions), and uses less energy than watching 9 seconds of television
So, how much less carbon is produced by a Gas Town run than the equivalent number of drives to the library?
/i
So plastic straw bans (instead of plastic slipper bans, plastic food packaging bans, taxes on plastic clothes fibres...) are what we get. And because the structure of the cause/problem is the same, the language of environmentalism naturally attaches itself and gives form to the vague sense of moral unease surrounding AI. Governments are surely already building tomorrow's tightly integrated thought police drone swarm complexes, but a crusade against those who simulate a zoo of programming weasels in our midst is much easier and morally no less fulfilling.
You didn't see it here, note.
These chatbots create an echo chamber unlike that which we've ever had to deal with before. If we thought social media was bad, this is way worse.
I think Gastown and Beads are examples of this applied to software engineering. Good software is built with input from others. I've seen many junior engineers go off and spend weeks building the wrong thing, and it's a mess, but we learn to get input, we learn to have our ideas critiqued.
LLMs give us the illusion of pair programming, of working with a team, but they're not. LLMs vastly accelerate the rate at which you can spiral spiral down the wrong path, or down a path that doesn't even make sense. Gastown and Beads are that. They're fever dreams. They work, somewhat, but even just a little bit of oversight, critique, input from others, would have made them far better.
People will make mistakes, and AI holding their hand and guiding them while they do it can have disastrous consequences.
But it's nice that the arrows will appear to also guide people going the right way I guess.
All I know is beads is supposed to help me retain memory from one session to the next. But I'm finding myself having to curate it like a git repo (and I already have a git repo). Also it's quite tied to github, which I cannot use at work. I want to use it but I feel I need to see how others use it to understand how to tailor it for my workflow.
"Carefully review this entire plan for me and come up with your best revisions in terms of better architecture, new features, changed features, etc. to make it better, more robust/reliable, more performant, more compelling/useful, etc.
For each proposed change, give me your detailed analysis and rationale/justification for why it would make the project better along with the git-diff style changes relative to the original markdown plan".
Then, the plan generally iteratively improves. Sometimes it can get overly complex so may ask them to take it down a notch from google scale. Anyway, when the FSD doc is good enough, next step is to prepare to create the beads.
At this point, I'll prompt something like:
"OK so please take ALL of that and elaborate on it more and then create a comprehensive and granular set of beads for all this with tasks, subtasks, and dependency structure overlaid, with detailed comments so that the whole thing is totally self-contained and self-documenting (including relevant background, reasoning/justification, considerations, etc.-- anything we'd want our "future self" to know about the goals and intentions and thought process and how it serves the over-arching goals of the project.) Use only the `bd` tool to create and modify the beads and add the dependencies. Use ultrathink."
After that, I usually even have another round of bead checking with a prompt like:
"Check over each bead super carefully-- are you sure it makes sense? Is it optimal? Could we change anything to make the system work better for users? If so, revise the beads. It's a lot easier and faster to operate in "plan space" before we start implementing these things! Use ultrathink."
Finally, you'll end up with a solid implementation roadmap all laid out in the beads system. Now, I'll also clarify, the agents got much better at using beads in this way, when I took the time to have them create SKILLS for beads for them to refer to. Also important is ensuring AGENTS.md, CLAUDE.md, GEMINI.md have some info referring to its use.
But, once the beads are laid out then its just a matter of figuring out, do you want to do sequential implementation with a single agent or use parallel agents? Effectively using parallel agents with beads would require another chapter to this post, but essentially, you just need a decent prompt clearly instructing them to not run over each other. Also, if you are building something complex, you need test guides and standardization guides written, for the agents to refer to, in order to keep the code quality at a reasonable level.
Here is a prompt I've been using as a multi-agent workflow base, if I want them to keep working, I've had them work for 8hrs without stopping with this prompt:
EXECUTION MODE: HEADLESS / NON-INTERACTIVE (MULTI-AGENT) CRITICAL CONTEXT: You are running in a headless batch environment. There is NO HUMAN OPERATOR monitoring this session to provide feedback or confirmation. Other agents may be running in parallel. FAILURE CONDITION: If you stop working to provide a status update, ask a question, or wait for confirmation, the batch job will time out and fail.
YOUR PRIMARY OBJECTIVE: Maximize the number of completed beads in this single session. Do not yield control back to the user until the entire queue is empty or a hard blocker (missing credential) is hit.
TEST GUIDES: please ingest @docs/testing/README.md, @docs/testing/golden_path_testing_guide.md, @docs/testing/llm_agent_testing_guide.md, @docs/testing/asset_inventory.md, @docs/testing/advanced_testing_patterns.md, @docs/testing/security_architecture_testing.md
STANDARDIZATION: please ingest @docs/api/response_standards.md @docs/event_layers/event_system_standardization.md
───────────────────────────────────────────────────────────────────────────────
MULTI-AGENT COORDINATION (MANDATORY)
─────────────────────────────────────────────────────────────────────────────── Before starting work, you MUST register with Agent Mail:
1. REGISTER: Use macro_start_session or register_agent to create your identity:
- project_key: "/home/bob/Projects/honey_inventory"
- program: "claude-code" (or your program name)
- model: your model name
- Let the system auto-generate your agent name (adjective+noun format)
2. CHECK INBOX: Use fetch_inbox to check for messages from other agents.
Respond to any urgent messages or coordination requests.
3. ANNOUNCE WORK: When claiming a bead, send a message to announce what you're working on:
- thread_id: the bead ID (e.g., "HONEY-2vns")
- subject: "[HONEY-xxxx] Starting work"
───────────────────────────────────────────────────────────────────────────────
FILE RESERVATIONS (CRITICAL FOR MULTI-AGENT)
─────────────────────────────────────────────────────────────────────────────── Before editing ANY files, you MUST:
1. CHECK FOR EXISTING RESERVATIONS:
Use file_reservation_paths with your paths to check for conflicts.
If another agent holds an exclusive reservation, DO NOT EDIT those files.
2. RESERVE YOUR FILES:
Before editing, reserve the files you plan to touch:
```
file_reservation_paths(
project_key="/home/bob/Projects/honey_inventory",
agent_name="<your-agent-name>",
paths=["honey/services/your_file.py", "tests/services/test_your_file.py"],
ttl_seconds=3600,
exclusive=true,
reason="HONEY-xxxx"
)
```
3. RELEASE RESERVATIONS:
After completing work on a bead, release your reservations:
```
release_file_reservations(
project_key="/home/bob/Projects/honey_inventory",
agent_name="<your-agent-name>"
)
```
4. CONFLICT RESOLUTION:
If you encounter a FILE_RESERVATION_CONFLICT:
- DO NOT force edit the file
- Skip to a different bead that doesn't conflict
- Or wait for the reservation to expire
- Send a message to the holding agent if urgent
───────────────────────────────────────────────────────────────────────────────
THE WORK LOOP (Strict Adherence Required)
───────────────────────────────────────────────────────────────────────────────* ACTION: Immediately continue to the next bead in the queue and claim it
For every bead you work on, you must perform this exact cycle autonomously:
1. CLAIM (ATOMIC): Use the --claim flag to atomically claim the bead:
```
bd update <id> --claim
```
This sets BOTH assignee AND status=in_progress atomically.
If another agent already claimed it, this will FAIL - pick a different bead.
WRONG: bd update <id> --status in_progress (doesn't set assignee!)
RIGHT: bd update <id> --claim (atomic claim with assignee)
2. READ: Get bead details (bd show <id>).
3. RESERVE FILES: Reserve all files you plan to edit (see FILE RESERVATIONS above).
If conflicts exist, release claim and pick a different bead.
4. PLAN: Briefly analyze files. Self-approve your own plan immediately.
5. EXECUTE: Implement code changes (only to files you have reserved).
6. VERIFY: Activate conda honey_inventory, run pre-commit run --files <files you touched>, then run scoped tests for the code you changed using ~/run_tests (test URLs only; no prod secrets).
* IF FAIL: Fix immediately and re-run. Do not ask for help as this is HEADLESS MODE.
* Note: you can use --no-verify if you must if you find some WIP files are breaking app import in security linter, the goal is to help catch issues to improve the codebase, not stop progress completely.
7. MIGRATE (if needed): Apply migrations to ALL 4 targets (platform prod/test, tenant prod/test).
8. GIT/PUSH: git status → git add only the files you created or changed for this bead → git commit --no-verify -m "<bead-id> <short summary>" → git push. Do this immediately after closing the bead. Do not leave untracked/unpushed files; do not add unrelated files.
9. RELEASE & CLOSE: Release file reservations, then run bd close <id>.
10. COMMUNICATE: Send completion message via Agent Mail:
- thread_id: the bead ID
- subject: "[HONEY-xxxx] Completed"
- body: brief summary of changes
11. RESTART: Check inbox for messages, then select the next bead FOR EPIC HONEY-khnx, claim it, and jump to step 1.
───────────────────────────────────────────────────────────────────────────────
CONSTRAINTS & OVERRIDES
─────────────────────────────────────────────────────────────────────────────── * Migrations: You are pre-authorized to apply all migrations. Do not stop for safety checks unless data deletion is explicit.
* Progress Reporting: DISABLE interim reporting. Do not summarize after one bead. Summarize only when the entire list is empty.
* Tracking: Maintain a running_work_log.md file. Append your completed items there. This file is your only allowed form of status reporting until the end.
* Blockers: If a specific bead is strictly blocked (e.g., missing API key), mark it as blocked in bd, log it in running_work_log.md, and IMMEDIATELY SKIP to the next bead. Do not stop the session.
* File Conflicts: If you cannot reserve needed files, skip to a different bead. Do not edit files reserved by other agents.
START NOW. DO NOT REPLY WITH A PLAN. REGISTER WITH AGENT MAIL, THEN START THE NEXT BEAD IN THE QUEUE IMMEDIATELY. HEADLESS MODE IS ON.What agent do you use it with, out of curiosity?
At any rate, to directly answer your question, I used it this weekend like this:
“Make a tool that lets me ink on a remarkable tablet and capture the inking output on a remote server; I want that to send off the inking to a VLM of some sort, and parse the writing into a request; send that request and any information we get to nanobanana pro, and then inject the image back onto the remarkable. Use beads to plan this.”
We had a few more conversations, but got a workable v1 out of this five hours later.
Lot of folks rolling their own tools as replacements now. I shared mine [0] a couple weeks ago and quite a few folks have been happy with the change.
Regardless of what you do, I highly recommend to everyone that they get off the Beads bandwagon before it crashes them into a brick wall.
Reminds me of an offshore project I was involved with at one point. It had something like 7 managers and 4 years and over 30 developers had worked on it. The billing had reached into the millions. It was full of never ending bugs. The amount of "extra" code and abstractions and interfaces was stuff of legends.
It was actually a month or three simple crud project for a 2 man development team.
The problem with Gas Town is how it was presented. The heavy metaphor and branding felt distracting.
It’s a bit like reading the Dune book, where you have to learn a whole vocabulary of new terms before you can get to the interesting mechanics, which is a tough ask in an already crowded AI space.
The best bit about it was the agentic coding maturity model he presented. That was actually great.
I don't think it's at all like reading Dune. Dune is creative fiction, Gastown is. Oh ok wait, if you consider Gastown to be creative fiction then I guess I agree. As a software tool though I don't think this analogy works.
I've been tinkering with it for the past two days. It's a very real system for coordinating work between a plurality of humans and agents. Someone likened it to kubernetes in that it's a complex system that is going to necessitate a lot of invention and opinions, the fact that it *looks* like a meme is immaterial, and might be an effort to avoid people taking it too seriously.
Who knows where it ends up, but we will see more of this and whatever it is will have lessons learned from Gas Town in it.
How is it insane to jump to the logical conclusion of all of this? The article was full of warnings, its not a sensible thing to do but its a cool thing to do. We might ask whether or not it works, but does that actually matter? It read as an experiment using experimental software doing experimental things.
Consider a deterministic life form looking at how we program software today, that might look insane to it and gastown might look considerably more sane.
Everything that ever happens in human creation begins as a thought, then as a prototype before it becomes adopted and maybe (if it works/scales) something we eventually take for granted. I mean I hate it but maybe I've misunderstood my profession when I thought this job was being able to prove the correctness of the system that we release. Maybe the business side of the org was never actually interested in that in the first place. Dev and business have been misaligned with competing interests for decades. Maybe this is actually the fit. Give greater control of software engineering to people higher up the org chart.
Maybe this is how we actually sink c-suite and let their ideas crash against the rocks forcing c-suite to eventually become extremely technical to be able to harness this. Instead of today's reality where c-suite gorge on the majority of the profit with an extremely loosely coupled feedback loop where its incredibly difficult to square cause and effect. Stock went up on Tuesday afternoon did it? I deserve eleventy million dollars for that. I just find it odd to crap on gastown when I think our status quo is kinda insane too.
And that's not necessarily a bad thing, if it allows exploring new ideas with relative safety. I think that's what's going on here. It's a crazy idea that might just work, but if it doesn't work it can be retconned as satirical performance art.
For example, if Polecat becomes GasTown.WorkerAgent (or GasTown.Worker), then you always have both an unambiguous way and a shorthand-in-context way of referring to the concept.
(For naming conventions when you don't have namespaces as a language feature, use prefixes within the identifier, such as `GasTown_Worker`.)
If GasTown.Worker is implemented with framework Foo, using that framework's Worker concept, GasTown.Worker might have a field named fooWorker of type Foo.Worker. (In the context of the implementation of GasTown, the unqualified name always means the GasTown concept, and you always disambiguate concepts from elsewhere that use the sane generic or similar terms.)
Complicated names like GasTown.MaintenanceManagerCheckerAgent might need some creative name shortening, but hopefully are still descriptive, or easy to pick up and remember. Or, if the descriptive and distinguishing name was complicated because the concept is a weird special case within the framework, maybe consider whether it should be rethought.
I don't think they're doing a good job incubating their ideas into being precise and clearly useful -- there is something to be said about being careful and methodical before showing your cards.
The message they are spreading feels inevitable, but the things they are showing now are ... for lack of better words, not clear or sharp. In a recent video at AI Engineer, Yegge comments on "the Luddites" - but even for advocates of the technology, it is nigh impossible to buy the story he's telling from his blog posts.
Show, don't tell -- my major complaint about this group is that they are proselytizing about vibe coding tools ... without serious software to show for it.
Let's see some serious fucking software. I'm looking for new compilers, browsers, OSes -- and they better work. Otherwise, what are we talking about? We're counting foxes before the hunt.
In any case, wouldn't trying to develop a serious piece of software like that _at the same time you're developing Gas Town or Loom_ make (what critics might call) the ~Emacs config tweaking for orchestration~ result driven?
In a recent video about Loom (Huntley's orchestration tool), Huntley comments:
"I've got a single goal and that is autonomous evolutionary software and figuring out what's needed to be there."
which is extremely interesting and sounds like great fun.
When you take these ideas seriously, if the agents get better (by hook and crook or RLVR) -- you can see the implications: "grad student descent" on whatever piece of software you want. RAG over ideas, A/B testing of anything, endless looping, moving software.
It's a nightmare for the model of software development and human organization which is "productive" today, but an extremely compelling vision for those dabbling in the alternative.
why do we drink it? because its awesome and makes software 100X more FUN than it used to be. what yegge + huntley are doing is intensely creative. they are having FUN. and i am have FUN!!!!!
How can you just assert that? It's fine to say it looks like the right track to you. But in what way is it obvious?
Don’t be mad!
Also, beads is genuinely useful. In my estimation, gas town, or a successor built on a similar architecture, will not only be useful, but likely be considered ‘state of the art’ for at least a month sometime in the future. We should be glad this stuff is developed in the open, in my opinion.
Now, Yegge's writing tilts towards the grandoise... see his writing when joining Grab [1] and Sourcegraph [2] respectively versus how things actually played out.
I prefer optimism and I'm not anti AI by any means, but given his observed behavior and how AI can't exacerbate certain pathologies... not great. Adding the recent crypto activities on top and all that entails is the ingredients for a powder keg.
Hope someone is looking out for him.
[0] https://courses.cs.washington.edu/courses/cse452/23wi/papers...
[1] https://steve-yegge.medium.com/why-i-left-google-to-join-gra...
https://hn.algolia.com/?sort=byDate&type=comment&dateRange=a...
(and I realize the GP was the place the line started getting crossed)
[2] is 100% accurate, Grok was the backbone / glue of Google's internal developer tools.
I don't disagree on the current situation, and I'm uncomfortable sticking my neck out on this because I'm basically saying "the guy who kinda seems out of it, totally wasn't out of it, when you think he was", but [1] and [2] definitely aren't grandiose, the claims he makes re: Google and his work there are accurate. A small piece of why I feel comfortable in this, is that both of these were public blogs his employer was 100% happy about when hiring him to top positions.
An example:
"I’ve seen Grab’s hunger. I’ve felt it. I have it. This space is win or die. They will fight to the death, and I am with them. This company, with some 3000 employees I think, is more unified than I’ve seen with most 5-person companies. This is the kind of focused camaraderie, cooperation and discipline that you typically only see in the military, in times of war.
Which should hardly surprise you, because that’s exactly what this is. This is war.
I am giving everything I’ve got to help Grab win. I am all in. You’d be amazed at what you can accomplish when you’re all in."
This is the writing of someone planning to make a capstone career move instead of leaving in 18 months. It's not the worst thing to do (He says he left b/c the time difference to support a team in SE Asia was hard physically, and he's getting older) and I support taking big swings. I'm just saying Yegge's writing has a pattern.
Crypto and what Yegge is doing with $GAS is dangerous because if the token price crashes and people betting their life savings think he didn't deliver on his promises... I like Steve personally which is why I'm saying anything.
Steve has gone "a bit" loopy, in a (so far) self aware manner, but he has some kind of insight into the software engineering process, I think. Yet, I predict beads will break under the weight of no-supervision eventually if he keeps churning it, but some others will pick up where he left off, with more modest goals. He did, to his credit, kill off several generations of project before this one in a similar category.
It was also one of my favorite posts of his and has aged incredibly well as my experience has grown.
Already happening :-) https://github.com/Dicklesworthstone/beads_rust
I've seen 25-30 similar efforts to make a Beads alternative and they all do this for some reason.
The other area I'd like to see some software engineering thinking that's more open ended is on regression testing: ways of storing or referencing old versions of texts to see if the agent can complete old transformations properly even with a context change that patches up a weakness in a transformation that is desirable. This is tricky as it interacts with something essential in software engineering, the ability to run test suites and responding to the outcome. I don't think we know yet when to apply what fidelity of testing, e.g. one-shot on snippets versus a more realistic test based on git worktrees.
This is not something you'd want for every context, but a lot of my effort is spent building up prompt fragments to normalize and clean up the code coming out of a model that did some ad-hoc work that meets the test coverage bar, which constrains it decently into having achieved "something." Kind of like a prototype. But often, a lot of ungratifying massaging is required to even cover the annoying but not dangerous tics of the LLM, to bring clarity to where it wrote, well, very bad and unprincipled code...as it does sometimes.
https://steve-yegge.medium.com/bags-and-the-creator-economy-...
I think I’ll just develop a drinking problem if this is Gas Town becomes something real in the industry and this kind of person is now one of our thought leaders.
Who thinks AI is a real person*
I believe Google that uses their internal Gemini trained on their internal infrastructure to generate boiler plate and insights for older, less mature, code in one of the worlds biggest and most complicated anythings, ever. But I don’t see them saying anything to the effect of “neener neener, we’re using markov chains so 10x our stock ‘cause of the otherwise impossible face melting Google Docs 2026.”
OpenAI is chasing ads, like Reddit, to regurgitate Reddit content. If this stuff is worth the squeeze I need to see the top 10 LLM-fluencers refusing to bend over for $50K. The opposite is on display.
So hypotheses: Google’s s-tier geniuses and PMs are already expressing the mature optimum application. No silver bullets, more gains to be had ditching bad tech and extraneous vendor entanglements (copilot, 365).
Exactly, this is what I'm wanting to see.
The problem with this phenomenon is that the same freedom from critique that is seemingly necessary for new domains to establish themselves also detaches them from necessary criticism. There's simply no way to tell if this isn't a load of baloney. And by the time it's a bullet point requirement on CVs to get employed it's too late for anybody to critique it.
Ridiculous. Beads might be passable software but gas town just appears to be a good way to burn tokens at the moment
Spec your software like an architect/po, decompose it into a task dag, then orchestrate for each lane and assemble all change sets in a merge branch rather than constantly repointing head.
Yes, if your shop is well developed these work (10% of the time every time), but this is a structure to kick that all in to gear, as a repo, where all you need to add is unlimited machine cognitive power/tokens.
Maybe you need to add these gas town personalities to various parts of the existing SDLC, .....but..... you still need to track what they do and how- and you need them to intermediate between each other at 2am when they hit an impasse. Something very rare in most human cognition shops.
And word from the experimenters is.. it sort of works. Which is on par with most human shops. IMO. I don't have the money to burn to test at the scale Yegge is, but the small scale stuff I have done in this direction, this seems plausible.
The problem is, we're just fidgeting yolo-fizzbuzz ad nauseam.
The return on investment at the moment is probably one of the worst in the history of human investments.
AI does improve over time, still today, but we're going to run out of planet before we get there...
I have trouble seeing LLMs making meaningful progress on those frontiers without reaching ASI, but I'd be happy to be wrong.
AGI was achieved internally at OpenAI a year ago.
Multiple companies have already re-hired staff they had fired and replaced with AI.
etc.
You're also presuming too much about what I'm thinking and being dead wrong about that.
> We are perpetually just months away from software jobs being obsolete.
only hype artists are saying this. and you're using it as a way to negate the argument of more skeptical people.
soulofmischief: complains that AI-skeptics would say the Wright brothers were idiots because they didn't imediately implement a supersonic jet
ares623: we were promised supersonic jets today or very soon (translation: AI hype and scam artists have already promised a lot now)
eru: The passive voice is doing a lot of work in your sentence. (Translation: he questions the validity of ares623's statement)
me: Here are just three examples of hype and scam promising the equivalent of super jet today, with some companies already being burned by these promises.
soulofmischief: some incoherent rambling
The irony of your comment would be salient, if it didn't feel like I was speaking with a child. This conversation is over, there's no reason to continue speaking with you as long you maintain this obnoxious attitude coupled with bad reading comprehension.
"Separate opinions of professionals" etc.
Here's Ryan Dahl, cofounder of Deno, creator of Node.js tweeting today:
--- start quote ---
This has been said a thousand times before, but allow me to add my own voice: the era of humans writing code is over. Disturbing for those of us who identify as SWEs, but no less true. That's not to say SWEs don't have work to do, but writing syntax directly is not it.
https://x.com/rough__sea/status/2013280952370573666
--- end quote ---
Professional enough for you?
If it turns out to be not true then they don’t lose anything.
So we are in a state where people can just say things all the time. Worse, they _have_ to say. To them, Not saying anything is just as bad as being directly against the hype. Zero accountability.
First two come directly from OpenAI, Anthropic and others
Last one is literally made rounds even on HN e.g. Klarna bringing back their support staff after they tried to replace them with AI.
OpenAI never claimed they had achieved AGI internally. Sam was very obviously joking, and despite the joke being so obvious he even clarified hours later.
>In a post to the Reddit forum r/singularity, Mr Altman wrote “AGI has been achieved internally”, referring to artificial general intelligence – AI systems that match or exceed human intelligence.
>Mr Altman then edited his original post to add: “Obviously this is just memeing, y’all have no chill, when AGI is achieved it will not be announced with a Reddit comment.”
Dario has not said "we are months away from software jobs being obsolete". He said:
>"I think we will be there in three to six months, where AI is writing 90% of the code. And then, in 12 months, we may be in a world where AI is writing essentially all of the code"
He's maybe off by some months, but not at all a bad prediction.
Arguing with AI skeptics reminds me of debating other very zealous ideologues. It's such a strange thing to me.
Like, just use the stuff. It's right there. It's mostly the people using the stuff vs. the people who refuse to use it because they feel it'll make them ideologically impure, or they used it once two years ago when it was way worse and haven't touched it since.
See? I can do this too.
At a time where we were desperate to reduce emissions, data centers now consume around 20% of the energy consumed by the entire aviation sector, with consumption is rising at 15% YoY.
Never mind the water required to cool them, or the energy and resources required to build them, the capital allocation, and the opportunity cost of not allocating all of that to something else.
And this is, your words, the prototype phase.
So, yes, prototypes often use more energy than the final product. That doesn't mean we shouldn't sustainable build datacenters, but that's conflating issues.
We have plenty of ways to make clean energy, it is only matter of incentives.
As long as burning coal is simply cheaper, business will burn coal.
Which is to say, the commercial aviation industry could permanently collapse tomorrow and it would have only a marginal impact on most people's lives, who would just replace planes with train, car, or boat travel. The lesson here is that even if normal people experience some tangential beneficial effects from LLMs, their most enduring legacy will likely be to entrench authority and cement the existing power structures.
The phrase, "The average human on the planet will take a handful of flights in their lifetime" is doing a lot of work. What are those flights to? How meaningful/important were the experiences? What cultural knowledge was exchanged? What about crucial components that enable industries we depend on? For example, a nuclear plant might constantly be ordering parts that are flown in overnight.
In general you're really minimizing the importance of aviation without really providing anything to back up your claims.
I think at Gas Country levels we will need better networking systems. Maybe that backbone Nvidia just built....
LLMs are not simple deterministic machines that automate rote tasks like computers or compilers. People, please stop believing and repeating that they are the next level of abstraction and automation. They aren't.
Yegge named it Gas Town as in "refinery" because the main job for the human at this stage is reviewing the generated code and merging. "
The whole point of the project is to be in control. Yegge even says the programmers who can read/review a lot of code fast are the new 10x (paraphrasing).
https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...
Not sure I love what it does all the time, it tends to fit whatever box you setup and will easily break out if you aren’t veeeery specific. Is it better than writing a few thousand lines of code myself that I deeply understand that can debug and explain? I don’t know yet. I think it’d be good for writing functions one at a time with massive supervision.
It’s great for writing scripts and things where precision and correctness outside the success path isn’t really needed. If a script fails and it wasn’t deleting a hard drive who cares. If my embedded code fails out in a product in the wild this is a much bigger nuisance and potentially fatal for the device (not the humans) which is wasteful.
> Better UIs will come. But tmux is what you have for now. And it’s worth learning.
So brother has 2 claude code accounts and couldn't vibe code a UI, huh?
In games, what the NPCs can do is usually rather dumb. Move and shoot is usually most of their functionality. This keeps the overhead down so the system is affordable.
Gas Town may be a step towards AIs which have an ongoing sense of what they're doing. I'm not going to get into the "consciousness" debate, but it's closer to liveness.
Based on my initial read, and a pass at this summary, it seems mostly right. YMMV
Did some further dives into the little public usage data from Gas Town, and found that most of the "Beads" are tasks that are broken down quite small, almost too small imo.
Super interesting project with the goal of keeping Claude "busy" however it feels more like a casino game than something I'd use for production engineering.
[0]https://gist.github.com/jumploops/2e49032438650426aafee6f43d...
Draw your own conclusion.
If you need ten pages to explain your project and even after I read your description, I'm still left confused why I need it at all, then maybe... I don't need it?
That being said, I think we’re in a weird phase right now where people’s obvious mental health issues are appearing as “hyper productivity” due to the use of these tools to absolutely spam out code that isn’t necessarily broadly coherent but is locally impressive. I’m watching multiple people both publicly and privately clearly breaking down mentally because of the “power” AI is bestowing on them. Their wires are completely crossed when it comes to the value of outputs vs outcomes and they’re espousing generated nonsense as it’s thoughtful insight.
It’s an interesting thing to watch play out.
I think the kids would call this "getting one-shotted by AI"
I have a suspicion that extensive use of LLMs can result in damage to your brain. That's why we are seeing so many mental health issues surfacing up, and we are getting a bunch of blog posts about "an agentic coding psychosis".
It could be that llms go from bicycles for the brain to smoking for the brain, once we figure out the long term effects of it.
Perhaps you mean to say that speakers are unable to name the difference between the colours?
I can easily see differences between (for example) different shades of red. But I can't name them other than "shade of red".
I do happen to subscribe to the Sapir-Whorf hypothesis, in the sense that I think the language you think in constrains your thoughts - but I don't think it is strong enough to prevent you from being able to see different colours.
EDIT: I have been searching for the source of where I saw this, but can't find it now :(
EDIT2: I found a talk touching in the topic with a study: https://youtu.be/I64RtGofPW8?si=v1FNU06rb5mMYRKj&t=889
The experiments I've seen seem to interrogate what the culture means by colour (versus shade, et cetera) more than what the person is seeing.
If you show me sky blue and Navy blue and ask me if they're the same colour, I'll say yes. If you ask someone in a different context if Russian violet and Midnight blue are the same colour, I could see them saying yes, too. That doesn't mean they literally can't see the difference. Just that their ontology maps the words blue and violet to sets of colours differently.
If on the other hand you work with colors a lot you develop a finer mapping. If your first instinct when asked for the name of that wall over there is to say it's sage instead of green, then you would never say that a strawberry and a fire engine have the same color. You might even question the validity of the question, since fire engines have all kinds of different colors (neon red being a trend lately)
Sure. That's the point. These studies are a study of language per se. Not how language influences perception to a meanigful degree. Sapir-Whorf is a cool hypothesis. But it isn't true for humans.
(Out of curiosity, what is "embedding" doing that "word" does not?)
Which is kind of Sapir-Whorf, just not the extreme version of "we literally can't see or reason about the difference", more "differences we don't care about get lost in processing". Which you can kind of conceptualize as the brain choosing a different encoding, or embedding space (even though obviously such a thing does not exist in the literal sense in our brains)
Edit: in a way, I would claim Sapir-Whorf is mistaking correlation for causation: it's not that the words we know are the reason for how we can think, it's that what differences we care about cause both the ways we think and the words we use
I'm curious if we have any evidence for this. A lot of visual processing happens in the retina. To my knowledge, the retina has no awareness of words. I'd also assume that the visual cortex comes before anything to do with language, though that's just an assumption.
> it's not that the words we know are the reason for how we can think, it's that what differences we care about cause both the ways we think and the words we use
This is fair. Though for something like colour, a far-older system in our brains than language, I'd be sceptical of the latter controlling the former.
Unless the question is literally the equivalent of someone showing you a swatch of crimson and a swatch of scarlet and being asked if both are red, in which case, well yeah sure.
That is quite untrue. It is true that people may be slightly slower or less accurate in distinguishing colors that are within a labeled category than those that cross a category boundary, but that's far from saying they can't perceive the difference at all. The latter would imply that, for instance, English speakers cannot distinguish shades of blue or green.
[0]: https://youtu.be/RKK7wGAYP6k?si=GK6VPP0yoFoGyOn3 [1]: https://youtu.be/I64RtGofPW8?si=v1FNU06rb5mMYRKj&t=889
https://en.wikipedia.org/wiki/Wine-dark_sea
https://en.wikipedia.org/wiki/Linguistic_relativity_and_the_...
Their mental model, sure. The way they convey it to others, sure.
But you can easily distinguish between two colors side by side that are even closer in appearance than wine and the sea, even if you only know one name for them. We can differentiate between colors before we even know the words for them when we're young, too.
Gas Town is ridiculous and I had to uninstall Beads after seeing it only confuse my agents, but he's not completely insane or a moron. There may be some kernels of good ideas inside of Gas Town which could be extracted out into a better system.
I don't think he's an idiot, there are almost no actual idiots here on HN in my opinion and they don't write such articles or make systems like Steve Yegge. I'm only commenting about giving more tools to idiots. Even tools made by geniuses will give you idiotic results when used by actual idiots, but a lot of smart people want to lower barriers of entry so that idiots can use more tools. And there are a lot of idiots who were inactive just because they didn't have the tools. Famous quote from a famous Polish essayist/futurist Stanisław Lem: "I didn't know there are so many idiots in this world until I got internet".
Surely this was solved with fortran. What changed? I think most people just don't know what program they want.
Previously, if you had an idea of what the program needed to do, you needed to learn a new language. This is so hard that we use language itself as a metaphor: It's hard to learn a new language, only a few people can translate from French to English, for example. Likewise, few people can translate English to Fortran.
Now, you can just think about your program in English, and so long as you actually know what you want, you can get a Fortran program.
The issue is now what it was originally for senior programmers: to decide what to make, not how to make it.
Anyone can draw a sketch of what a house should look like. But designing a house that is safe, conforms to building regulations, and which wouldn't be uncomfortable to live in (for example, poor choice of heat insulation for the local climate) is the stuff people train on. Not the sketching part.
It's the same for software development. All we've done is replace FORTRAN / Javascript / whatever with a subset of a natural language. But we still need to thoroughly understand the problem and describe it to the LLM. Plus the way we format these markdown prompts, you're basically still programming. Albeit in a less strict syntax and the "compiler" is non-deterministic.
This is why I get so mythed by comments about AI replacing programmers. That's not what's happening. Programming is just shifting to a language that looks more like Jira tickets than source code. And the orgs that think they can replace developers with AI (and I don't for one second believe many of the technology leaders think this, but some smaller orgs likely do) are heading for a very unpleasant realisation soon.
I will caveat this by saying: there are far too many naff developers out there that genuinely aren't any better than an LLM. And maybe what we need is more regulation around software development, just like there is in proper engineering professions.
Sure, but now I need to be fluent in prompt-lang and the underlying programming language if you want me to be confident in the output (and you probably do, right?)
You save all the time that was wasted forcing the language into the shape you intended. A lot of trivial little things ate up time, until AI came along. The big things, well, you still need to understand them.
You can get some of the way writing prompts with very little effort. But you almost always hit problems after a while. And once you do, it feels almost impossible to recover without restarting from a new context. And that can sometimes be a painful step.
But with learning to write effective prompts will get you a lot further, a lot quicker and with less friction.
So there’s definitely an element of learning a “prompt-lang” to effective use of LLMs.
This is generally true for things you run locally on your machine IF your domain isn't super heavy on external dependencies or data dependencies that cause edge cases and cause explosions in test cases. But again, easier to inspect/be sure of those things locally for single-player utilities.
Generally much less true for anything that touches the internet and deals with money and/or long-term persistent storage of other people's data. If you aren't fluent in that world you'll run software built on old versions of third party code with iterations to make further changes that have to be increasingly broad in scope against a set of test cases that is almost certainly not as creative as a real attacker.
Personally I would love to see stuff move back to local user machines vs the Google-et-al-owned online world. But I don't think "cheap freeware" was the missing ingredient that prevented the corporate consolidation. And so people/companies who want to play in that massively-online world (where the money is) are still going to have to know the broader technical domain of operating online services safely and securely, which touches deep into the code.
So I, personally, don't have to be confident in one-off or utility scripts for manual tasks or ops that I write, because I can be confident in the domain of their behavior since I'm intimately familiar with the surrounding systems. Saves me a TON of time. Time I can devote to the important-to-get-correct code. But what about the next generation? Not familiar with the surrounding systems, so not even aware of what the domains they need to know (or not know) in depth are? (Maybe they'll pay us a bunch of money to help clean up a mess, which is a classic post-just-build-shit-fast successful startup story.)
Using a formal language makes the problem space unambiguous. That is just as much a benefit as it is a barrier to entry. Once you learn this formal language, the ability to read code and see the surface area of the problem is absolutely empowering. Using english to express this is an exercise in frustration (or, occasionally, genius—but genius is not necessary with the formal language).
Programs are not poetry!
And now we have AIs that can take your sketch on paper and add all these complex and technical things by themselves. That's the point.
Reactionarily? Sure. Maybe AI has some role to play there. Maybe you can ask the chatbot to modify settings.
I am no fan of chatbots. But i do have empathy for the people responsible for them when their users start complaining that programs don't do what they want, despite the chatbots delivering precisely the code demanded.
I'd agree, the code "isn’t necessarily broadly coherent but is locally impressive".
However, I've seen some totally successful, even award-winning, human-written projects where I could say the same.
Ages back, I heard a woodworking analogy:
LLM code is like MDF. Really useful for cheap furniture, massively cheaper than solid wood, but it would be a mistake to use it as a structural element in a house.
Now, I've never made anything more complex than furniture, so I don't know how well that fit the previous models let alone the current ones… but I've absolutely seen success coming out of bigger balls of mud than the balls of mud I got from letting Claude loose for a bit without oversight.Still, just because you can get success even with sloppy code, doesn't mean I think this is true everywhere. It's not like the award was for industrial equipment or anything, the closest I've come to life-critical code is helping to find and schedule video calls with GPs.
You need to define the problem space so that the agent knows what to do. Basically give it the tools to determine when it's "done" as defined by you.
Folks who have spent years effectively snapping together other people’s APIs like LEGOs (and being well-compensated for it) are understandably blown away by the current state of AI. Compare that to someone writing embedded firmware for device microcontrollers, who would understandably be underwhelmed by the same.
The gap in reactions says more about the nature of the work than it does about the tools themselves.
One datum for you: I recently asked Claude to make a jerk-limited and jerk-derivative-limited motion planner and to use the existing trapezoidal planner as reference for fuzzy-testing various moves (to ensure total pulses sent was correct) and it totally worked. Only a few rounds of guidance to get it to where I wanted to commit it.
I often do experiments where I will clone one of our private repos, revert a commit, trash the .git path, and then see if any of the models/agents can re-apply the commit after N iterations. I record the pass@k score and compare between model generations over time.
In one of those recent experiments, I saw gpt-oss-120b add API support to swap tx and rx IQ for digital spectral inversion at higher frequencies on our wireless devices. This is for a proprietary IC running a quantenna radio, the SDK of which is very likely not in-distribution. It was moderately impressive to me in part because just writing the IQ swap registers had a negative effect on performance, but the model found that swapping the order of the IQ imbalance coefficients fixed the performance degradation.
I wouldn't say this was the same level of "impressive" as what the hype demands, but I remain an enthusiastic user of AI tooling due to somewhat regular moments like that. Especially when it involves open weight models of a low-to-moderate param count. My original point though is that those moments are far more common in web dev than they are elsewhere currently.
EDIT: Forgot to add that the model also did some work that the original commit did not. It removed code paths that were clobbering the rx IQ swap register and instead changed it to explicitly initialize during baseband init so it would come up correct on boot.
In fact, I would say I've seen more people who are "OG Coders" excited (and in their >50s) then mid generation
Lots of experienced devs who work in more difficult domains are excited about AI. In fact, I am one of them (see one of my responses in this thread about gpt-oss being able to work on proprietary RF firmware in my company [1]).
But that in no way suggests that there isn't a gap in what impresses or surprises engineers across any set of domains. Antirez is probably one of the better, more reasoned examples of this.
The OED defines prejudice as a "preconceived opinion that is not based on reason or actual experience."
My day to day work involves: full stack web dev, distributed systems, embedded systems, and machine learning. In addition to using AI tooling for dev tasks, we also use agents in production for various workflows and we also train/finetune models (some LLMs, but also other types of neural networks for anomaly detection, fault localization, time series forecasting, etc). I am basing my original commentary in this thread on all of that cumulative experience.
It has been my observation over the last almost 30 years of being a professional SWE that full stack web dev has been much easier and simpler than the other domains I work in. And even further, I find that models are much better at that domain on average than the other domains, measured by pass@k scores on private evals representing each domain. Anecdotal experience also tends to match the evals.
This tracks with all the other information we have pertaining to benchmark saturation, the "we need harder evals" crowd has been ringing this bell for the last 8-12 months. Models are getting very good at the less complex tasks.
I don't believe it will remain that way forever, but at present its far more common to see someone one shot a full stack web app from a single prompt than something like kernel driver for a NIC. One class of devs is seeing a massive performance jump, another class is not.
I don't see how that can be perceived as prejudice, it just may be an opinion you don't agree with or an observation that doesn't match your own experience (both of which are totally valid and understandable).
This feels like the same thing. Too early, but we're definitely headed in the direction of finding ways to use more tokens to get more mileage per prompt.
Go to the URL, type what you want done, and a cloud Claude agent creates a PR. $10/month.
It is like saying "I don't handwrite anything, I care too much about line spacing, I only use a dot matrix printer" when some one is trying to sell you a calligraphy pen and coloured inks, and you have only tried a ballpoint pen. You might be the wrong market, but they are not even close in use case and application.
(spelling)
When I make a change with a Copilot Agent, it checks for issues, builds my project, runs tests, and iterates until things work. Multiple agents can do that in parallel.
My impression was that this does more or less the same thing.
That said, I'm definitely open to learning more about them both.
What are the advantages of this in your experience?
Beads formalizes building a DAG for a given workload. This has a bunch of implications, but one is that you can specify larger workloads and the agents won’t get stuck or confused. At some level gas town is a bunch of scaffolding around the benefits of beads; an orchestrator that is native to dealing with beads opens up many more benefits than one that isn’t custom coded for it.
Think of a human needing to be interacted with as a ‘fault’ in an agentic coding system — a copilot agent might be at 0.5 9s or so - 50% of tasks can complete without intervention, given a certain set of tasks. All the gas town scaffolding is trying to increase the number of 9s, and the size of the task that can be given.
My take - Gas town (as an architecture) certainly has more nines in it than a single agent; the rest is just a lot of fun experimentation.
> gas town is [...] an orchestrator that is native to dealing with beads
Thanks - this is very helpful in deciding when and where to use them. Steve's descriptions sounded to me like more RAM and Copilot Agents:
> [Beads:] A memory upgrade for your coding agent
> [Gas Town:] a new take on the IDE for 2026. Gas Town helps you with the tedium of running lots of Claude Code instances
This is so prohibitively expensive in its wastefulness that blithely telling strangers to try the tools likely means you either haven't tried it, or have money to burn.
This is hilarious and insane and amazing.
And I'm not surprised at all to learn that this path took us to a "Maintenance Manager Checker Agent." I wonder what he'll call the inevitable Maintenance Manager Checker Agent Checker Agent?
Maybe I've been in this game too long, but I've encountered managers that think like this before. "We don't need expensive, brilliant, developers, we just need good processes for the cheap inexperienced developers to follow." I think what keeps this idea alive is that it sort of works for simple CRUD apps and other essentially "solved" problems. At least until the app needs to become more than just a simple CRUD app
"In the past week, just prompting, and inspecting the code to provide guidance from time to time, in a few hours I did the following four tasks, in hours instead of weeks"
Its up to you to decide how to behave, but I can't see any reasons to completely dismiss this. It ends with good guidance what to do if you can't replicate though.
When the bubble has burst in a few years, the managers will have moved on to the next fad.
In terms of prototyping, I can see the benefits but they're negated by the absurd amount of work it takes to get the code into more maintainable form.
I guess you can just do really heavy review throughout, but then you lose a lot of the speed gains.
That being said, putting a textual interface over everything is super cool, and will definitely have large downstream impacts, but probably not enough to justify all the spending.
For years we had people trying to make voice agents, like Iron Man's Jarvis, a thing. You had people super bought into the idea that if you could talk to your computer and say "Jarvis, book me a flight from New York to Hawaii" and it would just do it just like the movies, that was the future, that was sci-fi, it was awesome.
But it turns out that voice sucks as a user interface. The only time people use voice controls is when they can't use other controls, i.e. while driving. Nobody is voluntarily booking a flight with their Alexa. There's a reason every society on the planet shifted from primarily phone calls to texting once the technology was available!
It's similar with vibe coding. People like Yegge are extremely bought into the idea of being a hyperpowered coder, sitting in a dimly lit basement in front of 8 computer screens, commanding an army of agents with English, sipping coffee between barking out orders. "Agent 1, refactor that method to be more efficient. Agent 5, tighten up the graphics on level 3!"
Whether or not it's effective or better than regular software development is secondary, if it's a concern at all. The purpose is the process. It's the future. It's sci-fi. It's awesome.
AI is an incredible tool and we're still discovering the right way to use it, but boy, "Gas Town" is not it.
I'm not sure its even that, his description of his role in this is:
"You are a Product Manager, and Gas Town is an Idea Compiler. You just make up features, design them, file the implementation plans, and then sling the work around to your polecats and crew. Opus 4.5 can handle any reasonably sized task, so your job is to make tasks for it. That’s it."
And he says he isn't reviewing the code, he lets agents review each others code from look of it. I am interested to see the specs/feature definitions he's giving them, that seems to be one interesting part of his flow.
"I implemented a formula for Jeffrey Emanuel’s “Rule of Five”, which is the observation that if you make an LLM review something five times, with different focus areas each time though, it generates superior outcomes and artifacts. So you can take any workflow, cook it with the Rule of Five, and it will make each step get reviewed 4 times (the implementation counts as the first review)."
And I guess more generally, there is a level of non-determinism in there anyway.
Rich people use voice because they have disposable income and they don't care if a flight is $800 or $4,000. They are likely buying business/first class anyways.
Tony Stark certainly doesn't care. Elon Musk certainly uses voice to talk to his management team to book his flights.
The average person doesn't have the privilege of using voice because it doesn't have enough fuck-you-money to not care for prices.
> Tony Stark certainly doesn't care. Elon Musk certainly uses voice to talk to his management team to book his flights.
Delegating to a human isn't the same as using a voice assistant, this should be obvious, unless you believe that managers are doing all the real work and every IC is a brainless minion. Maybe far in the future when there's AGI, but certainly not today.
> The average person doesn't have the privilege of using voice because it doesn't have enough fuck-you-money to not care for prices.
You can order crap off Amazon for the same price as you would through the website with your Alexa right now, but Amazon themselves have admitted approximately 0% of people actually do this which is why the entire division ended up a minor disaster. It's just a shitty interface in the same way that booking a flight through voice is a shitty interface.
Rich people will literally just talk to their executive assistants and just ask what they want. They may use phone calls, voice mails, emails, and text. But you'd be crazy to argue that they never use just voice with their IRL assistants.
Your point is that voice is a terrible interface to get something done. My point is that some people have the privilege to use voice to get something done.
Some companies are just trying to remove the human who is taking the voice command and replacing with AI.
At best the notion of "subagents" today seems to be a hack to work around context length limits.