I really like how their subagents work, as a bonus I get to choose which model is in which agent. Sadly I have to resort to the mess that Anthropic calls Claude Code
I'd rather switch to OpenAI than give up my favorite harness.
For me it's $0.8/kWh during peak, $0.47 off peak, and super off peak of $0.15. I accidentally left a little mini 500W heater on all day, while I was out, costing > 5% of your whole month!
API = way more expensive, allowed to use on your terms without anthropic hindering you.
The belief is that the subscriptions are subsidized by them (or just heavily cut into profit margins) so for whatever reason they're trying to maintain control over the harness - maybe to gather more usage analytics and gain an edge over competitors and improve their models better to work with it, or perhaps to route certain requests to Haiku or Sonnet instead of using Opus for everything, to cut down on the compute.
Given the ample usage limits, I personally just use Claude Code now with their 100 USD per month subscription because it gives me the best value - kind of sucks that they won't support other harnesses though (especially custom GUIs for managing parallel tasks/projects). OpenCode never worked well for me on Windows though, also used Codex and Gemini CLI.
You can point Claude Code at a local inference server (e.g. llama.cpp, vLLM) and see which model names it sends each request to. It's not hard to do a MITM against it either. Claude Code does send some requests to Haiku, but not the ones you're making with whatever model you have it set to - these are tool result processing requests, conversation summary / title generation requests, etc - low complexity background stuff.
Now, Anthropic could simply take requests to their Opus model and internally route them to Sonnet on the server side, but then it wouldn't really matter which harness was used or what the client requests anyway, as this would be happening server-side.
Actually curious to hear what others think about why Anthropic is so set on disallowing 3rd party tools on subscriptions.
So they have to move up the stack to higher margin business solutions. Which is why they offer subsidized subscription plans in the first place. It’s a marketing cost. But they want those marketing dollars to drive up the stack not commodity inference use cases.
One-price-per-month subscriptions (Claude Code Pro/MAX @ $20/$100/$200 a month) use a different authentication mechanism, OAUTH. The useful difference is you get a lot more inference than you can for the same cost using the API but they require you to use Claude Code as a client.
Some clients have made it simple to use your subscription key with them and they are getting cease and desist letters.
When setting your token limits, their economics calculations likely assume that those optimizations are going to work. If you're using a different agent, you're basically underpaying for your tokens.
Build the single pane of glass everyone uses. Offer it under cost. Salt the earth and kill everything else that moves.
Nobody can afford to run alternative interfaces, so they die. This game is as old as time. Remember Reddit apps? Alternative Twitter clients?
In a few years, CC will be the only survivor and viable option.
It also kneecaps attempts to distill Opus.
It’s not like moving from android to iOS.
(Ok, technically o1-pro is even more expensive, but I'm assuming that's a "please move on" pricing)
From what I've heard, the metrics used by Anthropic to detect unauthorized clients is pretty easy to sidestep if you look at the existing solutions out there. Better than getting your account banned.
EDIT: The system I bought last summer for $1980 and just took delivery of in October, Beelink GTR 9 Pro, is now $2999.... wow...
Never tried it for much coding though.
https://www.reddit.com/r/LocalLLaMA/comments/1rv690j/opencod...
?
> there isnt any telemetry, the open telemetry thing is if you want to get spans like the ai sdk has spans to track tokens and stuff but we dont send them anywhere and they arent enabled either
> most likely these requests are for models.dev (our models api which allows us to update the models list without needing new releases)
> opencode will proxy all requests internally to https://app.opencode.ai
> There is currently no option to change this behavior, no startup flag, nothing. You do not have the option to serve the web app locally, using `opencode web` just automatically opens the browser with the proxied web app, not a true locally served UI.
> https://github.com/anomalyco/opencode/blob/4d7cbdcbef92bb696...
I'm actually moving to containerised isolation. I realised the agents waste too much time trying to correctly install dependencies, not unlike a normal nixos user.
I use it with Qwen 3.5 running locally when my daily limits run out on my other subscriptions.
The harness is great. Local models are just slow enough that the subscription models are easier to use. For most of my tasks these days, the model's capability is sufficient; it is just not as snappy.
I just did a one hour vibe session today, ripping out a library dependency and replacing it with another and pushing the library to pypi. I should take my task list and let the local model replicate the work and see how it works out.
I briefly dabbled with Aider some months back but never got any real work done with it. Without installing each one of these new tools I'm having trouble grokking what is changing about them that moves the LLM-assisted software dev experience forward.
There's also a request and a PR to add such option but it was closed due to "not adhering to community standards"
I'm not a US citizen, so both companies are the same, as far as I'm concerned.
Still, I feel like "will commit illegal mass murder against their own citizens" is a significant enough degree more evil. I think lots of corporations will help their government murder citizens of other countries, but very few would go so far as to agree to murder their own (fellow) citizens ... just to get a juicy contract.
But you're still choosing evil when you could try local models
We also have taboos against betraying/murdering/whatever people of other tribes, but those taboos are much weaker and get relaxed sometimes (eg. in war). My point is, it takes significantly more anti-social (ie. evil) behavior to betray your own tribe, in the deepest way possible, than it does to do horrible things to other tribes.
This is just as much true for Russians murdering Ukranians as Ukranians murdering Russians, or any other conflict group: almost all Russians would consider a Russian who helps kill Russians to be more evil than a Russian who kills Ukranians (and vice versa).
https://www.washingtonpost.com/technology/2026/03/04/anthrop...
Coding is mostly "agentic" so I'm bit puzzled.
In this case if you have a server with an endpoint you can run opencode when the endpoint is called and pass it the prompt. Opencode then think, plan and act accordingly to you request, possibly using tools, skills, calling endpoints,etc.
Do you have resources you can point to / mind sharing your setup? What were the biggest problems / delights doing this?
Many folks from other tools are only getting exposed to the same functionality they got used to, but it offers much more than other harnesses, especially for remote coding.
You can start a service via `opencode serve`, it can be accessed from anywhere and has great experience on mobile except a few bugs. It's a really good way to work with your agents remotely, goes really well with TailScale.
The WebUI that they have can connect to multiple OpenCode backends at once, so you may use multiple VPS-es for various projects you have and control all of them from a single place.
Lastly, there's a desktop app, but TBH I find it redundant when WebUI has everything needed.
Make no mistakes though, it's not a perfect tool, my gripes with it:
- There are random bugs with loading/restoring state of the session
- Model/Provider selection switch across sessions/projects is often annoying
- I had a bug making Sonnet/Opus unusable from mobile phone because phone's clock was 150ms ahead of laptop's (ID generation)
- Sometimes agent get randomly stuck. It especially sucks for long/nested sessions
- WebUI on laptop just completely forgot all the projects at one day
- `opencode serve` doesn't pick up new skills automatically, it needs to be restarted
At least you can easily turn off telemetry in Claude Code - just set CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC to 1.
You can use Claude Code with llama.cpp and vLLM, too right out of the box with no additional software necessary, just point ANTHROPIC_BASE_URL at your inference server of choice, with any value in ANTHROPIC_API_KEY.
Some people think that Anthropic could disable this at any time, but that's not really true - you can disable automatic updates and back up and reuse native Claude Code binaries, ensuring Anthropic cannot change your existing local Claude Code binary's behavior.
With that said, I like the idea of an open source TUI agent that won't spy on me without my consent and no way to disable it much better than a closed source TUI agent that I can effectively neuter telemetry on, but sadly, OpenCode is not the former. It's just another piece of VC-funded spyware that's destined for enshittification.
¹https://github.com/anomalyco/opencode/blob/4d7cbdcbef92bb696...
You'd be surprised how useless datasets become with like 10% garbage data when you don't know which data is garbage
I’m sure there’s a more elegant way to say this, but OpenCode feels like an open source Claude Code, while pi feels like an open source coding agent.
Pi is refreshingly minimal in terms of system prompts, but still works really well and that makes me wonder whether other harnesses are overdoing. Look at OpenCode's prompts, for instance - long, mostly based on feels and IMO unnecessary. I would've liked to just overwrite OC's system prompts with Pi's (to get other features that Pi doesn't have) but that isn't possible today (without maintaining a custom fork)
I used it recently inside a CI workflow in GitLab to automatically create ChangeLog.md entries for commits. That + Qwen 3.5 has been pretty successful. The job starts up Pi programatically, points it at the commits in question, and tells it to explore and get all the context it needs within 600 seconds... and it works. I love that this is possible.
JS is not something that was developed with CLI in mind and on top of that that language does not lend itself to be good for LLM generation as it has pretty weak validation compared to e.g. Rust, or event C, even python.
Not to mention memory usage or performance.
I can’t find the tweet from Mario (the author), but he prefers the Typescript/npm ecosystem for non-performance critical systems because it hits a sweet spot for him. I admire his work and he’s a real polyglot, so I tend to think he has done his homework. You’ll find pi memory usage quite low btw.
Also python ones would also allow self modifying. I'm always puzzled (and worried) when JS is used outside of browsers.
I'm biased as I find JS/TS rather ugly language compared to anything other basically (PHP is close second). Python is clean, C has performance, Rust is clean and has performance, Java has the biggest library and can run anywhere.
It’s simply one of the most productive languages. It actually has a very strong type system, while still being a dynamic language that doesn’t have to be compiled, leading to very fast iteration. It’s also THE language you use when writing UIs. Execution is actually pretty fast through the runtimes we have available nowadays.
The only other interpreted language is Python and that thoroughly feels like a toy in comparison (typing situation still very much in progress, very weak ORM situation, not even a usable package manger until recently!).
You download one .py, run it and uv automatically downloads and installs any requirements to a virtual environment and runs it
I'm unsure that I agree with this, for my smaller tools with a UI I have been using rust for business logic code and then platform native languages, mostly swift/C#.
I feel like with a modern agentic workflow it is actually trivial to generate UIs that just call into an agnostic layer, and keeping time small and composable has been crucial for this.
That way I get platform native integration where possible and actual on the metal performance.
The simplicity of extending pi is in itself addictive, but even in its raw form it does the job well.
Before finding pi I had written a lot of custom stuff on top of all the provider specific CLI tools (codex, Claude, cursor-agent, Gemini) - but now I don’t have to anymore (except if I want to use my anthropic sub, which I will now cancel for that exact reason)
But the magic is that it knows how to modify itself, if you need a plan mode you can ask it to implement it :)
that is actually really nice
It's a pity it's written in TS, but at least it can draw from a big contributor pool.
There is https://eca.dev/ too, which might worth considering, which is a UI agnostic agent, a bit like LSP servers.
I sprinkle in some billed API usage to power my task-planner and reviewer subagents (both use GPT 5.4 now).
The ability to switch models is very useful and a great learning experience. GLM, Kimi and their free models surprised me. Not the best, not perfect, but still very productive. I would be a wary shareholder if I owned a stake in the frontier labs… that moat seems to be shrinking fast.
The big expensive models are great at planning tasks and reviewing the implementation of a task. They can better spot potential gotchas, performance or security gaps, subtle logic and nuance that cheaper models fail to notice.
The small cheap models are actually great (and fast) at generating decent code if they have the right direction up front.
So I do all the spec writing myself (with some LLM assistance), and I hand it to a Supervisor agent who coordinates between subagents. Plan -> implement -> review -> repeat until the planner says “all done”.
I switch up my models all the time (actively experimenting) but today I was using GPT 5.4 for review and planning, costing me about $0.4-$1 for a good sized task, and Kimi for implementation. Sometimes my spec takes 4-5 review loops and the cost can add up over an 8 hour day. Still cheaper than Claude Max (for now, barely).
Each agent retains a fairly small context window which seems to keep costs down and improves output. Full context can be catastrophic for some models.
As for the spec writing, this is the fun part for me, and I’ve been obsessing over this process, and the process of tracking acceptance criteria and keeping my agents aligned to it. I have a toolkit cooking, you can find in my comment history (aiming to open source it this week).
I'm building a full stack web app, simple but with real API integrations with CC.
Moving so fast that I can barely keep a hold on what I'm testing and building at the same time, just using Sonnet. It's not bad at all. A lot of the specs develop as I'm testing the features, either as an immediate or a todo / gh issue.
How can you manage an agentic flow?
It's been a moving target for years at this point.
Both open and closed source models have been getting better, but not sure if the open source models have really been closing the gap since DeepSeek R1.
But yes: If the top closed source models were to stop getting better today, it wouldn't take long for open source to catch up.
They shouldn't, as long as your terminal emulator doesn't. Why do you think it's Wayland related?
And then the official docs: https://opencode.ai/docs/troubleshooting/#linux-wayland--x11...
> Linux: Wayland / X11 issues
> On Linux, some Wayland setups can cause blank windows or compositor errors.
> If you’re on Wayland and the app is blank/crashing, try launching with OC_ALLOW_WAYLAND=1.
> If that makes things worse, remove it and try launching under an X11 session instead.
OC_ALLOW_WAYLAND=1 didn't work for me (Ubuntu 24.04)
Suggesting to use a different display server to use a TUI (!!) seems a bit wild to me. I didn't put a lot of time into investigating this so maybe there is another reason than Wayland. Anyway I'm using Pi now
That issue points out that it is probably a dependency problem.
The other problem is that they let a package manager block the UI and either swallow hard errors or unable to progress on soft errors. The errors are probably (hopefully) in some logs.
A dev oriented TUI should report unrecoverable errors on screen or at least direct you to the logs. It's not easy to get right, but if you dare to do it isn't rocket science either. They didn't dare.
It works perfectly fine on Niri, Hyprland and other Wayland WMs.
What problem do you have?
I didn't dig further
Seems like there's many github issues about this actually
https://github.com/anomalyco/opencode/issues/14336
If you respond twice to their theme query probes, the whole thing bricks. Or if you're slightly out of order. It's very delicate.
exec bwrap \
--unshare-pid \
--unshare-ipc \
--unshare-uts \
--share-net \
--bind "$OPENCODE_ROOT" "$OPENCODE_ROOT" \
--bind "$CURRENT_DIR" "$CURRENT_DIR" \
--bind "$HOME/.config/opencode/" "$HOME/.config/opencode/" \
--ro-bind /bin /bin \
--ro-bind /etc /etc \
--ro-bind /lib /lib \
--ro-bind /lib64 /lib64 \
--ro-bind /usr /usr \
--bind /run/systemd /run/systemd \
--tmpfs /tmp \
--proc /proc \
--dev /dev \
--setenv OPENCODE_EXPERIMENTAL_LSP_TOOL true \
--setenv EDITOR emacs \
--setenv PATH "$OPENCODE_BINDIR:/usr/bin:/bin" \
--setenv HOME "$HOME" \
-- \
"opencode" "$@"fence -t code -- opencode
https://www.youtube.com/live/z0JYVTAqeQM?si=oLvyLlZiFLTxL7p0
Assuming you pay per token, which seems like a really strange workflow to lock yourself into at this point. Neither paid monthly plans nor local models suffer from that issue.
I tried once to use APIs for agents but seeing a counter of money go up and eventually landing at like $20 for one change, made it really hard to justify. I'd rather pay $200/month before I'd be OK with that sort of experience.
I assume the usage varies based on prompt caching, but I could be wrong. Why would you assume prompt caching would have zero effect on the subscription usage?
Long tool outputs/command outputs everything in my harness is spilled over to the filesystem. Context messages are truncated and split to filesystem with a breadcrumb for retrieving the full message.
Works really well.
IMO, the web UI is a killer feature - it’s got just enough to be an agent manager - without any fluff. I run it on my remote VMs and connect over HTTP.
More than that, it's an extremely large and complex TypeScript code base — probably larger and more complex than it needs to be — and (partly as a result) it's fairly resource inefficient (often uses 1GB of RAM or more. For a TUI).
On top of that, at least I personally find the TUI to be overbearing and a little bit buggy, and the agent to be so full of features that I don't really need — also mildly buggy — that it sort of becomes hard to use and remember how everything is supposed to work and interact.
Before coding agents it took quite a lot more experience before most people could develop and ship a successful product. The average years of experience of both core team and contributors was higher and this reflected in product and architecture choices that really have an impact, especially on non-functional requirements.
They could have had better design and architecture in this project if they had asked the AI for more help with it, but they did not even know what to ask or how to validate the responses.
Of course, lots of devs with more years of experience would do just as badly or worse. What we are seeing here though is a filter removed that means a lot of projects now are the first real product everyone the team has ever developed.
I would (incorrectly) assume that a product like this would be heavily tested via AI - why not? AI should be writing all the code, so why would the humans not invest in and require extreme levels of testing since AI is really good at that?
Like Rails/DHH was one phase, Git/GitHub another.
And right now it's kinda Claude Code. But they're so obviously really bad at development that it feels like a MLM scam.
I'm just describing the feeling I'm getting, perhaps badly. I use Claude, I recommended Claude for the company I worked at. But by god they're bloody awful at development.
It feels like the point where someone else steps in with a rock solid, dependable, competitor and then everyone forgets Claude Code ever existed.
CC leads and they follow.
this is what i notice with openclaw as well. there have been releases where they break production features. unfortunately this is what happens when code becomes a commidity, everyone thinks that shipping fast is the moat but at the expense of suboptimality since they know a fix can be implemented quickly on the next release.
I’m sure we’ll all learn a lot from these early days of agentic coding.
So far what I am learning (from watching all of this) is that our constant claims that quality and security matter seem to not be true on average. Depressingly.
I'm 13 years into this industry, this is the first I'm hearing of this.
But as agents move from prototypes to production, the calculus changes. Production systems need: - Memory continuity across sessions - Predictable behavior across updates - Security boundaries that don't leak
The tools that prioritize these will win the enterprise market. The ones that don't will stay in the prototype/hobbyist space.
We're still in the "move fast" phase, but the "break things" part is starting to hurt real users. The pendulum will swing back.
The reason for this is that product development involves making decisions which can later be classified as good or bad decisions.
The good decisions must remain stable, while the bad decisions must remain open to change and therefore remain unstable.
The AI doesn't know anything about the user experience, which means it will inevitably change the good decisions as well.
Only for the non-pro users. After all, those users were happy to use excel to write the programs.
What we're seeing now is that more and more developers find they are happy with even less determinism than the Excel process.
Maybe they're right; maybe software doesn't need any coherence, stability, security or even correctness. Maybe the class of software they produce doesn't need those things.
I, unfortunately, am unable to adopt this view.
Also most of the long running enterprise projects I’ve seen - there was one that had been around for like 10 years and like about 75% of the devs I hadn’t even heard of and none of the original ones were in the project at all.
The thing had no less than three auditing mechanisms, three ways of interacting with the database, mixed naming conventions, like two validation mechanisms none of which were what Spring recommended and also configurations versioned for app servers that weren’t even in use.
This was all before AI, it’s not like you need it for projects to turn into slop and AI slop isn’t that much different from human slop (none of them gave a shit about ADRs or proper docs on why things are done a certain way, though Wiki had some fossilized meeting notes with nothing actually useful) except that AI can produce this stuff more quickly.
When encountered, I just relied on writing tests and reworking the older slop with something newer (with better AI models and tooling) and the overall quality improved.
I expect that from something guiding the market, but there have been times where stuff changes, and it isn't even clear if it is a bug or a permanent decision. I suspect they don't even know.
All code is not fungible, "irreverent code that kinda looks okay at first glance" might be a commodity, but well-tested, well-designed and well-understood code is what's valuable.
Code today can be as verbose and ugly as ever, because from here on out, fewer people are going to read it, understand and care about it.
What's valuable, and you know this I think, is how much money your software will sell for, not how fine and polished your code is.
Code was a liability. Today it's a liability that cost much much less.
There are limits to what even AI can do to code, within practical time-limits. Using AI also costs money. So, easier it is to maintain and evolve a piece of software, the cheaper it will be to the owners of that application.
How much value are you going to be able to extract over its lifetime once your customers want to see some additional features or improvements?
How much expensive maintenance burden are you incurring once any change (human or LLM generated) is likely to introduce bugs you have no better way of identifying than shipping to your paying customers?
Maybe LLM+tooling is going to get there with producing a comprehensible and well tested system but my anectodal experience is not promising. I find that AI is great until you hit its limit on a topic and then it will merrily generate tokens in a loop suggesting the same won't-work-fix forever.
The whole thing reminds me a bit of the many RAD tools that were supposed to 'solve' programming. While it was easy to start and produce something with those tools, at some point you started spending way too much time working around the limitations and wished you started from scratch without it.
Code that has not been thoroughly tested is a greater liability, not a lesser one.l, the faster you can write it.
[1] https://museumoffailure.com/exhibition/wonka-chocolate-exper...
Have fun on windows - automatic no from me. https://github.com/anomalyco/opencode/issues?q=is%3Aissue%20...
There are some cases where hardware support on Linux is suboptimal, such as Nvidia cards and many fingerprint readers, but things are a LOT better now than they used to be. Most consumer laptops and desktops will run linux just fine.
Then Windows 95 came out and I actively hated it, but did think it was amazingly pretty - somehow this was the impetus for me to get a pc again, which I put Windows NT on. Which was profitable for freelance gigs in college. Soon after that, I dual booted it to Linux and spent most of my time in Slackware.
After that, I graduated and had enough money to buy a second rig, which I installed OS/2 warp on - which was good for side gigs. And I really liked. A lot. But my day job required that I have a Windows NT box to shell into the Solaris servers as we ran. Then I got a better class of employer and the next several let me run a Linux box to connect to our solaris (or Aix) servers.
Next my girlfriend at the time got a PowerBook G4 and installed OS X on it. It was obviously amazing. Windows XP came out, and it was once again so much worse than Windows NT - and crashed so much more - which was odd as it was based on Windows NT. (yes 98 was before this but it was really bad). Anyhow, right about here the Linux box I was running at home, died. And it was obvious that I was not going to buy an XP box, so I bought my first Mac.
And it’s been the same for the last 25 years - every time I look at a Windows box it’s horrible. I pretty much always have a Linux box headless somewhere in the house, and one rented in the cloud, and a Mac for interacting with the world.
And like the parent I actively dislike windows. And that’s interesting because I’ve liked most other operating systems I’ve used in my life, including MS-DOS. Modern windows is uniquely bad.
On the other hand, it is actually useful that there is mostly a specific place you find settings etc, as in windows/linux it tends to vary depending on the app where to find those (is there a bar on top of the window? Is there a button to expand a menu somewhere? Something else? Who knows).
You need to set an explicit "small model" in OpenCode to disable that.
I mean the default model being Grok, whatever - that everyone sets to their favorite.
But the hidden use of a different model is wow.
(I do mean this as a general principle, but also it was pointed out elsewhere in the thread that this is a particularly "high velocity" project as far as unexpected changes go.)
The small_model option configures a separate model for lightweight tasks like title generation. By default, OpenCode tries to use a cheaper model if one is available from your provider, otherwise it falls back to your main model.
I would expect that if you set a local model it would just use the same model. Or if for example you set GPT as main model, it would use something else from OpenAI. I see no mentions of Grok as default
on that version, it does not fall back to the main model. it silently calls opencode zen and uses gpt-5-nano, which is listed as having 30 day retention plus openai policy, which is plain text human review by openai AND 3rd party contractors.
i see they removed the title model on v1.2.23.
i was so annoyed i made an account here today
Using AI to generate all your code only really makes sense if you prioritize shipping features as fast as possible over the quality, stability and efficiency of the code, because that's the only case in which the actual act of writing code is the bottleneck.
Personally, I find this idea that "coding isn't the bottleneck" completely preposterous. Getting all of the API documentation, the syntax, organizing and typing out all of the text, finding the correct places in the code base and understanding the code base in general, dealing with silly compiler errors and type errors, writing a ton of error handling, dealing with the inevitable and inoraticable boilerplate of programming (unless you're one of those people that believe macros are actually a good idea and would meaningfully solve this), all are a regular and substantial occurrence, even if you aren't writing thousands of lines of code a day. And you need to write code in order to be able to get a sense for the limitations of the technology you're using and the shape of the problem you're dealing with in order to then come up with and iterate on a better architecture or approach to the problem. And you need to see your program running in order to evaluate whether it's functionality and design a satisfactory and then to iterate on that. So coding is actually the upfront costs that you need to pay in order to and even start properly thinking about a problem. So being able to get a prototype out quickly is very important. Also, I find it hard to believe that you've never been in a situation where you wanted to make a simple change or refactor that would have resulted in needing to update 15 different call sites to do properly in a way that was just slightly variable enough or complex enough that editor macros or IDE refactoring capabilities wouldn't be capable of.
That's not to mention the fact that if agentic coding can make deploying faster, then it can also make deploying the same amount at the same cadence easier and more relaxing.
Which one you think companies prefer? Or if you're a consulting business, which one do you think your clients prefer?
I have yet to actually see a single example of the latter, though. OpenCode isn't an isolated case - every project with heavy AI involvement that I've personally examined or used suffers from serious architectural issues, tons of obvious bugs and quirks, or both. And these are mostly independent open source projects, where corporate interests are (hopefully) not an influence.
I will continue to believe it's not actually possible until I am proven wrong with concrete examples. The incentives just aren't there. It's easy to say "just mindlessly follow X principle and your software will be good", where X is usually some variation of "just add more tests", "just add more agents", "just spend more time planning" etc. but I choose to believe that good software cannot be created without the involvement of someone who has a passion for writing good software - someone who wouldn't want to let an LLM do the job for them in the first place.
That's a complete strawman of what I — or others trying to learn how to use coding agents to increase quality, like Simon Willison or the Oxide team — am saying.
> but I choose to believe that good software cannot be created without the involvement of someone who has a passion for writing good software - someone who wouldn't want to let an LLM do the job for them in the first place.
This is just a no true Scotsman. I prefer to use coding agents because they don't forget details, or get exhausted, or overwhelmed, or lazy, or give up, ever — whereas I might. Therefore, they allow me to do all of the things that improve code and software quality more extensively and thoroughly, like refactors, performance improvements, and tests among other things (because yes, there is no single panacea). Furthermore, I do still care about the clarity, concision, modularity, referential transparency, separation of concerns, local reasonability, cognitive load, and other good qualities of the code, because if those aren't kept up a) I can't review the code effectively or debug things as easily when they go wrong, b) the agent itself will struggle to male changes without breaking other things, and struggle to debug, c) those things often eventually effect the quality of the end state software.
Additionally, what you say is empirically false. Many people who do deeply value quality software and code quality, such as the creators of Flask, Redis, and SerenityOS/Ladybird, all use and value agentic coding.
Just because you haven't seen good quality software with a large amount of agentic influence doesn't mean it isn't possible. That's very close minded.
No, you are not allowed to see the excellent code that their masterful use of AI generated. You're just going to have to read their blog posts.
[1] https://github.com/badlogic/pi-mono/tree/main/packages/codin...
But we did a lot of work on improving the experience, both on UX, performance, and the actual reliability of the agent itself.
I would suggest you to give it a try.
Also, non-interactive support, useful for some workflows:
I build VT Code with Tree-sitter for semantic understanding and OS-native sandboxing. It's still early but I confident it usable. I hope you'll give it a try.
One of the best features is they haven't been noticed by Anthropic yet so you can still use your Claude subscription.
Interesting you say this because I'd say the opposite is true historically, especially in the systems software community and among older folks. "Do one thing and do it well" seems to be the prevailing mindset behind many foundational tools. I think this why so many are/were irked by systemd. On the other hand newer tools that are more heavily marketed and often have some commercial angle seem to be in a perpetual state of tacking on new features in lieu of refining their raison d'etre.
[0] https://www.reddit.com/r/LocalLLaMA/comments/1rv690j/opencod...
that #12446 PR hasn't even been resolved to won't merge and last change was a week ago (in a repo with 1.8k+ open PRs)
Must be a karmic response from “Free” /s
The choice isn't "telemetry or you're blindfolded", the other options include actually interacting with your userbase. Surveys exist, interviews exist, focus groups exist, fostering communities that you can engage is a thing, etc.
For example, I was recruited and paid $500 to spend an hour on a panel discussing what developers want out of platforms like DigitalOcean, what we don't like, where our pain points are. I put the dollar amount there only to emphasize how valuable such information is from one user. You don't get that kind of information from telemetry.
We all know it’s extremely, extremely hard to interact with your userbase.
> For example I was paid $500 an hour
+the time to find volunteers doubled that, so for $1000 an hour x 10 user interviews, a free software can have feedback from 0.001% of their users. I dislike telemetry, but it’s a lie to say it’s optional.
—a company with no telemetry on neither of our downloadable or cloud product.
On the contrary, your users will tell you what you need to know, you just have to pay attention.
> I dislike telemetry, but it’s a lie to say it’s optional.
The lie is believing it’s necessary. Software was successful before telemetry was a thing, and tools without telemetry continue to be successful. Plenty of independent developers ship zero telemetry in their products and continue to be successful.
OpenCode has been much more stable for me in the 6 months or so that I’ve been comparing the two in earnest.
On top of that. Open code go was a complete scam. It was not advertised as having lower quality models when I paid and glm5 was broken vs another provider, returning gibberish and very dumb on the same prompt
That being said, I do prefer OpenCode to Codex and Claude Code.
(I'm also hating on TS/JS: but some day some AI will port it to Rust, right?)
CC I have the least experience with. It just seemed buggy and unpolished to me. Codex was fine, but there was something about it that just didn't feel right. It seemed fined for code tasks but just as often I want to do research or discuss the code base, and for whatever reason I seemed to get terse less useful answers using Codex even when it's backed by the same model.
OpenCode works well, I haven't had any issues with bugs or things breaking, and it just felt comfortable to use right from the jump.
Tbf, this seems exactly like Claude Code, they are releasing about one new version per day, sometimes even multiple per day. It’s a bit annoying constantly getting those messages saying to upgrade cc to the latest version
It's annoying how I always get that "claude code has a native installer xyz please upgrade" message
The npm version tells you there’s a native installer and to use it instead
The native installer never says anything. Not sure how it gets updated
FWIW, in Kitty on Linux, SHIFT + mouse-select copies and SHIFT + middle-mouse-button pastes. This use of SHIFT and otherwise using standard Unix style copy/paste is common in a lot of TUIs (eg, weechat).
I then tried running other options like picoclaw/picocode etc but they were all really hard to manage/create
The UI/UX I want is that I can just put my free openrouter api key in and then I am ready to go to get access to free models like Arcee AI right now
After reading your comments/I read this thread, I tried crush by charmbracelet again and it gives the UI/UX that I want.
I am definitely impressed by crush/ the charm team. They are on HN and they work great for me, highly recommended if you want something which can work on low constrained devices
I do feel like Charm's TUI's are too beautiful in the sense that running a connection over SSH can delay so when I tried to copy some things, the delay made things less copy-able but overall, I think that I am using Crush and I am happy for the most part :-)
Edit: That being said, just as I was typing this, Crush took all the Free requests from Openrouter that I get for free so it might be a bit of minor issue but overall its not much of an issue from Crush side, so still overall, my point is that Crush is worth checking out
Kudos to the CharmBracelet team for making awesome golang applications!
That's (one of the reasons) why I'm favoring Codex over Claude Code.
Claude Code is an... Electron app (for a TUI? WTH?) and Codex is Rust. The difference is tangible: the former feels sluggish and does some odd redrawing when the terminal size changes, while the latter definitely feels more snappy to me (leaving aside that GPT's responses also seem more concise). At some point, I had both chewing concurrently on the same machine and same project, and Claude Code was using multiple GBs of RAM and 100% CPU whereas Codex was happy with 80 MB and 6%.
Performance _is_ a feature and I'm afraid the amounts of code AI produces without supervision lead to an amount of bloat we haven't seen before...
Also, that is so far outside my experience that I can’t tell if you are joking.
Codex seems to always do exactly what I ask of it, nothing more, and nothing less, and with as many shortcuts as possible.
The difference in feel between Codex and Claude Code is obvious.
The whole thing is vibed anyway, I'm sure they could get it done in a week or two for their quality standards.
What would make go more "accessible to contributors" than Rust?
Rust is my favorite, though. There are values beyond ease of contribution. I can't replicate the experience with a Rust project anymore, but I suspect it would have been tougher.
CC isn't foss in the first place, so the previous comment falls short.
This is the second time I've seen claims like this in the last 24 hours and I'm afraid I might have lost contact with reality.
There are more syntax features, more and more complex semantics, and while rustc and clippy do a great job of explaining like 90% of errors, the remaining 10% suuuuuck.
There’s also some choices imposed by the build system (like cargo allowing multiple versions of the same dep in a workspace) and by the macro system (axum has some unintuitive extractor ordering needs that you won’t find unless you know to look for them), and those things and the hurdles they present become intuitive after a time but just while getting started? Oof
Go could implement something like this with no dependencies outside the standard library. It would make sense to take on a few, but a comparable Rust project would have at least several dozens.
Also, Go can deliver a single binary that works on every Linux distribution right out of the box. In Rust, its possible but you have to static compile with muslc and that is a far less well-trodden path with some significant differences to the glibc that most Rust libraries have been tested with.
Rust is accessible to everyone now that Claude Code and Opus can emit it at a high proficiency level.
Rust is designed so the error handling is ergonomic and fits into the flow of the language and the type system. Rust code will be lower defect rate by default.
Plus it's faster and doesn't have a GC.
You can use Rust now even if you don't know the language. It's the best way to start learning Rust.
The learning curve is not as bad as people say. It's really gentle.
Rust is the best AI language. Bar none.
Erlang would offer similar benefits, because what we're doing with these things is more message passing than processing.
Rust is what I'd want agents writing for edge devices, things I don't want to have to monitor. Granted, our devices are edge devices to Anthropic, but they're more tightly coupled to their services.
Also, it'll be hard for them to lure good people to work on that thing. Absolutely no one is getting excited to write, vibe, or maintain Java.
The redraw glitches you’re referring to are actually signs of what I consider to be a pretty major feature, a reason to use `claude` instead of `codex` or `opencode`: `claude` doesn’t use the alternate screen, whereas the other two do. Meaning that it uses the standard screen buffer, meaning that your chat history is in the terminal (or multiplexer) scrollback. I much prefer that, and I totally get why they’ve put so much effort into getting it to work well.
In that context handling SIGWINCH has some issues and trickiness. Well worth the tradeoff, imo.
on my m1, claude is noticeably slower when starting, but it feels ok after that.
You can run a codex instance on machine A and connect the TUI to it from machine B. The same open source core and protocol is shared between the Codex app, VS Code and Xcode.
Is Claude Code like this too? I wonder if Pi is any better.
A big downside would be paying actual cost price for tokens but on the other hand, I wouldn't be tied to Google's model backend which is also extremely flaky and unable to meet demand a lot of the time. If I could get real work done with open models (no idea if that's the case yet) and switch providers when a given provider falls over, that would be great.
I'm very happy with Pi myself (running it on a small VPS so that I don't need to do sandboxing shenanigans).
Honestly, these models seem quite on par with Claude. Some days they seem slightly worse, some days I can't tell the difference.
AFAIK, the usage quota is comparable to the Claude $200 subscription.
That's a pretty big leap (5x), but still substantially cheaper than the average American hosting.
I wonder how much of this is because the maintainers are using OpenCode to vibe the code for OpenCode.
I'm looking forward to more folks building these kinds of tools with a stronger focus on portability via API or loading local models, as means of having a genuinely useful assistant or co-programmer rather than paying some big corp way too much money (and letting them use my data) for roughly the same experience.
I tried Opencode but it was just too much? Same with Crush, 10/10 pretty but lacking in features I need. LSP support was cool though.
I don't know how much it works in practice though.
It's fully open, fairly minimal, very extensible and (while getting very frequent updates) never has broken on me so far.
Been using it more and more in the last two months, switching more and more from codex to it now.
This is my experience with most AI tools that I spend more than a few weeks with. It's happening so often it's making me question my own judgement: "if everything smells of shit, check your own shoes." I left professional software engineering a couple of years ago, and I don't know how much of this is also just me losing touch with the profession, or being an old man moaning about how we used to do it better.
It reminds me of social media: there was a time where social media platforms were defined by their features, Vine was short video, snapchat was disappearing pictures, twitter was short status posts etc. but now they're all bloated messes that try do everything.
The same looks to be happening with AI and agent software. They start off as defined by one features, and then become messes trying to implement the latest AI approach (skills, or tools, or functions, or RAG, or AGENTS.md, or claws etc. etc.)
I think shitty AI software is a product of being in a bubble and the pressure to move fast and stay relevant. Just like there was a bunch of shitty blockchain software, and a bunch of shitty VR software, and a bunch of shitty mobile app software when they were booming.
I don't think pi has been around long enough to prove it's immune to this yet.
The amount of configuration updates, broken plugins… on top of what was already a difficult product to customise; it’s simply too much.
Why isn’t `opencode-workspace` the default, given that the base product is barely usable? Bah. I just reinstalled AGY and Mistral’s Vibe and got on with the work.
I’m old. It’s an open-source, gratis thing. I’m grateful for projects like OpenCode, but it was infuriating to configure a good set of plugins and prompts for spec-driven development, only to have it all stop working a few times because of something hard to debug.
That said, the runtime is so resource heavy that, even though the heavy computational workload is given to AI on a remote cluster of servers, it will bring an old-ish laptop to a stall.
I do wonder though...highly interactive TUIs are not novel. I would wager that AI + the attention of frontend devs have created an environment where you can make fancy terminal UIs without concern for how terminals generally work and if Electron is sitting in the background, it proves it.
Hugely grateful for what they do.
Edit: it's not. https://github.blog/changelog/2026-01-16-github-copilot-now-...
They must be eating insane amounts of $$$ for this. I wouldn't expect it to last
See https://models.dev for a comparison against the normal "vanilla" API.
What caused the switch was that we're building AI solutions for sometimes price-conscious customers, so I was already familiar with the pattern of "Use a superior model for setting a standard, then fine-tuning a cheaper one to do that same work".
So I brought that into my own workflows (kind of) by using Opus 4.6 to do detailed planning and one 'exemplar' execution (with 'over documentation' of the choices), then after that, use Opus 4.6 only for planning, then "throw a load of MiniMax M2.5s at the problem".
They tend to do 90% of the job well, then I sometimes do a final pass with Opus 4.6 again to mop up any issues, this saves me a lot of tokens/money.
This pattern wasn't possible with Claude Code, thus my move to Open Code.
There are probably IDE plugins that feed prompts or context in based on your interaction with the editor.
- GH copilot API is a first class citizen with access to multiple providers’ models at a very good price with a pro plan - no terminal flicker - it seems really good with subagents - I can’t see any terminal history inside my emacs vterm :(
I only boot my windows 11 gaming machine for drm games that don’t work with proton. Otherwise it’s hot garbage
The OpenCode docs suggest its possible, but it only works with their extension (not in an already open VS Code terminal) with a very specific keyboard shortcut and only barely at that.
Reading through his X comments and GitHub comments he is behaving immaturely. I don't trust what he's saying here. Ripping out Claude API support was just throwing a tantrum. Weird given his age - he's old enough to be more mature.
Even as a CC user I’m glad someone is forcing the discussion.
My prediction: within two years ‘model neutrality’ will be a topic of debate. Creating lock-in through discount pricing is anti-competitive. The model provider is the ISP; the tool, the website.
That is not the point. That is a mere technicality.
You signed a contract. If you don't ignore the terms of the contract to use the product in a way that is explicitly prohibited, you're abusing the product. It is as simple as that.
They offer a separate product (API) if you don't like the terms of the contract.
Also, if you really want to get technical: the limits are under the assumption that caching works as intended, which requires control of the client. 3P clients suck at caching and increase costs. But that is not the overarching point.
> Creating lock-in through discount pricing is anti-competitive.
Literally everyone does this. OpenAI is doing this with Codex, far more than Anthropic is. It's not great but players much bigger than Anthropic are using discount pricing to create an anti-competitive advantage.
And yet, OpenAI have publicly said they welcome OpenCode users to use their subscription package. So how are they being anti-competitive "far more" than Anthropic?
It's a PR stunt. They'll eat the costs for a bit, once they've cornered the market they'll do the same thing as Anthropic.
Because that could be easily resolved by factoring % cache hits into the usage limits.
> Literally everyone does this.
Never a strong justification, much as I like Anthropic in general.
Why is the 'Mercedes gas station' selling gas 85% cheaper but only to Mercedes drivers?
Why is the 'Apple electric company' selling cheaper electricity to households with Apple devices?
They're not the strongest analogies, I'll admit, but that's what it smells like to me.
Absolutely not, you are not thinking from a product perspective at all.
You might not want to capture cache % hits in usage limits because there may be some edge cases you want to support that have low hits even with an optimized client. Maybe your caching strategy isn't perfect yet, so you don't count hits to keep a good product experience going.
OSS clients that freeload on the subscription break your ability to support these use cases entirely. Now you have to count cache hits at the expense of everyone else. It is a classic case of some people ruining the experience for everyone.
> Why is the 'Apple electric company' selling cheaper electricity to households with Apple devices?
Why does Netflix not let you use your OSS hacked client of choice with your subscription?
I'm guessing that a model which only covers a single language might be more compact and efficient vs a model trained across many languages and non-programming data.
Now I’m using that to generate synthetic sets and clean it up, but man I’m struggling hah. Fun though.
If you want it to stick to better practices you have to write skills, provide references (example code it can read), and provide it with harnessing tools (linters, debuggers, etc) so the agent can iterate on its own output.
Give it a look, maybe it could inspire you: https://github.com/fulgidus/zignet
Bottom-line: fine-tuning looks like the best option atm
There is nothing open about it. Please do not abuse the term "open" like in OpenBSD.
> Please do not abuse the term "open" like in OpenBSD.
this is such a pet peeve of mine; all these "open" products (except when they're not)OpenCode just has more bugs, it's incredibly derivative so it doesn't really do anything else than Codex.
The advantage of OpenCode is that it can use any underlying model, but that's a disadvantage because it breaks the native integration. If you use Opus + Claude Code, or Gpt-Codex + Codex App, you are using it the way it was designed to be used.
If you don't actually use different models, or plan to switch, or somehow value vendor neutrality strategically, you are paying a large cost without much reward.
This is in general a rule, vendor neutrality is often seen as a generic positive, but it is actually a tradeoff. If you just build on top of AWS for example, you make use of it's features and build much faster and simpler than if you use Terraform.
To change that, you need to set a custom "small model" in the settings.
Imagine someone using it at work, where they are only allowed to use a GitHub Copilot Business subscription (which is supported in OpenCode). Now they have sent proprietary code to a third party, and don't even know they're doing it.
https://old.reddit.com/r/LocalLLaMA/comments/1rv690j/opencod...
They also don't let you run all local models, but specific whitelisted by another 3rd party: https://github.com/anomalyco/opencode/issues/4232
Everything you read on the internet seems exaggerated today. Especially true for reddit, and especially especially true for r/LocalLllama which is a former shadow of itself. Today it's mostly sockpuppets pushing various tools and models, and other sockpuppets trying to push misinformation about their competitors tools/models.
it will use whatever small model there is in your provider
we had a fallback where we provided free small models if your provider did not have one (gpt nano)
some configs fell back to this unexpectedly which upset people so we removed it
But for serious (“grown up”) use, stuff like this just doesn’t fly. At all. We have to know and be able to control exactly where data gets sent. You can’t just exfiltrate our data to random unvetted endpoints.
Given the hurt trust of the past, there also needs to be a communication campaign (“actually we’re secure now”), because otherwise people will keep going around claiming that OpenCode sends all of your data to Grok. This would really unnecessarily hurt the project in the long run.
More importantly, the current dev branch source for packages/opencode/src/session/summary.ts shows summarizeMessage() now only computes diffs and updates the message summary object; it does not make an LLM call there anymore. The current code path calls summarizeSession() and summarizeMessage(), and summarizeMessage() just filters messages, computes diffs, sets userMsg.summary.diffs, and saves the message.
https://github.com/anomalyco/opencode/blob/dev/packages/open...
The situation is ... pretty bad. But I don’t think this is particularly malicious or even a really well considered stance, but just a compromise in order to move fast and ship useful features.
To make it easily adoptable by anyone privacy conscious without hours of tweaking, there should be an effort to massively improve this situation. Luckily, unlike Claude Code, the project is open source and can he changed!
The model selection for title generation works as follows (prompt.ts:1956-1960): 1. If the title agent has an explicit model configured — that model is used. 2. Otherwise, it tries Provider.getSmallModel(providerID) — which picks a "small" model from the same provider as the current session, using this priority list (provider.ts:1396-1402): - claude-haiku-4-5 / claude-haiku-4.5 / 3-5-haiku / 3.5-haiku - gemini-3-flash / gemini-2.5-flash - gpt-5-nano - (Copilot adds gpt-5-mini at the front; opencode provider uses only gpt-5-nano) 3. If no small model is found — it falls back to the same model currently being used for the session. So by default, title generation uses a cheaper/faster small model from the same provider (e.g., Haiku if on Anthropic, Flash if on Google, nano if on OpenAI), and if none are available, it just uses whatever model the user is chatting with. You can also override this entirely by configuring a model on the title agent.
Chat titles would work even when the local llama.cpp server hadn't started, and it was never in the the llama.cpp logs, it used an external model I hadn't set up and had not intended to use.
It was only when I set `small_model` that I was able to route title generation to my own models.
Personally I think it's necessary to run opencode itself inside a sandbox, and if you do that you can see all of the rejected network calls it's trying to make even in local mode. I use srt and it was pretty straightforward to set up
Just my prompts, or everything the agent has in the context window?
Also, could you please provide a reference for this claim? Thank you
I do not understand the insistence on using JavaScript for command line tools. I don't use rust at all, but if I'm making a vibe coded cli I'm picking rust or golang. Not zig because coding agents can't handle the breaking changes. What better test of agentic coders' conviction in their belief in AI than to vibe a language they can't read.
I feel that if you want to build a coding agent / harness the first thing you should do is to build an evaluation framework to track performance for coding by having your internal metrics and task performance, instead I see most coding agents just fiddle with adding features that don't improve the core ability of a coding agent.
I considered creating a PR for that, but found that creating new agents instead worked fine for me.
The changes I've made locally are:
- Added a discuss mode with almost on tools except read file, ask tool, web search only based no heuristics + being able to switch from discuss to plan mode.
Experiments:
- hashline: it doesn't bring that much benefit over the default with gpt-5.4.
- tried scribe [0]: It seems worth it as it saves context space but in worst case scenarios it fails by reading the whole file, probably worth it but I would need to experiment more with it and probably rewrite some parts.
The nice thing about opencode is that it uses sqlite and you can do experiments and then go through past conversation through code, replay and compare.
Now I just started looking into OpenCode yesterday, but seems you can override the system prompts by basically overloading the templates used in for example `~/.opencode/agents/build.md`, then that'd be used instead of the default "Build" system prompt.
At least from what I gathered skimming the docs earlier, might not actually work in practice, or not override all of it, but seems to be the way it works.
Have to watch out for other plugins trying to do the same, though.
I really should look into more "native" Emacs options as I find using vterm a bit of a clunky hack. But I'm just not that excited about this stuff right now. I use it because I'm lazy, that's all. Right now I'm actually getting into woodwork.
I wonder why did they use Typescript and not a more resource efficient language like C, C++, Rust, Zig?
Since their code is generated by AI human preferences shouldn't matter much and AI is happy to work with any language.
- With love The Official Pink Eye #ThereIsNoOther
It started getting increasingly flaky with Anthropic's API recently, so I switched back to Claude Code for a couple of days. Oh my, what a night and day difference. Tokens, MCP use, everything.
For anyone reading at OpenAI, your support for OpenCode is the reason I now pay you 200 bucks a month instead.
But I don't use MCP, don't need anything complicated, and not sure what OpenCode actually offers on top. The UI is slightly nicer (but oh so much heavier resource usage), both projects source code seems vibecoded and the architecture is held together with hopes and dreams, but in reality, minor difference really.
Also, didn't find a way in OpenCode to do the "Fast Mode" that Codex has available, is that just not possible or am I missing some setting? Not Codex-Spark but the mode that toggles faster inference.
All the background capability Claude code now has makes things way more complex and I saw a meaningful improvement with 4.6 versus 4.5, so imagine other harnesses will take time to catch up.
I used Claude with paid subscription and codex as well and settled to OpenCode with free models.
I have things to criticize about it, their approach to security and pulling in code being my main one, but over all it’s the most complete solution I’ve found.
They have a server/client architecture, a client SDK, a pretty good web UI and use pretty standard technologies.
The extensibility story is good and just seems like the right paradigms mostly, with agents, skills, plugins and providers.
They also ship very fast, both for good and bad, I’ve personally enjoyed the rapid improvements (~2 days from criticizing not being able to disable the default provider in the web ui to being able to).
I think OpenCode has a pretty bright future and so far I think that my issues with it should be pretty fixable. The amount of tasteful choices they’ve made dwarfs the few untasteful ones for me so far.
Just note that you need to either create any special features yourself or find an implementation by someone else. It’s pretty bare bones by default
"we see occasional complaints about memory issues in opencode
if you have this can you press ctrl+p and then "Write heap snapshot"
Upload here: https://romulus.warg-snake.ts.net/upload
Original post:https://x.com/i/status/2035333823173447885
What does well: helps context switching by using one window to control many repos with many worktrees each.
What can do better? It's putting AI too much in control? What if I want to edit a function myself in the workspace I'm working on? or select a snippet and refer that in the promp? without that I feel it's missing a non-negotiable feature.
From architecture to system programming smoothly. We need to nail that.
It has beautified markdown output, much more subagents, and access to free models. Unlike claude and codex. Best is opencode with GitHub opus 4.6, but the fun only lasts for a day, then you're out of tokens for a month.
I guess golang is better since we need goroutines that will basically wait for i/o and api calls.
It’s actually “dumber” than any of your suggestions - they just let the agent explore to build up context on its own. “ls” and “grep” are among the most used discovery tools. This works extraordinarily well and is pretty much the standard nowadays because it lets the agent be pretty smart about what context it pulls in.
Since the homelab doesn't really have access to any risky data, I just gave OpenCode full Docker access and connect to it through Tailscale on my iPhone https://github.com/pprotas/homelab