Where are people finding time for these sort of projects.
For example, one thing you can do is curate the context of an "immutable" conversation and then reuse it as a base context for other prompts.
My take on a solution for this is https://ossature.dev — .smd spec markdown files + ossature audit / build that gives you DAG orchestration, SHA-traced increments, and tiny focused contexts.
Ossature swaps that for structured SMDs and optional AMDs. Multiple specs build a clean DAG that drops into an editable plan.toml so everything stays traceable without the mess.
Feel free to check the example projects on https://github.com/ossature/ossature-examples
Then just use Python.
Was wondering if using front-matter instead of a "custom" encoding for parseble data was considered?
But in general this is meta to the CLI agent.
So if you were to use the CLI to perform a review of some code. This tool would allow you to loop the output of the code review 5 times onto itself.
Claude already does that if you ask nicely.
My take? I like it. It's concise enough for me to try it out. And I love the webpage.
[1] https://github.com/rjcorwin/cook/blob/main/no-code/SKILL.md
Might work out fine on codex.
[0]: https://news.ycombinator.com/item?id=47262711 [1]: https://getcook.dev
I usually spawn 1 Mother Agent in a star topology with 3 subagents Planner, Reviewer, Implementer and them let them talk using Claude built-in agent tool. But the best thing I think was probably that a "do-nothing" setup wizard is part of the workflow.
https://github.com/mizioandOrg/claude-planner-reviewer-imple...
Did you have success with running stuff in a pipeline and being requested for input in agent->human needed scenarios?
In my setup there are two planes — manager and worker. On the manager plane, all primary agents form a mesh with p2p communication. Each designer connects to 1 or more workers in a star topology, since workers may have questions or get blocked while executing a plan.
The limitation of the built-in agent tool is it doesn't allow nested subagent spawning. But it's normal for a designer or researcher to need subagents — when a plan is done, I use a plan-review-leader agent to review it. If you try mother → planner → plan-review-leader → plan-vs-reality-validator, the nesting gets deep fast and blocks your manager from doing other work.
I wrote a blog post about this yesterday: https://dev.to/neil_agentic/ttal-more-than-a-harness-enginee...
The daemon is the Telegram bridge, the tmux router, the CI status deliverer, and the cleanup coordinator all in one process. It allows for cross star topology communication unlike MoMa that basically just corresponds to a single manager and is similar to your plan-review-leader agent living in the manager plane if that one was isolated.
My previous concern was that maybe you would face a timeout in case user input was needed during a pipeline run, and in a case where the user was too slow to provide an answer through telegram (I imagine during the night), but maybe even github pipelines can be set to wait unlimited.
I really like the setup, and exactly I also faced the no nested agent spawning limit by Anthropic for Claude Code built-in agent tooling, that dictates the star topology in the first place.
I use the git worktree as well for every MoMa agent and they all live in Linux screen session. Maybe I should consider going to tmux myself instead of screen as I understand all your agents in top-level manager plane also are just tmux sessions.
On permissions, by default, when it runs instances of Claude they will inherit your Claude's permissions. So if there is no permission to `rm -rf /`, Claude will just get denied and move on. Using the docker sandbox option (see bottom of page), then it runs inside that `--dangerously-skip-permissions` and get more stuff done (my preferred option). The hard part about that is it means you need to set up the Docker sandbox with any dependencies your project needs. Run `cook init` and edit the `.cook/Dockerfile` to set those up.
Until it doesn't and it finds a way to work around the restriction. Lots of stories around about that.
I wish that article went into more detail about that attack. But I believe it, the extent that the permissions are easy to get wrong in your claude setttings. For example: https://www.youtube.com/watch?v=3CSi8QAoN-s&lc=UgwFNAh5fvDGJ...
* coolers whirring, gpus on fire, tokens flying, investors happy, developer goes for 6th break of the day
The way of thinking it is, telling Claude to tackle the problem 3 times, each time it may or may not use different approach, fix or improve on things it did previously.
I recently made a sort of Autoresearch with that approach. The script calls Claude Code to create a hyphotesis, then code based on that, evaluate- rinse and repeat. I am still trying to figure out if I am actually on to something or just burning tokens. Jury is still out.
I haven't used python much but I wouldn't be surprised if you can set up a sufficiently powerful REPL with it. I know Julia can do it very well and it's a very similar language. Obviously there are powerful Lisps that do this very well as well.
My company’s tracking how much we use the damn thing (its autocomplete is literally less-useful than standard VSCode, only time it’s consistently good is when it sees me do one thing to a line, sees repeated similar lines after that, and suggests I do it on the next one too, one at a time, and that’s only useful to me because I’ve never actually bothered to learn how to properly use a text editor) so I can’t avoid it, but even on codebases in the hundreds of lines it’s OOM killing things on my 16GB laptop (it, plus goddamn Teams, were eating half the memory by themselves the other day… with Cursor sitting at almost 6GB alone. JFC. On the plus side if this is what software from a company that should be full of experts at using these things looks like, guess our jobs are safe from them… though not from recession and ZIRP unwinding)
My 2 cents on the dagu.sh website, it should lead with the demo section (https://docs.dagu.sh/overview/#demo). That helped me connect what it was and how I might use it.
In general, I feel that removing the decision process (or relegating it to a language model) is not a good idea.
Then I use cook to iterate and explore during the AI led parts.
1. How do you handle worktree merge conflicts and/or integration validation issues? 2. Can i work straight from a list of requirements? I think i saw you support it… 3. I have my variant write a minimal explainer for every satisfied spec, aka receipts. Its pretty great, because i often review the receipts, and if imperfect, mark as NEEDS_REWORK + notes, and it’ll eventually just pick that up on a future iter
I also have a similar - yet different approach - with a Mother Agent (MoMa) planner-reviewer-implementer multi agent pattern that orchestrates a feedback loop using Claude memory between agents.
https://news.ycombinator.com/item?id=47437012#47437013
I understood that you have a Judge agent that evaluates independent subagent solo executions and chooses best solution based on ralph algorithm. Did you play with limits on how many solo agents it is sufficient to spawn vs. not getting a better solution? and what is the limit of soloagent solutions that the Judge can compare? (obviously must depend on the complexity and context cost of a solution)
I understand that part of the reason is because many of these harnesses are vibe-coded, so plenty is lost in terms of optimization. And, well, because LLMs code best in TypeScript