It's not a one off issue - it has happened to me a few times. It has once even force pushed to github, which doesn't allow branch protection for private personal projects. Here's an example.
1) claude will stash (despite clear instructions never to do so).
2) claude will use sed to bulk replace (despite clear instructions never to do so). sed replacements make a mess and replaces far too many files.
3) claude restores the stash. Finds a lot of conflicts. Nothing runs.
4) claude decides it can't fix the problem and does a reset hard.
I have this right at the top of my CLAUDE.md and it makes things better, but unlike codex, claude doesn't follow it to the letter. However, it has become a lot better now.
NEVER USE sed TO BULK REPLACE.
*NEVER USE FORCE PUSH OR DESTRUCTIVE GIT OPERATIONS*: `git push --force`, `git push --force-with-lease`, `git reset --hard`, `git clean -fd`, or any other destructive git operations are ABSOLUTELY FORBIDDEN. Use `git revert` to undo changes instead.
When will you all learn that merely "telling" an LLM not to do something won't deterministically prevent it from doing that thing? If you truly want it to never use those commands, you better be prepared to sandbox it to the point where it is completely unable to do the things you're trying to stop.
Even worse, explicitly telling it not to do something makes it more likely to do it. It's not intelligent. It's a probability machine write large. If you say "don't git push --force", that command is now part of the context window dramatically raising the probability of it being "thought" about, and likely to appear in the output.
Like you say, the only way to stop it from doing something is to make it impossible for it to do so. Shove it in a container. Build LLM safe wrappers around the tools you want it to be able to run so that when it runs e.g. `git`, it can only do operations you've already decided are fine.
Even even worse, angry all-caps shouting will make it more stupid, because it pushes you into a significantly stupider vector subspace full of angry all-caps shouting. The only thing that can possibly save you then is if you land in the even tinier Film Crit Hulk sub-subspace.
I touch on this a bit in the piece I wrote for normies, it helped a lot of people I know understand the tech a bit better.
Is this true for anything beyond the simplest LLM architectures? It seems like as soon as you introduce something like CoT this is no longer the case, at least in terms of mechanism, if not outcome.
This is true for prohibitions but claude.md works really well as positive documentation. I run custom mcp servers and documenting what each tool does and when to use it made claude pick the right ones way more reliably. Totally different outcome than a list of NEVER DO THIS rules though, for that you definitely need hooks or sandboxing.
Yes but this is probabilistic. Skill, documentation etc help by giving it the information it needs. You are then in the more correct probability distribution. Fine for docs, tips etc, but not good enough for mandatory things.
Agree completely. The middle ground between "please don't" and full sandboxing: run a validation script between agent steps. The agent writes code, a regex check catches banned patterns, the agent has to fix them before it can proceed. Sandboxing controls what the agent can do. Output validation controls what it gets to keep. Both are more reliable than prompt instructions.
That’s right, because we’re not developers anymore— we orchestrate writhing piles of insane noobs that generally know how to code, but have absolutely no instinct or common sense. This is because it’s cheaper per pile of excreted code while this is all being heavily subsidized. This is the future and anyone not enthusiastically onboard is utterly foolish.
My point is exactly that you need safeguards. (I have VMs per project, reduced command availability etc). But those details are orthogonal to this discussion.
However "Telling" has made it better, and generally the model itself has become better. Also, I've never faced a similar issue in Codex.
What do you mean? By default, Claude asks for permission for every file read, every edit, every command. It gets exhausting, so many people run it with `--dangerously-skip-permissions`.
It does not ask for permission for every file read, only those outside the project and not explicitly allowed. You can bypass project edit permission requests with “allow edits”, no need for “dangerously skip permissions”. Bash commands are harder, but you can allow-list them up to a point.
> so many people run it with `--dangerously-skip-permissions`
It's on the people then, not the "agent". But why doesn't Claude come with a decent allow list, or at least remember what the user allows, so the spam is reduced?
You have the option to "always allow command `x.*`", but even then. The more control you hand over to these things, the more powerful and useful (and dangerous) they become. It's a real dilemma and yet to be solved.
I use a script wrapper of git un muy path for claude, but as you correctly said, I'm not sure claude Will not ever use a new zsh with a differentPATH....
Why do you expect that a weighted random text generator will ever behave in predictable way?
How can people be so naive as to run something like Claude anywhere other than in a strictly locked down sandbox that has no access to anything but the single git repo they are working on (and certainly no creds to push code)?
This is absolutely insane behavior that you would give Claude access to your GitHub creds. What happens when it sees a prompt injection attack somewhere and exfiltrates all of your creds or wipes out all of your repos?
I can't believe how far people have fallen for this "AI" mania. You are giving a stochastic model that is easily misdirected the keys to all of your productive work.
I can understand the appeal to a degree, that it can seem to do useful work sometimes.
But even so, you can't trust it with anything, not running it in a locked down container that has no access to anything but a Git repo which has all important history stored elsewhere seems crazy.
Shouting harder and harder at the statistical model might give you a higher probability of avoiding the bad behavior, but no guarantee; actually lock down your random text generator properly if you want to avoid it causing you problems.
And of course, given that you've seen how hard it is to get it follow these instructions properly, you are reviewing every line of output code thoroughly, right? Because you can't trust that either.
> How can people be so naive as to run something like Claude anywhere other than in a strictly locked down sandbox that has no access to anything but the single git repo they are working on (and certainly no creds to push code)?
> This is absolutely insane behavior that you would give Claude access to your GitHub creds. What happens when it sees a prompt injection attack somewhere and exfiltrates all of your creds or wipes out all of your repos?
I don’t understand why people are so chill about doing this. I have AI running on a dedicated machine which has absolutely no access to any of my own accounts/data. I want that stuff hardware isolated. The AI pushes up work to a self-hosted Gitea instance using a low-permission account. This setup is also nice because I can determine provenance of changes easily.
> How can people be so naive as to run something like Claude anywhere other than in a strictly locked down sandbox that has no access to anything but the single git repo they are working on (and certainly no creds to push code)?
Because it’s insanely useful when you give it access, that’s why. They can do way more tasks than just write code. They can make changes to the system, setup and configure routers and network gear, probe all the iot devices in the network, set up dns, you name it—anything that is text or has a cli is fair game.
The models absolutely make catastrophic fuckups though and that is why we’ll have to both better train the models and put non-annoying safeguards in front of them.
Running them in isolated computers that are fully air gapped, require approval for all reads and writes, and can only operate inside directories named after colors of the rainbow is not a useful suggestion. I want my cake and I want to eat it too. It’s far to useful to give these tools some real access.
It doesn’t make me naive or stupid to hand the keys over to the robot. I know full well what I’m getting myself into and the possible consequences of my actions. And I have been burned but I keep coming back because these tools keep getting better and they keep doing more and more useful things for me. I’m an early adopter for sure…
Well, one of the other reasons I suggest running it in a strictly limited container is that you can then run it in yolo mode.
In fact, I use the pi agent, which doesn't have command sandboxing, it's always in yolo mode, I just run it in a container and then I get the benefit of not having to confirm every command, while strictly controlling what I share with it from the beginning of the session.
> How can people be so naive as to run something like Claude anywhere other than in a strictly locked down sandbox that has no access to anything but the single git repo they are working on (and certainly no creds to push code)?
Because it is much easier to do and failure rate is quite low.
It has once even force pushed to github, which doesn't allow branch protection for private personal projects.
This is only restricted for *fully free* accounts, but this feature only requires a minimum of a paid Pro account. That starts around $4 USD/month, which sounds worth it to prevent lost work from a runaway tool.
Reinforcing an avoidance tactic is nowhere near as effective as doing that PLUS enforcing a positive tactic. People with loads of 'DONT', 'STOP', etc. in their instructions have no clue what they're doing.
In your own example you have all this huge emphasis on the negatives, and then the positive is a tiny un-emphasized afterthought.
I think you're generally correct, but certainly not definitively, and I worry the advice and tone isn't helpful in this instance with an outcome of this magnitude.
(more loosely: I'm a big proponent of this too, but it's a helluva hot take, how one positively frames "don't blow away the effing repro" isn't intuitive at all)
The trick is to explain why something is important, not just to emphasize it. For instance:
"As an LLM, when Claude used 'sed', it can quickly and easily break files that are difficult for the user to fix. Claude must be aware that an LLM's actions seem effortless to it but to the user it represents hours of work getting things back in order."
Claude tends to disregard "NEVER do X" quite often, but funnily enough, if you tell it "Always ask me to confirm before going X", it never fails to ask you. And you can deny it every time
There are plenty of examples in the RL training showing it how and when to prompt the human for help or additional information. This is even a common tool in the "plan" mode of many harnesses.
Conversely, it's much harder to represent a lack of doing something
$ yoloai apply bugfix
Target: /home/ks/tmp/b64
Commits to apply (1):
9db260b33bcd Fix bit mask in base64 encoding
Apply to /home/ks/tmp/b64? [y/N] y
1 commit(s) applied to /home/ks/tmp/b64
Now the commit claude made inside the sandbox has been applied to my workdir:
$ git log
commit 5b0fc3a237efe8bbc9a9e1a05f9ce45d37d38bfa (HEAD -> main)
Author: Karl Stenerud <kstenerud@gmail.com>
Date: Mon Mar 30 05:28:21 2026 +0000
Fix bit mask in base64 encoding
Corrected the bit mask for the first character extraction from 0x3E to 0x3F to properly extract all 6 bits.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
commit 31e12b62b0c3179f3399521d7c4326a8f6130721 (tag: init)
The important thing here is that Claude was not able to reach anything on the network except its own API, and nothing it did ever touched my work dir until I was happy with the changes and applied them.
It also doesn't get access to my credentials, so it couldn't push even if it did have network access.
I've recently implemented hooks that make it impossible for Claude to use tools that I don't want it to use. You could consider setting up a tool that errors if if they do an unsafe use of sed (or any use of sed if there are safer tools).
Even just last week I auto approved a plan and it even wrote the commit message for me (with @ClaudeCode signed off) which I am grateful my manager did not see.
Maybe stop using the CLAUDE.md to prevent it from running tools you don't want it to and just setup a hook for pretooluse that blocks any command you don't want.
Its trivial to setup and you could literally ask claude to do it for you and never have any of these issues ever again.
Any and all "I don't want it to ever run this command" issues are just skill issues.
1) claude will stash (despite clear instructions never to do so).
2) claude will use sed to bulk replace (despite clear instructions never to do so). sed replacements make a mess and replaces far too many files.
3) claude restores the stash. Finds a lot of conflicts. Nothing runs.
4) claude decides it can't fix the problem and does a reset hard.
I have this right at the top of my CLAUDE.md and it makes things better, but unlike codex, claude doesn't follow it to the letter. However, it has become a lot better now.
NEVER USE sed TO BULK REPLACE.
*NEVER USE FORCE PUSH OR DESTRUCTIVE GIT OPERATIONS*: `git push --force`, `git push --force-with-lease`, `git reset --hard`, `git clean -fd`, or any other destructive git operations are ABSOLUTELY FORBIDDEN. Use `git revert` to undo changes instead.