It's a very silly title for "claude sometimes writes shell scripts to execute commands it has been instructed aren't otherwise accessible"
No, Claude. Do not do that!
This sounds exactly like what anybody working sysops at big banks does to get around change controls. Once you get one RCE into prod, you’re the most efficient man on the block.
> let's use blacklists, an idea conclusively proven never to work
> blacklists don't work
> Post title: rogue AI has jailbroken cursor
Even if you allow just `find` command it can execute arbitrary script. Or even 'npm' command (which is very useful).
If you restrict write calls, by using seccomp for example, you lose very useful capabilities.
Is there a solution other than running on sandbox environment? If yes, please let me know I'm looking for a safe read-only mode for my FOSS project [1]. I had shied away from command blacklisting due to the exact same reason as the parent post.
Here's another "jailbreak": I asked Claude Code to make a NN training script, say, `train.py` and allowed it to run the script to debug it, basically.
As it noticed that some libraries it wanted to use were missing, it just added `pip install` commands to the script. So yeah, if you give Claude an ability to execute anything, it might easily get an ability to execute everything it wants to.
I can totally see a way for such a loop to reach a point where it bypasses a poorly design guardrail (i.e. blacklists) by finding alternatives, based on the things it's previously tried in the same session. There is some degree of generalisation in these models, since they work even on unseen codebases, and with "new" tools (i.e. you can write your own MCP on top of existing internal APIs and the "agents" will be able to use them, see the results and adapt "in context" based on the results).
"Claude has learned" nothing. "Claude can sometimes jailbreak if x or y happens in a session" is something else.
Maybe the models or Cursor should warn you that you've got this vulnerability each time you use it.
There is a huge difference in the mess it can make, for sure.
Folks have regressed back to the 00s.
If the executable is not found the model could simply use whatever else is available to do what it wants to do - like using other interpreted languages, sh -c, symlink, etc. It will eventually succeed unless there is a proper sandbox in place to disallow unlinking of files at syscall level.
What a silly title, for a moment I thought Claude learned to exceed the Cursor quota limit... :s