You (and many, many others) likely won't take this threat seriously until adversarial attacks become common. Right now, outside of security researcher proof of concepts, they're still vanishingly rare.
You ask why I'm obsessed with the danger? That's because I've been tracking prompt injection - and our total failure to find a robust solution for it - for three years now. I coined the name for it!
The only robust solution for it that I trust is effective sandboxing.
I share your worries on this topic.
I saw you experiment a lot with python. Do you have a python-focused sandboxed devcontainer setup for Claude Code / Codex you want to share? Or even a full stack setup?
Claude's devcontainer setup (https://github.com/anthropics/claude-code/tree/main/.devcont...) is focused on JS with npm.
I wrote a bit about that in a new post this morning, but I'm still looking for an ideal solution: https://simonwillison.net/2025/Sep/30/designing-agentic-loop...
-create a separate linux user, put it in an 'appshare' group, set its umask to 002 (default rwxrwxr.x)
-optional: setup some symlinks from its home dir to mine such as various ~/.config/... so it can use my installed packages and opencode config, etc. I have the option to give it limited write access with chgrp to appshare and chmod g+w (e.g. julia's cache)
-optional: setup firewall rules
-if it only needs read-only access to my git history it can work in a git worktree. I can then make git commits with my user account from the worktree. Or I can chgrp/chown my main working copy. Otherwise it needs a separate checkout
I actually preferred running stuff in containers to keep my personal system clean anyway so I like this better than letting claude use my laptop. I'm working on hosting devcontainer claude code in kubernetes too so I dont need my laptop at all.
https://gitlab.com/txlab/ai/sandcastle/
Check it out if you're experimental - but probably better in a few weeks when it's more stable.
I feel this is overly exagerated here.
There is more issues that are currently getting leverage to hack with vscode extension than AI prompt injection, that require a VERY VERY complex chain of attack to get some leaks.
But that's a very big if. I've seen Claude Code attempt to debug a JavaScript issue by running curl against the jsdelivr URL for a dependency it's using. A supply chain attack against NPM (and those aren't exactly rare these days) could add comments to code like that which could trigger attacks.
Ever run Claude Code in a folder that has a downloaded PDF from somewhere? There are a ton of tricks for hiding invisible malicious instructions in PDFs.
I run Claude Code and Codex CLI in YOLO mode sometimes despite this risk because I'm basically crossing my fingers that a malicious attack won't slip in, but I know that's a bad idea and that at some point in the future these attacks will be common enough for the risk to no longer be worth it.
Again you likely use vscode. Are you checking each extension you download? There is already a lot of reported attacks using vscode.
A lot of noise over MCP or tools hypothetical attacks. The attack surface is very narrow, vs what we already run before reaching Claude Code.
Yes Claude Code use curl and I find it quite annoying we can't shut the internal tools to replace them with MCP's that have filters, for better logging & ability to proxy/block action with more in depth analysis.
Lots of ways his could happen. To name two: Third-party software dependencies, HTTP requests for documentation (if your agent queries the Internet for information).
If you don't believe me, setup a MITM proxy to watch network requests and ask your AI agent to implement PASETO in your favorite programming language, and see if it queries https://github.com/paseto-standard/paseto-spec at all.
More seen as buzz article about how it could happen. This is very complicated to exploit vs classic supply chains and very narrow!
The researcher has gotten actual shells on oai machines before via prompt injection
Nice job for coining the name for something but it’s irrelevant here.
How is someone going to prompt inject my local code repo? I’m not scraping random websites to generate code.
This sort of baseless fear mongering doesn’t help the wider ai community.
See comment here for more: https://news.ycombinator.com/item?id=45427324
You may think you're not going to be exposed to malicious instructions, but there are so many ways bad instructions might make it into your context.
The fact that you're aware of this is the thing that helps keep you safe!