undefined | Better HN

0 pointssimonw5mo ago0 comments

You've been safe since launch because you haven't faced an adversarial prompt injection attack yet.

You (and many, many others) likely won't take this threat seriously until adversarial attacks become common. Right now, outside of security researcher proof of concepts, they're still vanishingly rare.

You ask why I'm obsessed with the danger? That's because I've been tracking prompt injection - and our total failure to find a robust solution for it - for three years now. I coined the name for it!

The only robust solution for it that I trust is effective sandboxing.

0 comments

jackstraw425mo ago

This right here, it's fine until it's not. And the best-designed threats make sure you don't become aware of them.

wiesbadener5mo ago

Hi Simon,

I share your worries on this topic.

I saw you experiment a lot with python. Do you have a python-focused sandboxed devcontainer setup for Claude Code / Codex you want to share? Or even a full stack setup?

Claude's devcontainer setup (https://github.com/anthropics/claude-code/tree/main/.devcont...) is focused on JS with npm.

simonwOP5mo ago

I've been trying out GitHub Codespaces as a sandbox, which works pretty well.

I wrote a bit about that in a new post this morning, but I'm still looking for an ideal solution: https://simonwillison.net/2025/Sep/30/designing-agentic-loop...

versteegen5mo ago

Using a container or a VM is still friction compared to just working on your files directly using a separate user account to prevent unsophisticated bad behaviour. I:

-create a separate linux user, put it in an 'appshare' group, set its umask to 002 (default rwxrwxr.x)

-optional: setup some symlinks from its home dir to mine such as various ~/.config/... so it can use my installed packages and opencode config, etc. I have the option to give it limited write access with chgrp to appshare and chmod g+w (e.g. julia's cache)

-optional: setup firewall rules

-if it only needs read-only access to my git history it can work in a git worktree. I can then make git commits with my user account from the worktree. Or I can chgrp/chown my main working copy. Otherwise it needs a separate checkout

bakies5mo ago

you can do anything in that devcontainer, i have a dockerfile that adds golang tools and claude code just runs whatever install it needs anyway :)

I actually preferred running stuff in containers to keep my personal system clean anyway so I like this better than letting claude use my laptop. I'm working on hosting devcontainer claude code in kubernetes too so I dont need my laptop at all.

goodrubyist5mo ago

But I like the general fallacy behind this that people fall for all the time: taking the past value of a variable as a complete predictor of its future value (applies to other stuff like investment returns e.g.)

tennox5mo ago

I'm working on a sandboxing project btw ;)

https://gitlab.com/txlab/ai/sandcastle/

Check it out if you're experimental - but probably better in a few weeks when it's more stable.

mehdibl5mo ago

how are you going to get "adversarial attacks" with prompt injection. If you don't fetch data from external sources. Web scraping ( you can channel that thru Perplexity by the to sanitize it). PR reviews, would be fine if repo is private.

I feel this is overly exagerated here.

There is more issues that are currently getting leverage to hack with vscode extension than AI prompt injection, that require a VERY VERY complex chain of attack to get some leaks.

simonwOP5mo ago

If you don't fetch data from external sources then you're safe from prompt injection.

But that's a very big if. I've seen Claude Code attempt to debug a JavaScript issue by running curl against the jsdelivr URL for a dependency it's using. A supply chain attack against NPM (and those aren't exactly rare these days) could add comments to code like that which could trigger attacks.

Ever run Claude Code in a folder that has a downloaded PDF from somewhere? There are a ton of tricks for hiding invisible malicious instructions in PDFs.

I run Claude Code and Codex CLI in YOLO mode sometimes despite this risk because I'm basically crossing my fingers that a malicious attack won't slip in, but I know that's a bad idea and that at some point in the future these attacks will be common enough for the risk to no longer be worth it.

mehdibl5mo ago

This is quite convoluted. Not seen in the wild and comments don't trigger prompt injection that easily.

Again you likely use vscode. Are you checking each extension you download? There is already a lot of reported attacks using vscode.

A lot of noise over MCP or tools hypothetical attacks. The attack surface is very narrow, vs what we already run before reaching Claude Code.

Yes Claude Code use curl and I find it quite annoying we can't shut the internal tools to replace them with MCP's that have filters, for better logging & ability to proxy/block action with more in depth analysis.

1 more reply

some_furry5mo ago

> how are you going to get "adversarial attacks" with prompt injection

Lots of ways his could happen. To name two: Third-party software dependencies, HTTP requests for documentation (if your agent queries the Internet for information).

If you don't believe me, setup a MITM proxy to watch network requests and ask your AI agent to implement PASETO in your favorite programming language, and see if it queries https://github.com/paseto-standard/paseto-spec at all.

mehdibl5mo ago

This is a vendor selling a solution for "hypothecal" risk not seen in the WILD!

More seen as buzz article about how it could happen. This is very complicated to exploit vs classic supply chains and very narrow!

1 more reply

alexchantavy5mo ago

Excellent concrete examples with video demos here: https://embracethered.com/blog/

The researcher has gotten actual shells on oai machines before via prompt injection

saberience5mo ago

How would I get prompt injected by running claude code on my own system? It reads code which is local, it writes code which is local.

Nice job for coining the name for something but it’s irrelevant here.

How is someone going to prompt inject my local code repo? I’m not scraping random websites to generate code.

This sort of baseless fear mongering doesn’t help the wider ai community.

simonwOP5mo ago

Claude Code can run commands like curl. Curl can be used to fetch data from the web.

See comment here for more: https://news.ycombinator.com/item?id=45427324

You may think you're not going to be exposed to malicious instructions, but there are so many ways bad instructions might make it into your context.

The fact that you're aware of this is the thing that helps keep you safe!

j / k navigate · click thread line to collapse

0 comments

jackstraw425mo ago

This right here, it's fine until it's not. And the best-designed threats make sure you don't become aware of them.

wiesbadener5mo ago

Hi Simon,

I share your worries on this topic.

I saw you experiment a lot with python. Do you have a python-focused sandboxed devcontainer setup for Claude Code / Codex you want to share? Or even a full stack setup?

Claude's devcontainer setup (https://github.com/anthropics/claude-code/tree/main/.devcont...) is focused on JS with npm.

simonwOP5mo ago

I've been trying out GitHub Codespaces as a sandbox, which works pretty well.

I wrote a bit about that in a new post this morning, but I'm still looking for an ideal solution: https://simonwillison.net/2025/Sep/30/designing-agentic-loop...

versteegen5mo ago

Using a container or a VM is still friction compared to just working on your files directly using a separate user account to prevent unsophisticated bad behaviour. I:

-create a separate linux user, put it in an 'appshare' group, set its umask to 002 (default rwxrwxr.x)

-optional: setup firewall rules

bakies5mo ago

you can do anything in that devcontainer, i have a dockerfile that adds golang tools and claude code just runs whatever install it needs anyway :)

goodrubyist5mo ago

tennox5mo ago

I'm working on a sandboxing project btw ;)

https://gitlab.com/txlab/ai/sandcastle/

Check it out if you're experimental - but probably better in a few weeks when it's more stable.

mehdibl5mo ago

I feel this is overly exagerated here.

There is more issues that are currently getting leverage to hack with vscode extension than AI prompt injection, that require a VERY VERY complex chain of attack to get some leaks.

simonwOP5mo ago

If you don't fetch data from external sources then you're safe from prompt injection.

Ever run Claude Code in a folder that has a downloaded PDF from somewhere? There are a ton of tricks for hiding invisible malicious instructions in PDFs.

mehdibl5mo ago

This is quite convoluted. Not seen in the wild and comments don't trigger prompt injection that easily.

Again you likely use vscode. Are you checking each extension you download? There is already a lot of reported attacks using vscode.

A lot of noise over MCP or tools hypothetical attacks. The attack surface is very narrow, vs what we already run before reaching Claude Code.

1 more reply

some_furry5mo ago

> how are you going to get "adversarial attacks" with prompt injection

Lots of ways his could happen. To name two: Third-party software dependencies, HTTP requests for documentation (if your agent queries the Internet for information).

mehdibl5mo ago

This is a vendor selling a solution for "hypothecal" risk not seen in the WILD!

More seen as buzz article about how it could happen. This is very complicated to exploit vs classic supply chains and very narrow!

1 more reply

alexchantavy5mo ago

Excellent concrete examples with video demos here: https://embracethered.com/blog/

The researcher has gotten actual shells on oai machines before via prompt injection

saberience5mo ago

How would I get prompt injected by running claude code on my own system? It reads code which is local, it writes code which is local.

Nice job for coining the name for something but it’s irrelevant here.

How is someone going to prompt inject my local code repo? I’m not scraping random websites to generate code.

This sort of baseless fear mongering doesn’t help the wider ai community.

simonwOP5mo ago

Claude Code can run commands like curl. Curl can be used to fetch data from the web.

See comment here for more: https://news.ycombinator.com/item?id=45427324

You may think you're not going to be exposed to malicious instructions, but there are so many ways bad instructions might make it into your context.

The fact that you're aware of this is the thing that helps keep you safe!

j / k navigate · click thread line to collapse