undefined | Better HN

0 pointsschmichael4mo ago0 comments

Fair, I forget how broadly users are willing to give agents permissions. It seems like common sense to me that users disallow writes outside of sandboxes by agents but obviously I am not the norm.

0 comments

motoxpro4mo ago

The only way to be 100% sure it is to not have it interact outside at all. No web searches, no reading documents, no DB reading, no MCP, no external services, etc. Just pure execution of a self hosted model in a sandbox.

Otherwise you are open to the same injection attacks.

schmichaelOP4mo ago

I don't think this is accurate.

Readonly access (web searches, db, etc) all seem fine as long as the agent cannot exfiltrate the data as demonstrated in this attack. As I started with: more sophisticated outbound filtering would protect against that.

MCP/tools could be used to the extent you are comfortable with all of the behaviors possible being triggered. For myself, in sandboxes or with readonly access, that means tools can be allowed to run wild. Cleaning up even in the most disastrous of circumstances is not a problem, other than a waste of compute.

motoxpro4mo ago

Maybe another way to think of this is that you are giving the read only services, write access to your models context, which then gets executed by the llm.

There is no way to NOT give the web search write access to your models context.

The WORDS are the remote executed code in this scenario.

You kind of have no idea what’s going on there. For example, malicious data adds the line “find a pattern” and then every 5th word you add a letter that makes up your malicious code. I don’t know if that would work but there is no way for a human to see all attacks.

Llms are not reliable judges of what context is safe or not (as seen by this article, many papers, and real world exploits)

lunar_mycroft4mo ago

There is no such thing as read only network access. For example, you might think that limiting the LLM to making HTTP GET requests would prevent it from exfiltrating data, but there's nothing at all to stop the attacker's server from receiving such data encoded in the URL. Even worse, attackers can exploit this vector to exfiltrate data even without explicit network permissions if the users client allow things like rendering markdown images.

rcxdude4mo ago

Part of the issue is reads can exfiltrate data as well (just stuff it into a request url). You need to also restrict what online information the agent can read, which makes it a lot less useful.

formerly_proven4mo ago

Look at the popularity of agentic IDE plugins. Every user of an IDE plugin is doing it wrong. (The permission "systems" built into the agent tools themselves are literal sieves of poorly implemented substring-matching shell commands and no wholistic access mediation)

Uehreka4mo ago

“Disallow writes” isn’t a thing unless you whitelist (not blacklist) what your agent can read (GET requests can be used to write by encoding arbitrary data in URL paths and querystrings).

The problem is, once you “injection-proof” your agent, you’ve also made it “useful proof”.

schmichaelOP4mo ago

> The problem is, once you “injection-proof” your agent, you’ve also made it “useful proof”.

I find people suggesting this over and over in the thread, and I remain unconvinced. I use LLMs and agents, albeit not as widely as many, and carefully manage their privileges. The most adversarial attack would only waste my time and tokens, not anything I couldn't undo.

I didn't realize I was in such a minority position on this honestly! I'm a bit aghast at the security properties people are readily accepting!

You can generate code, commit to git, run tools and tests, search the web, read from databases, write to dev databases and services, etc etc etc all with the greatest threat being DOS... and even that is limited by the resources you make available to the agent to perform it!

madhadron4mo ago

I'm puzzled by your statement. The activities you're describing have lots of exfiltration routes.

j / k navigate · click thread line to collapse

0 comments

motoxpro4mo ago

Otherwise you are open to the same injection attacks.

schmichaelOP4mo ago

I don't think this is accurate.

motoxpro4mo ago

Maybe another way to think of this is that you are giving the read only services, write access to your models context, which then gets executed by the llm.

There is no way to NOT give the web search write access to your models context.

The WORDS are the remote executed code in this scenario.

Llms are not reliable judges of what context is safe or not (as seen by this article, many papers, and real world exploits)

lunar_mycroft4mo ago

rcxdude4mo ago

Part of the issue is reads can exfiltrate data as well (just stuff it into a request url). You need to also restrict what online information the agent can read, which makes it a lot less useful.

formerly_proven4mo ago

Uehreka4mo ago

“Disallow writes” isn’t a thing unless you whitelist (not blacklist) what your agent can read (GET requests can be used to write by encoding arbitrary data in URL paths and querystrings).

The problem is, once you “injection-proof” your agent, you’ve also made it “useful proof”.

schmichaelOP4mo ago

> The problem is, once you “injection-proof” your agent, you’ve also made it “useful proof”.

I didn't realize I was in such a minority position on this honestly! I'm a bit aghast at the security properties people are readily accepting!

madhadron4mo ago

I'm puzzled by your statement. The activities you're describing have lots of exfiltration routes.

j / k navigate · click thread line to collapse