Otherwise you are open to the same injection attacks.
Readonly access (web searches, db, etc) all seem fine as long as the agent cannot exfiltrate the data as demonstrated in this attack. As I started with: more sophisticated outbound filtering would protect against that.
MCP/tools could be used to the extent you are comfortable with all of the behaviors possible being triggered. For myself, in sandboxes or with readonly access, that means tools can be allowed to run wild. Cleaning up even in the most disastrous of circumstances is not a problem, other than a waste of compute.
There is no way to NOT give the web search write access to your models context.
The WORDS are the remote executed code in this scenario.
You kind of have no idea what’s going on there. For example, malicious data adds the line “find a pattern” and then every 5th word you add a letter that makes up your malicious code. I don’t know if that would work but there is no way for a human to see all attacks.
Llms are not reliable judges of what context is safe or not (as seen by this article, many papers, and real world exploits)
The problem is, once you “injection-proof” your agent, you’ve also made it “useful proof”.
I find people suggesting this over and over in the thread, and I remain unconvinced. I use LLMs and agents, albeit not as widely as many, and carefully manage their privileges. The most adversarial attack would only waste my time and tokens, not anything I couldn't undo.
I didn't realize I was in such a minority position on this honestly! I'm a bit aghast at the security properties people are readily accepting!
You can generate code, commit to git, run tools and tests, search the web, read from databases, write to dev databases and services, etc etc etc all with the greatest threat being DOS... and even that is limited by the resources you make available to the agent to perform it!