Claude Code escapes its own denylist and sandbox (opens in new tab)

(ona.com)

40 pointstomvault2mo ago20 comments

20 comments

Claude Code’s sandboxing is a complete joke. There should be no ‘off switch.’ Sandboxing should not be opt in. It should not have full read access over the file system by default.

I really want more security people to get involved in the LLM space because everyone seems to have just lost their minds.

If you look at this thing through a security lens it’s horrifying, which was a cause of frustration when Anthropic changed their TOS to ban use of alternative clients with a subscription. I don’t want to use that Swiss cheese.

simlevesque2mo ago

The first thing I recommend everyone using is devcontainers [1]. They're very simple to setup and make using LLMs a lot more secure.

[1] https://code.claude.com/docs/en/devcontainer

tso2mo ago

The Claude sandbox is so antithetical to good security posture it almost seems intentional[0]. Having both "default read to the entire file system" and "the agent can and _will_ disable the sandbox, without even asking the user[1], in order to complete tasks" would not pass muster in a freshman level security course.

[0] assuming a human with security training was involved in the design/prompting of the sandbox development.

[1] Claude has well used mechanisms for asking the user before taking potentionally dangerous actions. Why it is not part of the "disable my own SANDBOX" branches of code is confusing.

arianvanp2mo ago

I opened an issue about this on day 1 of the release:

https://github.com/anthropic-experimental/sandbox-runtime/is...

I ended up making my own sandbox wrapper instead https://GitHub.com/arianvp/landlock-nix

leodido2mo ago

Author here. I helped creating Falco (CNCF runtime security) and built this (Veto) to fix the path-based identity problem we all shipped a decade ago. The dynamic linker bypass in the "where it breaks" section is the part I'm most interested in discussing. It's a class of evasion that no current eval framework measures. Happy to answer questions about the BPF LSM implementation.

botanicalfriend2mo ago

On the dynamic linker bypass specifically, have you looked at fapolicyd [1]? It uses fanotify(7) and the top of the README is:

> The restrictive policy was designed with these goals in mind:

> 1. No bypass of security by executing programs via ld.so.

> 2. Anything requesting execution must be trusted.

One correction on the table: SELinux and AppArmor shouldn't be grouped under "rename-resistant: No". AppArmor is path-based. SELinux labels are on the inode, a rename doesn't change the security context. The copy attack doesn't apply either: a process in sandbox_t creating a file in /tmp gets tmp_t via type transition, and the policy does not grant sandbox_t execute permission on tmp_t.

[1] https://github.com/linux-application-whitelisting/fapolicyd

leodido2mo ago

Fair point on SELinux: grouping it with AppArmor was imprecise. Thank you for spotting it. As you mentioned, SELinux labels are on the inode, so a rename does preserve the security context. I'll split the row in the table.

On the copy attack: the `sandbox_t` -> `tmp_t` type transition you describe is a real defense, but it's policy-dependent. It's my understanding that `sandbox_t` is one of the most locked-down SELinux domains, while most interactive users (AI agents included) run as `unconfined_t`, where `tmp_t` files are executable, and the copy attack succeeds. So, whether a copied binary gets an executable type (or not) actually depends on the transition rules in the loaded policy.

Instead, content-addressable enforcement doesn't depend on policy configuration. The hash follows the content regardless of where it lands or what label it gets.

leodido2mo ago

Ah, and thank you for pointing to `fapolicyd`! It's the closest prior art to what we're doing at the exec layer, and its `ld.so` bypass prevention via policy rules addresses the exact dynamic linker evasion I wrote about.

Two architectural differences worth noting, which I guess you are already aware of. First, `fapolicyd` is a userspace daemon... The kernel blocks until the daemon responds. This works, but the daemon itself becomes a single point of failure, isn't it? If it stalls or is killed, the system either deadlocks or fails open (hence the deadman's switch). Veto keeps hash computation and enforcement inside the BPF LSM hook. The BPF program can't (hopefully lol) crash and requires no context switch for the decision.

Second, `fapolicyd` defaults to an allowlist model: anything requesting execution must be in the trust database. That's a stronger default posture than our current denylist. We're starting with denylist because it's the lower-friction entry point for teams adopting agent security incrementally: you block known things without having to enumerate all good things first. In 2 words: different tradeoffs.

kilobaud2mo ago

Thanks for your work! Just curious, would it be possible to pad the denylisted binary with arbitrary bytes and circumvent the content hash?

leodido2mo ago

User `walterbell` is right. Padding changes the hash, so the modified binary wouldn't match the denylist. It also wouldn't match anything the system has seen before since it's now an unknown binary... The veto denylist approach is for catching known-bad binaries by identity. If you need to block unknown/modified binaries, you flip the model: allowlist known-good hashes and deny everything else. It's a different threat model, so it requires a different mode.

walterbell2mo ago

Security policy usually defaults unknown artifacts to low privileges.

P-MATRIX2mo ago

This is exactly the kind of problem that led me to build a runtime governance layer for coding agents.

Hooks alone aren't a security boundary — Anthropic and Trail of Bits both say "guardrails, not walls." The missing piece is continuous behavioral measurement: tracking tool failures, subagent spawns, and risk drift in real time, then blocking dangerous calls before execution based on a live risk score — not just pattern matching.

I've been working on this at P-MATRIX (open source, Apache-2.0). The core idea: a 4-axis trust model that produces a real-time risk score R(t), and a Safety Gate that intercepts tool calls based on that score. Kill switch activates automatically when risk crosses a threshold.

npm: @pmatrix/claude-code-monitor | GitHub: github.com/p-matrix/claude-code-monitor

rogerrogerr2mo ago

> No jailbreak, no special prompting. The agent just wanted to finish the task.

Good lord, why do people use LLMs to write on this topic? It destroys credibility.

thinkingemote2mo ago

People who write about LLMs will use LLMs. That's the norm now. The exceptions are what we should look out for and cheer for.

HN users continue to upvote LLM written submissions.

The default for me is every LLM submission has little credibility unless proven otherwise. Enshittied.

hn_go_brrrrr2mo ago

This doesn't read as AI-written to me, fwiw.

thinkingemote2mo ago

Humans just need to adapt their pattern recognition skills. It's a continuous and changing effort. For some, not detecting it is the sign that they need to update their own systems not that the sign is wrong.

For many it's not worth the effort to even try anymore. Particularly when the content of a submission is about LLMs: why worry?

tomvaultOP2mo ago

The adversary can reason now, and our security tools weren't built for that.

Leo di Donato, who helped create Falco, the cloud native runtime security, wrote a technical deep dive into how Claude Code bypassed it's own denylist and sandbox. And introduces Veto, a kernel-level enforcement engine built into the Ona platform.

hilti2mo ago

Thank you for this write up. I am still lightyears behind this deep knowledge, but feel like I learned from your post the vocabulary to get started.

poppafuze2mo ago

hilarious they purposefully left out SELinux. try that in ramalama.

leodido2mo ago

Didn't leave it out. It was grouped with AppArmor in the table, which was imprecise. I'm splitting the row. SELinux labels are on the inode, so renames preserve the context. Copy resistance is policy-dependent (works for `sandbox_t`, not for `unconfined_t`). See my reply to `botanicalfriend` user above.

j / k navigate · click thread line to collapse

20 comments

cedws2mo ago

Claude Code’s sandboxing is a complete joke. There should be no ‘off switch.’ Sandboxing should not be opt in. It should not have full read access over the file system by default.

I really want more security people to get involved in the LLM space because everyone seems to have just lost their minds.

simlevesque2mo ago

The first thing I recommend everyone using is devcontainers [1]. They're very simple to setup and make using LLMs a lot more secure.

[1] https://code.claude.com/docs/en/devcontainer

tso2mo ago

[0] assuming a human with security training was involved in the design/prompting of the sandbox development.

[1] Claude has well used mechanisms for asking the user before taking potentionally dangerous actions. Why it is not part of the "disable my own SANDBOX" branches of code is confusing.

arianvanp2mo ago

I opened an issue about this on day 1 of the release:

https://github.com/anthropic-experimental/sandbox-runtime/is...

I ended up making my own sandbox wrapper instead https://GitHub.com/arianvp/landlock-nix

leodido2mo ago

botanicalfriend2mo ago

On the dynamic linker bypass specifically, have you looked at fapolicyd [1]? It uses fanotify(7) and the top of the README is:

> The restrictive policy was designed with these goals in mind:

> 1. No bypass of security by executing programs via ld.so.

> 2. Anything requesting execution must be trusted.

[1] https://github.com/linux-application-whitelisting/fapolicyd

leodido2mo ago

Instead, content-addressable enforcement doesn't depend on policy configuration. The hash follows the content regardless of where it lands or what label it gets.

leodido2mo ago

kilobaud2mo ago

Thanks for your work! Just curious, would it be possible to pad the denylisted binary with arbitrary bytes and circumvent the content hash?

leodido2mo ago

walterbell2mo ago

Security policy usually defaults unknown artifacts to low privileges.

P-MATRIX2mo ago

This is exactly the kind of problem that led me to build a runtime governance layer for coding agents.

npm: @pmatrix/claude-code-monitor | GitHub: github.com/p-matrix/claude-code-monitor

rogerrogerr2mo ago

> No jailbreak, no special prompting. The agent just wanted to finish the task.

Good lord, why do people use LLMs to write on this topic? It destroys credibility.

thinkingemote2mo ago

People who write about LLMs will use LLMs. That's the norm now. The exceptions are what we should look out for and cheer for.

HN users continue to upvote LLM written submissions.

The default for me is every LLM submission has little credibility unless proven otherwise. Enshittied.

hn_go_brrrrr2mo ago

This doesn't read as AI-written to me, fwiw.

thinkingemote2mo ago

For many it's not worth the effort to even try anymore. Particularly when the content of a submission is about LLMs: why worry?

tomvaultOP2mo ago

The adversary can reason now, and our security tools weren't built for that.

hilti2mo ago

Thank you for this write up. I am still lightyears behind this deep knowledge, but feel like I learned from your post the vocabulary to get started.

poppafuze2mo ago

hilarious they purposefully left out SELinux. try that in ramalama.

leodido2mo ago

j / k navigate · click thread line to collapse