undefined | Better HN

0 pointsryanrasti3mo ago0 comments

This is a really good question because it hits on the fundamental issue: LLMs are useful because they can't be statically modeled.

The answer is to constrain effects, not intent. You can define capabilities where agent behavior is constrained within reasonable limits (e.g., can't post private email to #general on Slack without consent).

The next layer is UX/feedback: can compile additional policy based as user requests it (e.g., only this specific sender's emails can be sent to #general)

0 comments

botusaurus3mo ago

but how do you check that an email is being sent to #general, agents are very creative at escaping/encoding, they could even paraphrase the email in words

decades ago securesm OSes tracked the provenience of every byte (clean/dirty), to detect leaks, but it's hard if you want your agent to be useful

ryanrastiOP3mo ago

> decades ago securesm OSes tracked the provenience of every byte (clean/dirty), to detect leaks, but it's hard if you want your agent to be useful

Yeah, you're hitting on the core tradeoff between correctness and usefulness.

The key differences here: 1. We're not tracking at byte-level but at the tool-call/capability level (e.g., read emails) and enforcing at egress (e.g., send emails) 2. Agent can slowly learn approved patterns from user behavior/common exceptions to strict policy. You can be strict at the start and give more autonomy for known-safe flows over time.

botusaurus3mo ago

what about the interaction between these 2 flows:

- summarize email to text file

- send report to email

the issue is tracking that the first step didnt contaminate the second step, i dont see how you can solve this in a non-probabilistic works 99% of the time way

1 more reply

gostsamo3mo ago

you can restrict the email send tool to have to/cc/bcc emails hardcoded in a list and an agent independent channel should be the one to add items to it. basically the same for other tools. You cannot rewire the llm, but you can enumerate and restrict the boundaries it works through.

exfiltrating info through get requests won't be 100% stopped, but will be hampered.

botusaurus3mo ago

parent was talking about a different problem. to use your framing, how you ensure that in the email sent to the proper to/cc/bcc as you said there is no confidential information from another email that shouldnt be sent/forwarded to these to/cc/bcc

1 more reply

ATechGuy3mo ago

TBH, this looks like an LLM-assisted response.

zmmmmm3mo ago

and then the next:

> you're hitting on the core tradeoff between correctness and usefulness

The question is, is it a completely unsupervised bot or is a human in the loop. I kind of hope a human is not in the loop with it being such a caricature of LLM writing.

j / k navigate · click thread line to collapse

0 comments

botusaurus3mo ago

but how do you check that an email is being sent to #general, agents are very creative at escaping/encoding, they could even paraphrase the email in words

decades ago securesm OSes tracked the provenience of every byte (clean/dirty), to detect leaks, but it's hard if you want your agent to be useful

ryanrastiOP3mo ago

> decades ago securesm OSes tracked the provenience of every byte (clean/dirty), to detect leaks, but it's hard if you want your agent to be useful

Yeah, you're hitting on the core tradeoff between correctness and usefulness.

botusaurus3mo ago

what about the interaction between these 2 flows:

- summarize email to text file

- send report to email

the issue is tracking that the first step didnt contaminate the second step, i dont see how you can solve this in a non-probabilistic works 99% of the time way

1 more reply

gostsamo3mo ago

exfiltrating info through get requests won't be 100% stopped, but will be hampered.

botusaurus3mo ago

1 more reply

ATechGuy3mo ago

TBH, this looks like an LLM-assisted response.

zmmmmm3mo ago

and then the next:

> you're hitting on the core tradeoff between correctness and usefulness

The question is, is it a completely unsupervised bot or is a human in the loop. I kind of hope a human is not in the loop with it being such a caricature of LLM writing.

j / k navigate · click thread line to collapse