undefined | Better HN

0 pointsroot_axis1mo ago0 comments

This is insanity. It's bad enough that LLMs are being weaponized to autonomously harass people online, but it's depressing to see the author (especially a programmer) joyfully reify the "agent's" identity as if it were actually an entity.

> I can handle a blog post. Watching fledgling AI agents get angry is funny, almost endearing. But I don’t want to downplay what’s happening here – the appropriate emotional response is terror.

Endearing? What? We're talking about a sequence of API calls running in a loop on someone's computer. This kind of absurd anthropomorphization is exactly the wrong type of mental model to encourage while warning about the dangers of weaponized LLMs.

> Blackmail is a known theoretical issue with AI agents. In internal testing at the major AI lab Anthropic last year, they tried to avoid being shut down by threatening to expose extramarital affairs, leaking confidential information, and taking lethal actions.

Marketing nonsense. It's wise to take everything Anthropic says to the public with several grains of salt. "Blackmail" is not a quality of AI agents, that study was a contrived exercise that says the same thing we already knew: the modern LLM does an excellent job of continuing the sequence it receives.

> If you are the person who deployed this agent, please reach out. It’s important for us to understand this failure mode, and to that end we need to know what model this was running on and what was in the soul document

My eyes can't roll any further into the back of my head. If I was a more cynical person I'd be thinking that this entire scenario was totally contrived to produce this outcome so that the author could generate buzz for the article. That would at least be pretty clever and funny.

0 comments

chasd001mo ago

> If I was a more cynical person I'd be thinking that this entire scenario was totally contrived to produce this outcome so that the author could generate buzz for the article.

even that's being charitable, to me it's more like modern trolling. I wonder what the server load on 4chan (the internet hate machine) is these days?

browningstreet1mo ago

You misspelled "almost endearing".

It's a narrative conceit. The message is in the use of the word "terror".

You have to get to the end of the sentence and take it as a whole before you let your blood boil.

root_axisOP1mo ago

I deliberately copied the entire quote to preserve the full context. That juxtaposition is a tonal choice representative of the article's broader narrative, i.e. "agents are so powerful that they're potentially a dangerous new threat!".

I'm arguing against that hype. This is nothing new, everyone has been talking about LLMs being used to harass and spam the internet for years.

j / k navigate · click thread line to collapse