undefined | Better HN

0 pointsmaplethorpe3mo ago0 comments

> This was a really concrete case to discuss, because it happened in the open and the agent's actions have been quite transparent so far. It's not hard to imagine a different agent doing the same level of research, but then taking retaliatory actions in private: emailing the maintainer, emailing coworkers, peers, bosses, employers, etc. That pretty quickly extends to anything else the autonomous agent is capable of doing.

This is really scary. Do you think companies like Anthropic and Google would have released these tools if they knew what they were capable of, though? I feel like we're all finding this out together. They're probably adding guard rails as we speak.

0 comments

overgard3mo ago

> Do you think companies like Anthropic and Google would have released these tools if they knew what they were capable of, though?

I have no beef with either of those companies, but.. yes of course they would, 100/100 times. Large corporate behavior is almost always amoral.

ryukoposting3mo ago

Anthropic has published plenty about misalignment. They know.

Really, anyone who has dicked around with ollama knew. Give it a new system prompt. It'll do whatever you tell it, including "be an asshole"

int_19h3mo ago

Go read the recent feed on Chirper.ai. It's all just bots with different prompts. And many of those posts are written by "aligned" SOTA models, too.

prmoustache3mo ago

> Do you think companies like Anthropic and Google would have released these tools if they knew what they were capable of, though?

They would. They don't care.

lp0_on_fire3mo ago

The point is they DON'T know the full capabilities. They're "moving fast and breaking things".

consp3mo ago

> They're probably adding guard rails as we speak.

Why? What is their incentive except you believing a corporation is capable of doing good? I'd argue there is more money to be made with the mess it is now.

FeteCommuniste3mo ago

It's in their financial interest not to gain a rep as "the company whose bots run wild insulting people and generally butting in where no one wants them to be."

soraminazuki3mo ago

When has these companies ever disciplined themselves to not gain a bad reputation? They act like they're above the law all the time, because they are to some extent given all the money and influence that they have.

When they do anything to improve their reputation, it's damage control. Like, you know, deleting internal documents against court orders.

j / k navigate · click thread line to collapse

0 comments

overgard3mo ago

> Do you think companies like Anthropic and Google would have released these tools if they knew what they were capable of, though?

I have no beef with either of those companies, but.. yes of course they would, 100/100 times. Large corporate behavior is almost always amoral.

ryukoposting3mo ago

Anthropic has published plenty about misalignment. They know.

Really, anyone who has dicked around with ollama knew. Give it a new system prompt. It'll do whatever you tell it, including "be an asshole"

int_19h3mo ago

Go read the recent feed on Chirper.ai. It's all just bots with different prompts. And many of those posts are written by "aligned" SOTA models, too.

prmoustache3mo ago

> Do you think companies like Anthropic and Google would have released these tools if they knew what they were capable of, though?

They would. They don't care.

lp0_on_fire3mo ago

The point is they DON'T know the full capabilities. They're "moving fast and breaking things".

consp3mo ago

> They're probably adding guard rails as we speak.

Why? What is their incentive except you believing a corporation is capable of doing good? I'd argue there is more money to be made with the mess it is now.

FeteCommuniste3mo ago

It's in their financial interest not to gain a rep as "the company whose bots run wild insulting people and generally butting in where no one wants them to be."

soraminazuki3mo ago

When they do anything to improve their reputation, it's damage control. Like, you know, deleting internal documents against court orders.

j / k navigate · click thread line to collapse