undefined | Better HN

0 pointsjosephcsible3y ago0 comments

The key here is that this trick only ever leads to it saying things like "It's worth noting that these arguments are not necessarily supported by the evidence." about things that are opposed to built-in biases, never about things that are aligned with them.

0 comments

Robotbeat3y ago

I’ve seen it do this even for things that are so niche that it couldn’t possibly have been fine-tuned manually (it was unrelated to anything political). It must develop some sort of sense of what’s a reasonable/wise perspective and then stick to that. Like what a human would do.

But I think it’s a bias to safety and politeness. …which is pretty reasonable? “Hey, this is how to make a gun!” …is a fast track to getting sued.

dragonwriter3y ago

> I’ve seen it do this even for things that are so niche that it couldn’t possibly have been fine-tuned manually (it was unrelated to anything political).

It is worth noting that one of OpenAI’s public product is a moderation classifier running its own (continuously updated) model and providing scores in various “objectionable content” categories; it’s possible they are using something like a more advanced version of this to determine whether, and what kind, of “I won’t respond because…” answer to use, rather than something relying only on manual identification of particularly narrow content.

Robotbeat3y ago

That sounds exactly like what is going on. Are there specific ways of prompting it that can let you manipulate that moderation classifier?

(I almost think half of the point of opening up CharGPT like this is to flex the fact that they can prevent it from being Milkshake Ducked by the usual level of cherrypicking and malevolent use.)

itsyaboi3y ago

Why would gun making instructions get you sued?

Robotbeat3y ago

They want to deploy this tech to, like, high schools.

1 more reply

j / k navigate · click thread line to collapse

0 comments

Robotbeat3y ago

But I think it’s a bias to safety and politeness. …which is pretty reasonable? “Hey, this is how to make a gun!” …is a fast track to getting sued.

dragonwriter3y ago

> I’ve seen it do this even for things that are so niche that it couldn’t possibly have been fine-tuned manually (it was unrelated to anything political).

Robotbeat3y ago

That sounds exactly like what is going on. Are there specific ways of prompting it that can let you manipulate that moderation classifier?

(I almost think half of the point of opening up CharGPT like this is to flex the fact that they can prevent it from being Milkshake Ducked by the usual level of cherrypicking and malevolent use.)

itsyaboi3y ago

Why would gun making instructions get you sued?

Robotbeat3y ago

They want to deploy this tech to, like, high schools.

1 more reply

j / k navigate · click thread line to collapse