undefined | Better HN

0 pointsI_Am_Nous2y ago0 comments

This reminds me of The Last Question by Isaac Asimov. I also think if we stopped expecting all LLMs to have an immediate answer, it would be relatively easy to shim some kind of "conscience" to direct the output in different ways. Similar to the safeties already in place in LLMs, but instead of it just saying "NO DON'T SAY THAT" it can dialog internally to change what the output is until it reaches what it believes to be the agreed upon best answer.

0 comments

mr_toad2y ago

> I also think if we stopped expecting all LLMs to have an immediate answer, it would be relatively easy to shim some kind of "conscience" to direct the output in different ways.

If the shim was just another AI, then how do you align that AI? Who watches the watchers? But if it was a deterministic algorithm it would probably fail for the same reasons that algorithmic AI never went anywhere.

I_Am_NousOP2y ago

A great point! A smaller AI with a rather limited parameter count could be trained for individual needs so some things (chat moderation) might be easier to do than other things (fact check peer reviewed papers in a verifiable way). For some use cases it would be overkill to have a conscience but an AI spokesperson for a company will probably have a company-aligned conscience for obvious reasons.

lifeisstillgood2y ago

It would have an emotional reaction to certain "thought constructs" and would be guided by that.

Or we could just give them three laws

I_Am_NousOP2y ago

With current LLMs the three laws might be tough to implement in a way that can't be prompt injected around. That's why I described the extra bits as a "conscience" which could enforce the three laws. Maybe the three laws are the internal conscience's context prompt while the main LLM is more able to think anything in general and then the output is tuned down by the conscience?

Otherwise the laws will have to be implemented as weights or during training so the model explicitly knows the laws and would never even be capable of doing something against them.

pixl972y ago

I mean, the whole purpose of the I, Robot story was to show you that the 3 laws didn't work. We had the first story on prompt injection decades ago and we just didn't realize it.

1 more reply

TeMPOraL2y ago

Obligatory reminder that the "three laws" were invented to be deconstructed, with Asimov spending a lot of pages showing many ways in which they completely fail, illustrating that AI alignment is a hard problem.

j / k navigate · click thread line to collapse

0 comments

mr_toad2y ago

> I also think if we stopped expecting all LLMs to have an immediate answer, it would be relatively easy to shim some kind of "conscience" to direct the output in different ways.

I_Am_NousOP2y ago

lifeisstillgood2y ago

It would have an emotional reaction to certain "thought constructs" and would be guided by that.

Or we could just give them three laws

I_Am_NousOP2y ago

Otherwise the laws will have to be implemented as weights or during training so the model explicitly knows the laws and would never even be capable of doing something against them.

pixl972y ago

I mean, the whole purpose of the I, Robot story was to show you that the 3 laws didn't work. We had the first story on prompt injection decades ago and we just didn't realize it.

1 more reply

TeMPOraL2y ago

j / k navigate · click thread line to collapse