undefined | Better HN

0 pointstypest2y ago0 comments

If a model hasn't been explicitly told (via some system prompt or something) about its weights, it won't know them. It would be akin to asking you how many neurons you had. How would you know?

0 comments

foota2y ago

I don't know, but the fact that the model can suggest the most relevant sentence is intriguing to me. I don't know. I realize it's just looking at the probability. Would it be possible to sort of craft adversarial inputs to learn the model's weights? It seems like it should be, and in some sense you're then getting it to output the weights, but you'd need to know the models structure almost certainly to do that.

ShamelessC2y ago

It doesn’t have access to its own probabilities in this regard. Instead the output is encouraged to be a ranking of preferences of the dataset modeled. It outputs the preferences of the average human writer from its dataset (incorporating any custom changes leftover from instruction fine tuning).

foota2y ago

This is what confuses me though, people don't write things like: What is the most relevant sentence in this book?

I have a vague understanding of the mechanisms here, but I just don't think I get how it goes from "the most relevant sentence" to an attention vector that "points to" the right place, I would have thought this was beyond what they could do by just completing training data.

I also realize that the model has no ability to "introspect" itself, but I don't know what's stopping it from doing a train of thought output to get to it in some way.

Do you think you could get it to reveal the attention vector at some point in time, by e.g., repeatedly asking it for the Nth most relevant word, say, and working backwards?

1 more reply

esafak2y ago

That's the perfect intelligence test, as Ilya said: ask it about something it has not been trained, but might be able to infer.

j / k navigate · click thread line to collapse

0 comments

foota2y ago

ShamelessC2y ago

foota2y ago

This is what confuses me though, people don't write things like: What is the most relevant sentence in this book?

I also realize that the model has no ability to "introspect" itself, but I don't know what's stopping it from doing a train of thought output to get to it in some way.

Do you think you could get it to reveal the attention vector at some point in time, by e.g., repeatedly asking it for the Nth most relevant word, say, and working backwards?

1 more reply

esafak2y ago

That's the perfect intelligence test, as Ilya said: ask it about something it has not been trained, but might be able to infer.

j / k navigate · click thread line to collapse