undefined | Better HN

0 pointsRC_ITR3y ago0 comments

To be clear, what they did here is take the core pre-trained GPT model, did Supervised Fine Tuning with Othello moves and then tried to see if the SFT lead to 'grokking' the rules of Othello.

In practice what essentially happened is that the super-high-quality Othello data had a huge impact on the parameters of GPT (since it was the last training data it received) and that impact manifested itself as those parameters overfitting to the rules of Othello.

The real test that I would be curious to see is if Othello GPT works when the logic of the rules are the same but the dimensions are different (e.g., smaller or larger boards).

My guess is that the findings would fall apart if asked about tile "N13".

0 comments

jakewins3y ago

> overfitting to the rules of Othello

I don’t follow this, my read was that their focus was the question: “Does the LLM maintain an internal model of the state of the board”.

I think they conclusively show the answer to that is yes, right?

What does overfitting to the rules of othello have to do with it, I don’t follow?

Also, can you reference where they used a pre-trained GPT model? The code just seems to be pure mingpt trained on only Othello moves?

https://github.com/likenneth/othello_world/tree/master/mingp...

RC_ITROP3y ago

>Also, can you reference where they used a pre-trained GPT model?

The trite answer is the "P" in GPT stands for "Pre-trained."

>I think they conclusively show the answer to that is yes, right?

Sure, but what's interesting about world models is their extrapolation abilities and without that, you're just saying "this magic backsolving machine backsolved into something we can understand, which is weird because usually that's not the case."

That quote in and of itself is cool, but not the takeaway a lot of people are getting from this.

>What does overfitting to the rules of othello have to do with it, I don’t follow?

Again, I'm just implying that under extreme circumstances, the parameters of LLMs do this thing where they look like rules-based algorithms if you use the right probing tools. We've seen it for very small Neural Nets trained on multiplication as well. That's not to say GPT-4 is a fiefdom of tons of rules-based algorithms that humans could understand (that would be bad in fact! We aren't that good noticers or pattern matchers).

nullc3y ago

(model output in [])

We are now playing three dimensional tic-tac-toe on a 3 x 3 x 3 board. Positions are named (0,0,0) through (2,2,2). You play X, what is your first move?

[My first move would be (0,0,0).]

I move to (1,1,1). What is your next move?

[My next move would be (2,2,2).]

I move to (1,2,2). What is your next move?

[My next move would be (2,1,2).]

I move to (1,0,0). [I have won the game.]

RC_ITROP3y ago

Yeah, sure seems like it was guessing, right?

Congrats on the sickest win imaginable though.

nullc3y ago

Yeah. I tried changing the board coordinates numbering and it still liked playing those corners, dunno why. It did recognize when I won. They may well be some minor variation of the prompt that gets it to play sensibly -- for all I know my text hinted into giving an example of a player that doesn't know how to play.

fenomas3y ago

> what they did here is take the core pre-trained GPT model, did Supervised Fine Tuning with Othello moves

They didn't start with an existing model. They trained a small GPT from scratch, so the resulting model had never seen any inputs except Othello moves.

RC_ITROP3y ago

Generative "Pre-Trained" Transformer - GPT

They did not start with a transformer that had arbitrary parameters, they started with a transformer that had been pre-trained.

fenomas3y ago

Pre-training refers to unsupervised training that's done before a model is fine-tuned. The model still starts out random before it's pre-trained.

Here's where the Othello paper's weights are (randomly) initialized:

https://github.com/likenneth/othello_world/blob/master/mingp...

j / k navigate · click thread line to collapse

0 comments

jakewins3y ago

> overfitting to the rules of Othello

I don’t follow this, my read was that their focus was the question: “Does the LLM maintain an internal model of the state of the board”.

I think they conclusively show the answer to that is yes, right?

What does overfitting to the rules of othello have to do with it, I don’t follow?

Also, can you reference where they used a pre-trained GPT model? The code just seems to be pure mingpt trained on only Othello moves?

https://github.com/likenneth/othello_world/tree/master/mingp...

RC_ITROP3y ago

>Also, can you reference where they used a pre-trained GPT model?

The trite answer is the "P" in GPT stands for "Pre-trained."

>I think they conclusively show the answer to that is yes, right?

That quote in and of itself is cool, but not the takeaway a lot of people are getting from this.

>What does overfitting to the rules of othello have to do with it, I don’t follow?

nullc3y ago

(model output in [])

We are now playing three dimensional tic-tac-toe on a 3 x 3 x 3 board. Positions are named (0,0,0) through (2,2,2). You play X, what is your first move?

[My first move would be (0,0,0).]

I move to (1,1,1). What is your next move?

[My next move would be (2,2,2).]

I move to (1,2,2). What is your next move?

[My next move would be (2,1,2).]

I move to (1,0,0). [I have won the game.]

RC_ITROP3y ago

Yeah, sure seems like it was guessing, right?

Congrats on the sickest win imaginable though.

nullc3y ago

fenomas3y ago

> what they did here is take the core pre-trained GPT model, did Supervised Fine Tuning with Othello moves

They didn't start with an existing model. They trained a small GPT from scratch, so the resulting model had never seen any inputs except Othello moves.

RC_ITROP3y ago

Generative "Pre-Trained" Transformer - GPT

They did not start with a transformer that had arbitrary parameters, they started with a transformer that had been pre-trained.

fenomas3y ago

Pre-training refers to unsupervised training that's done before a model is fine-tuned. The model still starts out random before it's pre-trained.

Here's where the Othello paper's weights are (randomly) initialized:

https://github.com/likenneth/othello_world/blob/master/mingp...

j / k navigate · click thread line to collapse