undefined | Better HN

0 pointssillysaurusx4y ago0 comments

Try temp 0.1 top-k 40. For math, it matters to have an unthinkably low temperature. It’s what generated the results in the OP.

What is 12345 - 12345?

What is the distance between -0.1 and -0.01?

0.09

One neat example just now:

What is 12345 divided by 12?

4115/4

0 comments

ggrrhh_ta4y ago

I pointed it out above; even though it is text, the ASCII representation is just a different base for the numbers - base 2^8 - ('325' is '3' * (2^32) + '2' * (2^16) + '5' * 2^8 = 51 * 2^32 + 50 * 2^16 + 53 * 2^8); it should approximate those polynomial functions very well.

sillysaurusxOP4y ago

Hmm. I’m not sure what you mean. Temperature is randomness; low temp is to get the most probable least random result. It’s what chess engines do during tournaments, for example.

The other parts seem unlikely. It has no knowledge of bases, except insofar as they appear in the training set. I saw this in our GPT chess work — even with strange tokenization, it learned chess notation well.

ggrrhh_ta4y ago

Sorry, I thought it was clear. A neural network, when untrained is just random noise that multiplies inputs by random weights over an over (+ normalization) until it reaches the output. When you train it with inputs whose outputs have are the process of applying some polynomial to those inputs, the weights can be set so that the output very closely approximates that polynomial. It never needs to know the base, and less randomness will help because the computations within the neural network match very well with the function you want to approximate. Still, it is not as simple, as outputting the correct ASCII representation is a challenge for example when carry is involved (100009999999999 + 1), however, the emergence of good arithmetic from a neural network itself should not be shocking.

1 more reply

kripke4y ago

Does GPT know about ASCII? My understanding was that these models use a dictionary of (initially) random vectors as input and learn their own text representation.

ggrrhh_ta4y ago

In that case, I would say that GPT's performance in arithmetic is something that we see because we are looking for it or want to find it but that is not there. It is an illusion. If we have no theory of why would it an arithmetic capability would emerge from GPT, then, there is no scientific discovery; at most, there a field survey, a taxonomist work, but no understanding is generated.

j / k navigate · click thread line to collapse

0 pointssillysaurusx4y ago0 comments

Try temp 0.1 top-k 40. For math, it matters to have an unthinkably low temperature. It’s what generated the results in the OP.

What is 12345 - 12345?

What is the distance between -0.1 and -0.01?

0.09

One neat example just now:

What is 12345 divided by 12?

4115/4

0 comments

ggrrhh_ta4y ago

sillysaurusxOP4y ago

Hmm. I’m not sure what you mean. Temperature is randomness; low temp is to get the most probable least random result. It’s what chess engines do during tournaments, for example.

ggrrhh_ta4y ago

1 more reply

kripke4y ago

Does GPT know about ASCII? My understanding was that these models use a dictionary of (initially) random vectors as input and learn their own text representation.

ggrrhh_ta4y ago

j / k navigate · click thread line to collapse