undefined | Better HN

0 pointsuoaei1y ago0 comments

You're missing the picture again.

Stepping one level out in the metacognition hierarchy is the key. "Learning to learn" as it were. It is only the relative ease of implementation and deployment of feedforward models like Transformers that makes it seem like we have reached an optimum but we desperately need to move beyond it before it's entrenched too thoroughly.

0 comments

PeterisP1y ago

Okay, but it does seem that this hack is in the entirely opposite direction; a pure transformer is more towards "learning to learn" than any special preprocessing to explicitly encode a different representation of numbers.

We probably do have to move beyond transformers, but not in the direction of such hacks, but rather towards even more general representations that could encode the whole class of all such alternate representations and then learn from data which of them work best.

uoaeiOP1y ago

You seem to be making my point just fine. What was your confusion, then?

j / k navigate · click thread line to collapse

0 comments

PeterisP1y ago

uoaeiOP1y ago

You seem to be making my point just fine. What was your confusion, then?

j / k navigate · click thread line to collapse