undefined | Better HN

0 pointsjameshart2y ago0 comments

That isn’t incompatible with what LLMs do though.

The penultimate layer of the LLM could be thought of as the one that figures out ‘given S1..Sn, what concept am I trying to express now?’. The final layer is the function from that to ‘what token should I output next’.

The fact that the LLM has to figure that all out again from scratch as part of generating every token, rather than maintaining a persistent ‘plan’, doesn’t make the essence of what it’s doing any different from what you claim you’re doing.

0 comments

iandanforth2y ago

Correct, but it's functionally very different from how LLMs are implemented and deployed today. What you're highlighting is being experimented with and ties into ideas like scratch pads, world models, RAG, and progressive fine-tuning (if you're googling).

It's a bit like saying your computer has everything it needs to manipulate photos but doesn't yet have Photoshop installed.

jameshartOP2y ago

No, I’m not talking about giving LLMs chain of thought prompts or augmenting them with scratchpads - I’m literally saying that in a multilayer neural network you don’t know what concepts activations on the inner layers mean. The result of ‘where I want this conversation to be in 100 tokens time’ could absolutely be in there somewhere.

iandanforth2y ago

Ahh. That doesn't found falsifiable. So sure, "could be."

j / k navigate · click thread line to collapse

0 pointsjameshart2y ago0 comments

That isn’t incompatible with what LLMs do though.

0 comments

iandanforth2y ago

It's a bit like saying your computer has everything it needs to manipulate photos but doesn't yet have Photoshop installed.

jameshartOP2y ago

iandanforth2y ago

Ahh. That doesn't found falsifiable. So sure, "could be."

j / k navigate · click thread line to collapse