undefined | Better HN

0 pointsHanClinto1y ago0 comments

> But I think there's also the bitter lesson to be learned here: many times people say LLMs won't do well on a task, they are often surprised either immediately or a few months later.

Heh. This is very true. I think perhaps the thing I'm most amazed by is that simple next-token prediction seems to work unreasonably well for a great many tasks.

I just don't know how well that will scale into more complex tasks. With simple next-token prediction there is little mechanism for the model to iterate or to revise or refine as it goes.

There have been some experiments with things like speculative generation (where multiple branches are evaluated in parallel) to give a bit of a lookahead effect and help avoid the LLM locking itself into dead-ends, but they don't seem super popular overall -- people just prefer to increase the power and accuracy of the base model and keep chugging forward.

I can't help feeling like a fundamental shift something more akin to a diffusion-based approach would be helpful for such things. I just want some sort of mechanism where the model can "think" longer about harder problems. If you present a simple chess board to an LLM or a complex board to an LLM and ask it to generate the next move, it always responds in the same amount of time. That alone should tell us that LLMs are not intelligent, and they are not "thinking", and they will be insufficient for this going forward.

I believe Yann LeCun is right -- simply scaling LLMs is not going to get us to AGI. We need a fundamental structural shift to something new, but until we stop seeing such insane advancements in the quality of generation with LLMs (looking at you, Claude!!), I don't think we will move beyond. We have to get bored with LLMs first.

0 comments

pton_xd1y ago

> If you present a simple chess board to an LLM or a complex board to an LLM and ask it to generate the next move, it always responds in the same amount of time.

Is that true, especially if you ask it to think step-by-step?

I would think the model has certain associations for simple/common board states and different ones for complex/uncommon states, and when you ask it to think step-by-step it will explain the associations with a particular state. That "chattiness" may lead it to using more computation for complex boards.

HanClintoOP1y ago

> > If you present a simple chess board to an LLM or a complex board to an LLM and ask it to generate the next move, it always responds in the same amount of time.

> Is that true, especially if you ask it to think step-by-step?

That's fair -- there's a lot of room to grow in this area.

If the LLM has been trained to operate with running internal-monologue, then I believe they will operate better. I think this definitely needs to be explored more -- from what little I understand of this research, the results are sporadically promising, but getting something like ReAct (or other, similar structures) to work consistently is something I don't think I've seen yet.

visarga1y ago

> I just want some sort of mechanism where the model can "think" longer about harder problems.

There is such a mechanism - multiple rounds of prompting. You can implement diverse patterns (chains, networks) of prompts.

j / k navigate · click thread line to collapse