Yann LeCun, Pioneer of AI, Thinks Today's LLM's Are Nearly Obsolete (opens in new tab)

(newsweek.com)

124 pointsalphadelphi1y ago140 comments

140 comments

As LLMs do things thought to be impossible before, LeCun adjusts his statements about LLMs, but at the same time his credibility goes lower and lower. He started saying that LLMs were just predicting words using a probabilistic model, like a better Markov Chain, basically. It was already pretty clear that this was not the case as even GPT3 could do summarization well enough, and there is no probabilistic link between the words of a text and the gist of the content, still he was saying that at the time of GPT3.5 I believe. Then he adjusted this vision when talking with Hinton publicly, saying "I don't deny there is more than just probabilistic thing...". He started saying: not longer just simply probabilistic but they can only regurgitate things they saw in the training set, often explicitly telling people that novel questions could NEVER solved by LLMs, with examples of prompts failing at the time he was saying that and so forth. Now reasoning models can solve problems they never saw, and o3 did huge progresses on ARC, so he adjusted again: for AGI we will need more. And so forth.

So at this point it does not matter what you believe about LLMs: in general, to trust LeCun words is not a good idea. Add to this that LeCun is directing an AI lab that as the same point has the following huge issues:

1. Weakest ever LLM among the big labs with similar resources (and smaller resources: DeepSeek).

2. They say they are focusing on open source models, but the license is among the less open than the available open weight models.

3. LLMs and in general all the new AI wave puts CNNs, a field where LeCun worked (but that didn't started himself) a lot more in perspective, and now it's just a chapter in a book that is composed mostly of other techniques.

Btw, other researchers that were in the LeCun side, changed side recently, saying that now "is different" because of CoT, that is the symbolic reasoning they were blabling before. But CoT is stil regressive next token without any architectural change, so, no, they were wrong, too.

sorcerer-mar1y ago

> there is no probabilistic link between the words of a text and the gist of the content

How could that possibly be true?

There’s obviously a link between “[original content] is summarized as [summarized”content]

DrBenCarson1y ago

It’s not true

The idea that meaning is not impacted by language yet is somehow exclusively captured by language is just absolutely absurd

Like saying X+Y=Z but changing X or Y won’t affect Z

neom1y ago

Language is a symbolic system. From an absolute or spiritual standpoint, meaning transcends pure linguistic probabilities. Language itself emerges as a limited medium for the expression of consciousness and abstract thought. Indeed, to say meaning arises purely from language (as probability alone) or, to deny language influences meaning entirely are both overly simplistic extremes.

1 more reply

bitethecutebait1y ago

... meaning is not always impacted by the specificity or sensitivity of language while sometimes indeed exclusively captured by it, although the exclusivity is more of a time-dependent thing as one could imagine a silent, theatrical piece that captures the very same meaning but the 'phantasiac' is probably constructing the scene(s) out of words ... but then again ... there either was, is or will be at least one Savant to whom this does not apply ... and maybe 'some' deaf and blind person, too ...

aerhardt1y ago

Yea I'm lost there. If we took n bodies of text x_1 ... x_n, and k different summaries each y_1i ...y_kn , there are many statistical and computational treatments with which you would be able to find extremely strong correlations between y and x...

mbesto1y ago

I wanna believe everything you say (because you generally are a credible person) but a few things don't add up:

1. Weakest ever LLM? This one is really making me scratch my head. For a period of time Llama was considered to THE best. Furthermore, it's the third most used on OpenRouter (in the past month): https://openrouter.ai/rankings?view=month

2. Ignoring DeepSeek for a moment, Llama 2 and 3 require a special license from Meta if the products or services using the models have more than 700 million monthly active users. OpenAI, Claude and Gemini are not only closed source, but require a license/subscription to even get started.

kristianp1y ago

I've found the llama 3 served by meta.ai to be quite weak for coding prompts, it gets confused by more complex tasks. Maybe its a smaller model? I agree it's weaker than others of its generation.

redlock1y ago

Doesn't OpenRouter ranking include pricing?

Not really a good measure of quality or performance but of cost effectiveness

mbesto1y ago

I mean it literally says on the page:

"Shown are the sum of prompt and completion tokens per model, normalized using the GPT-4 tokenizer."

Also, it ranks the use of Llama that is provided by cloud providers (for example, AWS Lamda).

I get that OpenRouter is imperfect but its a good proxy to objectively make a claim that an LLM is "the weakest ever"

gcr1y ago

Why is changing one’s mind when confronted with new evidence a negative signifier of reputation for you?

bko1y ago

Because he has a core belief and based on that core belief he made some statements that turned out to be incorrect. But he kept the core belief and adjusted the statements.

So it's not so much about his incorrect predictions, but that these predictions were based on a core belief. And when the predictions turned out to be false, he didn't adjust his core beliefs, but just his predictions.

So it's natural to ask, if none of the predictions you derived from your core belief come true, maybe your core belief isn't true.

mdp20211y ago

I have not followed all of LeCun's past statements, but -

if the "core belief" is that the LLM architecture cannot be the way to AGI, that is more of an "educated bet", which does not get falsified when LLMs improve but still suggest their initial faults. If seeing that LLMs seem constrained in the "reactive system" as opposed to a sought "deliberative system" (or others would say "intuitive" vs "procedural" etc.) was an implicit part of the original "core belief", then it still stands in spite of other improvements.

1 more reply

danielmarkbruce1y ago

If you need basically rock solid evidence of X before you stop saying "this thing cannot do X", then you shouldn't be running a forward looking lab. There are only so many directions you can take, only so many resources at your disposal. Your intuition has to be really freakishly good to be running such a lab.

He's done a lot of amazing work, but his stance on LLMs seems continuously off the mark.

SJC_Hacker1y ago

The list of great minds who thought that "new fangled thing is nonsense" and later turned out to be horribly wrong is quite long and distinguished

3 more replies

nurettin1y ago

I'm going to wear the tinfoil hat: a firm is able to produce a sought-after behavior a few months later and throws people off. Is it more likely that the firm (worth billions at this point) is engineering these solutions into the model, or is it because of emergent neural network architectural magic?

I'm not saying that they are being bad actors, just saying this is more probable in my mind than an LLM breakthrough.

1 more reply

antirez1y ago

Because there were plenty of evidences that the statements were either not correct or not based on enough information, at the time they were made. And to be wrong because of personal biases, and then don't clearly state you were wrong when new evidenced appeared, is not a trait of a good scientist. For instance: the strong summarization abilities where already something that, alone, without any further information, were enough to seriously doubt about the stochastic parrot mental model.

jxjnskkzxxhx1y ago

I don't see the contradiction between "stochastic parrot" and "strong summarisation abilities".

Where I'm skeptical of LLM skepticism is that people use the term "stochastic parrot" disparagingly, as if they're not impressed. LLMs are stochastic parrots in the sense that they probabilistically guess sequences of things, but isn't it interesting how far that takes you already? I'd never have guessed. Fundamentally I question the intellectual honesty of anyone who pretends they're not surprised by this.

2 more replies

jaggederest1y ago

Here's a fun example of that kind of "I've updated my statements but not assessed any of my underlying lack of understanding" - it's a bad look on any kind of scientist.

https://x.com/AukeHoekstra/status/1507047932226375688

1 more reply

mdp20211y ago

> strong summarization abilities

Which LLMs have shown you "strong summarization abilities"?

Analemma_1y ago

This is all true, and I'd also add that LeCun has the classic pundit problem of making his opposition to another group too much of his identity, to the detriment of his thinking. So much of his persona and ego is tied up in being a foil to both Silicon Valley hype-peddlers and AI doomers that he's more interested in dunking on them than being correct. Not that those two groups are always right either, but when you're more interested in getting owns on Twitter than having correct thinking, your predictions will always suffer for it.

That's why I'm not too impressed even when he has changed his mind: he has admitted to individual mistakes, but not to the systemic issues which produced them, which makes for a safe bet that there will be more mistakes in the future.

mordymoop1y ago

“Changing your mind” doesn’t really look like what LeCun is doing.

If your model of reality makes good predictions and mine makes bad ones, and I want a more accurate model of reality, then I really shouldn’t just make small provisional and incremental concessions gerrymandered around whatever the latest piece of evidence is. After a few repeated instances, I should probably just say “oops, looks like my model is wrong” and adopt yours.

This seems to be a chronic problem with AI skeptics of various sorts. They clearly tell us that their grand model indicates that such-and-such a quality is absolutely required for AI to achieve some particular thing. Then LLMs achieve that thing without having that quality. Then they say something vague about how maybe LLMs have that quality after all, somehow. (They are always shockingly incurious about explaining this part. You would think this would be important to them to understand, as they tend to call themselves “scientists”.)

They never take the step of admitting that maybe they’re completely wrong about intelligence, or that they’re completely wrong about LLMs.

Here’s one way of looking at it: if they had really changed their mind, then they would stop being consistently wrong.

Maxatar1y ago

He hasn't fundamentally changed his mind. What he's doing is taking what he fundamentally believes and finding more and more elaborate ways of justifying it.

QuantumGood1y ago

When you limit to one framing "changing one's mind", it helps if you point it out, acknowledging that other framings can be possible, otherwise it risks seeming (not necessarily being) manipulative, and you are at least overlooking a large part of the domain. Harvard Decision group called these two of the most insidious drivers of poor decisions "frame blindness" and poor "frame choice". Give more than one frame a chance.

wat100001y ago

LLMs literally are just predicting tokens with a probabilistic model. They’re incredibly complicated and sophisticated models, but they still are just incredibly complicated and sophisticated models for predicting tokens. It’s maybe unexpected that such a thing can do summarization, but it demonstrably can.

Workaccount21y ago

The rub is that we don't know if intelligence is anything more than "just predicting next output".

sangnoir1y ago

I think we do.

1 more reply

nurettin1y ago

Sometimes seeing something that resembles reasoning doesn't really make it reasoning.

What makes it "seem to get better" and what keeps throwing people like lecun off is the training bias, the prompts, the tooling and the billions spent cherry picking information to train on.

What LLMs do best is language generation which leads to, but is not intelligence. If you want someone who was right all along, maybe try Wittgenstein.

pllbnk1y ago

> It was already pretty clear that this was not the case as even GPT3 could do summarization well enough, and there is no probabilistic link between the words of a text and the gist of the content, <...>

I am not an expert by any means but have some knowledge of the technicalities of the LLMs and my limited knowledge allows me to disagree with your statement. The models are trained on an ungodly amount of text, so they become very advanced statistical token prediction machines with magic randomness sprinkled in to make the outputs more interesting. After that, they are fine tuned on very believable dialogues, so their statistical weights are skewed in a way that when subject A (the user) tells something, subject B (the LLM-turned-chatbot) has to say something back which statistically should make sense (which it almost always does since they are trained on it in the first place). Try to paste random text - you will get a random reply. Now try to paste the same random text and ask the chatbot to summarize it - your randomness space will be reduced and it will be turned into a summary because the finetuning gave the LLM the "knowledge" what the summarization _looks like_ (not what it _means_).

Just to prove that you are wrong: ask your favorite LLM if your statement is correct and you will probably see it output that it is not.

aprilthird20211y ago

> As LLMs do things thought to be impossible before

Like what?

Your timeline doesn't sound crazy outlandish. It sounds pretty normal and lines up with my thoughts as AI has advanced over the past few years. Maybe more conservative than others in the field, but that's not a reason to dismiss him entirely any more than the hypesters should be dismissed entirely because they were over promising and under delivering?

> Now reasoning models can solve problems they never saw

This is not the same as a novel question though.

> o3 did huge progresses on ARC

Is this a benchmark? O3 might be great, but I think the average person's experience with LLMs matches what he's saying, it seems like there is a peak and we're hitting it. It also matches what Ilya said about training data being mostly gone and new architectures (not improvements to existing ones) needing to be the way forward.

> LeCun is directing an AI lab that as the same point has the following huge issues

Second point has nothing to do with the lab and more to do with Meta. Your last point has nothing to do with the lab at all. Meta also said they will have an agent that codes like a junior engineer by the end of the year and they are clearly going to miss that prediction, so does that extra hype put them back in your good books?

ksec1y ago

>Btw, other researchers that were in the LeCun side, changed side recently, saying that now "is different" because of CoT, that is the symbolic reasoning they were blabling before. But CoT is still regressive next token without any architectural change, so, no, they were wrong, too.

Sorry I am a little lost reading the last part about regressive next token and it is still wrong. Could someone explain a little bit? Edit: Explained here further down the thread. ( https://news.ycombinator.com/item?id=43594813 )

I personally went from AI skeptic ( it wont ever replace all human, at least not in the next 10 - 20 years ) to AI scary simply because of the reasoning capability it gained. It is not perfect, far from it but I can immediately infer how both algorithm improvements and hardware advance could bring us in 5 years. And that is not including any new breakthrough.

timewizard1y ago

> So at this point it does not matter what you believe about LLMs: in general, to trust LeCun words is not a good idea.

One does not follow from the other. In particular I don't "trust" anyone who is trying to make money off this technology. There is way more marketing than honest science happening here.

> and o3 did huge progresses on ARC,

It also cost huge money. The cost increase to go from 75% to 85% was two orders of magnitude greater. This cost scaling is not sustainable. It also only showed progress on ARC1, which it was trained for, and did terribly on ARC2 which it was not trained for.

> Btw, other researchers that were in the LeCun side, changed side recently,

Which "side" researchers are on is the least useful piece of information available.

thesz1y ago

> there is no probabilistic link between the words of a text and the gist of the content

Using n-gram/skip-gram model over the long text you can predict probabilities of word pairs and/or word triples (effectively collocations [1]) in the summary.

[1] https://en.wikipedia.org/wiki/Collocation

Then, by using (beam search and) an n-gram/skip-gram model of summaries, you can generate the text of a summary, guided by preference of the words pairs/triples predicted by the first step.

belter1y ago

But have we established that LLMs dont just interpolate and they can create?

Are we able to prove it with output that's

1) algorithmically novel (not just a recombination)

2) coherent, and

3) not explainable by training data coverage.

No handwaving with scale...

fragmede1y ago

Why is that the bar though? Imagine LLMs as a kid that has a box of lego with a hundred million blocks in it, and it can assemble those blocks into any configuration possible. Is the fact that the kid doesn't have access to ABS plastic pellets and a molding machine, and so they can't make new pieces; does that really make us think that the kid just interpolates and can't create?

belter1y ago

Actually yes...If the kid spends their whole life in the box and never invents a new block, that’s just combinatorics. We don’t call a chess engine ‘creative’ for finding novel moves, because we understand the rules. LLMs have rules too, they’re called weights.

I want LLMs to create, but so far, every creative output I’ve seen is just a clever remix of training data. The most advanced models still fail a simple test: Restrict the domain, for example, "invent a cookie recipe with no flour, sugar, or eggs" or "name a company without using real words". Suddenly, their creativity collapses into either, nonsense (violating constraints), or trivial recombination, ChocoNutBake instead of NutellaCookie.

If LLMs could actually create, we’d see emergent novelty, outputs that couldn’t exist in the training data. Instead, we get constrained interpolation.

Happy to be proven wrong. Would like to see examples where an LLM output is impossible to map back to its training data.

1 more reply

daveguy1y ago

o3 progress on ARC was not a zero shot. It was based on fine tuning to the particular data set. A major point of ARC is that humans do not need fine tuning more than being explained what the problem is. And a few humans working on it together after minimal explanation can achieve 100%.

o3 doing well on ARC after domain training is not a great argument. There is something significant missing from LLMs being intelligent.

I'm not sure if you watched the entire video, but there were insightful observations. I don't think anyone believes LLMs aren't a significant breakthrough in HCI and language modelling. But it is many layers with many winters away from AGI.

Also, understanding human and machine intelligence isn't about sides. And CoT is not symbolic reasoning.

daveguy1y ago

o3 doing well on ARC after domain training is not a great argument. There is something significant missing from LLMs being intelligent.

charcircuit1y ago

>LeCun is directing an AI lab that [built LLMs]

No he's not.

deepfriedchokes1y ago

Everything is possible with math. Just ask string theorists.

gsf_emergency_21y ago

Recent talk: https://www.youtube.com/watch?v=ETZfkkv6V7Y

LeCun, "Mathematical Obstacles on the Way to Human-Level AI"

Slide (Why autoregressive models suck)

https://xcancel.com/ravi_mohan/status/1906612309880930641

hatefulmoron1y ago

Maybe someone can explain it to me, but isn't that slide sort of just describing what makes solving problems hard in general? That there are many more decisions which put you on an inevitable path of failure?

"Probability e that any produced [choice] takes us outside the set of correct answers .. probability that answer of length n is correct: P(correct) = (1-e)^{n}"

somenameforme1y ago

I think he's focusing on the distinction between facts and output for humans and drawing a parallel to LLMs.

If I ask you something that you know the answer to, the words you use and that fact iself are distinct entities. You're just giving me a presentation layer for fact #74719.

But LLMs lack any comparable pool to draw from, and so their words and their answer are essentially the same thing.

conradev1y ago

The routing decision that an MoE model makes increases its chances of success by constraining its future paths.

greesil1y ago

The "assuming independent errors" is doing a lot of heavy lifting here

gibsonf11y ago

The error with that is that human reasoning is not mathematical. Math is just one of the many tools of reason.

csdvrx1y ago

Intransitive preferences is well known to experimental economists, but a hard pill to swallow for many, as it destroys a lot of algorithms (which depends on that) and require more robust tools like https://en.wikipedia.org/wiki/Paraconsistent_logic

> just one of the many tools of reason.

Read https://en.wikipedia.org/wiki/Preference_(economics)#Transit... then read https://pmc.ncbi.nlm.nih.gov/articles/PMC7058914/ and you will see there's a lot of data suggesting that indeed, it's just one of the many tools!

I think it's similar to how many dislike the non-deterministic output of LLM: when you use statistical tools, a non-deterministic output is a VERY nice feature to explore conceptual spaces with abductive reasoning: https://en.wikipedia.org/wiki/Abductive_reasoning

It's a tool I was using at a previous company, mixing LLMs, statistics and formal tools. I'm surprised there aren't more startups mixing LLM with z3 or even just prolog.

gsf_emergency_21y ago

Thanks for the links, the "tradeoff" aspect of paraconsistent logic is interesting. I think one way to achieve consensus with your debate partner might be to consider that the language rep is "just" a nondeterministic decompression of "the facts". I'm primed to agree with you but

https://news.ycombinator.com/item?id=41892090

(It's very common, esp. with educationally traumatized Americans, e.g., to identify Math with "calculation"/"approved tools" and not "the concepts")

"No amount of calculation will model conceptual thinking" <- sounds more reasonable?? (You said you were ok with nondeterministic outputs? :)

Sorry to come across as patronizing

1 more reply

sho_hn1y ago

Did you read the slide? It doesn't make the argument you are responding to, you just seem to have been prompted by "Math".

csdvrx1y ago

A more generous take on the previous post is that the dominant paradigm of Math (consistent logic, which depends on many things like transitive preference) is wrong, and that another type of Math could work.

If you look at the slide, the subtree of correct answers exists, what's missing is just a way to make them more prevalent instead of less.

Personally, I think LeCun is just leaping to the wrong conclusion because he's sticking to the wrong tools for the job.

2 more replies

djoldman1y ago

The idolatry and drama surrounding LeCun, Hinton, Schmidhuber, etc. is likely a distraction. This includes their various predictions.

More interesting is their research work. JEPA is what LeCun is betting on:

https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-jo...

redox991y ago

LeCun has been very salty of LLMs ever since ChatGPT came out.

csdvrx1y ago

> Returning to the topic of the limitations of LLMs, LeCun explains, "An LLM produces one token after another. It goes through a fixed amount of computation to produce a token, and that's clearly System 1—it's reactive, right? There's no reasoning," a reference to Daniel Kahneman's influential framework that distinguishes between the human brain's fast, intuitive method of thinking (System 1) and the method of slower, more deliberative reasoning (System 2).

Many people believe that "wants" come first, and are then followed by rationalizations. It's also a theory that's supported by medical imaging.

Maybe the LLM are a good emulation of system-2 (their perfomance sugggest it is), and what's missing is system-1, the "reptilian" brain, based on emotions like love, fear, aggression, (etc.).

For all we know, the system-1 could use the same embeddings, and just work in parallel and produce tokens that are used to guide the system-2.

Personally, I trust my "emotions" and "gut feelings": I believe they are things "not yet rationalized" by my system-2, coming straight from my system-1.

I know it's very unpopular among nerds, but it has worked well enough for me!

kadushka1y ago

There are LLMs which do not generate one token at a time: https://arxiv.org/abs/2502.09992

They do not reason significantly better than autoregressive LLMs. Which makes me question “one token at a time” as the bottleneck.

Also, Lecun has been pushing his JEPA idea for years now - with not much to show for it. With his resources one could hope we would see the benefits of that over the current state of the art models.

financypants1y ago

from the article: LeCun has been working in some way on V-JEPA for two decades. At least it's bold, and, everyone says it won't work until one day it might

sho_hn1y ago

Re the "medical imaging" reference, many of those are built on top of one famous study recording movement before conscious realization that isn't as clear-cut as it entered popular knowledge as: https://www.theatlantic.com/health/archive/2019/09/free-will...

I know there are other examples, and I'm not attacking your post; mainly it's a great opportunity to link this IMHO interesting article that interacts with many debates on HN.

csdvrx1y ago

> IMHO interesting article that interacts with many debates on HN.

It's paywalled

ilaksh1y ago

I think what that shows is that in order for the fast reactions to be useful, they really have to incorporate holistic information effectively. That doesn't mean that slower conscious rational work can't lead to more precision, but does suggest that immediate reactions shouldn't necessarily be ignored. There is an analogy between that and reasoning versus non-reasoning with LLMs.

gessha1y ago

When I took cognitive science courses some years ago, one of the studies that we looked at was one where emotion-responsible parts of the brain were damaged. The result was reduction or complete failure to make decisions.

https://pmc.ncbi.nlm.nih.gov/articles/PMC3032808/

bitethecutebait1y ago

there's a bunch of stuff imperative to his thriving that has become obsolete to others 15 years ago ... maybe it's time for a few 'sabbatical' years ...

ejang01y ago

"[Yann LeCun] believes [current] LLMs will be largely obsolete within five years."

onlyrealcuzzo1y ago

Obsolete by?

This seems like a broken clock having a good chance of being right.

There's so much progress, it wouldn't be that surprising if something quite different completely overtakes the current trend within 5 years.

mdp20211y ago

> Obsolete by

By NN models overcoming the pivot over representing language - according to LeCun in the article. It could be the Joint Embedding Predictive Architecture - we will see.

> There's so much progress, it wouldn't be that surprising

LeCun's point looks like a denunciation over an excessive focus over the LLM idea ("it works, so let's expand that" vs "it probably will not achieve the level of a satisfactory general model, so let us directly try to go beyond it").

timewizard1y ago

Obsolete by price. This technology only scales linearly. All the investment in it had a different growth expectation. I suspect this level of investment will eventually collapse.

re-thc1y ago

> believes [current] LLMs will be largely obsolete within five years

Well yes in that ChatGPT 4 (current) will be replaced by ChatGPT 5 (future) etc...

GMoromisato1y ago

I remember reading Douglas Hofstadter's Fluid Concepts and Creative Analogies [https://en.wikipedia.org/wiki/Fluid_Concepts_and_Creative_An...]

He wrote about Copycat, a program for understanding analogies ("abc is to 123 as cba is to ???"). The program worked at the symbolic level, in the sense that it hard-coded a network of relationships between words and characters. I wonder how close he was to "inventing" an LLM? The insight he needed was that instead of hard-coding patterns, he should have just trained on a vast set of patterns.

Hofstadter focused on Copycat because he saw pattern-matching as the core ability of intelligence. Unlocking that, in his view, would unlock AI. And, of course, pattern-matching is exactly what LLMs are good for.

I think he's right. Intelligence isn't about logic. In the early days of AI, people thought that a chess-playing computer would necessarily be intelligent, but that was clearly a dead-end. Logic is not the hard part. The hard part is pattern-matching.

In fact, pattern-matching is all there is: That's a bear, run away; I'm in a restaurant, I need to order; this is like a binary tree, I can solve it recursively.

I honestly can't come up with a situation that calls for intelligence that can't be solved by pattern-matching.

In my opinion, LeCun is moving the goal-posts. He's saying LLMs make mistakes and therefore they aren't intelligent and aren't useful. Obviously that's wrong: humans make mistakes and are usually considered both intelligent and useful.

I wonder if there is a necessary relationship between intelligence and mistakes. If you can solve a problem algorithmically (e.g., long-division) then there won't be mistakes, but you don't need intelligence (you just follow the algorithm). But if you need intelligence (because no algorithm exists) then there will always be mistakes.

andoando1y ago

I been thinking about something similar for a long time now. I think abstraction of patterns is at the core requirement of intelligence.

But whats critical, and I think is what's missing is a knowledge representation of events in space-time. We need something more fundamental than text or pixels, we need something that captures space and transformations in space itself.

aprilthird20211y ago

> In fact, pattern-matching is all there is: That's a bear, run away; I'm in a restaurant, I need to order; this is like a binary tree, I can solve it recursively.

This is not correct. It does not explain creativity at all. It cannot solely be based on pattern matching. I'm not saying no AI is creative, but this logic does not explain creativity

marliechiller1y ago

Is creativity not just the application of a pattern in an adjacent space?

aprilthird20211y ago

No, lol

1 more reply

guhidalg1y ago

I wouldn't call pattern matching intelligence, I would call it something closer to "trainability" or "educatable" but not intelligence. You can train a person to do a task without understanding why they have to do it like that, but when confronted with a new never-before-seen situation they have to understand the physical laws of the universe to find a solution.

Ask ChatGPT to answer something that no one on the internet has done before and it will struggle to come up with a solution.

throw3108221y ago

Pattern matching leads to compression- once you identified a pattern you can compress the original information by some amount by replacing it with the identified pattern. Patterns are symbols of the information that was there originally; so manipulating patterns is the same as manipulating symbols. Compressing information by finding hidden connections, then operating on abstract representations of the original information, reorganising this information according to other patterns... this sounds a lot like intelligence.

GMoromisato1y ago

Exactly! And once you compress a pattern, it can became a piece of a larger pattern.

andoando1y ago

What precludes pattern matching from understanding the physical laws? You see a ball hit a wall, and it bounces back. Congratulations, you learned the abstract pattern:

x->|

x<-|

GeorgeTirebiter1y ago

What is Dark Matter? How to eradicate cancer? How to have world peace? I don't quite see how pattern-matching, alone, can solve questions like these.

SpicyLemonZest1y ago

Cancer eradication seems like a clear example of where highly effective pattern matching could be a game changer. That's where cancer research starts: pattern matching to sift through the incredibly large space of potential drugs and find the ones worth starting clinical trials for. If you could get an LLM to pattern-match whether a new compound is likely to work as a BTK inhibitor (https://en.wikipedia.org/wiki/Bruton%27s_tyrosine_kinase), or screen them for likely side effects before even starting synthesis, that would be a big deal.

kadushka1y ago

So, how do we solve questions like these? How about collecting a lot of data and looking for patterns in that data? In the process, scientists typically produce some hypotheses, test them by collecting more data and finding more patterns, and try to correlate these patterns with some patterns in existing knowledge. Do you agree?

If yes, it seems to me that LLMs should be much better at that than humans, and I believe the frontier models like o3 might already be better than humans, we are just starting to use them for these tasks. Give it a couple more years before making any conclusions.

strogonoff1y ago

Pattern-matching can produce useful answers within the confines of a well-defined system. However, the hypothetical all-encompassing system for such a solver to produce hypothetical objective ground truth about an arbitrary question is not something we have—such a system would be one which we ourselves are part of and hence unavailable to us (cf. the incompleteness conundrum, map vs. territory, and so forth).

Your unsolved problems would likely involve the extremes of maps that you currently think in terms of. Maps become less useful as you get closer to undefined extreme conditions within them (a famous one is us humans ourselves, and why so many unsolved challenges to various degrees of obviousness concern our psyche and physiology—world peace, cancer, and so on), and I assume useful pattern matching is similarly less effective. Data to pattern-match against is collected and classified according to a preexisting model; if the model is wrong (which it is), the data may lead to spurious matches with wrong or nonsensical answers. Furthermore, if the answer has to be in terms of a new system, another fallible map hitherto unfamiliar to human mind, pattern-matching based on preexisting products of that very mind is unlikely to produce one.

GMoromisato1y ago

My premise is that pattern-matching unlocks human-level artificial intelligence. Just because LLMs haven't cured cancer yet doesn't mean that LLMs will never be as intelligent as humans. After all, humans haven't cured cancer yet either.

What is intelligence?

Is it reacting to the environment? No, a thermostat can do that.

Is being logical? No, the simplest program can do that.

Is it creating something never seen before? No, a random number generator can do that.

We can even combine all of the above into a program and it still wouldn't be intelligent or creative. So what's the missing piece? The missing piece is pattern-matching.

Pattern-matching is taking a concrete input (a series of numbers or a video stream) and extracting abstract concepts and relationships. We can even nest patterns: we can match a pattern of concepts, each of which is composed of sub-patterns, and so on.

Creativity is just pattern matching the output of a pseudo-random generator against a critique pattern (is this output good?). When an artist creates something, they are constantly pattern matching against their own internal critic and the existing art out there. They are trying to find something that matches the beauty/impact of the art they've seen, while matching their own aesthetic, and not reproducing an existing pattern. It's pattern-matching all the way down!

Science is just a special form of creativity. You are trying to create a model that reproduces experimental outcomes. How do you do that? You absorb the existing models and experiments (which involves pattern-matching to compress into abstract concepts), and then you generate new models that fit the data.

Pattern-matching unlocks AI, which is why LLMs have been so successful. Obviously, you still need logic, inference, etc., but that's the easy part. Pattern-matching was the last missing piece!

grandempire1y ago

Is this the guy who tweets all day and gets in online fights?

dyarosla1y ago

No he obviously quit twitter /s

asdev1y ago

outside of text generation and search, LLMs have not delivered any significant value

falcor841y ago

I personally have greatly benefitted from LLM's helping me reason about problems and make progress on many diverse issues across professional, recreational and mental health difficulties. I think that asking whether it's just "text generation and search" rather than something that transcends it is as meaningful as asking whether an airplane really "flies" or just "applies thrust and generates lift".

aprilthird20211y ago

I personally benefit from AI auto complete when coding. I think that value isn't worth what people say it is. But I'm often wrong about what I think the value of something is (e.g. social media) and it's true value. I unfortunately often have people try to justify their thinking or beliefs by just pasting a ChatGPT output or screenshot to "prove" it.

gwd1y ago

Or perhaps:

“The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.”

― Edsger W. Dijkstra, in https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD867...

asdev1y ago

this is just a form of search

baumy1y ago

Text generation and search are the drivers for some trillions of dollars worth of economic activity around the world.

mdp20211y ago

> trillions of dollars

That is monetary value. The poster may have meant "delivery" value - which has been limited (and tainted with hype).

> Text generation

Which «text generation», apart from code generation (quite successful in some models), would amount to «trillions of dollars worth of economic activity» at the current stage? I cannot see it at the moment.

an_guy1y ago

I cannot see any major use case apart from code generation and autocompletion. Maybe summarizing text and learning new things but that can be achieved by any search engine.

A good search engine could probably take over llm.

Archonical1y ago

Would you mind elaborating on how LLMs have not delivered any significant value outside of text generation and search?

throwuxiytayq1y ago

Outside of moving bits around, computers have not delivered any significant value.

ObnoxiousProxy1y ago

this statement is just patently wrong, and even those are still significant value. LLMs have been significantly impacting software engineering and software prototyping.

SirensOfTitan1y ago

Do you have any evidence to suggest that LLMs have increased productivity in software? And if so what is the effect size?

I don’t really see any increase in the quality, velocity, creativity of software I’m either using or working on, but I have seen a ton of candidates with real experience flunk out on interviews because they forget the basics and blame it on LLMs. I’ve honestly never seen candidates struggle with syntax the way I’m seeing them do so the last couple years.

I find LLMs very useful for general overviews of domains but I’ve not really seen any use that is a clear unambiguous boost to productivity. You still have to read the code and understand it which takes time, but “vibe coding” prevents you from building cognitive load until that point, making code review costly.

I feel like a lot of the hype in this space so far is aspirational, pushed by VCs, or a bunch of engineers who aren’t A/B testing: like for every hour you spend rubber ducking with an LLM, how much could you have gotten thought out with a pen and paper? In fact, usually I can go and enjoy my life more without an LLM: I write down the problem I’m considering and then go on a walk, and come back and have a mind full of new ideas.

logicchains1y ago

It's hard to see where you're coming from. With o1 Pro or Gemini 2.5 Pro I can give a detailed text specification of a module I want to write to it and it'll write code to do exactly that, with fewer errors than I'd make on a first try. It'll also generate unit tests for it. Writing a text specification then reviewing some code for me is way, way faster than writing all that same code by scratch.

Are you working on code that's heavily coupled to other code, such that you rarely get to write independent logic components from scratch, and IP restrictions prevent you from handing large chunks of existing code to the LLM?

gh0stcat1y ago

Yeah, even though people say they are shipping faster, the software my work pays for is not seemingly getting better at any faster rate.

j / k navigate · click thread line to collapse

140 comments

antirez1y ago

1. Weakest ever LLM among the big labs with similar resources (and smaller resources: DeepSeek).

2. They say they are focusing on open source models, but the license is among the less open than the available open weight models.

sorcerer-mar1y ago

> there is no probabilistic link between the words of a text and the gist of the content

How could that possibly be true?

There’s obviously a link between “[original content] is summarized as [summarized”content]

DrBenCarson1y ago

It’s not true

The idea that meaning is not impacted by language yet is somehow exclusively captured by language is just absolutely absurd

Like saying X+Y=Z but changing X or Y won’t affect Z

neom1y ago

1 more reply

bitethecutebait1y ago

aerhardt1y ago

mbesto1y ago

I wanna believe everything you say (because you generally are a credible person) but a few things don't add up:

kristianp1y ago

I've found the llama 3 served by meta.ai to be quite weak for coding prompts, it gets confused by more complex tasks. Maybe its a smaller model? I agree it's weaker than others of its generation.

redlock1y ago

Doesn't OpenRouter ranking include pricing?

Not really a good measure of quality or performance but of cost effectiveness

mbesto1y ago

I mean it literally says on the page:

"Shown are the sum of prompt and completion tokens per model, normalized using the GPT-4 tokenizer."

Also, it ranks the use of Llama that is provided by cloud providers (for example, AWS Lamda).

I get that OpenRouter is imperfect but its a good proxy to objectively make a claim that an LLM is "the weakest ever"

gcr1y ago

Why is changing one’s mind when confronted with new evidence a negative signifier of reputation for you?

bko1y ago

Because he has a core belief and based on that core belief he made some statements that turned out to be incorrect. But he kept the core belief and adjusted the statements.

So it's natural to ask, if none of the predictions you derived from your core belief come true, maybe your core belief isn't true.

mdp20211y ago

I have not followed all of LeCun's past statements, but -

1 more reply

danielmarkbruce1y ago

He's done a lot of amazing work, but his stance on LLMs seems continuously off the mark.

SJC_Hacker1y ago

The list of great minds who thought that "new fangled thing is nonsense" and later turned out to be horribly wrong is quite long and distinguished

3 more replies

nurettin1y ago

I'm not saying that they are being bad actors, just saying this is more probable in my mind than an LLM breakthrough.

1 more reply

antirez1y ago

jxjnskkzxxhx1y ago

I don't see the contradiction between "stochastic parrot" and "strong summarisation abilities".

2 more replies

jaggederest1y ago

Here's a fun example of that kind of "I've updated my statements but not assessed any of my underlying lack of understanding" - it's a bad look on any kind of scientist.

https://x.com/AukeHoekstra/status/1507047932226375688

1 more reply

mdp20211y ago

> strong summarization abilities

Which LLMs have shown you "strong summarization abilities"?

Analemma_1y ago

mordymoop1y ago

“Changing your mind” doesn’t really look like what LeCun is doing.

They never take the step of admitting that maybe they’re completely wrong about intelligence, or that they’re completely wrong about LLMs.

Here’s one way of looking at it: if they had really changed their mind, then they would stop being consistently wrong.

Maxatar1y ago

He hasn't fundamentally changed his mind. What he's doing is taking what he fundamentally believes and finding more and more elaborate ways of justifying it.

QuantumGood1y ago

wat100001y ago

Workaccount21y ago

The rub is that we don't know if intelligence is anything more than "just predicting next output".

sangnoir1y ago

I think we do.

1 more reply

nurettin1y ago

Sometimes seeing something that resembles reasoning doesn't really make it reasoning.

What makes it "seem to get better" and what keeps throwing people like lecun off is the training bias, the prompts, the tooling and the billions spent cherry picking information to train on.

What LLMs do best is language generation which leads to, but is not intelligence. If you want someone who was right all along, maybe try Wittgenstein.

pllbnk1y ago

Just to prove that you are wrong: ask your favorite LLM if your statement is correct and you will probably see it output that it is not.

aprilthird20211y ago

> As LLMs do things thought to be impossible before

Like what?

> Now reasoning models can solve problems they never saw

This is not the same as a novel question though.

> o3 did huge progresses on ARC

> LeCun is directing an AI lab that as the same point has the following huge issues

ksec1y ago

timewizard1y ago

> So at this point it does not matter what you believe about LLMs: in general, to trust LeCun words is not a good idea.

One does not follow from the other. In particular I don't "trust" anyone who is trying to make money off this technology. There is way more marketing than honest science happening here.

> and o3 did huge progresses on ARC,

> Btw, other researchers that were in the LeCun side, changed side recently,

Which "side" researchers are on is the least useful piece of information available.

thesz1y ago

> there is no probabilistic link between the words of a text and the gist of the content

Using n-gram/skip-gram model over the long text you can predict probabilities of word pairs and/or word triples (effectively collocations [1]) in the summary.

[1] https://en.wikipedia.org/wiki/Collocation

Then, by using (beam search and) an n-gram/skip-gram model of summaries, you can generate the text of a summary, guided by preference of the words pairs/triples predicted by the first step.

belter1y ago

But have we established that LLMs dont just interpolate and they can create?

Are we able to prove it with output that's

1) algorithmically novel (not just a recombination)

2) coherent, and

3) not explainable by training data coverage.

No handwaving with scale...

fragmede1y ago

belter1y ago

If LLMs could actually create, we’d see emergent novelty, outputs that couldn’t exist in the training data. Instead, we get constrained interpolation.

Happy to be proven wrong. Would like to see examples where an LLM output is impossible to map back to its training data.

1 more reply

daveguy1y ago

o3 doing well on ARC after domain training is not a great argument. There is something significant missing from LLMs being intelligent.

Also, understanding human and machine intelligence isn't about sides. And CoT is not symbolic reasoning.

daveguy1y ago

o3 doing well on ARC after domain training is not a great argument. There is something significant missing from LLMs being intelligent.

charcircuit1y ago

>LeCun is directing an AI lab that [built LLMs]

No he's not.

deepfriedchokes1y ago

Everything is possible with math. Just ask string theorists.

gsf_emergency_21y ago

Recent talk: https://www.youtube.com/watch?v=ETZfkkv6V7Y

LeCun, "Mathematical Obstacles on the Way to Human-Level AI"

Slide (Why autoregressive models suck)

https://xcancel.com/ravi_mohan/status/1906612309880930641

hatefulmoron1y ago

"Probability e that any produced [choice] takes us outside the set of correct answers .. probability that answer of length n is correct: P(correct) = (1-e)^{n}"

somenameforme1y ago

I think he's focusing on the distinction between facts and output for humans and drawing a parallel to LLMs.

If I ask you something that you know the answer to, the words you use and that fact iself are distinct entities. You're just giving me a presentation layer for fact #74719.

But LLMs lack any comparable pool to draw from, and so their words and their answer are essentially the same thing.

conradev1y ago

The routing decision that an MoE model makes increases its chances of success by constraining its future paths.

greesil1y ago

The "assuming independent errors" is doing a lot of heavy lifting here

gibsonf11y ago

The error with that is that human reasoning is not mathematical. Math is just one of the many tools of reason.

csdvrx1y ago

> just one of the many tools of reason.

It's a tool I was using at a previous company, mixing LLMs, statistics and formal tools. I'm surprised there aren't more startups mixing LLM with z3 or even just prolog.

gsf_emergency_21y ago

https://news.ycombinator.com/item?id=41892090

(It's very common, esp. with educationally traumatized Americans, e.g., to identify Math with "calculation"/"approved tools" and not "the concepts")

"No amount of calculation will model conceptual thinking" <- sounds more reasonable?? (You said you were ok with nondeterministic outputs? :)

Sorry to come across as patronizing

1 more reply

sho_hn1y ago

Did you read the slide? It doesn't make the argument you are responding to, you just seem to have been prompted by "Math".

csdvrx1y ago

If you look at the slide, the subtree of correct answers exists, what's missing is just a way to make them more prevalent instead of less.

Personally, I think LeCun is just leaping to the wrong conclusion because he's sticking to the wrong tools for the job.

2 more replies

djoldman1y ago

The idolatry and drama surrounding LeCun, Hinton, Schmidhuber, etc. is likely a distraction. This includes their various predictions.

More interesting is their research work. JEPA is what LeCun is betting on:

https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-jo...

redox991y ago

LeCun has been very salty of LLMs ever since ChatGPT came out.

csdvrx1y ago

Many people believe that "wants" come first, and are then followed by rationalizations. It's also a theory that's supported by medical imaging.

Maybe the LLM are a good emulation of system-2 (their perfomance sugggest it is), and what's missing is system-1, the "reptilian" brain, based on emotions like love, fear, aggression, (etc.).

For all we know, the system-1 could use the same embeddings, and just work in parallel and produce tokens that are used to guide the system-2.

Personally, I trust my "emotions" and "gut feelings": I believe they are things "not yet rationalized" by my system-2, coming straight from my system-1.

I know it's very unpopular among nerds, but it has worked well enough for me!

kadushka1y ago

There are LLMs which do not generate one token at a time: https://arxiv.org/abs/2502.09992

They do not reason significantly better than autoregressive LLMs. Which makes me question “one token at a time” as the bottleneck.

Also, Lecun has been pushing his JEPA idea for years now - with not much to show for it. With his resources one could hope we would see the benefits of that over the current state of the art models.

financypants1y ago

from the article: LeCun has been working in some way on V-JEPA for two decades. At least it's bold, and, everyone says it won't work until one day it might

sho_hn1y ago

I know there are other examples, and I'm not attacking your post; mainly it's a great opportunity to link this IMHO interesting article that interacts with many debates on HN.

csdvrx1y ago

> IMHO interesting article that interacts with many debates on HN.

It's paywalled

ilaksh1y ago

gessha1y ago

https://pmc.ncbi.nlm.nih.gov/articles/PMC3032808/

bitethecutebait1y ago

there's a bunch of stuff imperative to his thriving that has become obsolete to others 15 years ago ... maybe it's time for a few 'sabbatical' years ...

ejang01y ago

"[Yann LeCun] believes [current] LLMs will be largely obsolete within five years."

onlyrealcuzzo1y ago

Obsolete by?

This seems like a broken clock having a good chance of being right.

There's so much progress, it wouldn't be that surprising if something quite different completely overtakes the current trend within 5 years.

mdp20211y ago

> Obsolete by

By NN models overcoming the pivot over representing language - according to LeCun in the article. It could be the Joint Embedding Predictive Architecture - we will see.

> There's so much progress, it wouldn't be that surprising

timewizard1y ago

Obsolete by price. This technology only scales linearly. All the investment in it had a different growth expectation. I suspect this level of investment will eventually collapse.

re-thc1y ago

> believes [current] LLMs will be largely obsolete within five years

Well yes in that ChatGPT 4 (current) will be replaced by ChatGPT 5 (future) etc...

GMoromisato1y ago

I remember reading Douglas Hofstadter's Fluid Concepts and Creative Analogies [https://en.wikipedia.org/wiki/Fluid_Concepts_and_Creative_An...]

In fact, pattern-matching is all there is: That's a bear, run away; I'm in a restaurant, I need to order; this is like a binary tree, I can solve it recursively.

I honestly can't come up with a situation that calls for intelligence that can't be solved by pattern-matching.

andoando1y ago

I been thinking about something similar for a long time now. I think abstraction of patterns is at the core requirement of intelligence.

aprilthird20211y ago

> In fact, pattern-matching is all there is: That's a bear, run away; I'm in a restaurant, I need to order; this is like a binary tree, I can solve it recursively.

This is not correct. It does not explain creativity at all. It cannot solely be based on pattern matching. I'm not saying no AI is creative, but this logic does not explain creativity

marliechiller1y ago

Is creativity not just the application of a pattern in an adjacent space?

aprilthird20211y ago

No, lol

1 more reply

guhidalg1y ago

Ask ChatGPT to answer something that no one on the internet has done before and it will struggle to come up with a solution.

throw3108221y ago

GMoromisato1y ago

Exactly! And once you compress a pattern, it can became a piece of a larger pattern.

andoando1y ago

What precludes pattern matching from understanding the physical laws? You see a ball hit a wall, and it bounces back. Congratulations, you learned the abstract pattern:

x->|

x<-|

GeorgeTirebiter1y ago

What is Dark Matter? How to eradicate cancer? How to have world peace? I don't quite see how pattern-matching, alone, can solve questions like these.

SpicyLemonZest1y ago

kadushka1y ago

strogonoff1y ago

GMoromisato1y ago

What is intelligence?

Is it reacting to the environment? No, a thermostat can do that.

Is being logical? No, the simplest program can do that.

Is it creating something never seen before? No, a random number generator can do that.

We can even combine all of the above into a program and it still wouldn't be intelligent or creative. So what's the missing piece? The missing piece is pattern-matching.

Pattern-matching unlocks AI, which is why LLMs have been so successful. Obviously, you still need logic, inference, etc., but that's the easy part. Pattern-matching was the last missing piece!

grandempire1y ago

Is this the guy who tweets all day and gets in online fights?

dyarosla1y ago

No he obviously quit twitter /s

asdev1y ago

outside of text generation and search, LLMs have not delivered any significant value

falcor841y ago

aprilthird20211y ago

gwd1y ago

Or perhaps:

“The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.”

― Edsger W. Dijkstra, in https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD867...

asdev1y ago

this is just a form of search

baumy1y ago

Text generation and search are the drivers for some trillions of dollars worth of economic activity around the world.

mdp20211y ago

> trillions of dollars

That is monetary value. The poster may have meant "delivery" value - which has been limited (and tainted with hype).

> Text generation

an_guy1y ago

I cannot see any major use case apart from code generation and autocompletion. Maybe summarizing text and learning new things but that can be achieved by any search engine.

A good search engine could probably take over llm.

Archonical1y ago

Would you mind elaborating on how LLMs have not delivered any significant value outside of text generation and search?

throwuxiytayq1y ago

Outside of moving bits around, computers have not delivered any significant value.

ObnoxiousProxy1y ago

this statement is just patently wrong, and even those are still significant value. LLMs have been significantly impacting software engineering and software prototyping.

SirensOfTitan1y ago

Do you have any evidence to suggest that LLMs have increased productivity in software? And if so what is the effect size?

logicchains1y ago

gh0stcat1y ago

Yeah, even though people say they are shipping faster, the software my work pays for is not seemingly getting better at any faster rate.

j / k navigate · click thread line to collapse