So at this point it does not matter what you believe about LLMs: in general, to trust LeCun words is not a good idea. Add to this that LeCun is directing an AI lab that as the same point has the following huge issues:
1. Weakest ever LLM among the big labs with similar resources (and smaller resources: DeepSeek).
2. They say they are focusing on open source models, but the license is among the less open than the available open weight models.
3. LLMs and in general all the new AI wave puts CNNs, a field where LeCun worked (but that didn't started himself) a lot more in perspective, and now it's just a chapter in a book that is composed mostly of other techniques.
Btw, other researchers that were in the LeCun side, changed side recently, saying that now "is different" because of CoT, that is the symbolic reasoning they were blabling before. But CoT is stil regressive next token without any architectural change, so, no, they were wrong, too.
How could that possibly be true?
There’s obviously a link between “[original content] is summarized as [summarized”content]
The idea that meaning is not impacted by language yet is somehow exclusively captured by language is just absolutely absurd
Like saying X+Y=Z but changing X or Y won’t affect Z
1. Weakest ever LLM? This one is really making me scratch my head. For a period of time Llama was considered to THE best. Furthermore, it's the third most used on OpenRouter (in the past month): https://openrouter.ai/rankings?view=month
2. Ignoring DeepSeek for a moment, Llama 2 and 3 require a special license from Meta if the products or services using the models have more than 700 million monthly active users. OpenAI, Claude and Gemini are not only closed source, but require a license/subscription to even get started.
Not really a good measure of quality or performance but of cost effectiveness
So it's not so much about his incorrect predictions, but that these predictions were based on a core belief. And when the predictions turned out to be false, he didn't adjust his core beliefs, but just his predictions.
So it's natural to ask, if none of the predictions you derived from your core belief come true, maybe your core belief isn't true.
He's done a lot of amazing work, but his stance on LLMs seems continuously off the mark.
If your model of reality makes good predictions and mine makes bad ones, and I want a more accurate model of reality, then I really shouldn’t just make small provisional and incremental concessions gerrymandered around whatever the latest piece of evidence is. After a few repeated instances, I should probably just say “oops, looks like my model is wrong” and adopt yours.
This seems to be a chronic problem with AI skeptics of various sorts. They clearly tell us that their grand model indicates that such-and-such a quality is absolutely required for AI to achieve some particular thing. Then LLMs achieve that thing without having that quality. Then they say something vague about how maybe LLMs have that quality after all, somehow. (They are always shockingly incurious about explaining this part. You would think this would be important to them to understand, as they tend to call themselves “scientists”.)
They never take the step of admitting that maybe they’re completely wrong about intelligence, or that they’re completely wrong about LLMs.
Here’s one way of looking at it: if they had really changed their mind, then they would stop being consistently wrong.
What makes it "seem to get better" and what keeps throwing people like lecun off is the training bias, the prompts, the tooling and the billions spent cherry picking information to train on.
What LLMs do best is language generation which leads to, but is not intelligence. If you want someone who was right all along, maybe try Wittgenstein.
I am not an expert by any means but have some knowledge of the technicalities of the LLMs and my limited knowledge allows me to disagree with your statement. The models are trained on an ungodly amount of text, so they become very advanced statistical token prediction machines with magic randomness sprinkled in to make the outputs more interesting. After that, they are fine tuned on very believable dialogues, so their statistical weights are skewed in a way that when subject A (the user) tells something, subject B (the LLM-turned-chatbot) has to say something back which statistically should make sense (which it almost always does since they are trained on it in the first place). Try to paste random text - you will get a random reply. Now try to paste the same random text and ask the chatbot to summarize it - your randomness space will be reduced and it will be turned into a summary because the finetuning gave the LLM the "knowledge" what the summarization _looks like_ (not what it _means_).
Just to prove that you are wrong: ask your favorite LLM if your statement is correct and you will probably see it output that it is not.
Like what?
Your timeline doesn't sound crazy outlandish. It sounds pretty normal and lines up with my thoughts as AI has advanced over the past few years. Maybe more conservative than others in the field, but that's not a reason to dismiss him entirely any more than the hypesters should be dismissed entirely because they were over promising and under delivering?
> Now reasoning models can solve problems they never saw
This is not the same as a novel question though.
> o3 did huge progresses on ARC
Is this a benchmark? O3 might be great, but I think the average person's experience with LLMs matches what he's saying, it seems like there is a peak and we're hitting it. It also matches what Ilya said about training data being mostly gone and new architectures (not improvements to existing ones) needing to be the way forward.
> LeCun is directing an AI lab that as the same point has the following huge issues
Second point has nothing to do with the lab and more to do with Meta. Your last point has nothing to do with the lab at all. Meta also said they will have an agent that codes like a junior engineer by the end of the year and they are clearly going to miss that prediction, so does that extra hype put them back in your good books?
Sorry I am a little lost reading the last part about regressive next token and it is still wrong. Could someone explain a little bit? Edit: Explained here further down the thread. ( https://news.ycombinator.com/item?id=43594813 )
I personally went from AI skeptic ( it wont ever replace all human, at least not in the next 10 - 20 years ) to AI scary simply because of the reasoning capability it gained. It is not perfect, far from it but I can immediately infer how both algorithm improvements and hardware advance could bring us in 5 years. And that is not including any new breakthrough.
One does not follow from the other. In particular I don't "trust" anyone who is trying to make money off this technology. There is way more marketing than honest science happening here.
> and o3 did huge progresses on ARC,
It also cost huge money. The cost increase to go from 75% to 85% was two orders of magnitude greater. This cost scaling is not sustainable. It also only showed progress on ARC1, which it was trained for, and did terribly on ARC2 which it was not trained for.
> Btw, other researchers that were in the LeCun side, changed side recently,
Which "side" researchers are on is the least useful piece of information available.
Using n-gram/skip-gram model over the long text you can predict probabilities of word pairs and/or word triples (effectively collocations [1]) in the summary.
[1] https://en.wikipedia.org/wiki/Collocation
Then, by using (beam search and) an n-gram/skip-gram model of summaries, you can generate the text of a summary, guided by preference of the words pairs/triples predicted by the first step.
Are we able to prove it with output that's
1) algorithmically novel (not just a recombination)
2) coherent, and
3) not explainable by training data coverage.
No handwaving with scale...
o3 doing well on ARC after domain training is not a great argument. There is something significant missing from LLMs being intelligent.
I'm not sure if you watched the entire video, but there were insightful observations. I don't think anyone believes LLMs aren't a significant breakthrough in HCI and language modelling. But it is many layers with many winters away from AGI.
Also, understanding human and machine intelligence isn't about sides. And CoT is not symbolic reasoning.
o3 doing well on ARC after domain training is not a great argument. There is something significant missing from LLMs being intelligent.
I'm not sure if you watched the entire video, but there were insightful observations. I don't think anyone believes LLMs aren't a significant breakthrough in HCI and language modelling. But it is many layers with many winters away from AGI.
No he's not.
LeCun, "Mathematical Obstacles on the Way to Human-Level AI"
Slide (Why autoregressive models suck)
"Probability e that any produced [choice] takes us outside the set of correct answers .. probability that answer of length n is correct: P(correct) = (1-e)^{n}"
If I ask you something that you know the answer to, the words you use and that fact iself are distinct entities. You're just giving me a presentation layer for fact #74719.
But LLMs lack any comparable pool to draw from, and so their words and their answer are essentially the same thing.
> just one of the many tools of reason.
Read https://en.wikipedia.org/wiki/Preference_(economics)#Transit... then read https://pmc.ncbi.nlm.nih.gov/articles/PMC7058914/ and you will see there's a lot of data suggesting that indeed, it's just one of the many tools!
I think it's similar to how many dislike the non-deterministic output of LLM: when you use statistical tools, a non-deterministic output is a VERY nice feature to explore conceptual spaces with abductive reasoning: https://en.wikipedia.org/wiki/Abductive_reasoning
It's a tool I was using at a previous company, mixing LLMs, statistics and formal tools. I'm surprised there aren't more startups mixing LLM with z3 or even just prolog.
More interesting is their research work. JEPA is what LeCun is betting on:
https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-jo...
Many people believe that "wants" come first, and are then followed by rationalizations. It's also a theory that's supported by medical imaging.
Maybe the LLM are a good emulation of system-2 (their perfomance sugggest it is), and what's missing is system-1, the "reptilian" brain, based on emotions like love, fear, aggression, (etc.).
For all we know, the system-1 could use the same embeddings, and just work in parallel and produce tokens that are used to guide the system-2.
Personally, I trust my "emotions" and "gut feelings": I believe they are things "not yet rationalized" by my system-2, coming straight from my system-1.
I know it's very unpopular among nerds, but it has worked well enough for me!
They do not reason significantly better than autoregressive LLMs. Which makes me question “one token at a time” as the bottleneck.
Also, Lecun has been pushing his JEPA idea for years now - with not much to show for it. With his resources one could hope we would see the benefits of that over the current state of the art models.
I know there are other examples, and I'm not attacking your post; mainly it's a great opportunity to link this IMHO interesting article that interacts with many debates on HN.
It's paywalled
This seems like a broken clock having a good chance of being right.
There's so much progress, it wouldn't be that surprising if something quite different completely overtakes the current trend within 5 years.
By NN models overcoming the pivot over representing language - according to LeCun in the article. It could be the Joint Embedding Predictive Architecture - we will see.
> There's so much progress, it wouldn't be that surprising
LeCun's point looks like a denunciation over an excessive focus over the LLM idea ("it works, so let's expand that" vs "it probably will not achieve the level of a satisfactory general model, so let us directly try to go beyond it").
Well yes in that ChatGPT 4 (current) will be replaced by ChatGPT 5 (future) etc...
He wrote about Copycat, a program for understanding analogies ("abc is to 123 as cba is to ???"). The program worked at the symbolic level, in the sense that it hard-coded a network of relationships between words and characters. I wonder how close he was to "inventing" an LLM? The insight he needed was that instead of hard-coding patterns, he should have just trained on a vast set of patterns.
Hofstadter focused on Copycat because he saw pattern-matching as the core ability of intelligence. Unlocking that, in his view, would unlock AI. And, of course, pattern-matching is exactly what LLMs are good for.
I think he's right. Intelligence isn't about logic. In the early days of AI, people thought that a chess-playing computer would necessarily be intelligent, but that was clearly a dead-end. Logic is not the hard part. The hard part is pattern-matching.
In fact, pattern-matching is all there is: That's a bear, run away; I'm in a restaurant, I need to order; this is like a binary tree, I can solve it recursively.
I honestly can't come up with a situation that calls for intelligence that can't be solved by pattern-matching.
In my opinion, LeCun is moving the goal-posts. He's saying LLMs make mistakes and therefore they aren't intelligent and aren't useful. Obviously that's wrong: humans make mistakes and are usually considered both intelligent and useful.
I wonder if there is a necessary relationship between intelligence and mistakes. If you can solve a problem algorithmically (e.g., long-division) then there won't be mistakes, but you don't need intelligence (you just follow the algorithm). But if you need intelligence (because no algorithm exists) then there will always be mistakes.
But whats critical, and I think is what's missing is a knowledge representation of events in space-time. We need something more fundamental than text or pixels, we need something that captures space and transformations in space itself.
This is not correct. It does not explain creativity at all. It cannot solely be based on pattern matching. I'm not saying no AI is creative, but this logic does not explain creativity
Ask ChatGPT to answer something that no one on the internet has done before and it will struggle to come up with a solution.
x->|
x|
x<-|
If yes, it seems to me that LLMs should be much better at that than humans, and I believe the frontier models like o3 might already be better than humans, we are just starting to use them for these tasks. Give it a couple more years before making any conclusions.
Your unsolved problems would likely involve the extremes of maps that you currently think in terms of. Maps become less useful as you get closer to undefined extreme conditions within them (a famous one is us humans ourselves, and why so many unsolved challenges to various degrees of obviousness concern our psyche and physiology—world peace, cancer, and so on), and I assume useful pattern matching is similarly less effective. Data to pattern-match against is collected and classified according to a preexisting model; if the model is wrong (which it is), the data may lead to spurious matches with wrong or nonsensical answers. Furthermore, if the answer has to be in terms of a new system, another fallible map hitherto unfamiliar to human mind, pattern-matching based on preexisting products of that very mind is unlikely to produce one.
What is intelligence?
Is it reacting to the environment? No, a thermostat can do that.
Is being logical? No, the simplest program can do that.
Is it creating something never seen before? No, a random number generator can do that.
We can even combine all of the above into a program and it still wouldn't be intelligent or creative. So what's the missing piece? The missing piece is pattern-matching.
Pattern-matching is taking a concrete input (a series of numbers or a video stream) and extracting abstract concepts and relationships. We can even nest patterns: we can match a pattern of concepts, each of which is composed of sub-patterns, and so on.
Creativity is just pattern matching the output of a pseudo-random generator against a critique pattern (is this output good?). When an artist creates something, they are constantly pattern matching against their own internal critic and the existing art out there. They are trying to find something that matches the beauty/impact of the art they've seen, while matching their own aesthetic, and not reproducing an existing pattern. It's pattern-matching all the way down!
Science is just a special form of creativity. You are trying to create a model that reproduces experimental outcomes. How do you do that? You absorb the existing models and experiments (which involves pattern-matching to compress into abstract concepts), and then you generate new models that fit the data.
Pattern-matching unlocks AI, which is why LLMs have been so successful. Obviously, you still need logic, inference, etc., but that's the easy part. Pattern-matching was the last missing piece!
“The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.”
― Edsger W. Dijkstra, in https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD867...
That is monetary value. The poster may have meant "delivery" value - which has been limited (and tainted with hype).
> Text generation
Which «text generation», apart from code generation (quite successful in some models), would amount to «trillions of dollars worth of economic activity» at the current stage? I cannot see it at the moment.
I don’t really see any increase in the quality, velocity, creativity of software I’m either using or working on, but I have seen a ton of candidates with real experience flunk out on interviews because they forget the basics and blame it on LLMs. I’ve honestly never seen candidates struggle with syntax the way I’m seeing them do so the last couple years.
I find LLMs very useful for general overviews of domains but I’ve not really seen any use that is a clear unambiguous boost to productivity. You still have to read the code and understand it which takes time, but “vibe coding” prevents you from building cognitive load until that point, making code review costly.
I feel like a lot of the hype in this space so far is aspirational, pushed by VCs, or a bunch of engineers who aren’t A/B testing: like for every hour you spend rubber ducking with an LLM, how much could you have gotten thought out with a pen and paper? In fact, usually I can go and enjoy my life more without an LLM: I write down the problem I’m considering and then go on a walk, and come back and have a mind full of new ideas.