Stochastic Parrots: Frequently Unasked Questions (opens in new tab)

(medium.com)

44 pointsolalonde11d ago39 comments

39 comments

"Text generated by an LM is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind."

Modelling text describing the world is not modelling (some aspect) of the world?

Modelling the probability that a reader likes or dislike a piece of text is not modelling (some aspect) of a reader's state of mind?

tootie8d ago

No? There's no model involved. It's all just probabilistic. LLMs understand what you're thinking as well as a mood ring.

aoeusnth18d ago

The model is the thing which is learned in order to make the probabilistic prediction with low entropy.

roenxi8d ago

It isn't possible to have "just probabilistic" (maybe a philosophical exception could be made for a uniform random distribution or whatever provides the little dose of randomness required to get nondeterministic results). Probabilities are always in context of a model. LLMs model language but language itself is a model of something else. My money would have been on language modelling nonsense, but that is quite clearly not the case. Turns out it models the world and so do LLMs.

hellohello28d ago

The literal definition of a model is "an informative representation of an object, person, or system". I think you mean something else though, what are you trying to express exactly?

afthonos8d ago

Nothing about an LLM is “just”. In what precise sense do you mean it is probabilistic?

siegecraft8d ago

> Most things we historically do with computing are not well approximated by extruding synthetic text.

I don't understand this point. I feel like almost everything associated with computing is extruding synthetic text.

majormajor8d ago

It seems like a criticism that's actually a hint at a bigger point. The entire appeal/hype is due to the promise of doing things that historically computers have not done well.

That's captured elsewhere - attempts to create "synthetic human behavior" - but mostly around ethics vs practical function or consumer appeal.

Even just a "stochastic parrot" can be extremely valuable if the parrot is fast enough and can connect enough dots in a human-reasoning-style to say things like "what could come after a description of a problem, some background info, and a question about what could have caused the problem? Probably a relevant hypothesis that fits the background facts and the problem description" and then generate a high-probability-fitting sequence of text to spit out.

There doesn't need to be any more intent in that than just "predict what would be the next text that would be similarly connected to the previous in the same way text in the model training process would." It doesn't need to be intending to solve the problem if the hit rate is good enough such that predicting how someone else would describe the solution is often the same as actually "intending" to solve it...

Nor does the ability to predict things stochasticly mean that there isn't any symbolic way to do the same. Quite possibly the stochastic process is just a brute-force rough approximation of what a true symbolic model could do. IMO the success of the stochastic approach is exactly in line with the existence of some sort of underlying structure/system. (Though such as system would have to be incredibly complex to support all the crazy things we do with language.)

advisedwang8d ago

Just to name some of the main things I think of computers doing, especially with a historical lens: analyzing data, processing transactions, simulating dynamics of physical systems, controlling electronic parts of devices, providing entertainment, encoding/decoding audio/video/text. I think these are the kinds of things that Dr Bender is saying are not well suited to textual tools.

loandbehold8d ago

Sounds like increasing capabilities of LLMs over last 5 years proved her 2021 paper wrong but instead of admitting that she had been wrong she's trying to change/reinterpret what she wrote in 2021.

NooneAtAll38d ago

> Another common trope in the discourse around this phrase is to claim that stochastic parrot is an insult (or even a slur). On one reading, that would require LLMs to be the kind of thing that can take or feel offense, which they clearly aren’t.

isn't that circular reasoning?

"I can call anyone not smart enough to take offense because as I said those anyone aren't smart enough to take offense"?

(also disregarding that being offended has been shifted into "protection of the (perceived) weak (or of the group of your allegiance)" rather than "protection of self" for quite some time now)

---

but generally I always felt that this tension around the phrase was somewhat of perscriptive/descriptive difference, or maybe "level of detail in the model" type

just because there is knowledge of a more full understanding of the process doesn't mean other descriptions/modeling of the process are invalid or unuseful

newtonian gravity doesn't describe time dilation - and yet most of the time it is enough to use only it, so it's successfully studied in schools and undergrads

if output of LLM can be modeled (by intuition) as "some other being" for many practical uses *and model works* - then automatical blaming others for "using less precise model" and warning about it feels... strange

getnormality8d ago

I think "stochastic parrot" misses the mark as a characterization of LLMs, but so does "artificial intelligence." They're both somewhat helpful and somewhat misleading in complementary ways.

Maybe that's the best one can do when describing something very new and strange. A series of vivid, incompatible metaphors might be the best guide for a while. "Intelligence" as we normally understand it is a significant overstatement, while "parrot" is a massive understatement.

tibbar8d ago

I mean, we're pretty deep into Westworld/Blade Runner-style scifi at this point. It's actually a crazy, mind-bending question to try to grasp what is going on with chatclaudini at this point. Regardless of what labels we choose or properties we choose to affirm, we're far too deep into uncanny valley for it to be very helpful.

libraryofbabel8d ago

It would have been nice to see some version of “I am very surprised by how far LLMs have come since I wrote the stochastic parrots paper, here is how I have revised my thinking.” But there is nothing like that and the author is just doubling down or trying to correct perceived “misinterpretations” of her work.

Meanwhile you have multiple Fields Medalists (Tau, Gowers) saying they’re very impressed by LLMs’ mathematical reasoning, something that the stochastic parrots thesis (if it has any empirically-predictive content at all) would predict was impossible. I doubt Tau and Gowers thought much of LLMs a few years ago either. But they changed their minds. Who do you want to listen to?

I think it’s time to retire the Stochastic Parrots metaphor. A few years ago a lot of us didn’t think LLMs would ever be capable of doing what they can do now. I certainly didn’t. But new methods of training (RLVR) changed the game and took LLMs far beyond just reducing cross entropy on huge corpuses of text. And so we changed our opinions. Shame Emily Bender hasn’t too.

Sigh.

seatsh8d ago

Gowers, Tao and Lichtman are especially impressed by the funding of math.inc and the AI for Math Fund, a joint venture of Renaissance Philanthropies and XTX Markets.

Renaissance Philanthropies is a front for VC companies.

They never publish allocated computational resources, prior art or any novel algorithm that is used in the LLMs. For all we know, all accounts that are known to work on math stunts get 20% of total compute.

In other words, they ignore prior art, do not investigate and just celebrate if they get a vibe math result. It isn't science, it is a disgrace.

leonidasv8d ago

What a hill to die on.

_wire_11d ago

Lovely article well worth attention by virtue of its regard for the cultural traits of terminology and its inflections, while also debunking the pervasive lore that "AI" devices are doing anything but the merest resemblance of thinking.

It's rare to read an author who can directly face Brandolini's Law of misinformation asymmetry and not only hold his own against the bullshit but overcome it.

CamperBob28d ago

TIL that the "merest resemblance of thinking" is enough to take gold at IMO.

radkZ8d ago

Automated theorem provers are not new, in fact they are very old. One of the most automated is ACL2, which uses the well studied waterfall method (unrelated to waterfall development).

LLMs certainly use something similar, except they understand text as input. LLMs, especially used for marketing stunts, have way more computing power available than any theorem prover ever had. They probably do random restarts if a proof fails which amounts to partially brute forcing.

Lawrence Paulson correctly complained about some of the hype that Lean/LLMs are getting.

ACL2 even uses formulaic text output that describes the proof in human language, despite being all in Common Lisp and not a mythical clanker.

They do not think and use old and well established algorithms or perhaps novel ones that were added.

scotty798d ago

And also create novel math proofs.

radkZ8d ago

This is the first submission since a year that gives me some hope for humanity. It shows that linguistics is not obsolete. Maybe the last people capable of thinking will be linguists.

j / k navigate · click thread line to collapse

39 comments

hellohello28d ago

"Text generated by an LM is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind."

Modelling text describing the world is not modelling (some aspect) of the world?

Modelling the probability that a reader likes or dislike a piece of text is not modelling (some aspect) of a reader's state of mind?

tootie8d ago

No? There's no model involved. It's all just probabilistic. LLMs understand what you're thinking as well as a mood ring.

aoeusnth18d ago

The model is the thing which is learned in order to make the probabilistic prediction with low entropy.

roenxi8d ago

hellohello28d ago

The literal definition of a model is "an informative representation of an object, person, or system". I think you mean something else though, what are you trying to express exactly?

afthonos8d ago

Nothing about an LLM is “just”. In what precise sense do you mean it is probabilistic?

siegecraft8d ago

> Most things we historically do with computing are not well approximated by extruding synthetic text.

I don't understand this point. I feel like almost everything associated with computing is extruding synthetic text.

majormajor8d ago

It seems like a criticism that's actually a hint at a bigger point. The entire appeal/hype is due to the promise of doing things that historically computers have not done well.

That's captured elsewhere - attempts to create "synthetic human behavior" - but mostly around ethics vs practical function or consumer appeal.

advisedwang8d ago

loandbehold8d ago

Sounds like increasing capabilities of LLMs over last 5 years proved her 2021 paper wrong but instead of admitting that she had been wrong she's trying to change/reinterpret what she wrote in 2021.

NooneAtAll38d ago

isn't that circular reasoning?

"I can call anyone not smart enough to take offense because as I said those anyone aren't smart enough to take offense"?

(also disregarding that being offended has been shifted into "protection of the (perceived) weak (or of the group of your allegiance)" rather than "protection of self" for quite some time now)

---

but generally I always felt that this tension around the phrase was somewhat of perscriptive/descriptive difference, or maybe "level of detail in the model" type

just because there is knowledge of a more full understanding of the process doesn't mean other descriptions/modeling of the process are invalid or unuseful

newtonian gravity doesn't describe time dilation - and yet most of the time it is enough to use only it, so it's successfully studied in schools and undergrads

getnormality8d ago

I think "stochastic parrot" misses the mark as a characterization of LLMs, but so does "artificial intelligence." They're both somewhat helpful and somewhat misleading in complementary ways.

tibbar8d ago

libraryofbabel8d ago

Sigh.

seatsh8d ago

Gowers, Tao and Lichtman are especially impressed by the funding of math.inc and the AI for Math Fund, a joint venture of Renaissance Philanthropies and XTX Markets.

Renaissance Philanthropies is a front for VC companies.

In other words, they ignore prior art, do not investigate and just celebrate if they get a vibe math result. It isn't science, it is a disgrace.

leonidasv8d ago

What a hill to die on.

_wire_11d ago

It's rare to read an author who can directly face Brandolini's Law of misinformation asymmetry and not only hold his own against the bullshit but overcome it.

CamperBob28d ago

TIL that the "merest resemblance of thinking" is enough to take gold at IMO.

radkZ8d ago

Automated theorem provers are not new, in fact they are very old. One of the most automated is ACL2, which uses the well studied waterfall method (unrelated to waterfall development).

Lawrence Paulson correctly complained about some of the hype that Lean/LLMs are getting.

ACL2 even uses formulaic text output that describes the proof in human language, despite being all in Common Lisp and not a mythical clanker.

They do not think and use old and well established algorithms or perhaps novel ones that were added.

scotty798d ago

And also create novel math proofs.

radkZ8d ago

This is the first submission since a year that gives me some hope for humanity. It shows that linguistics is not obsolete. Maybe the last people capable of thinking will be linguists.

j / k navigate · click thread line to collapse