undefined | Better HN

0 pointskromem2y ago0 comments

That's not the case. It's very much in the realm of "we don't know what's going on in the network."

Rather than a binary it's much more likely that it's a mix of factors going into results that includes basic reasoning capabilities developed from the training data (much like board representations and state tracking abilities developed feeding board game moves into a toy model in Othello-GPT) as well as statistic driven autocomplete.

In fact often when I've seen GPT-4 get hung up with logic puzzle variations such as transparency, it tends to seem more like the latter is overriding the former, and changing up tokens to emoji representations or having it always repeat adjectives attached to nouns so it preserves variation context gets it over the hump to reproducible solutions (as would be expected from a network capable of reasoning) but by default it falls into the pattern of the normative cases.

For something as complex as SotA neural networks, binary sweeping statements seem rather unlikely to actually be representative...

0 comments

nsagent2y ago

As an PhD student in NLP who's graduating soon, my perspective is that language models do not demonstrate "reasoning" in the way most people colloquially use the term.

These models have no capacity to plan ahead, which is a requirement for many "reasoning" problems. If it's not in the context, the model is unlikely to use it for predicting the next token. That's why techniques like chain-of-thought are popular; they cause the model to parrot a list of facts before making a decision. This increases the likelihood that the context might contain parts of the answer.

Unfortunately, this means the "reasoning" exhibited by language models is limited: if the training data does not contain a set of generalizable text applicable to a particular domain, a language model is unlikely to make a correct inference when confronted with a novel version of a similar situation.

That said, I do think adding reasoning capabilities is an active area of research, but we don't have a clear time horizon on when that might happen. Current prompting approaches are stopgaps until research identifies a promising approach for developing reasoning, e.g. combining latent space representations with planning algorithms over knowledge bases, constraining the logits based on an external knowledge verifier, etc (these are just random ideas, not saying they are what people are working on, rather are examples of possible approaches to the problem).

In my opinion, language models have been good enough since the GPT-2 era, but have been held back by a lack of reasoning and efficient memory. Making the language models larger and trained on more data helps make them more useful by incorporating more facts with increased computational capacity, but the approach is fundamentally a dead end for higher level reasoning capability.

kromemOP2y ago

Congrats on the upcoming PhD!

I'm curious where you are drawing your definition or scope for 'reasoning' from?

For example, in Shuren The Neurology of Reasoning (2002) the definition selected was "the ability to draw conclusions from given information."

While I agree that LLMs can only process token to token and that juggling context is critical to effective operation such that CoT or ToT approaches are necessary to maximize the ability to synthesize conclusions, I'm not quite sure what the definition of reasoning you have in mind is such that these capabilities fall outside of it.

The typical lay audience suggestion that LLMs cannot generate new information or perspectives outside of the training data isn't the case, as I'm sure you're aware, and synthesizing new or original conclusions from input is very much within their capabilities.

Yes, this has to happen within a context window and occurs on a token by token basis, but that seems like a somewhat arbitrary distinction. Humans are unquestionably better at memory access and running multiple subprocesses on information than an LLM.

But if anything, this simply suggests that continuing to move in the direction of multiple pass processing of NLP tasks with selective contexts and a variety of fine tuned specializations of intermediate processing is where practical short term gains might lie.

As for the issue of new domains outside of training data, I'm somewhat surprised by your perspective. Hasn't one of the big research trends over the past twelve months been that in context learning has proven more capable than was previously expected? I'd agree that a zero shot evaluation of a problem type that isn't represented in a LLMs training data is setting it up for failure, but the capacity to extend in context examples outside of training data has proven relatively more successful, no?

jimmaswell2y ago

> These models have no capacity to plan ahead, which is a requirement for many "reasoning" problems. If it's not in the context, the model is unlikely to use it for predicting the next token. That's why techniques like chain-of-thought are popular; they cause the model to parrot a list of facts before making a decision. This increases the likelihood that the context might contain parts of the answer.

Is it not possible that this is essentially how our brains do it too? Attempt to plan by branching out to related ideas until they contain an answer. Any of these statements that AI can't be on track to reason like a human because of X seem to come with an implication that we have such a good model of the human brain that we know it doesn't X. But I'm not an expert on neuroscience so in many of these cases maybe that implication is true.

freejazz2y ago

>Is it not possible that this is essentially how our brains do it too?

Is that how you think? Just curious

2 more replies

visarga2y ago

> if the training data does not contain a set of generalizable text applicable to a particular domain, a language model is unlikely to make a correct inference when confronted with a novel version of a similar situation.

True. But look at the Phi-1.5 model - it punches 5x above its weight limit. The trick is in the dataset:

> Our training data for phi-1.5 is a combination of phi-1’s training data (7B tokens) and newly created synthetic, “textbook-like” data (roughly 20B tokens) for the purpose of teaching common sense reasoning and general knowledge of the world (science, daily activities, theory of mind, etc.). We carefully selected 20K topics to seed the generation of this new synthetic data. In our generation prompts, we use samples from web datasets for diversity. We point out that the only non-synthetic part in our training data for phi-1.5 consists of the 6B tokens of filtered code dataset used in phi-1’s training (see [GZA+ 23]).

> We remark that the experience gained in the process of creating the training data for both phi-1 and phi-1.5 leads us to the conclusion that the creation of a robust and comprehensive dataset demands more than raw computational power: It requires intricate iterations, strategic topic selection, and a deep understanding of knowledge gaps to ensure quality and diversity of the data. We speculate that the creation of synthetic datasets will become, in the near future, an important technical skill and a central topic of research in AI.

https://arxiv.org/pdf/2309.05463.pdf

Synthetic data has its advantages - less bias, more diverse, scalable, higher average quality. But more importantly, it can cover all the permutations and combinations of skills, concepts, situations. That's why a small model just 1.5B like Phi was able to work like a 7B model. Usually at that scale they are not coherent.

Der_Einzige2y ago

Are you going to school in Langley, Virginia?

NovemberWhiskey2y ago

NSA is more commonly associated with Fort Meade, MD, for what that's worth.

Kim_Bruning2y ago

> These models have no capacity to plan ahead

How would you describe the behavior of "GPT Advanced Data Analysis"?

j / k navigate · click thread line to collapse