undefined | Better HN

0 pointschaxor3y ago0 comments

What do you think about the papers showing mathematical proofs that GNNs (i.e. GATs/transformers) are dynamic programmers and therefore perform algorithmic reasoning?

The fact that these systems can extrapolate well beyond their training data by learning algorithms is quite different than what has come before, and anyone stating that they "simply" predict next token is severely shortsighted. Things don't have to be 'brain-like' to be useful, or to have capabilities of reasoning, but we have evidence that these systems have aligned well with reasoning tasks, perform well at causal reasoning, and we also have mathematical proofs that show how.

So I don't understand your sentiment.

0 comments

rdedev3y ago

To be fair LLMs are predicting the next token. It's just that to get better and better predictions it needs to understand some level of reasoning and math. However it feels to me that a lot of this reasoning is brute forced from the training data. Like chatgpt gets some things wrong when adding two very large numbers. If it really knew the algorithm for adding two numbers it shouldn't be making them in the first place. I guess same goes for issues like hallucinations. We can keep pushing the envelope using this technique but I'm sure we will hit a limit somewhere

chaxorOP3y ago

Of course it predict the next token. Every single person on earth knows that so it's not worth repeating at all.

As for the fact that it gets things wrong sometimes - sure, this doesn't say it actually learned every algorithm (in whichever model you may be thinking about). But the nice thing is that we now have this proof via category theory, and it allows us to both frame and understand what has occurred, and to consider how to align the systems to learn algorithms better.

rdedev3y ago

The fact that it sometimes fails simple algorithms for large numbers but shows good performance in other complex algorithms with simple inputs seems to me that something on a fundamental level is still insufficient

starlust23y ago

You're focusing too much on what the LLM can handle internally. No LLMs aren't good at math, but they understand mathematic concepts and can use a program or tool to perform calculations.

Your argument is the equivalent of saying humans can't do math because they rely on calculators.

In the end what matters is whether the problem is solved, not how it is solved.

(assuming that the how has reasonable costs)

1 more reply

zamnos3y ago

Insufficient for what? Humans regularly fail simple algorithms for small numbers, nevermind large numbers and complex algorithms

glitcher3y ago

> Of course it predict the next token. Every single person on earth knows that so it's not worth repeating at all

What's a token?

visarga3y ago

A token is either a common word or a common enough word fragment. Rare words are expressed as multiple tokens, while frequent words as a single token. They form a vocabulary of 50k up to 250k. It is possible to write any word or text in a combination of tokens. In the worst case 1 token can be 1 char, say, when encoding a random sequence.

Tokens exist because transformers don't work on bytes or words. This is because it would be too slow (bytes), the vocabulary too large (words), and some words would appear too rarely or never. The token system allows a small set of symbols to encode any input. On average you can approximate 1 token = 1 word, or 1 token = 4 chars.

So tokens are the data type of input and output, and the unit of measure for billing and context size for LLMs.

uh_uh3y ago

Both of these statements can be true:

1. ChatGPT knows the algorithm for adding two numbers of arbitrary magnitude.

2. It often fails to use the algorithm in point 1 and hallucinates the result.

Knowing something doesn't mean it will get it right all the time. Rather, an LLM is almost guaranteed to mess up some of the time due to the probabilistic nature of its sampling. But this alone doesn't prove that it only brute-forced task X.

visarga3y ago

> If it really knew the algorithm for adding two numbers it shouldn't be making them in the first place.

You're using it wrong. If you asked a human to do the same operation in under 2 seconds without paper, would the human be more accurate?

On the other hand if you ask for a step by step execution, the LLM can solve it.

tedunangst3y ago

I never told the LLM it needed to answer immediately. It can take its time and give the correct answer. I'd prefer that, even.

ipaddr3y ago

2 seconds? What model are you using?

flangola73y ago

GPT 3.5 is that fast.

catchnear43213y ago

am i bad at authoring inputs?

no, it’s the LLMs that are wrong.

throwuwu3y ago

Create two random 10 digit numbers and sit down and add them up on paper. Write down every bit of inner monologue that you have while doing this or just speak it out loud and record it.

ChatGPT needs to do the same process to solve the same problem. It hasn’t memorized the addition table up to 10 digits and neither have you.

3 more replies

agentultra3y ago

And LLMs will never be able to reason about mathematical objects and proofs. You cannot learn the truth of a statement by reading more tokens.

A system that can will probably adopt a different acronym (and gosh that will be an exciting development... I look forward to the day when we can dispatch trivial proofs to be formalized by a machine learning algorithm so that we can focus on the interesting parts while still having the entire proof formalized).

chaxorOP3y ago

You should read some of the papers referred to in the above comments before making that assertion. It may take a while to realize the overall structure of the argument, how the category theory is used, and how this is directly applicable to LLMs, but if you are in ML it should be obvious. https://arxiv.org/abs/2203.15544

agentultra3y ago

There are methods of proof that I'm not sure dynamic programming is fit to solve but this is an interesting paper. However even if it can only solve particular induction proofs that would be a big help. Thanks for sharing.

zootreeves3y ago

You know the algorithm for arithmetic. Are you telling me you could sum any large numbers first attempt, without any working and in less than a second 100% of the time?

joaogui13y ago

I don't get why the sudden fixation on time, the model is also spending a ton of compute and energy to do it

jmcgeeney3y ago

I could with access to a computer

starlust23y ago

If you get to use a tool, then so does the LLM.

agentofoblivion3y ago

Give me a break. Very interesting theoretical work and all, but show me where it's actually being used to do anything of value, beyond publication fodder. You could also say MLPs are proved to be universal approximators, and can therefore model any function, including the one that maps sensory inputs to cognition. But the disconnect between this theory and reality is so great that it's a moot point. No one uses MLPs this way for a reason. No one uses GATs in systems that people are discussing right now either. GATs rarely even beat GCNs by any significant margin in graph benchmarks.

chaxorOP3y ago

Are you saying that the new mathematical theorems that were proven using GNNs from Deepmind were not useful?

There were two very noteworthy (Perhaps Nobel prize level?) breakthroughs in two completely different fields of mathematics (knot theory and representation theory) by using these systems.

I would certainly not call that "useless", even if they're not quite Nobel-prize-worthy.

Also, "No one uses GATs in systems people discuss right now" ... Transformerare GATs (with PE) ... So, you're incredibly wrong.

agentofoblivion3y ago

You’re drinking from the academic marketing koolaid. Please tell me: where are these methods being applied in AI systems today?

And I’m so tired of this “transformers are just GNNs” nonsense that Petar has been pushing (who happens to have invented GATs and has a vested interest in overstating their importance). Transformers are GNNs in only the most trivial way: if you make the graph fully connected and allow everything to interact with everything else. I.e., not really a graph problem. Not to mention that the use of positional encodings breaks the very symmetry that GNNs were designed to preserve. In practice, no one is using GNN tooling to build transformers. You don’t see PyTorch geometric or DGL in any of the code bases. In fact, you see the opposite: people exploring transformers to replace GNNs in graph problems and getting SOTA results.

It reminds me of people that are into Bayesian methods always swooping in after some method has success and saying, “yes, but this is just a special case of a Bayesian method we’ve been talking about all along!” Yes, sure, but GATs have had 6 years to move the needle, and they’re no where to be found within modern AI systems that this thread is about.

joaogui13y ago

The paper shows the equivalence for specific networks, it doesn't say every GNN (and as such transformers) are Dynamic Programmers. Also the models are explicitly trained on that task, in a regime quite different from ChatGPT. What the paper shows and the possibility of LLMs being able to reason are pretty much completely independent from each other

pdonis3y ago

> What do you think about the papers showing mathematical proofs that GNNs (i.e. GATs/transformers) are dynamic programmers and therefore perform algorithmic reasoning?

Do you have a reference?

felipemnoa3y ago

>>What do you think about the papers showing mathematical proofs that GNNs (i.e. GATs/transformers) are dynamic programmers and therefore perform algorithmic reasoning?

Do you mind linking to one of those papers?

uh_uh3y ago

I just don't get how the average HN commenter thinks (and gets upvoted) that they know better than e.g. Ilya Sutskever who actually, you know, built the system. I keep reading this "it just predicts words, duh" rhetoric on HN which is not at all believed by people like Ilya or Hinton. Could it be that HN commenters know better than these people?

RandomLensman3y ago

That is the wrong discussion. What are their regulatory, social, or economic policy credentials?

uh_uh3y ago

I'm not suggesting that they have any. I was reacting to srslack above making _technical_ claims why LLMs can't be "generalized and adaptable intelligence" which is not shared by said technical experts.

hervature3y ago

No one is claiming to know better than Ilya. Just recognition of the fact that such a license would benefit these same individuals (or their employers) the most. I don't understand how HN can be so angry about a company that benefits from tax law (Intuit) advocating for regulation while also supporting a company that would benefit from an AI license (OpenAI) advocating for such regulation. The conflict of interest isn't even subtle. To your point, why isn't Ilya addressing the committee?

uh_uh3y ago

2 reasons:

1. He's too busy building the next generation of tech that HN commenters will be arguing about in a couple months' time.

2. I think Sam Altman (who is addressing the committee) and Ilya are pretty much on the same page on what LLMs do.

dmreedy3y ago

I am reminded of the Mitchell and Webb "Evil Vicars" sketch.

"So, you've thought about eternity for an afternoon, and think you've come to some interesting conclusions?"

shafyy3y ago

The thing is, experts like Ilya Sutskever are so deep in that shit that they are heavily biased (from a tech and social/economic) perspective. Furthermore, many experts are wrong all the time.

I don't think the average HN commenter claims to be better at building these system than an expert. But to criticize, especially critic on economic, social, and political levels, one doesn't need to be an expert on LLMs.

And finally, what the motivation of people like Sam Altman and Elon Musk is should be clear to everbody with a half a brain by now.

NumberWangMan3y ago

I honestly don't question Altman's motivations that much. I think he's blinded a bit by optimism. I also think he's very worried about existential risks, which is a big reason why he's asking for regulation. He's specifically come out and said in his podcast with Lex Friedman that he thinks it's safer to invent AGI now, when we have less computing power, than to wait until we have more computing power and the risk of a fast takeoff is greater, and that's why he's working so hard on AI.

collaborative3y ago

He's just cynical and greedy. Guy has a bunker with an airstrip and is eagerly waiting for the collapse he knows will come if the likes of him get their way

They claim to serve the world, but secretly want the world to serve them. Scummy 101

1 more reply

uh_uh3y ago

srslack above was making technical claims why LLMs can't be "generalized and adaptable intelligence". To make such statements, it surely helps if you are a technical expert at building LLMs.

1 more reply

agentofoblivion3y ago

Maybe I'm not "the average HN commenter" because I am deep in this field, but I think the overlap of what these famous experts know, and what you need to know to make the doomer claims is basically null. And in fact, for most of the technical questions, no one knows.

For example, we don't understand fundamentals like these: - "intelligence", how it relates to computing, what its connections/dependencies to interacting with the physical world are, its limits...etc. - emergence, and in particular: an understanding of how optimizing one task can lead to emergent ability on other tasks - deep learning--what the limits and capabilities are. It's not at all clear that "general intelligence" even exists in the optimization space the parameters operate in.

It's pure speculation on behalf of those like Hinton and Ilya. The only thing we really know is that LLMs have had surprising ability to perform on tasks they weren't explicitly trained for, and even this amount of "emergent ability" is under debate. Like much of deep learning, that's an empirical result, but we have no framework for really understanding it. Extrapolating to doom and gloom scenarios is outrageous.

NumberWangMan3y ago

I'm what you'd call a doomer. Ok, so if it is possible for machines to host general intelligence, my question is, what scenario are you imagining where that ends well for people?

Or are you predicting that machines will just never be able to think, or that it'll happen so far off that we'll all be dead anyway?

agentofoblivion3y ago

My primary argument is that we not only don't have the answers, but don't even really have well posed questions. We're talking about "General Intelligence" as if we even know what that is. Some people, like Yann Lecun, don't think it's even a meaningful concept. We can't even agree which animals are conscious, whatever that means. Because we have so little understanding of the most basic of questions, I think we should really calm down, and not get swept away by totally ridiculous scenarios, like viruses that spread all over the world and kill us all when a certain tone is rang, or a self-fabricating organism with crystal blood cells that blots out the sun, as were recently proposed by Yudkowsky as possible scenarios on Econtalk.

A much more credible threat are humans that get other humans excited, and take damaging action. Yudkowsky said that an international coalition banning AI development, and enforcing it on countries that do not comply (regardless of whether they were part of the agreement) was among the only options left for humanity to save itself. He clarified this meant a willingness to engage in a hot war with a nuclear power to ensure enforcement. I find this sort of thinking a far bigger threat than continuing development on large language models.

To more directly answer your question, I find the following scenarios equally, or more, plausible to Yudkowsky's sound viruses or whatever. 1/ we are no closer to understanding real intelligence as we were 50 years ago, and we won't create an AGI without fundamental breakthroughs, therefore any action taken now on current technology is a waste of time and potential economic value; 2/ we can build something with human-like intelligence, but additional intelligence gains are constrained by the physical world (e.g., like needing to run physical experiments), and therefore the rapid gain of something like "super-intelligence" is not possible, even if human-level intelligence is. 3/ We jointly develop tech to augment our own intelligence with AI systems, so we'll have the same super-human intelligence as autonomous AI systems. 4/ If there are advanced AGIs, there will be a large diversity of them and will at the least compete with and constrain one another.

But, again, these are wild speculations just like the others, and I think the real message is: no one knows anything, and we shouldn't be taking all these voices seriously just because they have some clout in some AI-relevant field, because what's being discussed is far outside the realm of real-life AI systems.

1 more reply

henryfjordan3y ago

So what if they kill us? That's nature, we killed the wooly mammoth.

2 more replies

j / k navigate · click thread line to collapse

0 comments

rdedev3y ago

chaxorOP3y ago

Of course it predict the next token. Every single person on earth knows that so it's not worth repeating at all.

rdedev3y ago

starlust23y ago

You're focusing too much on what the LLM can handle internally. No LLMs aren't good at math, but they understand mathematic concepts and can use a program or tool to perform calculations.

Your argument is the equivalent of saying humans can't do math because they rely on calculators.

In the end what matters is whether the problem is solved, not how it is solved.

(assuming that the how has reasonable costs)

1 more reply

zamnos3y ago

Insufficient for what? Humans regularly fail simple algorithms for small numbers, nevermind large numbers and complex algorithms

glitcher3y ago

> Of course it predict the next token. Every single person on earth knows that so it's not worth repeating at all

What's a token?

visarga3y ago

So tokens are the data type of input and output, and the unit of measure for billing and context size for LLMs.

uh_uh3y ago

Both of these statements can be true:

1. ChatGPT knows the algorithm for adding two numbers of arbitrary magnitude.

2. It often fails to use the algorithm in point 1 and hallucinates the result.

visarga3y ago

> If it really knew the algorithm for adding two numbers it shouldn't be making them in the first place.

You're using it wrong. If you asked a human to do the same operation in under 2 seconds without paper, would the human be more accurate?

On the other hand if you ask for a step by step execution, the LLM can solve it.

tedunangst3y ago

I never told the LLM it needed to answer immediately. It can take its time and give the correct answer. I'd prefer that, even.

ipaddr3y ago

2 seconds? What model are you using?

flangola73y ago

GPT 3.5 is that fast.

catchnear43213y ago

am i bad at authoring inputs?

no, it’s the LLMs that are wrong.

throwuwu3y ago

Create two random 10 digit numbers and sit down and add them up on paper. Write down every bit of inner monologue that you have while doing this or just speak it out loud and record it.

ChatGPT needs to do the same process to solve the same problem. It hasn’t memorized the addition table up to 10 digits and neither have you.

3 more replies

agentultra3y ago

And LLMs will never be able to reason about mathematical objects and proofs. You cannot learn the truth of a statement by reading more tokens.

chaxorOP3y ago

agentultra3y ago

zootreeves3y ago

You know the algorithm for arithmetic. Are you telling me you could sum any large numbers first attempt, without any working and in less than a second 100% of the time?

joaogui13y ago

I don't get why the sudden fixation on time, the model is also spending a ton of compute and energy to do it

jmcgeeney3y ago

I could with access to a computer

starlust23y ago

If you get to use a tool, then so does the LLM.

agentofoblivion3y ago

chaxorOP3y ago

Are you saying that the new mathematical theorems that were proven using GNNs from Deepmind were not useful?

There were two very noteworthy (Perhaps Nobel prize level?) breakthroughs in two completely different fields of mathematics (knot theory and representation theory) by using these systems.

I would certainly not call that "useless", even if they're not quite Nobel-prize-worthy.

Also, "No one uses GATs in systems people discuss right now" ... Transformerare GATs (with PE) ... So, you're incredibly wrong.

agentofoblivion3y ago

You’re drinking from the academic marketing koolaid. Please tell me: where are these methods being applied in AI systems today?

joaogui13y ago

pdonis3y ago

> What do you think about the papers showing mathematical proofs that GNNs (i.e. GATs/transformers) are dynamic programmers and therefore perform algorithmic reasoning?

Do you have a reference?

felipemnoa3y ago

>>What do you think about the papers showing mathematical proofs that GNNs (i.e. GATs/transformers) are dynamic programmers and therefore perform algorithmic reasoning?

Do you mind linking to one of those papers?

uh_uh3y ago

RandomLensman3y ago

That is the wrong discussion. What are their regulatory, social, or economic policy credentials?

uh_uh3y ago

hervature3y ago

uh_uh3y ago

2 reasons:

1. He's too busy building the next generation of tech that HN commenters will be arguing about in a couple months' time.

2. I think Sam Altman (who is addressing the committee) and Ilya are pretty much on the same page on what LLMs do.

dmreedy3y ago

I am reminded of the Mitchell and Webb "Evil Vicars" sketch.

"So, you've thought about eternity for an afternoon, and think you've come to some interesting conclusions?"

shafyy3y ago

The thing is, experts like Ilya Sutskever are so deep in that shit that they are heavily biased (from a tech and social/economic) perspective. Furthermore, many experts are wrong all the time.

And finally, what the motivation of people like Sam Altman and Elon Musk is should be clear to everbody with a half a brain by now.

NumberWangMan3y ago

collaborative3y ago

He's just cynical and greedy. Guy has a bunker with an airstrip and is eagerly waiting for the collapse he knows will come if the likes of him get their way

They claim to serve the world, but secretly want the world to serve them. Scummy 101

1 more reply

uh_uh3y ago

srslack above was making technical claims why LLMs can't be "generalized and adaptable intelligence". To make such statements, it surely helps if you are a technical expert at building LLMs.

1 more reply

agentofoblivion3y ago

NumberWangMan3y ago

I'm what you'd call a doomer. Ok, so if it is possible for machines to host general intelligence, my question is, what scenario are you imagining where that ends well for people?

Or are you predicting that machines will just never be able to think, or that it'll happen so far off that we'll all be dead anyway?

agentofoblivion3y ago

1 more reply

henryfjordan3y ago

So what if they kill us? That's nature, we killed the wooly mammoth.

2 more replies

j / k navigate · click thread line to collapse