undefined | Better HN

0 pointscircuit103y ago0 comments

Well hardware and parameter count are scaling exponentially, so it seems very feasible that it could happen very soon. Of course it's possible that we'll hit a wall somewhere but it seems that just scaling current models up could be enough to get to the point where they can self-improve or gain more compute for themselves

0 comments

matthewfcarlson3y ago

We've been out of exponential territory for a few years now (https://en.wikipedia.org/wiki/Moore%27s_law). Yes, we are still bounding forward at a crazy pace, but I think the pace is slowing down somewhat

rolisz3y ago

Hardware isn't scaling exponentially anymore (Moore's law is dead). Parameter count isn't really scaling exponentially anymore either. GPT3 had 175b parameters 3 years ago. There are some attempts at training 1 trillion parameter models, but they are not better than GPT3.

lhl3y ago

While I agree we probably aren't getting exponentially increasing parameter counts (GPT4 is by all accounts 1T paramaters and of course, it is significantly better than GPT3) we are still seeing lots of improvements - 3.5 is much better than 3, based "just" on InstructGPT/RLHF training. Models are getting better as well - LLaMA 30B beats/matches GPT-3 on raw eval benchmarks at 1/6 the parameter count.

We're also seeing lots of optimizations with new models (RoPE/RoPER embedding, Swish/GeLU activation, Flash Attention, etc) but I think some the most interesting gains we'll be seeing soon is with inference-optimized training (-70% parameters for +100% compute) [1] combined with sparsity pruning (-50% size w/ almost no loss in accuracy) [2] and quantization [3] which will lead to significantly smaller models performing well.

[1] https://www.harmdevries.com/post/model-size-vs-compute-overh...

[2] https://arxiv.org/abs/2301.00774

[3] https://openreview.net/forum?id=tcbBPnfwxS

JohnFen3y ago

What I doubt is that the current approach can lead to AGI at all, regardless of scale. But I'm just speculating along with everyone else. We will see.

blibble3y ago

as moores law is dead it's hard to see more exponential scaling

they're also not going to find another 2, 4, 8, 16 ... internets worth of content to parasitise

circuit10OP3y ago

It’s still exponential, but a little slower. (edit: wait, is that still exponential if it slows down?) Anyway we only need to get to human level (or maybe a bit less) and we’re not that far off (maybe 10 or 20 years at current rates of progress?)

Not all types of AI need external training data, you can train on how effectively a goal is achieved

blibble3y ago

> maybe 10 or 20 years at current rates of progress?

how can the rate be maintained?

exponential chip scaling is over, and they've parasited, sorry, trained on the entirety of accessible human knowledge

the rate may drop to zero

the exponent may even go negative once LLMs start ingesting their own hallucinations

visarga3y ago

> they've parasited, sorry, trained on the entirety of accessible human knowledge

I see this as a new development in language, used to be restricted to meat neural nets and books, now it can also be consumed and created by LLMs. A new self replication path was opened for language. Language is an evolutionary system, it's alive. Without Language humans are mere shadows of what they can be. Language turns a baby into a modern adult, and a randomly initialised neural net into chatGPT.

The magic was always in the language, not in the neural network. We should care more about the size and quality of the training dataset than the model. Any model would do, all model tweaks are more or less the same. But the data, that is the origin of all the abilities. But we cannot own abilities, it should be fair game to learn abilities and facts even from copyrighted data. Novel and creative training examples should not be reproduced by LLMs, but mere facts and skills should be general enough not to be owned by anyone.

1 more reply

circuit10OP3y ago

The training data thing is a problem mainly for LLMs, so it might be a limitation if we purely scale up LLMs but there are other types of AI around too

Chip scaling still seems to be going pretty fast, and we may discover new ways to make better use of the chips we currently have, like better methods of quantisation, or just using more of them, which could get us just far enough to reach the self improvement threshold

So we could end up hitting a wall with chip scaling or something but I don’t think it’s that likely

1 more reply

j / k navigate · click thread line to collapse

0 comments

matthewfcarlson3y ago

rolisz3y ago

lhl3y ago

[1] https://www.harmdevries.com/post/model-size-vs-compute-overh...

[2] https://arxiv.org/abs/2301.00774

[3] https://openreview.net/forum?id=tcbBPnfwxS

JohnFen3y ago

What I doubt is that the current approach can lead to AGI at all, regardless of scale. But I'm just speculating along with everyone else. We will see.

blibble3y ago

as moores law is dead it's hard to see more exponential scaling

they're also not going to find another 2, 4, 8, 16 ... internets worth of content to parasitise

circuit10OP3y ago

Not all types of AI need external training data, you can train on how effectively a goal is achieved

blibble3y ago

> maybe 10 or 20 years at current rates of progress?

how can the rate be maintained?

exponential chip scaling is over, and they've parasited, sorry, trained on the entirety of accessible human knowledge

the rate may drop to zero

the exponent may even go negative once LLMs start ingesting their own hallucinations

visarga3y ago

> they've parasited, sorry, trained on the entirety of accessible human knowledge

1 more reply

circuit10OP3y ago

The training data thing is a problem mainly for LLMs, so it might be a limitation if we purely scale up LLMs but there are other types of AI around too

So we could end up hitting a wall with chip scaling or something but I don’t think it’s that likely

1 more reply

j / k navigate · click thread line to collapse