I still hold the opinion that we’re going to need to move to spiking neuron (SNN) models in the future to keep growing the networks. Spiking networks require lots of storage, but a lot, lot less compute. They also propagate additional information in the _timing_ of the spikes, not just the values. There are a lot of low-hanging fruit in SNNs and I think people are still trying to copy biological systems too much.
Unfortunately, the main issue with SNNs is that no one has figured out a way to train them as effectively as ANNs.
As someone just trying to learn more about the implications of new research, I find myself resorting to /r/machinelearning, or even twitter threads, to get timely and informed discussions. That's a shame, given what HN sets out to be.
One way or another we need a 1000x increase in efficiency to be able to run these models on edge hardware with full privacy and outside the control of the big corporations.
Funny that Gary Marcus is pleading on Twitter to get Dall-E 2 access in order to formulate his response. He isn't getting access yet. https://twitter.com/GaryMarcus/status/1513215530366234625
That kind of gate-keeping is possible because the costs of training and inferencing these models is too high today.
Is this fundamental, or just a problem with mapping these models to our current serially-bottlenecked compute architectures? Could a move to “hyperconverged infrastructure in-the-small” — striping DRAM or NVMe and tiny RISC cores together on a die, where each CPU gets its own storage (or, you might say, where each small cluster of storage cells has its own tiny CPU attached), such that one stick has millions of independent+concurrent [+slow+memory-constrained] processors — resolve these difficulties?
I'm extremely optimistic about how transformers can recursively speed up progress in multiple areas of science. Transformers are reaching a point where they can demonstrate reasoning abilities within the ballpark of what you might expect from a human. For certain qualities, they far exceed what any human is capable of. One of those areas being depth of knowledge. Transformers (e.g. RETRO) can incorporate a library of knowledge far larger than any human can. Soon we will improve and harness this ability to the point where it may be pointless to create a scientific hypothesis without first "consulting" a large language model that is able to process the entire library of scientific publications.
Gpt-3 type models are very good at selecting for arbitrary qualities from among a list of options. Generating a list of 10 potential answers, then running prompts on the candidates to select for quality, accuracy, style, and so forth resembles the cyclic formulation of ideas in humans. The process used to generate essays and articles - draft, edit, revise, simplify, repeat until satisfied - can be implemented trivially. Those processes will transfer to larger models, and things like RETRO reduce resources by orders of magnitude.
Cognitive architecture seems to be an accurate descriptor of the use of multiple models and the logic layers for many-shot, many model development.
It may not be human level with zero-shot output, but how many humans produce human-level output in their stream-of-consciousness output? The act of consideration, recursing over an idea and refining it, is achievable with these models in a way that humans can debug and tweak cycle to cycle.
Multipass "consideration" and revision methodologies can capture almost any meta-cognitive processes used by humans, whether it's Socratic method or the AP style guide or an arbitrary jumble of rules derived from 4chan posters.
This type of methodology, doing meta-cognitive programming by linking together different models, is awesome. They're constructing low resolution imitations of brains - gpt-3 and BERT and the like can do things that no individual model can achieve. A predicate logic layer can document and explain decision history, and the other modules start to resemble something like the subconscious mind.
I think the next step in NLP will be a drastic innovation on today's learning model.
The Socratic paper is not about “higher intelligence”, it’s about demonstrating useful behaviour purely by connecting several large models via language.
"Stochastic parrot" is a derogatory term and I've never seen anyone who actually understands the technology use that phrase unironically. If anything, it's a shibboleth for bias or ignorance.
Anyone who thinks this REALLY doesn't know how language models work. A properly trained LM will only parrot something back because of lack of diversity in training data. This does happen in some cases (eg, GPL license or something) but those are pretty unique cases.
People on HN seem to think this a lot, but they are just wrong.
It's the first thing anyone learns, and it's easy to do.
It's really unfortunate, but that's why you see so many on HN that dismiss new technologies in ML (especially in NLP, since everyone can understand the output - that's less true in e.g. protein folding)
Overall this term says "limited to the intelligence of a parrot" which is false, models can solve math and coding problems, generate passable art, translate and speak in hundreds of languages and beat us at all board and card games. When was a parrot able to do that?
To me, it is more proof of "stochastic parrot" behavior: model seen most of the available math information in internet, and even with significant computational power, can solve only 58% of elementary school level questions, and they were probably those with clear examples in training data, and can't generalize on those beyond.