Not just small data-sets and limited computer power, but also very few libraries to help you out - although you could download something like xerion from ftp.cs.toronto.edu and join their email list, it was generally a case of retyping examples or implementing algorithms from printed textbooks. And it was all in C, presumably for performance reasons, while most of the symbolic AI folks came from Lisp or Prolog backgrounds.
* First off, there was no major issue with computation. Adding more units or more layers isn't that much more expensive. Vanishing gradients and poor regulation were a challenge and meant that increasing network size rarely improved performance empirically. This was a well known challenge up until the mid/later 2000s.
* There was a major 'AI winter' going on in the 90s after neural networks failed to live up to their hype in the 80s. Computer vision and NLP researchers - fields that have most famously recently been benefiting from huge neural networks - largely abandoned neural networks in the 90s. My undergrad PI at a computer vision lab told me in no uncertain terms he had no interest in neural networks, but was happy to support my interest in them. My grad school advisors had similar takes.
* A lot of the problems that did benefit from neural networks in the 90s/early 2000s just needed a non-linear model, but did not need huge neural networks to do well. You can very roughly consider the first layer of a 2-layer neural network to be a series of classifiers, each tackling a different aspect of the problem (e.g. the first neuron of a spam model may activate if you have never received an email from the sender, the second if the sender is tagged as spam a lot, etc). These kinds of problems didn't need deep, large networks, and 10-50 neuron 2-layer networks were often more than enough to fully capture the complexity of the problem. Nowadays many practitioners would throw a GBM at problems like that and can get away with O(100) shallow trees, which isn't very different from what the small neural networks were doing back then.
Combined, what this means from a rough perspective, is that the researchers who really could have used larger neural networks abandoned them, and almost everyone else was fine with the small networks that were readily available. The recent surge in AI is being fueled by smarter approaches and more computation, but arguably much more importantly from a ton more data that the internet made available. That last point is the real story IMO.
https://direct.mit.edu/books/book/4424/Parallel-Distributed-...
I found the two Rumelhart & McClelland books, just a single copy on the shelf at Cody's Books, soon after publication. I worked through the examples, and was immediately convinced that this low-level approach was a way forward.
For some reason, none of the stressed out Comp Sci professors wanted to listen to a weirdo undergraduate, a lousy student.
I'm glad I was there at a reboot of AI, but my timing was lousy.
[1] Residual nets (2015): https://arxiv.org/abs/1512.03385
[2] Batch normalization (2015): https://arxiv.org/abs/1502.03167
LeCun, Bottou, et al (2002) in "Efficient Backprop" described techniques for improving backprop algorithms.
If anyone could have been at the forefront of this wave, it could've been him.
And now the landscape has utterly changed and no one is even convinced they need "AGI". Just a continually refined LLM hooked up to tools and other endpoints.
Why does DOOM and clever programming on a NeXT imply what you assert?
Now the last major innovation in the space came from epic games / unreal engine.
Once that little detail gets solved, who’s to say that “refined LLM hooked up to tools and other specialized LLMs” won’t be it? Sure could be.
But it also could not be! AGI has been right around the corner my whole life and even longer. 50 years at least. Every new AI discovery is on the verge of AGI until a few years later it hits a wall. Research is hard like that.
Talk about having "fuck you" money but just not willing to say "fuck you".
“Data” was so much smaller then. I had a minuscule hard drive if any, no internet, 8 bit graphics but nothing photo realistic, glimpses of windows and os2, and barely a mouse. In retrospect, it was like embedded programming.
System X, in 2004 was the 7th most powerful computer in the world. It was 1100 PowerPC 970 Macs with 2200 cores and claimed an Rmax of 12k GFlops. https://www.top500.org/system/173736/
A M1 MacBook Air hits 900 Gflops ( https://news.ycombinator.com/item?id=26333369 ). A dozen MacBook Airs - about what you'd expect in a grade school computer lab - is more powerful than the 7th most powerful computer system in the world 2 decades ago.
I've also noticed this, and want to ask: who are these people? Do they not have (~80-billion-neuron) brains? (And that's neurons, with by most estimates thousands of synapses each; so you're actually talking on the order of tens to hundreds of trillions of neural network parameters before you reach parity with biological examples.)
An other factor was that SVM were all the rage back then, because they had nice math and fitted the computational resources of a contemporary workstation.
The oldest nn I was exposed to was a image upscaler (mostly used for deinterlacing) called nnedi, which goes back to ~2007: http://web.archive.org/web/20130127123511/http://forum.doom9...
nnedi3 is actually quite respectable today
https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d...
https://en.wikipedia.org/wiki/Universal_approximation_theore...
Probably too many low-probability events chained together.
But I think they discovered most of the interesting things that small networks can do? For example, TD-Gammon from 1992: https://en.wikipedia.org/wiki/TD-Gammon .
You got a BASIC code snippet for training and inference and mos of all, there is an explicit use-case for digital filter approximation! At the time NN were treated as a tool among other ones, not a "answer-to-everything" type of thing.
I know Deep Learning opened new possibilities but a lot of time CNN/RNN/Transformers are definitely not needed: working on the data instead and using "linear" models can go really far (my 2 cents)
[1]: https://www.dspguide.com [2]: http://www.dspguide.com/ch26.htm
I wonder how LLM’s avoid that?
Does anyone have suggestions (and links to code!) for what would be a cool demo? I’m thinking of a haar classifier to show some object recognition/face detection, but would appreciate more options!
edit yes, almost certainly Neural Networks for Pattern Recognition (1995) thx!
The random forest guy you mean is/was Leo Breiman. His student Adele Cutler deserves some of the credit there too.
This was a thing in early 2000s/late 90's.
There really wasn't the compute power around at the time, and as others have pointed out there wasn't the training data, or the cameras.
One side, holding a pipe, 'well actually, back in 1954, I put together an analog variant of a neuron perceptron built out of old speaker cables and car parts, strung it across the living room and it could say 10 words and fetch my slippers'. 'Really', 'Yes, Indubitably'.
The other side, It's all, 'REEEEEEEEEE'
"Elmer and Elsie, or the "tortoises" as they were known, were constructed between 1948 and 1949 using war surplus materials and old alarm clocks."
"The robots were designed to show the interaction between both light-sensitive and touch-sensitive control mechanisms which were basically two nerve cells with visual and tactile inputs."
I meant to make relationship between Psychology and Machine Learning.
Psychology, the study of the mind, with questionable scientific methods and a replication problem.
And
Machine Learning, (that is taking the mind as a model), with questionable scientific methods, and replication problem, and the addition of corporate hype machines.
Often in last few months we stand in awe of what AI achieves, but it produces questionable results, and has a lot of problems. Machine learning is worshiped.
And yet often in last few months, posts on Psychology is railed on and called a field full of con-men and BS-Artists.
Why the duality? Both are young fields and stretching. Rapidly making progress, hitting dead ends, and changing course. The scientific method isn't a strait path. But Psychology doesn't seem to be given much leeway to make errors and course correct.
I just find it hitting a peak right now, because the study of the Human Mind (wet net) and Machine Mind (electric net). Seem to be hitting a lot of the same issues. There are so many parallels in how they are spoken of, so many common problems and how they are framed within each field.
Wonder how long until we just openly talk about a field of Psychology of Machines, where we use the same tools to try and understand what the Neural Nets are thinking.