Neural networks in the 1990s (opens in new tab)

(twitter.com)

131 pointsjrott2y ago84 comments

84 comments

bvan2y ago

Yikes, I’m old. There was a lot of NN work and a lot of books available on NN’s back in the mid and late 90’s. ‘Soft computing’ was the all-encompassing term for NN, genetic algorithms, AI, expert systems, fuzzy logic, ALife and all sorts of nascent computational areas back then. I still have a bunch of issues to the monthly AI Expert magazine one could buy at a decent magazine stand. Small data-sets were definitely a limiting factor as well as limited computer power. I remember certain applied fields did embrace NN’s early on, like some civil engineers and hydrologists, who were finding some use for them. At the U of Toronto, I considered doing a PhD with a biologist who was using them to investigate vision (and got help from Hinton). Physiology was one area where you could generate “long” time-series in a relatively short period of time. Those were still the days when Intel 286/386/486 and lowly Pentium machines were still common currency. Computer scientists at the time didn’t yet have clear break-through commercial applications which would have attracted crazy funding. A lot of theory, little real actions.

p_l2y ago

Let's not forget that especially early 1990s are still in shock from AI Winter and there's essentially no funding.

m-i-l2y ago

>"Small data-sets were definitely a limiting factor as well as limited computer power."

Not just small data-sets and limited computer power, but also very few libraries to help you out - although you could download something like xerion from ftp.cs.toronto.edu and join their email list, it was generally a case of retyping examples or implementing algorithms from printed textbooks. And it was all in C, presumably for performance reasons, while most of the symbolic AI folks came from Lisp or Prolog backgrounds.

rm9992y ago

While my experience is not from the 90s, I think I can speak to some of why this is. For some context, I first got into neural networks in the early 2000s during my undergrad research, and my first job (mid 2000s) was at an early pioneer that developed their V1 neural network models in the 90s (there is a good chance models I evolved from those V1 models influenced decisions that impacted you, however small).

* First off, there was no major issue with computation. Adding more units or more layers isn't that much more expensive. Vanishing gradients and poor regulation were a challenge and meant that increasing network size rarely improved performance empirically. This was a well known challenge up until the mid/later 2000s.

* There was a major 'AI winter' going on in the 90s after neural networks failed to live up to their hype in the 80s. Computer vision and NLP researchers - fields that have most famously recently been benefiting from huge neural networks - largely abandoned neural networks in the 90s. My undergrad PI at a computer vision lab told me in no uncertain terms he had no interest in neural networks, but was happy to support my interest in them. My grad school advisors had similar takes.

* A lot of the problems that did benefit from neural networks in the 90s/early 2000s just needed a non-linear model, but did not need huge neural networks to do well. You can very roughly consider the first layer of a 2-layer neural network to be a series of classifiers, each tackling a different aspect of the problem (e.g. the first neuron of a spam model may activate if you have never received an email from the sender, the second if the sender is tagged as spam a lot, etc). These kinds of problems didn't need deep, large networks, and 10-50 neuron 2-layer networks were often more than enough to fully capture the complexity of the problem. Nowadays many practitioners would throw a GBM at problems like that and can get away with O(100) shallow trees, which isn't very different from what the small neural networks were doing back then.

Combined, what this means from a rough perspective, is that the researchers who really could have used larger neural networks abandoned them, and almost everyone else was fine with the small networks that were readily available. The recent surge in AI is being fueled by smarter approaches and more computation, but arguably much more importantly from a ton more data that the internet made available. That last point is the real story IMO.

low_tech_love2y ago

The funny thing is that the authors of the paper he linked actually answer his question in the first paragraph, when they say that the input dataset needs to be significantly larger than the number of weights to achieve good generalisation, but there is usually not enough data available.

MilStdJunkie2y ago

Data, data, data, data. 1990s don't have wikipedia, Youtube, megapixel cameras every which where, every single adult human hooked up to a sensor package 24 hours a day, and who knows what else. I know as a 1990s guy I would never have imagined the amount of data we would eventually all throw up into the ether even ten years later, to say nothing of today. Without that corpus . .

reverius422y ago

And none of those examples except Wikipedia were used to train the various LLMs. I wonder how much better multi-modal models are going to get if they start incorporating the 24/7 sensor data from billions of people.

ldjkfkdsjnv2y ago

I cant wait for the time when someone trains a multimodal LLM on all of youtube

1 more reply

moomoo112y ago

encyclopedia Britannia existed. I came to USA in late 90s and my school had the CD set.

CorrectHorseBat2y ago

Wikipedia is ~100 bigger than the Encyclopædia Britannica

https://en.m.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

2 more replies

signa112y ago

gpus don't forget the gpus ! compute was too slow for the task at hand.

robg2y ago

Highly recommend the exercises in Rumelhart and McClelland - Parallel Distributed Processing: Explorations in the Microstructure of Cognition from 1986-1987 (two volumes)

https://direct.mit.edu/books/book/4424/Parallel-Distributed-...

watersb2y ago

I was studying computer science and AI in 1987-1990; I didn't know it was the deepest, darkest pit of AI research despair.

I found the two Rumelhart & McClelland books, just a single copy on the shelf at Cody's Books, soon after publication. I worked through the examples, and was immediately convinced that this low-level approach was a way forward.

For some reason, none of the stressed out Comp Sci professors wanted to listen to a weirdo undergraduate, a lousy student.

I'm glad I was there at a reboot of AI, but my timing was lousy.

dunefox2y ago

Does it hold up for today?

radq2y ago

We were missing two architecture patterns that were needed to get deeper nets to converge: residual nets [1] which solved gradient propagation, and batch normalization [2] which solved initialization.

[1] Residual nets (2015): https://arxiv.org/abs/1512.03385

[2] Batch normalization (2015): https://arxiv.org/abs/1502.03167

sigmoid102y ago

Also quasi-linear activation functions (prevent vanishing gradients), tons of regularisation (e.g convolutions) and more adaptive gradient descent (faster convergence). I've still met people in the early 2010s who tried to make neural networks work using only a few dozen units. Academia is pretty slow. What people also forget is that libraries like pytorch or tensorflow simply didn't exist. I wrote my own neural network stacks complete with backpropagation from scratch in c++ back then.

bravura2y ago

LeCun et al (1989) had backprop working for digit recognition.

LeCun, Bottou, et al (2002) in "Efficient Backprop" described techniques for improving backprop algorithms.

1 more reply

hzay2y ago

Yes, but the tweet is talking about single layer networks!

arketyp2y ago

AlexNet predated that though.

Solvency2y ago

Do you think Carmack, deep down, wonders why he let himself miss the boat on the LLM revolution? He spent golden years toiling away in Facebook, only to finally announce he was quitting to focus on AGI... only for the world to be taken by storm by transformers, GPT, Midjourney, etc.

If anyone could have been at the forefront of this wave, it could've been him.

And now the landscape has utterly changed and no one is even convinced they need "AGI". Just a continually refined LLM hooked up to tools and other endpoints.

jjtheblunt2y ago

> If anyone could have been at the forefront of this wave, it could've been him.

Why does DOOM and clever programming on a NeXT imply what you assert?

arketyp2y ago

Indeed. Carmack wasn't simply betting on VR; more specifically, he was working on VR for mobile devices, where his expertise was a good fit.

lyu072822y ago

I sometimes wonder what could've happened if he stuck to the 3d graphics space. He once was a great innovator, wolfenstein, doom then quake, he did some innovation in Rage / id Tech 5 with infinite texture streaming but it was full of technical issues. Ultimately around doom 3 / rage, it felt like id software wasn't anything special anymore, they were brought out and then he left Id.

Now the last major innovation in the space came from epic games / unreal engine.

WithinReason2y ago

He did his best work when he wrote the entire engine alone. That's no longer possible. You can however plausibly invent AGI alone. He said that an AGI implementation is likely simple (meaning not complex), and I agree. The difficulty is in the method not lines of code, so it's work that fits him.

Swizec2y ago

The biggest problem with AGI is definitional. How will we know when we see it?

Once that little detail gets solved, who’s to say that “refined LLM hooked up to tools and other specialized LLMs” won’t be it? Sure could be.

But it also could not be! AGI has been right around the corner my whole life and even longer. 50 years at least. Every new AI discovery is on the verge of AGI until a few years later it hits a wall. Research is hard like that.

jojobas2y ago

With everything Carmack achieved two things dumbfounded me: his sycophantic relationship with Jobs (who apparently almost succeeded in getting him to postpone his wedding so that he could appear on some Apple event) and that he would go near Facebook at all.

Talk about having "fuck you" money but just not willing to say "fuck you".

moomoo112y ago

maybe he gets to go to Mars and set up a research facility there.

waivej2y ago

I got exposed to programming neural networks in the early 90s. It solved certain problems incredibly fast like the traveling salesman problem. I was tinkering with 3D graphics and fractals and map pathfinding. Though it didn’t occur to me how much more power was there.

“Data” was so much smaller then. I had a minuscule hard drive if any, no internet, 8 bit graphics but nothing photo realistic, glimpses of windows and os2, and barely a mouse. In retrospect, it was like embedded programming.

WiSaGaN2y ago

I believe the issue was not a lack of computational power, but rather that people at the time didn't think large models with many parameters would effect meaningful change. This was even true three years ago, albeit on a different scale. As Ilya Sutskever expressed, people were not convinced there was still room to increase the scale. For the status quo to shift, two things could happen: a substantial reduction in computing costs, making large-scale experiments less a matter of conviction and more a matter of course; or the emergence of individuals with the resources and conviction to undertake larger experiments.

Palomides2y ago

is that really true? a modern high end GPU has more computing power than the top 20 supercomputers of the year 2000 added together

shagie2y ago

My favorite comparison for the accessibility of power is looking at a weird computer in the top 500 from a while back.

System X, in 2004 was the 7th most powerful computer in the world. It was 1100 PowerPC 970 Macs with 2200 cores and claimed an Rmax of 12k GFlops. https://www.top500.org/system/173736/

A M1 MacBook Air hits 900 Gflops ( https://news.ycombinator.com/item?id=26333369 ). A dozen MacBook Airs - about what you'd expect in a grade school computer lab - is more powerful than the 7th most powerful computer system in the world 2 decades ago.

1 more reply

WiSaGaN2y ago

Computers are undoubtedly more powerful now than they were in the 90s. Although computing capabilities of the 90s seem weak compared to today's standards, they were not so inadequate that we couldn't train and run a network comprising thousands of parameters. I vividly recall the early 2000s when I was in college. Neural networks were seen as a sort of "fringe" technology in a series of statistics courses. We were mostly shown examples with 6 or 12 neurons, and nobody mentioned the possibility of scaling up to hundreds of neurons. Around that time, we already had sophisticated games like The Elder Scrolls III. We could have easily scaled up the network size by at least an order of magnitude at home, not to mention the capabilities that big companies possessed at that time.

2 more replies

a13692099932y ago

> but rather that people at the time didn't think large models with many parameters would effect meaningful change. This was even true three years ago, albeit on a different scale.

I've also noticed this, and want to ask: who are these people? Do they not have (~80-billion-neuron) brains? (And that's neurons, with by most estimates thousands of synapses each; so you're actually talking on the order of tens to hundreds of trillions of neural network parameters before you reach parity with biological examples.)

marmakoide2y ago

In the early 2000's, it was believed that the topology of a neuron network was a major factor to get it to work well, and that throwing more neurons and computing power alone would not suffice. In a sense it was not wrong : convolutional nets were an early example of neuron network topology that enforced translation invariance while being parsimonious in tunable parameters.

An other factor was that SVM were all the rage back then, because they had nice math and fitted the computational resources of a contemporary workstation.

kristopolous2y ago

Did you post something nearly identical to this before? I feel like I read it before.

WiSaGaN2y ago

Are you referring to other threads? No. However, I wouldn't be surprised if other people developed similar beliefs following recent advances in large language models (LLMs). Of course, we wouldn't achieve GPT-4 level results using only technology available before 2020, but with sufficient data and computational power, we could have accomplished much more than what was generally believed to be possible in the machine learning field at the time.

1 more reply

brucethemoose22y ago

1990s is beyond my time horizon.

The oldest nn I was exposed to was a image upscaler (mostly used for deinterlacing) called nnedi, which goes back to ~2007: http://web.archive.org/web/20130127123511/http://forum.doom9...

nnedi3 is actually quite respectable today

version_five2y ago

I think it's more that modern automatic differentiation abstractions weren't well known to researchers. From what I remember, even in the early 2000s when I went to school, backpropagation was basically hand coded.

mlajtos2y ago

Yes, everything was hand coded (no autodiff) & Hinton loves Matlab.

brrrrrm2y ago

I doubt it was obvious scaling up would magically work. I suspect the experiments were limited for analytic simplicity rather than computational.

pavon2y ago

The only ML that I ever did was a single undergrad NN class around ~2001. That was a long time ago, but I vaguely remember being taught at that time that adding more nodes rarely helped, that you were just going to overfit to your dataset and have worse results on items outside the dataset, or worse end up with a completely degenerate NN - eg that best practice was to use the minimum number of nodes that would do the job.

peterfirefly2y ago

"Optimal Brain Damage", 1989:

https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d...

mhh__2y ago

The modern slow-but-scales way of coding them also wasn't prevalent

Solvency2y ago

Why couldn't mathematical proofs/models have predicted or revealed this to be the case back then?

nurbsnn2y ago

On the contrary, there was a mathematical proof that one-hidden-layer neural network with nonlinearity is enough to represent any function. Using more than 1 hidden seemed a waste.

https://en.wikipedia.org/wiki/Universal_approximation_theore...

actionfromafar2y ago

How??

yobbo2y ago

To experiment with SGD and back-propagation with 4096x4096 32-bit matrices, you would need a machine with hundreds of megabytes of ram in the 90s. In terms of software, you would need to be comfortable with C/C++ or maybe Fortran to be able to experiment quickly enough to land on effective hyper parameters.

Probably too many low-probability events chained together.

But I think they discovered most of the interesting things that small networks can do? For example, TD-Gammon from 1992: https://en.wikipedia.org/wiki/TD-Gammon .

hax0ron32y ago

The 1990s gamer in me gets a kick out of seeing John Carmack and Tim Sweeney talk to each other.

ttul2y ago

In 1999, our “computer vision” guy - a masters student - struggled mightily to recognize very simple things in a video stream from a UAV. Today, we would take this for granted. But back then, the computation was for all intents and purposes entirely non-existent. At best he was hoping to apply an edge detection kernel maybe once every two seconds and see if he could identify some lines and arcs and then hand code some logic to recognize things.

Ono-Sendai2y ago

What? There were pentium 2 and 3 machines back then that could certainly do more than a edge detection kernel every 2 seconds. Or do you mean on an embedded CPU?

bandrami2y ago

Software is iterative. At least when I was studying in the mid 90s people had really only just gotten the idea to do a Fourier transform of an image and look for high frequencies to indicate borders. Add ~3 decades of each generation of grad student doing slightly better than the last one.

ttul2y ago

Indeed, this had to happen onboard a small airplane. Should have clarified that.

rmnclmnt2y ago

Yeah good times! The other day I was browsing for the 999th time Steve Smith's book "The Scientist and Engineer's Guide to Digital Signal Processing"[1] and stumbled upon the chapter on NN[2]: I remember ready this when I was a student I could make sense of it and why it worked, but reading it 15 years later I find it is explained so clearly compared to other resources! (maybe experience is playing in my favor too)

You got a BASIC code snippet for training and inference and mos of all, there is an explicit use-case for digital filter approximation! At the time NN were treated as a tool among other ones, not a "answer-to-everything" type of thing.

I know Deep Learning opened new possibilities but a lot of time CNN/RNN/Transformers are definitely not needed: working on the data instead and using "linear" models can go really far (my 2 cents)

[1]: https://www.dspguide.com [2]: http://www.dspguide.com/ch26.htm

29athrowaway2y ago

In the early 90s, not only there was lower computing power but there was not that much internet connectivity, low bandwidth, no digital cameras so not that many images online, and the images the images you had were low res and low color depth. Internet giants didn't yet exist and didn't yet collect massive amounts of data.

plun92y ago

1995 car automatic transmission with neural network: https://www.sciencedirect.com/science/article/abs/pii/038943...

Ono-Sendai2y ago

I personally made a quake 2 bot using neural networks in 1999, I think it had several hundred neurons and several thousand 'synapses' (parameters). At the time that felt like a lot of parameters. Computation wasn't much of a limit though, I could run several NNs faster than realtime.

2sk212y ago

I have one of the early PhDs in neural networks (graduated in 1992). However my work was analytical - I was able to prove a couple of theorems about the backpropagation. I just needed a simple implementation to prove that my ideas worked so I wrote my code from scratch in C.

bilsbie2y ago

I remember people telling me you would just get overfitting if you made the network too big.

I wonder how LLM’s avoid that?

amichal2y ago

I followed a scientific American article in 1992 as a high schooler and got digit recognition and basic arithmetic working on a 386. What the popsci press said at the time was that we were limited by memory bandwidth (cache size), training data, and to some extend pointer-chasing (and other inefficencies) in graph algos

gattilorenz2y ago

On the topic of AI history, I would like to set up a demo of old AI and/or general CS research on late 90s/early 00s Sun Ultra machines.

Does anyone have suggestions (and links to code!) for what would be a cool demo? I’m thinking of a haar classifier to show some object recognition/face detection, but would appreciate more options!

mistrial92y ago

definitely saw NN code in the 1990s ; I recall a hardback book with mostly red cover.. not sure of the title.. Prominent and rigorous code implementations were associated with MIT at that time (the Random Forest guy was at Berkeley in the stats department)

edit yes, almost certainly Neural Networks for Pattern Recognition (1995) thx!

huitzitziltzin2y ago

The book “neural networks and pattern recognition” by bishop dates to 1996 and has a red cover, at least in its current softcover iteration.

The random forest guy you mean is/was Leo Breiman. His student Adele Cutler deserves some of the credit there too.

mjan226402y ago

In 2012 were published results of a vision processing in the brain research, that (among other things, like the retina compressing the input) figured out that visual cortex uses convolution. That got mimicked and was a breakthrough in image recognition NN, which sparked life into the whole field.

r13a2y ago

Would you mind giving a reference to the paper? A quick googling didn't brought anything.

anthk2y ago

https://tldp.org/HOWTO/AI-Alife-HOWTO-1.html

This was a thing in early 2000s/late 90's.

LarsDu882y ago

Lol, Carmack's like -- I could've gotten a 4096 NN running on my early 90s NextCube dev rig, you neural networking researcher peasants!

rwmj2y ago

I knew someone in the early 90s who was making a neural network on a chip for his PhD. The chip fitted 1 neuron. Yes he might have used float16 to cram more in but those techniques were not known at the time.

There really wasn't the compute power around at the time, and as others have pointed out there wasn't the training data, or the cameras.

FrustratedMonky2y ago

Reading through the twitter thread, and these comments. It reminds me of all of the back and forth when HN discusses Psychology.

One side, holding a pipe, 'well actually, back in 1954, I put together an analog variant of a neuron perceptron built out of old speaker cables and car parts, strung it across the living room and it could say 10 words and fetch my slippers'. 'Really', 'Yes, Indubitably'.

The other side, It's all, 'REEEEEEEEEE'

peterfirefly2y ago

https://en.wikipedia.org/wiki/Elmer_and_Elsie_(robots)

"Elmer and Elsie, or the "tortoises" as they were known, were constructed between 1948 and 1949 using war surplus materials and old alarm clocks."

"The robots were designed to show the interaction between both light-sensitive and touch-sensitive control mechanisms which were basically two nerve cells with visual and tactile inputs."

FrustratedMonky2y ago

A very bad comment, that failed to make a point, and that wasn't very humorous.

I meant to make relationship between Psychology and Machine Learning.

Psychology, the study of the mind, with questionable scientific methods and a replication problem.

And

Machine Learning, (that is taking the mind as a model), with questionable scientific methods, and replication problem, and the addition of corporate hype machines.

Often in last few months we stand in awe of what AI achieves, but it produces questionable results, and has a lot of problems. Machine learning is worshiped.

And yet often in last few months, posts on Psychology is railed on and called a field full of con-men and BS-Artists.

Why the duality? Both are young fields and stretching. Rapidly making progress, hitting dead ends, and changing course. The scientific method isn't a strait path. But Psychology doesn't seem to be given much leeway to make errors and course correct.

I just find it hitting a peak right now, because the study of the Human Mind (wet net) and Machine Mind (electric net). Seem to be hitting a lot of the same issues. There are so many parallels in how they are spoken of, so many common problems and how they are framed within each field.

Wonder how long until we just openly talk about a field of Psychology of Machines, where we use the same tools to try and understand what the Neural Nets are thinking.

j / k navigate · click thread line to collapse

84 comments

bvan2y ago

p_l2y ago

Let's not forget that especially early 1990s are still in shock from AI Winter and there's essentially no funding.

m-i-l2y ago

>"Small data-sets were definitely a limiting factor as well as limited computer power."

rm9992y ago

low_tech_love2y ago

MilStdJunkie2y ago

reverius422y ago

ldjkfkdsjnv2y ago

I cant wait for the time when someone trains a multimodal LLM on all of youtube

1 more reply

moomoo112y ago

encyclopedia Britannia existed. I came to USA in late 90s and my school had the CD set.

CorrectHorseBat2y ago

Wikipedia is ~100 bigger than the Encyclopædia Britannica

https://en.m.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

2 more replies

signa112y ago

gpus don't forget the gpus ! compute was too slow for the task at hand.

robg2y ago

Highly recommend the exercises in Rumelhart and McClelland - Parallel Distributed Processing: Explorations in the Microstructure of Cognition from 1986-1987 (two volumes)

https://direct.mit.edu/books/book/4424/Parallel-Distributed-...

watersb2y ago

I was studying computer science and AI in 1987-1990; I didn't know it was the deepest, darkest pit of AI research despair.

For some reason, none of the stressed out Comp Sci professors wanted to listen to a weirdo undergraduate, a lousy student.

I'm glad I was there at a reboot of AI, but my timing was lousy.

dunefox2y ago

Does it hold up for today?

radq2y ago

[1] Residual nets (2015): https://arxiv.org/abs/1512.03385

[2] Batch normalization (2015): https://arxiv.org/abs/1502.03167

sigmoid102y ago

bravura2y ago

LeCun et al (1989) had backprop working for digit recognition.

LeCun, Bottou, et al (2002) in "Efficient Backprop" described techniques for improving backprop algorithms.

1 more reply

hzay2y ago

Yes, but the tweet is talking about single layer networks!

arketyp2y ago

AlexNet predated that though.

Solvency2y ago

If anyone could have been at the forefront of this wave, it could've been him.

And now the landscape has utterly changed and no one is even convinced they need "AGI". Just a continually refined LLM hooked up to tools and other endpoints.

jjtheblunt2y ago

> If anyone could have been at the forefront of this wave, it could've been him.

Why does DOOM and clever programming on a NeXT imply what you assert?

arketyp2y ago

Indeed. Carmack wasn't simply betting on VR; more specifically, he was working on VR for mobile devices, where his expertise was a good fit.

lyu072822y ago

Now the last major innovation in the space came from epic games / unreal engine.

WithinReason2y ago

Swizec2y ago

The biggest problem with AGI is definitional. How will we know when we see it?

Once that little detail gets solved, who’s to say that “refined LLM hooked up to tools and other specialized LLMs” won’t be it? Sure could be.

jojobas2y ago

Talk about having "fuck you" money but just not willing to say "fuck you".

moomoo112y ago

maybe he gets to go to Mars and set up a research facility there.

waivej2y ago

WiSaGaN2y ago

Palomides2y ago

is that really true? a modern high end GPU has more computing power than the top 20 supercomputers of the year 2000 added together

shagie2y ago

My favorite comparison for the accessibility of power is looking at a weird computer in the top 500 from a while back.

System X, in 2004 was the 7th most powerful computer in the world. It was 1100 PowerPC 970 Macs with 2200 cores and claimed an Rmax of 12k GFlops. https://www.top500.org/system/173736/

1 more reply

WiSaGaN2y ago

2 more replies

a13692099932y ago

> but rather that people at the time didn't think large models with many parameters would effect meaningful change. This was even true three years ago, albeit on a different scale.

marmakoide2y ago

An other factor was that SVM were all the rage back then, because they had nice math and fitted the computational resources of a contemporary workstation.

kristopolous2y ago

Did you post something nearly identical to this before? I feel like I read it before.

WiSaGaN2y ago

1 more reply

brucethemoose22y ago

1990s is beyond my time horizon.

The oldest nn I was exposed to was a image upscaler (mostly used for deinterlacing) called nnedi, which goes back to ~2007: http://web.archive.org/web/20130127123511/http://forum.doom9...

nnedi3 is actually quite respectable today

version_five2y ago

mlajtos2y ago

Yes, everything was hand coded (no autodiff) & Hinton loves Matlab.

brrrrrm2y ago

I doubt it was obvious scaling up would magically work. I suspect the experiments were limited for analytic simplicity rather than computational.

pavon2y ago

peterfirefly2y ago

"Optimal Brain Damage", 1989:

https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d...

mhh__2y ago

The modern slow-but-scales way of coding them also wasn't prevalent

Solvency2y ago

Why couldn't mathematical proofs/models have predicted or revealed this to be the case back then?

nurbsnn2y ago

On the contrary, there was a mathematical proof that one-hidden-layer neural network with nonlinearity is enough to represent any function. Using more than 1 hidden seemed a waste.

https://en.wikipedia.org/wiki/Universal_approximation_theore...

actionfromafar2y ago

How??

yobbo2y ago

Probably too many low-probability events chained together.

But I think they discovered most of the interesting things that small networks can do? For example, TD-Gammon from 1992: https://en.wikipedia.org/wiki/TD-Gammon .

hax0ron32y ago

The 1990s gamer in me gets a kick out of seeing John Carmack and Tim Sweeney talk to each other.

ttul2y ago

Ono-Sendai2y ago

What? There were pentium 2 and 3 machines back then that could certainly do more than a edge detection kernel every 2 seconds. Or do you mean on an embedded CPU?

bandrami2y ago

ttul2y ago

Indeed, this had to happen onboard a small airplane. Should have clarified that.

rmnclmnt2y ago

I know Deep Learning opened new possibilities but a lot of time CNN/RNN/Transformers are definitely not needed: working on the data instead and using "linear" models can go really far (my 2 cents)

[1]: https://www.dspguide.com [2]: http://www.dspguide.com/ch26.htm

29athrowaway2y ago

plun92y ago

1995 car automatic transmission with neural network: https://www.sciencedirect.com/science/article/abs/pii/038943...

Ono-Sendai2y ago

2sk212y ago

bilsbie2y ago

I remember people telling me you would just get overfitting if you made the network too big.

I wonder how LLM’s avoid that?

amichal2y ago

gattilorenz2y ago

On the topic of AI history, I would like to set up a demo of old AI and/or general CS research on late 90s/early 00s Sun Ultra machines.

Does anyone have suggestions (and links to code!) for what would be a cool demo? I’m thinking of a haar classifier to show some object recognition/face detection, but would appreciate more options!

mistrial92y ago

edit yes, almost certainly Neural Networks for Pattern Recognition (1995) thx!

huitzitziltzin2y ago

The book “neural networks and pattern recognition” by bishop dates to 1996 and has a red cover, at least in its current softcover iteration.

The random forest guy you mean is/was Leo Breiman. His student Adele Cutler deserves some of the credit there too.

mjan226402y ago

r13a2y ago

Would you mind giving a reference to the paper? A quick googling didn't brought anything.

anthk2y ago

https://tldp.org/HOWTO/AI-Alife-HOWTO-1.html

This was a thing in early 2000s/late 90's.

LarsDu882y ago

Lol, Carmack's like -- I could've gotten a 4096 NN running on my early 90s NextCube dev rig, you neural networking researcher peasants!

rwmj2y ago

There really wasn't the compute power around at the time, and as others have pointed out there wasn't the training data, or the cameras.

FrustratedMonky2y ago

Reading through the twitter thread, and these comments. It reminds me of all of the back and forth when HN discusses Psychology.

The other side, It's all, 'REEEEEEEEEE'

peterfirefly2y ago

https://en.wikipedia.org/wiki/Elmer_and_Elsie_(robots)

"Elmer and Elsie, or the "tortoises" as they were known, were constructed between 1948 and 1949 using war surplus materials and old alarm clocks."

"The robots were designed to show the interaction between both light-sensitive and touch-sensitive control mechanisms which were basically two nerve cells with visual and tactile inputs."

FrustratedMonky2y ago

A very bad comment, that failed to make a point, and that wasn't very humorous.

I meant to make relationship between Psychology and Machine Learning.

Psychology, the study of the mind, with questionable scientific methods and a replication problem.

And

Machine Learning, (that is taking the mind as a model), with questionable scientific methods, and replication problem, and the addition of corporate hype machines.

Often in last few months we stand in awe of what AI achieves, but it produces questionable results, and has a lot of problems. Machine learning is worshiped.

And yet often in last few months, posts on Psychology is railed on and called a field full of con-men and BS-Artists.

Wonder how long until we just openly talk about a field of Psychology of Machines, where we use the same tools to try and understand what the Neural Nets are thinking.

j / k navigate · click thread line to collapse