MNIST Handwritten Digit Classifier – beginner neural network project (opens in new tab)

(github.com)

205 pointsironislands9y ago28 comments

28 comments

Nice. Recreating these methods in simple code for yourself is definitely the way to check you understand it. This demo looks nice, clean, and straightforward. (Although I'd rename or comment variables x and y, or give some sort of guidance on what way around the weight matrices are within the code itself.)

It's also worth checking out existing neural net code-bases to see what tricks they have. The fine details usually aren't in papers, and they're not all in the text-books either.

The first potential problem that jumped out at me in this code was the initialization:

    self.weights = [np.array([0])] + [np.random.randn(y, x)
            for y, x in zip(sizes[1:], sizes[:-1])]

If the number of units in a layer is H, the typical size of the input into the layer above will be √(H). For large H, the sigmoid will usually saturate, and the gradients will underflow to zero, making it impossible to learn anything. There are some tricks to avoid the numerical problems, but even if you avoid numerical-underflow, things probably aren't going to work well.

I'd multiply those initial weights by a small constant divided by the square-root of weights going into the same neuron. For multiple layers you might consider layer-by-layer pre-training. For other architectures, like recurrent nets, definitely find a reference on how to do the initialization.

PS I would definitely add a test routine to check that the gradients from back-propagation agree with a finite difference approximation. It's so easy to get gradient code wrong, and it's so easy to test.

tyrael719y ago

'It's also worth checking out existing neural net code-bases to see what tricks they have. The fine details usually aren't in papers, and they're not all in the text-books either.'

Given that you are a person who is highly-qualified to answer, I am genuinely curious why do you think that is? Reimplementing algorithms from scratch is an efficient way to learn, understand the underlying concepts and attempt improvements in a research context.

imurray9y ago

A lot of machine-learning papers are eight pages. Speech conference papers (heavy users of neural nets) are often only four. Some details aren't part of the main message, so don't make it in. Often code is available, and initialization and other tweaks can be found in there (even if you aren't going to use their code).

That said, there are also whole papers, even collected volumes, on initialization and other practical details.

Textbooks aren't always up-to-date with the latest practical knowledge, as deep-learning practice is moving quickly. Or they simply don't want to clutter their high-level maths descriptions with code-level implementation details. Teaching stuff is all about tradeoffs. I'm sure several books do mention the scale of weights for simple feed-forward weights though, as it's not an implementation-level detail, and it's probably been well known since the 1980s.

Joof9y ago

I'll weigh in; papers aren't necessarily worded to convey new information in an ideal manner (especially to newbies). They are worded so that expert researchers are able to reproduce them, especially the parts that constitute whatever their contribution is to the field.

As for textbooks, I imagine that the field is moving too fast; half the stuff I use has only existed for the past year or two.

ironislandsOP9y ago

Hi ! Thanks for the valuable suggestion :-) Your points make much sense to me. I am caught up with some other work but I surely intend to make amendments latest by next 20 days.

tyrael719y ago

Can someone explain how this repo is so popular/ why it's so popular here? This is a basic implementation of a relatively simple algorithm that you learn when initially starting with ML/DL.

I do not want in any way to sound critical and am genuinely curious about the dynamics of why people would find this interesting given it's reduced complexity.

thibaut_barrere9y ago

One hypothesis is that there is a trend of non ML-familiar developers currently working on getting more grasp on ML. Such repositories provide something that e.g. web developers can take a look at, with reduced friction.

Disclaimer: I'm such a developer! (currently going through the last bits of https://www.coursera.org/learn/machine-learning) - but I've noticed other around me recently.

netvarun9y ago

I up-voted this since I found it really helpful while working through Michael Nielsen's book on neural networks[1]. (I had chanced upon this repo a couple of months ago)

The code is actually based on the original code from the book (e.g.: can be seen from the variable names like 'nabla') , but written in a more succinct manner. Since I am relatively new to Python, I found it easier to follow this repo's code than the code in the book and used it as my reference implementation.

It's missing quite a few things like calculating accuracy, regularization, etc. but they are quite straightforward to implement.

[1] Neural Networks and Deep Learning by Michael Nielsen - http://neuralnetworksanddeeplearning.com

huskyr9y ago

You said it yourself: when you want to learn something complex (like ML), it's usually easy to start with something with reduced complexity.

At least, that's why i clicked the link.

superuser29y ago

We aren't all working on machine learning. There will always be some people who are getting into it for the first time.

mlquestions9y ago

Hehe, you're probably of similar mind to PaulHoule who commented on my Ask HN thread - https://news.ycombinator.com/edit?id=12198327

rch9y ago

https://xkcd.com/1053/

d4rth_s1d10us9y ago

Haha nice comic ! :-D

sapphireblue9y ago

I have an old one written some time ago by myself too, in node.js/javascript: https://github.com/crystalline/dnnjs It is a simple multilayer perceptron with ReLU nonlinearity, it can achieve 1.7% error on MNIST which is bad compared to convnets but good enough for multilayer perceptron. Training a model is as simple as running "node node-mnist.js" in terminal.

liamconnell9y ago

I'm seeing a lot of comments about the lack of non-MNIST neural network tutorials. Well here's mine. It uses financial data and the goal is to build a trading strategy. Its a work in progress and comments/criticism is welcome.

https://github.com/LiamConnell/deep-algotrading

vamega9y ago

Thanks. This looks really great!

badminton19y ago

If you want to jump into this sort of thing, I highly recommend Google's Udacity course for deep learning. https://classroom.udacity.com/courses/ud730

RockyMcNuts9y ago

it's very good, but personally I'd start with the Andrew Ng and Hinton courses

https://www.coursera.org/learn/machine-learning https://www.coursera.org/learn/neural-networks

I think the Udacity course is best if you know principles of machine learning and want to apply them in a more professional toolchain and learn Tensorflow

1 more reply

iopq9y ago

I thought Neural Networks nowadays use ReLU instead of Sigmoid? Especially in the context of deep learning

asib9y ago

Looks like this implementation is based (in part) on the Stanford ML course, which teaches nnets using sigmoid activation.

Given that it's intended to introduce to beginners how nnets work, the choice of activation is an aside anyway - the real meat is back/forwardprop.

ironislandsOP9y ago

Absolutely, though I'll put in softmax, tanh and ReLU activation functions in near time. It isn't that difficult. Putting in more docstrings is also one of my todo.

naveen999y ago

I found the Arrayfire examples in c++11 very approachable: https://github.com/arrayfire/arrayfire/blob/devel/examples/m...

The softmax_regression and logistic regression examples are even easier.

There are bindings for nodejs, python, other languages.

But it's so nice to be able to follow the definitions of each symbol and function in visual studio, not to mention being able to step through the imperative code.

And it's fast.

melonakos9y ago

Thank you for the kind words. If there is anything we can do to help, let us know.

*I work at ArrayFire.

naveen999y ago

Will do. Thank you for Arrayfire.

IshKebab9y ago

I kind of with people would stop using MNIST for NN tutorials. There are a billion of them already. Do something different.

orthoganol9y ago

Neural nets even have their own "universality theorem", meaning any function at all, any sort of process out there in the world, anything, could in theory be approximated by a neural net. Yet everyone just does MNIST :)

detaro9y ago

Suggestions for other interesting starter projects? Doing something with NN is on the learning list...

IshKebab9y ago

The difficulty is always getting a dataset.

I've started trying to get a network to recognise different vowels ("aaahhhhhh", "eeeeee", "ooooooo", etc.). Relatively easy to generate data - you just need your voice and a microphone. Downside is all the NN systems are much more set up for images than sound.

Or what about neural net fingerprint recognition. There must be databases of fingerprints somewhere. Or irises.

Recognise a type of wood from images of its grain?

Or activity recognition from accelerometer data. I think Pebble recently open sourced their recogniser and it was surprisingly not a neural network. I'm sure a neural network could do better. Might be hard to get a decent amount of data here but this could be a good incentive to do exercise!

j / k navigate · click thread line to collapse

28 comments

imurray9y ago

It's also worth checking out existing neural net code-bases to see what tricks they have. The fine details usually aren't in papers, and they're not all in the text-books either.

The first potential problem that jumped out at me in this code was the initialization:

    self.weights = [np.array([0])] + [np.random.randn(y, x)
            for y, x in zip(sizes[1:], sizes[:-1])]

tyrael719y ago

'It's also worth checking out existing neural net code-bases to see what tricks they have. The fine details usually aren't in papers, and they're not all in the text-books either.'

imurray9y ago

That said, there are also whole papers, even collected volumes, on initialization and other practical details.

Joof9y ago

As for textbooks, I imagine that the field is moving too fast; half the stuff I use has only existed for the past year or two.

ironislandsOP9y ago

Hi ! Thanks for the valuable suggestion :-) Your points make much sense to me. I am caught up with some other work but I surely intend to make amendments latest by next 20 days.

tyrael719y ago

Can someone explain how this repo is so popular/ why it's so popular here? This is a basic implementation of a relatively simple algorithm that you learn when initially starting with ML/DL.

I do not want in any way to sound critical and am genuinely curious about the dynamics of why people would find this interesting given it's reduced complexity.

thibaut_barrere9y ago

Disclaimer: I'm such a developer! (currently going through the last bits of https://www.coursera.org/learn/machine-learning) - but I've noticed other around me recently.

netvarun9y ago

I up-voted this since I found it really helpful while working through Michael Nielsen's book on neural networks[1]. (I had chanced upon this repo a couple of months ago)

It's missing quite a few things like calculating accuracy, regularization, etc. but they are quite straightforward to implement.

[1] Neural Networks and Deep Learning by Michael Nielsen - http://neuralnetworksanddeeplearning.com

huskyr9y ago

You said it yourself: when you want to learn something complex (like ML), it's usually easy to start with something with reduced complexity.

At least, that's why i clicked the link.

superuser29y ago

We aren't all working on machine learning. There will always be some people who are getting into it for the first time.

mlquestions9y ago

Hehe, you're probably of similar mind to PaulHoule who commented on my Ask HN thread - https://news.ycombinator.com/edit?id=12198327

rch9y ago

https://xkcd.com/1053/

d4rth_s1d10us9y ago

Haha nice comic ! :-D

sapphireblue9y ago

liamconnell9y ago

https://github.com/LiamConnell/deep-algotrading

vamega9y ago

Thanks. This looks really great!

badminton19y ago

If you want to jump into this sort of thing, I highly recommend Google's Udacity course for deep learning. https://classroom.udacity.com/courses/ud730

RockyMcNuts9y ago

it's very good, but personally I'd start with the Andrew Ng and Hinton courses

https://www.coursera.org/learn/machine-learning https://www.coursera.org/learn/neural-networks

I think the Udacity course is best if you know principles of machine learning and want to apply them in a more professional toolchain and learn Tensorflow

1 more reply

iopq9y ago

I thought Neural Networks nowadays use ReLU instead of Sigmoid? Especially in the context of deep learning

asib9y ago

Looks like this implementation is based (in part) on the Stanford ML course, which teaches nnets using sigmoid activation.

Given that it's intended to introduce to beginners how nnets work, the choice of activation is an aside anyway - the real meat is back/forwardprop.

ironislandsOP9y ago

Absolutely, though I'll put in softmax, tanh and ReLU activation functions in near time. It isn't that difficult. Putting in more docstrings is also one of my todo.

naveen999y ago

I found the Arrayfire examples in c++11 very approachable: https://github.com/arrayfire/arrayfire/blob/devel/examples/m...

The softmax_regression and logistic regression examples are even easier.

There are bindings for nodejs, python, other languages.

But it's so nice to be able to follow the definitions of each symbol and function in visual studio, not to mention being able to step through the imperative code.

And it's fast.

melonakos9y ago

Thank you for the kind words. If there is anything we can do to help, let us know.

*I work at ArrayFire.

naveen999y ago

Will do. Thank you for Arrayfire.

IshKebab9y ago

I kind of with people would stop using MNIST for NN tutorials. There are a billion of them already. Do something different.

orthoganol9y ago

detaro9y ago

Suggestions for other interesting starter projects? Doing something with NN is on the learning list...

IshKebab9y ago

The difficulty is always getting a dataset.

Or what about neural net fingerprint recognition. There must be databases of fingerprints somewhere. Or irises.

Recognise a type of wood from images of its grain?

j / k navigate · click thread line to collapse