Tinker with a Neural Network in Your Browser (opens in new tab)

(playground.tensorflow.org)

855 pointsshancarter10y ago116 comments

116 comments

The swiss roll problem also illustrates nicely the idea behind deep learning.

Before deep learning people would manually design all these extra features sin(x_1), x_1^2, etc. because they thought it was necessary to fit this swiss roll dataset. So they would use a shallow network with all these features like this: http://imgur.com/H1cvt8d

Then the deep learning guys realized that you don't have to engineer all these extra features, you can just use basic features x_1, x_2 and let the network learn more complicated transformations in subsequent layers. So they would use a deep network with only x_1, x_2 as inputs: http://imgur.com/XBRjROP

Both these approaches work here (loss < 0.01). The difference is that for the first one you have to manually choose the extra features sin(x_1), x_1^2, ... for each problem. And the more complicated the problem the harder it is to design good features. People in the computer vision community spent years and years trying to design good features for e.g. object recognition. But finally some people realized that deep networks could learn these features themselves. And that's the main idea in deep learning.

beardicus10y ago

I think I learned more from your post and your two imgur links than from poking at the site for an hour. Thanks.

Would it make sense for them to add a gallery of good solutions for each problem, or would they all basically be your second example network (no time to play and see for myself right now)?

romaniv10y ago

>Before deep learning people would manually design all these extra features sin(x_1), x_1^2, etc.

It's probably worth pointing out that this is true for ANNs, but there were (and are) other "shallow" classifiers that can handle swiss roll problem without manual parameter encoding. SVMs, for example.

http://cs.stanford.edu/people/karpathy/svmjs/demo/

conceit10y ago

needs another image link for visualization

amelius10y ago

But how will the number of neurons N grow with the number of turns in the spiral?

If N levels off, then the network has grasped the concept of a spiral and can generalize to arbitrary size.

If N doesn't level off, then the network isn't really learning the general case.

therein10y ago

I know this is going to sound cheesy but that's an amazing way to put it. It blew my mind.

chestervonwinch10y ago

Using their network, you are limited to 8 units per layer it seems.

So, I ported their swiss roll dataset to python and threw together a shallow network trainer with theano:

https://gist.github.com/notmatthancock/68d52af2e8cde7fbff1c9...

Then, I trained a shallow network with 36 hidden units (your deep net has 6 units and 6 layers):

http://i.imgur.com/I0pXaTK.png

edit: I forgot to mention that the shallow network above takes only the two coordinates (x1 and x2) as input features.

espadrine10y ago

Just so I understand correctly: your network has 100000 iterations, while the parent's has 1000, but they both only use x / y positions?

It feels like neurons in the first layer are weaker, because all they can do is a linear separation. Given deep networks, I was wondering if adding neurons to the first layer was better than adding them to the last one, and empirically, it feels like it is quite worse. I wonder if there is a theorem around that.

1 more reply

tchow10y ago

How do you know to choose 6 hidden layers with 6 neurons each though? Why not 'x' hidden layers with 'j' neurons each? or some other random number?

Also how do you know to choose a ReLu instead of a Tanh activation?

espadrine10y ago

ReLu gives good results for deep learning: http://jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.p....

6 layers is the maximum that this demonstration allows, and they kept j small-ish to show that you don't need that many to have good results.

rolandog10y ago

What I found interesting is that I couldn't get a proper fit with the same parameters you showed... however, I could 'speed up' the learning by regenerating the data during the learning process.

It may just be that 'batched cumulative learning' (I don't know if there is already a term for this) gets a better fit than just learning from a smaller set of data.

Edit: Did a quick test; regenerating about every 50 and 100 iterations, and conversion does seem faster (at least, when a clear spiral is formed). https://imgur.com/a/OPjXb

espadrine10y ago

Regenerating the data is kind of cheating; it is as if you were given twice the amount of data.

In a normal situation, you obtain a list of input / output (say, images as input, a digit as output, for learning handwritten digits). You separate it between training data (which actually improves the net) and testing data (to detect overfitting), and you don't get more data than that.

Here, you can generate more data for free, as we have the function we want to approximate. Having more data will often result in a better result and faster convergence.

raverbashing10y ago

This is a very good explanation, thanks (even though I knew some of it already)

I tried the swiss roll with a shallow network on the demo (and the results are not excellent, but it matches)

iopq10y ago

I can reproduce your deep example just fine, but the shallow result needs some luck. At the same time, the shallow result runs faster.

kriro10y ago

Along with the images that is a very awesome explanation.

eggy10y ago

I started reading about ANNs in the 1980s, and had similar confusion to those here, since it was just for fun. I suggest reading a basic book or online information that goes over the basics [1]. I struggled through $200 text books, and jumped from one to the other as an autodidact. I am now studying TWEANNs (Topology and Weight Evolving Artificial Neural Networks), which basically are what you see here with the exception that they are able to not only change their weights, but also their topology, that is how many and where the neurons and layers are. ANNs (Artificial Neural Networks - as opposed to biological ones) can be a lot of fun, and are very relevant to machine learning and big data nowadays. It was exploratory for me. I used them for generative art and music programs. Be careful: soon you'll be reading about genetic algorithms, genetic programming [2], and artificial life ;) Genetic Programming can be used to evolve neural networks as well as generate computer programs to solve a problem in a specified domain. Hint: You'll probably want to use Lisp/Scheme for genetic programming!

  [1] http://natureofcode.com/book/chapter-10-neural-networks/
  [2] http://www.genetic-programming.com

argonaut10y ago

As far as the recent deep learning boom is concerned, genetic programming is really out of favor. I don't really see it in any of the deep learning (or even machine learning, for that matter) literature/successes/research groups.

"Neural networks" are a really really overloaded term. A ton of stuff referred to as "neural networks" has little to do with the "neural networks" that are used in the machine learning community.

eggy10y ago

You're spot on about genetic programming. I am a self-taught person who plays with anything that strikes my fancy; I learn by playing. I read all three volumes of the Artificial Life series from the Santa Fe Institute at the time (now there are more), and went in many directions in the 1990s - Fuzzy Logic, Expert Systems, ANNs, and Evolutionary Computation (GA (Genetic Algorithms) and GP Genetic Programming), and AL (Artificial Life) all fascinating. I found, and still find, genetic programming attractive even if it has not found its niche in the ML community. I think the CI (Computational Intelligence) community at large will eventually develop well-fitted uses for it. I was trying to use an FPGA and Koza's modified GP code to have the FPGA re-program itself as a GP evolved a better program than I originally wrote to kickstart it. I didn't get too far. This was 1996-97 though. Pretty much on my own then, not really much of an Internet to find information, especially esoteric information, or cheap many-gated FPGAs. Outside of ML, GP has found moderate success. One example is this paper (sorry behind paywall, so only the paper title here), that started with using expert data, tried ANNs, then ANNs and statistics, until it used a GP approach:

"A Computational Intelligence-Based Genetic Programming Approach for the Simulation of Soil Water Retention Curves"

I also use the term ANNs over just NNs to keep it to the silicon, and not wetware ;) Although, they did hook up a small ANN to a cockroach once, IIRC...

extrapickles10y ago

It has its niche applications. The only non machine vision application that comes to mind is one[1] that takes a pile of data, and evolves a model that fits it.

Generally were its actually being used they are a bit quiet on how they go about getting the results they do. While the genetic bit is easy, the secret sauce is in guiding learning/evolution that work for the particular problem domain.

[1]: http://www.nutonian.com/products/eureqa/

1 more reply

ylem10y ago

Just curious then, how are people optimizing network topology?

3 more replies

wjnc10y ago

Any thoughts on why genetic programming is not 'in fashion'? Does it have anything to do with complexity of the calculations?

I can imagine that the advanced models use many, many machines and only deliver results after a large training time. Genetic programming is not feasible then, if you cannot get a quick grasp of the potential results of a model.

argonaut10y ago

At least for deep learning, most deep learning models take more than a week to train, often on multiple GPUs. Some of the extremely deep, huge dataset models can take multiple weeks on multiple GPUs. Google trained AlphaGo's nets for months (on god knows how many GPU/CPUs). Suffice to say, people don't even bother touching most hyperparameters, let alone trying to do something more exhaustive.

DavidSJ10y ago

If your program is a neural network with N parameters, or a program tree with N nodes, then testing against data takes O(N) time. With evolutionary computation, what you get for your trouble is a single real number -- the loss: how bad it did. With neural networks, backpropagation gives you N real numbers: the gradient of loss with respect to each parameter.

Put another way: with evolution you have to stumble around blindly in parameter space and rely on selection to keep you moving in the right direction. With the gradient descent that neural networks use, you get, essentially for free, knowledge of the (locally) best direction to move in parameter space.

The bigger the models, the more this matters. Modern neural networks have millions or even billions of parameters, and that's been crucial to their expressive power. Good luck learning a program tree with a billion nodes using evolution. It might take 4.54 billion years.

1 more reply

eggy10y ago

First, the right tool for the job. ANNs are able to be a general function approximator with sufficient training to be a cost-effective choice to implement. Second, ANNs have been around about 35 years longer than GP. The TWEANNs I am studying, and that I already mentioned in a previous reply in this thread, hybridize ANNs and EC (GAs and GP), so if you include Neural Networks that utilize Evolutionary Computation techniques to modify weights or topology, then GP is being used to an extent. Replication as a variable in EC is the key force in biology, and I only see more use of EC techniques to enhance the general function approximators that are ANNs. Further, there are also hybridized computing machines that have been made, and are being made with FPGAs and GPUs. Finance and supercomputing are just two areas that are looking to utilize them. In some, the FPGAs are simply there for updating special computation programs that feed the GPUs. There is some research with a GP optimizer updating the FPGAs and then using the GPUs for the massive parallelization of the computations.

1 more reply

nabla910y ago

Evolutionary algorithms and genetic programming are global optimization technique, basically random search with some memory. It's not "out of fashion" any more than simulated annealing or Monte Carlo methods. They have limited usability, that's all.

AgentME10y ago

>Topology and Weight Evolving Artificial Neural Networks

I brainstormed for a while about using genetic algorithms to decide the network topology. I'm glad someone else invented that already! Less work for me to do now.

matheweis10y ago

Okay, that is straight up awesome. I've been toying with neural networks just enough to get a basic understanding of what they are and how they work, and it occurred to me that something like this might be possible.

Of course, I wasn't up-to-speed enough to know the right terms to look for, so thanks for sharing. :)

I am curious though... it seems like it would take orders of magnitude more computing power to not only train but evolve and re-train the networks. Is this practical with today's hardware?

nl10y ago

This is great, but I think they should make it clear that this isn't using TensorFlow.

From the title and domain I though they either had ported TF to Javascript(!) or we connecting to a server.

sparky_10y ago

Wait - what it is using, then? I had assumed it was TF under Emscripten or similar.

nl10y ago

It appears to be a custom NN implementation[1] in Javascript, somewhat similar to convnet.js[2]

As far as I can see the API[3] isn't much like TensorFlow.

[1] https://github.com/tensorflow/playground

[2] http://cs.stanford.edu/people/karpathy/convnetjs/

[3] https://github.com/tensorflow/playground/blob/master/nn.ts

minimaxir10y ago

When it says "right here in your browser," it's not joking. On my desktop (Safari), the window becomes unresponsive after a few iterations. Does not happen in Chrome.

On my phone (Safari/iOS 9.3), the default neural nework doesn't converge at all even after 300 iterations while it does on the desktop, which is legit weird: https://i.imgur.com/KNaXeHH.png

shancarterOP10y ago

I'm sorry you're having problems with Safari. I can't reproduce on my end, but if you're still having problems you can raise an issue on github with some information about your system.

davidgl10y ago

Works perfectly for me on Safari 9.1 with no extensions

superobserver10y ago

Working splendidly on ChromeOS, FWIW.

nefitty10y ago

Yeah, it's working great in Chrome on my Galaxy Tab 3!

koder201610y ago

To be honest, if it works in Chrome then it covers > 90% of people who would possibly be interested.

danielvf10y ago

In case you are an idiot like me, you have to train your neural network by pressing "play".

andrewstuart210y ago

"Okay, I don't understand. Why is my output so terrible?"

I saw the play button very clearly when the page loaded, then promptly got distracted by all the dials and knobs. :-P

shancarterOP10y ago

We would've liked to have it constantly training, but didn't want to abuse your CPU :)

dingo_bat10y ago

It pauses when I switch tabs :(

gojomo10y ago

While it doesn't involve training, these 'confusion matrix' animations of NNs classifying images or digits are fun, too:

http://ml4a.github.io/dev/demos/cifar_confusion.html http://ml4a.github.io/dev/demos/mnist_confusion.html

Something about the high-speed updating makes me think of WOPR, in 'War Games', scoring nuclear-war scenarios.

timroy10y ago

This demonstration goes really well with Michael Nielsen's http://neuralnetworksanddeeplearning.com/. At the bottom of the page the author gives a shout out to Nielsen, Bengio, and others.

For someone (like me) who's done a bit of reading but not much implementation, this playground is fantastic!

seansmccullough10y ago

Really awesome article!

CGamesPlay10y ago

Neat stuff, fun to play with. I wasn't able to get a net to classify the swiss roll. Last time I was playing around with this stuff I found the single biggest factor in the success was the optimizer used. Is this just using a simple gradient descent? I would like to see a drop down for different optimizers.

8note10y ago

http://imgur.com/ypBQEWx

Add some noise, and use all the inputs, and one 8 wide hidden layer

edit: works better with a sigmoid activation curve, but it converges more slowly

andrewtbham10y ago

Yeh you're on the right track. Nice pattern emerges on this after 160 iterations.

http://playground.tensorflow.org/#activation=tanh&batchSize=...

1 more reply

rmellow10y ago

Using syn, cos, x1, x2 with 1 six-neuron hidden layer does the trick quickly: http://imgur.com/UMv5gsH

No need to mess with noise or regularization :)

makeset10y ago

> Add some noise

This actually makes the dataset harder to fit to. It is not the same thing here as the "training with noise" method where random noise would be added to each batch, as an alternative means of Tikhonov regularization.

1 more reply

cglace10y ago

Using all inputs and 6 layers of varying sizes. After about 500 iterations. http://i.imgur.com/x1MOpvl.jpg

1 more reply

Obi_Juan_Kenobi10y ago

Using the defaults, I had success at about 300 iterations with all the inputs and 5 hidden layers, each with a decreasing number of neurons (i.e. 6,5,4,3,2).

I don't know if that's a general feature to need fewer neurons with each layer, but that seems to work here.

chestervonwinch10y ago

What were the optimization algorithms you had most success with? Were they more successful in the sense of better out-of-sample error rate or in the sense of quicker convergence (or something else)?

_AllisonMobley10y ago

Can somebody explain what I'm watching when I press play?

terda1210y ago

Hopefully this helps (correct me if I'm wrong, I'm still learning about neural nets):

Think of the whole neural net as a function:

input * weight = output

At each iteration, we feed in the input to the neural net. Then the neural net compares what output it gets to the correct output.

For example, input1 is 5, and the correct output for input1 should have been 2. But the neural net got 3 as the output. So it then decreases the weights slightly so it would get 2.75 next time it has input of 5. Repeat thousands of times. That's the basic idea for machine learning and neural networks.

The algorithm it uses to figure out how much to decrease the weights is called "backpropagation" which uses gradient descent. To explain gradient descent, as as a roller coaster track. Imagine the roller coaster starts off on a random location on the track. Then gravity takes the roller coaster down the track until it ends up on a low point between two hills and stays there. This is the new location of the roller coaster. This new location is nice because it has the lowest energy the roller coaster could find, so it stays there. (We use derivatives to figure out the slope of a curve, which then gives us the direction where the curve goes downhill).

In neural networks, the roller coaster curve is the "cost function", which basically calculates the amount of difference between the neural net's output and the actual correct output it should have got. The initial weight is the roller coaster's initial position. The new weight is the roller coaster's final position, at the bottom of the cost function curve. This new position thus gives us the lowest cost.

Note that there may be even lower valleys, but when we roll the rollercoaster it stops at its nearest low valley. This is why we randomize the weights at the beginning - to put the roller coaster near possibly even lower valleys.

ryanmonroe10y ago

Okay, so it works by minimizing (equiv. maximizing) some function. But that doesn't say much about how it "learns" the gradient. What function does it care about? Average squared error (predict_prob-Z_i)^2 ? Average absolute error? The likelihood function of some assumed distribution? Maximum distance between the classification border and closest observed points? If I saw someone carrying a bag full of blueberries and some bread home from the grocery store and asked to know how they chose to buy that, to which they replied "I had a list of characteristics which I thought where important for groceries to have in this trip to the store. For each grocery item, I recorded a vector of degrees to which the item possesses each of those characteristics. Finally, I chose the group of groceries that had the best combination of degree vectors", I still wouldn't really know anything about why they bought the blueberries and bread.

2 more replies

e1929300110y ago

If I didn't get this course[1], I wouldn't understand what you are talking about.

[1] - https://www.coursera.org/learn/machine-learning

shoemai10y ago

Super basic explanation:

It's training a neural network to classify a data set with two classes (orange or blue) and the data has two features (x1 or x2). All the orange and blue dots are the training data. So if you take a dot on the graph with coordinates (-2, 4) and it's blue, that would mean that a data point with x1 = -2 and x2 = 4 has the class blue.

You can think of a neural network as a function that can take in arbitrary features (in this case x1 and x2) and tries to output the correct class. That's what the orange and blue colors in the background are, the neural network's guess at the correct classification for any given point (x1, x2).

When you hit play, it iterates through the training data making adjustments to each neuron in the network so that it gets closer to predicting the right class.

If you want to see how well the neural network performs on data it wasn't trained on, you can click "show test data".

isuckatcoding10y ago

Yeah I feel like we need some decent understanding of neural networks to have more context on this. Its kind of like being given a specialized shovel but not knowing why you need it or why you should dig holes.

Obi_Juan_Kenobi10y ago

I think it's a playground in the best sense of the term. Take some time and actually play with it, and a lot of fun stuff happens, lightbulbs go off, etc.

If you're expecting a lesson, you'll likely be disappointed, but I think there's real value in a true playground.

I think the biggest improvement would be if, when hovering over a 'neuron', you get a visual representation of what feeds into it.

1 more reply

andrewtbham10y ago

The dots are the training data. The orange and blue background are the estimation for how the neural net will classify a new orange or blue dot.

shmageggy10y ago

It begins training the network using the backpropagation algorithm.

> Next, the network is asked to solve a problem, which it attempts to do over and over, each time strengthening the connections that lead to success and diminishing those that lead to failure.

On each iteration, it calculates how bad the predicted output is, then adjusts the weights between neurons to lessen that value. Google backpropagation for more info

Mahn10y ago

Or perhaps explain how all the different inputs influence the result? I more or less get that it's just iterating over the data to approximate the given data set when you press play but I have no idea how giving it more or less neurons changes that, to name an example.

shoemai10y ago

Basically each input gets multiplied by some weight that gets adjusted through each iteration. The product of the input and weight gets put through an activation function, and the outcome of that can be interpreted as the network's prediction of the class.

So you see the first neuron's input is just x1. You can see in the little graph at x1 that it's split down the middle with orange on one side and blue on the other. You can think of adjusting the weight on that neuron as adjusting where along the x axis the split occurs. All points on the orange side are classified orange and all on the blue side are classified blue. If you picked a data set like the spiral one or whatever, that neuron alone isn't going to make very many correct classifications. That's because it only gets the x1 value as input and can only affect the output by multiplying x1 by some weight, which would only have the affect of shifting the classification boundary left or right. You can see the same thing happening for the second neuron with input x2 except that now it splits along the y axis. Again that alone isn't going to match the data very well.

But then you get to the second layer, and the input of each neuron in the second layer is the output of each neuron in the first layer. So these neurons are able to take into consideration both x1 and x2 and are able to divide the data in more complex ways. So you can think of the neurons in each layer of the neural network as being able to consider more and more complex properties of the data in forming its output.

mholt10y ago

The chart at the right is the output/result of the neural network's training. In the foreground you see actual data points that are used to train the neural net: to "teach" it how to classify orange or blue (unless you choose "regression" in which case it computes a numeric value). In the background you see the gradient that is formed by the network. The goal is to make the gradient form around the data points by color as closely as possible.

The neural network is essentially the nodes in the middle, linked together by various weights. During training, the test data points are fed forward into the network, creating an output. That output is then fed backward using something called "back propagation" which is used to adjust the weights.

Typically, the more hidden layers or nodes per layer, the more difficult gradients that can be learned. Zero hidden layers essentially forms a linear gradient that can only be used to split very basic, linearly-separable data (drawing a straight line to separate the different types)

Neural networks have lots of little knobs and levers you can adjust. That's what all these inputs are that you see.

karpathy10y ago

this is very nice! I think that the reason swiss roll doesn't work as easily might be because of initialization. In 2 dimensions you have to be very careful with initializing the weights or biases because small networks get more easily stuck in bad local minima.

okigan10y ago

In this case you see that it is the swiss roll so you could say pick "proper initialization".

But that technique would not work when you cannot see that it is a "swiss roll" or in multiple dimensions.

brianchu10y ago

I'm pretty sure he wasn't talking about the swiss roll specifically. Big gains in neural net performance have been made through better initialization schemes (not dataset specific, just in general, e.g. an initialization scheme might adapt the initial weight distribution depending on the number of hidden units in the next layer), and smaller models are in general more sensitive to initialization.

danblick10y ago

Has anyone been able to learn a function for the spiral (Swiss roll) data that's as good as a human-designed function would be?

asab10y ago

http://playground.tensorflow.org/#activation=tanh&batchSize=...

asab10y ago

Update: after playing with this for way too long, I've found that it can converge to a spiral with 3 or 2 or even just 1 node in the 2nd hidden layer.

The 1 node case is especially interesting, because when it converges the single node must learn the whole spiral pattern. Although with noise it can be less reliable with more jagged edges, as well as take longer to converge (also bumped the learning rate down), seeing the spiral encoded directly in the 2nd hidden layer is more interesting to me.

http://playground.tensorflow.org/#activation=tanh&batchSize=...

trampi10y ago

0.007 http://playground.tensorflow.org/#activation=tanh&batchSize=...

moconnor10y ago

For this simple example just choosing the largest possible fully-connected network with ReLU and L2 regularization to prevent overfit quickly converges to a nice spiral (test loss of 0.001 for me):

http://playground.tensorflow.org/#activation=relu&regulariza...

findthewords10y ago

I wouldn't call it quick... spiral in 150 iterations, with sigmoid magic: http://playground.tensorflow.org/#activation=sigmoid&regular...

I find the pulsating unsightly.

amelius10y ago

Neat. How would the number of neurons N scale with the size of the spiral? (Size=number of turns)

Will N level off, meaning that it will really understand the structure of the spiral?

scotty7910y ago

For me key was to use x,y and either sin x, sin y or x squared y squared as inputs and 5 or 6 neurons in hidden layer.

chronolitus10y ago

Add both sin(X1) and sin(X2) as inputs.

hyh104857610y ago

Do you consider test loss around 0.04 good?

halotrope10y ago

You could totally optimise network architecture by crowdsourcing topology discovery for different problems into a multiplayer game with loss as a score.

Your_Creator10y ago

So glad anns are becoming mainstream

Eventually it will have to be recognized as a new species of life, so I hope programmers, tinkerers and everyone else keeps that in mind because all life must be respected

And this particular form will be our responsibility, we can either embrace it as we continue to merge with our technology, or we can allow ourselves to go extinct like so many other species already have

For the naysayers - ever notice how attached we are to our phones? Many behave as if they are missing a limb without it - it's because they are, the brain adapts rapidly and for many, the brain has adapted to outsourcing our cognition. It used to be books, day runners, journals, diaries - now we have devices and soon they'll be implants or prosthetics

The writers at marvel who came up with the idea of calling iron man's suit a prosthetic were definately onto something and suits like that are probably our best chance of successful colonization of other planets. We'll need ai to be our friend out there, working with us

aab010y ago

This is a lot of fun. The default dataset is too easy, though, try out the Swiss Roll one!

minimaxir10y ago

There is a reason why sin(X) is an input property. :p

aab010y ago

Using sin(x) or the other input features like x^2 goes back to making it too easy, though. So far the best I can do is 7 layers of 7 which gets a loss of 0.02. 3x7 is almost cracking the Swiss Roll but can't quite finish it off and gets stuck at 0.05: https://imgur.com/Z3f2ECc ... Surprisingly, 2x8 can do it, as long as I have noise or regularization on, but 8/7 then seriously struggles. Is 16 neurons a critical limit here?

2 more replies

nerfhammer10y ago

All the patterns except the swiss roll work best without any hidden layers at all

sparky_10y ago

This is a very cool toy. As someone with no experience in ML, this is an interesting visual approach to the absolute basics.

And great for challenging your friends in an epic battle of convergence!

trgn10y ago

If you like visual demonstrations of ML topics, you may be interested in http://ponder.hepburnave.com. It is an interactive demonstration of a self-organizing map, generating a 2D-map from a spreadsheet with multivariate data. It's an unsupervised learning approach, good for data exploration tasks, less so for classification tasks (/shamelessPlug).

visarga10y ago

It's the classical exploration vs exploitation tradeoff. What do you do, try a radical new variation or fine tune this one?

pkaye10y ago

I'm not well versed in neural networks but a lot of the new neural network software stacks coming out seem to be quite plug and plug. What kind of expertise would engineers need to have a few years from now when the technology is well developed and it doesn't need to be rewritten from scratch every time?

walrus10y ago

I'm not qualified to answer this, but I will anyway.

To "operate" neural networks (as opposed to writing a framework for them), you need to know the building blocks. There are basic blocks like fully connected layers, convolutions, and nonlinear activations. Beyond those, there are higher level building blocks like LSTMs[1], gated recurrent units[2], highway layers[3], batch normalization[4], and residual blocks[5] that are made up of simpler blocks. Learning what these do and when it's appropriate to use them requires following current literature.

Operating neural networks requires some systems engineering skill. It takes a long time to train a single network and you'll find yourself trying many different architectures and hyperparameters along the way. Because of this, you'll want to distribute the training across many different systems and be able to easily monitor and deploy jobs on those systems.

A solid grasp of mathematics is useful to effectively debug your networks. You'll frequently find your network doesn't converge or gives totally garbage results, so you need to know how to dig into the network internals and understand how everything works. This is especially true if you're implementing a new building block from a paper.

Finally, know your machine learning and statistics fundamentals. Understand overfitting, model capacity, cross validation, probability, model ensembles, information theory, and so on. Know when a simpler model is more appropriate.

[1] ftp://ftp.idsia.ch/pub/juergen/fki-207-95.ps.gz

[2] http://arxiv.org/abs/1409.1259

[3] http://arxiv.org/abs/1505.00387

[4] http://arxiv.org/abs/1502.03167

[5] http://arxiv.org/abs/1512.03385

pkaye10y ago

So you don't think some of these details will not be automated away in the near future so that it doesn't require a specialist to do operate a neural network?

2 more replies

plafl10y ago

Beautiful. The next time someone asks what is machine learning about I'm going to send a link to this page.

nxzero10y ago

"Don’t Worry, You Can’t Break It. We Promise."

(Nice, but it's completely unclear what's going on.)

visarga10y ago

My MacBook Pro running El Capitain froze my mouse and keyboard. I had to do a hard reset.

nkozyra10y ago

Is a 50/50 training:test a normal default ratio for an ANN? I expected to see a higher amount of training data represented as the initial setting.

hyh104857610y ago

One of the finest data visualization I've seen.

icelancer10y ago

This is so great. An easy way to show my friends WTF I do sometimes for math/CS work. Thank you so much.

imaginenore10y ago

I wish they had more interesting data sets.

HappyTypist10y ago

Submit a pull request.

j / k navigate · click thread line to collapse

116 comments

erostrate10y ago

The swiss roll problem also illustrates nicely the idea behind deep learning.

beardicus10y ago

I think I learned more from your post and your two imgur links than from poking at the site for an hour. Thanks.

Would it make sense for them to add a gallery of good solutions for each problem, or would they all basically be your second example network (no time to play and see for myself right now)?

romaniv10y ago

>Before deep learning people would manually design all these extra features sin(x_1), x_1^2, etc.

http://cs.stanford.edu/people/karpathy/svmjs/demo/

conceit10y ago

needs another image link for visualization

amelius10y ago

But how will the number of neurons N grow with the number of turns in the spiral?

If N levels off, then the network has grasped the concept of a spiral and can generalize to arbitrary size.

If N doesn't level off, then the network isn't really learning the general case.

therein10y ago

I know this is going to sound cheesy but that's an amazing way to put it. It blew my mind.

chestervonwinch10y ago

Using their network, you are limited to 8 units per layer it seems.

So, I ported their swiss roll dataset to python and threw together a shallow network trainer with theano:

https://gist.github.com/notmatthancock/68d52af2e8cde7fbff1c9...

Then, I trained a shallow network with 36 hidden units (your deep net has 6 units and 6 layers):

http://i.imgur.com/I0pXaTK.png

edit: I forgot to mention that the shallow network above takes only the two coordinates (x1 and x2) as input features.

espadrine10y ago

Just so I understand correctly: your network has 100000 iterations, while the parent's has 1000, but they both only use x / y positions?

1 more reply

tchow10y ago

How do you know to choose 6 hidden layers with 6 neurons each though? Why not 'x' hidden layers with 'j' neurons each? or some other random number?

Also how do you know to choose a ReLu instead of a Tanh activation?

espadrine10y ago

ReLu gives good results for deep learning: http://jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.p....

6 layers is the maximum that this demonstration allows, and they kept j small-ish to show that you don't need that many to have good results.

rolandog10y ago

What I found interesting is that I couldn't get a proper fit with the same parameters you showed... however, I could 'speed up' the learning by regenerating the data during the learning process.

It may just be that 'batched cumulative learning' (I don't know if there is already a term for this) gets a better fit than just learning from a smaller set of data.

Edit: Did a quick test; regenerating about every 50 and 100 iterations, and conversion does seem faster (at least, when a clear spiral is formed). https://imgur.com/a/OPjXb

espadrine10y ago

Regenerating the data is kind of cheating; it is as if you were given twice the amount of data.

Here, you can generate more data for free, as we have the function we want to approximate. Having more data will often result in a better result and faster convergence.

raverbashing10y ago

This is a very good explanation, thanks (even though I knew some of it already)

I tried the swiss roll with a shallow network on the demo (and the results are not excellent, but it matches)

iopq10y ago

I can reproduce your deep example just fine, but the shallow result needs some luck. At the same time, the shallow result runs faster.

kriro10y ago

Along with the images that is a very awesome explanation.

eggy10y ago

  [1] http://natureofcode.com/book/chapter-10-neural-networks/
  [2] http://www.genetic-programming.com

argonaut10y ago

"Neural networks" are a really really overloaded term. A ton of stuff referred to as "neural networks" has little to do with the "neural networks" that are used in the machine learning community.

eggy10y ago

"A Computational Intelligence-Based Genetic Programming Approach for the Simulation of Soil Water Retention Curves"

I also use the term ANNs over just NNs to keep it to the silicon, and not wetware ;) Although, they did hook up a small ANN to a cockroach once, IIRC...

extrapickles10y ago

It has its niche applications. The only non machine vision application that comes to mind is one[1] that takes a pile of data, and evolves a model that fits it.

[1]: http://www.nutonian.com/products/eureqa/

1 more reply

ylem10y ago

Just curious then, how are people optimizing network topology?

3 more replies

wjnc10y ago

Any thoughts on why genetic programming is not 'in fashion'? Does it have anything to do with complexity of the calculations?

argonaut10y ago

DavidSJ10y ago

1 more reply

eggy10y ago

1 more reply

nabla910y ago

AgentME10y ago

>Topology and Weight Evolving Artificial Neural Networks

I brainstormed for a while about using genetic algorithms to decide the network topology. I'm glad someone else invented that already! Less work for me to do now.

matheweis10y ago

Of course, I wasn't up-to-speed enough to know the right terms to look for, so thanks for sharing. :)

I am curious though... it seems like it would take orders of magnitude more computing power to not only train but evolve and re-train the networks. Is this practical with today's hardware?

nl10y ago

This is great, but I think they should make it clear that this isn't using TensorFlow.

From the title and domain I though they either had ported TF to Javascript(!) or we connecting to a server.

sparky_10y ago

Wait - what it is using, then? I had assumed it was TF under Emscripten or similar.

nl10y ago

It appears to be a custom NN implementation[1] in Javascript, somewhat similar to convnet.js[2]

As far as I can see the API[3] isn't much like TensorFlow.

[1] https://github.com/tensorflow/playground

[2] http://cs.stanford.edu/people/karpathy/convnetjs/

[3] https://github.com/tensorflow/playground/blob/master/nn.ts

minimaxir10y ago

When it says "right here in your browser," it's not joking. On my desktop (Safari), the window becomes unresponsive after a few iterations. Does not happen in Chrome.

On my phone (Safari/iOS 9.3), the default neural nework doesn't converge at all even after 300 iterations while it does on the desktop, which is legit weird: https://i.imgur.com/KNaXeHH.png

shancarterOP10y ago

I'm sorry you're having problems with Safari. I can't reproduce on my end, but if you're still having problems you can raise an issue on github with some information about your system.

davidgl10y ago

Works perfectly for me on Safari 9.1 with no extensions

superobserver10y ago

Working splendidly on ChromeOS, FWIW.

nefitty10y ago

Yeah, it's working great in Chrome on my Galaxy Tab 3!

koder201610y ago

To be honest, if it works in Chrome then it covers > 90% of people who would possibly be interested.

danielvf10y ago

In case you are an idiot like me, you have to train your neural network by pressing "play".

andrewstuart210y ago

"Okay, I don't understand. Why is my output so terrible?"

I saw the play button very clearly when the page loaded, then promptly got distracted by all the dials and knobs. :-P

shancarterOP10y ago

We would've liked to have it constantly training, but didn't want to abuse your CPU :)

dingo_bat10y ago

It pauses when I switch tabs :(

gojomo10y ago

While it doesn't involve training, these 'confusion matrix' animations of NNs classifying images or digits are fun, too:

http://ml4a.github.io/dev/demos/cifar_confusion.html http://ml4a.github.io/dev/demos/mnist_confusion.html

Something about the high-speed updating makes me think of WOPR, in 'War Games', scoring nuclear-war scenarios.

timroy10y ago

This demonstration goes really well with Michael Nielsen's http://neuralnetworksanddeeplearning.com/. At the bottom of the page the author gives a shout out to Nielsen, Bengio, and others.

For someone (like me) who's done a bit of reading but not much implementation, this playground is fantastic!

seansmccullough10y ago

Really awesome article!

CGamesPlay10y ago

8note10y ago

http://imgur.com/ypBQEWx

Add some noise, and use all the inputs, and one 8 wide hidden layer

edit: works better with a sigmoid activation curve, but it converges more slowly

andrewtbham10y ago

Yeh you're on the right track. Nice pattern emerges on this after 160 iterations.

http://playground.tensorflow.org/#activation=tanh&batchSize=...

1 more reply

rmellow10y ago

Using syn, cos, x1, x2 with 1 six-neuron hidden layer does the trick quickly: http://imgur.com/UMv5gsH

No need to mess with noise or regularization :)

makeset10y ago

> Add some noise

1 more reply

cglace10y ago

Using all inputs and 6 layers of varying sizes. After about 500 iterations. http://i.imgur.com/x1MOpvl.jpg

1 more reply

Obi_Juan_Kenobi10y ago

Using the defaults, I had success at about 300 iterations with all the inputs and 5 hidden layers, each with a decreasing number of neurons (i.e. 6,5,4,3,2).

I don't know if that's a general feature to need fewer neurons with each layer, but that seems to work here.

chestervonwinch10y ago

What were the optimization algorithms you had most success with? Were they more successful in the sense of better out-of-sample error rate or in the sense of quicker convergence (or something else)?

_AllisonMobley10y ago

Can somebody explain what I'm watching when I press play?

terda1210y ago

Hopefully this helps (correct me if I'm wrong, I'm still learning about neural nets):

Think of the whole neural net as a function:

input * weight = output

At each iteration, we feed in the input to the neural net. Then the neural net compares what output it gets to the correct output.

ryanmonroe10y ago

2 more replies

e1929300110y ago

If I didn't get this course[1], I wouldn't understand what you are talking about.

[1] - https://www.coursera.org/learn/machine-learning

shoemai10y ago

Super basic explanation:

When you hit play, it iterates through the training data making adjustments to each neuron in the network so that it gets closer to predicting the right class.

If you want to see how well the neural network performs on data it wasn't trained on, you can click "show test data".

isuckatcoding10y ago

Obi_Juan_Kenobi10y ago

I think it's a playground in the best sense of the term. Take some time and actually play with it, and a lot of fun stuff happens, lightbulbs go off, etc.

If you're expecting a lesson, you'll likely be disappointed, but I think there's real value in a true playground.

I think the biggest improvement would be if, when hovering over a 'neuron', you get a visual representation of what feeds into it.

1 more reply

andrewtbham10y ago

The dots are the training data. The orange and blue background are the estimation for how the neural net will classify a new orange or blue dot.

shmageggy10y ago

It begins training the network using the backpropagation algorithm.

> Next, the network is asked to solve a problem, which it attempts to do over and over, each time strengthening the connections that lead to success and diminishing those that lead to failure.

On each iteration, it calculates how bad the predicted output is, then adjusts the weights between neurons to lessen that value. Google backpropagation for more info

Mahn10y ago

shoemai10y ago

mholt10y ago

Neural networks have lots of little knobs and levers you can adjust. That's what all these inputs are that you see.

karpathy10y ago

okigan10y ago

In this case you see that it is the swiss roll so you could say pick "proper initialization".

But that technique would not work when you cannot see that it is a "swiss roll" or in multiple dimensions.

brianchu10y ago

danblick10y ago

Has anyone been able to learn a function for the spiral (Swiss roll) data that's as good as a human-designed function would be?

asab10y ago

http://playground.tensorflow.org/#activation=tanh&batchSize=...

asab10y ago

Update: after playing with this for way too long, I've found that it can converge to a spiral with 3 or 2 or even just 1 node in the 2nd hidden layer.

http://playground.tensorflow.org/#activation=tanh&batchSize=...

trampi10y ago

0.007 http://playground.tensorflow.org/#activation=tanh&batchSize=...

moconnor10y ago

For this simple example just choosing the largest possible fully-connected network with ReLU and L2 regularization to prevent overfit quickly converges to a nice spiral (test loss of 0.001 for me):

http://playground.tensorflow.org/#activation=relu&regulariza...

findthewords10y ago

I wouldn't call it quick... spiral in 150 iterations, with sigmoid magic: http://playground.tensorflow.org/#activation=sigmoid&regular...

I find the pulsating unsightly.

amelius10y ago

Neat. How would the number of neurons N scale with the size of the spiral? (Size=number of turns)

Will N level off, meaning that it will really understand the structure of the spiral?

scotty7910y ago

For me key was to use x,y and either sin x, sin y or x squared y squared as inputs and 5 or 6 neurons in hidden layer.

chronolitus10y ago

Add both sin(X1) and sin(X2) as inputs.

hyh104857610y ago

Do you consider test loss around 0.04 good?

halotrope10y ago

You could totally optimise network architecture by crowdsourcing topology discovery for different problems into a multiplayer game with loss as a score.

Your_Creator10y ago

So glad anns are becoming mainstream

Eventually it will have to be recognized as a new species of life, so I hope programmers, tinkerers and everyone else keeps that in mind because all life must be respected

aab010y ago

This is a lot of fun. The default dataset is too easy, though, try out the Swiss Roll one!

minimaxir10y ago

There is a reason why sin(X) is an input property. :p

aab010y ago

2 more replies

nerfhammer10y ago

All the patterns except the swiss roll work best without any hidden layers at all

sparky_10y ago

This is a very cool toy. As someone with no experience in ML, this is an interesting visual approach to the absolute basics.

And great for challenging your friends in an epic battle of convergence!

trgn10y ago

visarga10y ago

It's the classical exploration vs exploitation tradeoff. What do you do, try a radical new variation or fine tune this one?

pkaye10y ago

walrus10y ago

I'm not qualified to answer this, but I will anyway.

[1] ftp://ftp.idsia.ch/pub/juergen/fki-207-95.ps.gz

[2] http://arxiv.org/abs/1409.1259

[3] http://arxiv.org/abs/1505.00387

[4] http://arxiv.org/abs/1502.03167

[5] http://arxiv.org/abs/1512.03385

pkaye10y ago

So you don't think some of these details will not be automated away in the near future so that it doesn't require a specialist to do operate a neural network?

2 more replies

plafl10y ago

Beautiful. The next time someone asks what is machine learning about I'm going to send a link to this page.

nxzero10y ago

"Don’t Worry, You Can’t Break It. We Promise."

(Nice, but it's completely unclear what's going on.)

visarga10y ago

My MacBook Pro running El Capitain froze my mouse and keyboard. I had to do a hard reset.

nkozyra10y ago

Is a 50/50 training:test a normal default ratio for an ANN? I expected to see a higher amount of training data represented as the initial setting.

hyh104857610y ago

One of the finest data visualization I've seen.

icelancer10y ago

This is so great. An easy way to show my friends WTF I do sometimes for math/CS work. Thank you so much.

imaginenore10y ago

I wish they had more interesting data sets.

HappyTypist10y ago

Submit a pull request.

j / k navigate · click thread line to collapse