undefined | Better HN

0 pointssomewhereoutth3y ago0 comments

You don't need non-linearities, an infinite series of sine functions is enough to model any function.

For extraordinary claims ('intelligence'), the burden of proof is on those making the claim, not on others to prove the negative.

0 comments

meindnoch3y ago

>You don't need non-linearities, an infinite series of sine functions is enough

The sine function is nonlinear.

somewhereoutthOP3y ago

Ah so it is! I guess my point was that even this simple function is enough, you don't need any 'magic' beyond that. In particular, this continuous function can be used (infinitely) to represent discontinuous functions - the step function being the usual example. That is the more interesting and relevant mathematical fact I think.

meindnoch3y ago

Well, yes. The 'magic' is the nonlinearity. It's because the composition of linear (actually also affine) functions is still just linear (affine). You don't get any additional power by combining many of them - which is also the reason why linear functions are so well understood and easy to work with.

You give sprinkle in a tiny nonlinearity (e.g. x^2 instead of x) and suddenly you can get infinite complexity by weighted composition - which is also the reason why we're so helpless with nonlinear functions and reach for linear approximations immediately (cf. gradient descent).

1 more reply

dmd3y ago

So is IEEE floating point https://www.youtube.com/watch?v=Ae9EKCyI1xU

VHRanger3y ago

You can even do it in a linear regression. Just add enough polynomial terms (x^a) and interaction terms (x1 * x2). The end model looks something like:

y = b + x1 + x1^2 + x1^3 + ... + x1 * x2 + (x1 * x2)^2 + ... + x2 + x2^2 + ...

By that point you're making a Taylor approximation of the latent function through linear space, which is also a universal approximator.

So the commenter above is wrong -- neural networks are indeed just glorified linear regression from this point of view.

The main difference is that this kitchen sink regression is computationally inefficient which neural nets are extremely efficient computationally.*

ogogmad3y ago

That would suggest that 1-Hidden-Layer neural nets would work fine, since they are also universal function approximators. But no -- when people talk about "deep learning", the word "deep" refers to having lots of hidden layers.

I'm not an expert, but the motivation seems more like this:

- Linear regression and SVM sometimes work. But they apply to very few problems.

- We can fit those models using gradient descent. Alternatives to gradient descent do exist, but they become less useful as the above models get varied and generalised.

- Empirically, if we compose with some simple non-linearities, we get very good results on otherwise seemingly intractable problems like OCR. See Kernel SVM and Krieging.

- Initially, one might choose this non-linearity from a known list. And then fit the model using specialised optimisation algorithms. But gradient descent still works fine.

- To further improve results, the choice of non-linearity must itself be optimised. Call the non-linearity F. We break F into three parts: F' o L o F'', where L is linear, and F' and F'' are "simpler" non-linearities. We recursively factorise the F' and F'' in a similar way. Eventually, we get a deep feedforward neural network. We cannot use fancy algorithms to fit such a model anymore.

- Somehow, gradient descent, despite being a very generic optimisation algorithm, works much better than expected at successfully fitting the above model. We have derived Deep Learning.

VHRanger3y ago

RE: 1-layer neural nets, yes, that's the point.

Deep learning has been a series of *engineering* successes stacking over each other rather than theory being applied rigorously.

It's hard to scale training on the "dumb" approximators like a kitchen sink regression, and balancing overfitting becomes a nightmare.

1 more reply

theGnuMe3y ago

1-layer nets are equivalent in theory and universal approximators, they are just hard to train.

2 more replies

fancyfredbot3y ago

Sine is non-linear, but you are right about not needing non-linear functions, as recently proven by Tom VII in a groundbreaking SIGBOVIK paper:

http://tom7.org/grad/

(Edit: this whole thread feels like a setup for this punchline)

j / k navigate · click thread line to collapse

0 comments

meindnoch3y ago

>You don't need non-linearities, an infinite series of sine functions is enough

The sine function is nonlinear.

somewhereoutthOP3y ago

meindnoch3y ago

1 more reply

dmd3y ago

So is IEEE floating point https://www.youtube.com/watch?v=Ae9EKCyI1xU

VHRanger3y ago

You can even do it in a linear regression. Just add enough polynomial terms (x^a) and interaction terms (x1 * x2). The end model looks something like:

y = b + x1 + x1^2 + x1^3 + ... + x1 * x2 + (x1 * x2)^2 + ... + x2 + x2^2 + ...

By that point you're making a Taylor approximation of the latent function through linear space, which is also a universal approximator.

So the commenter above is wrong -- neural networks are indeed just glorified linear regression from this point of view.

The main difference is that this kitchen sink regression is computationally inefficient which neural nets are extremely efficient computationally.*

ogogmad3y ago

I'm not an expert, but the motivation seems more like this:

- Linear regression and SVM sometimes work. But they apply to very few problems.

- We can fit those models using gradient descent. Alternatives to gradient descent do exist, but they become less useful as the above models get varied and generalised.

- Empirically, if we compose with some simple non-linearities, we get very good results on otherwise seemingly intractable problems like OCR. See Kernel SVM and Krieging.

- Initially, one might choose this non-linearity from a known list. And then fit the model using specialised optimisation algorithms. But gradient descent still works fine.

- Somehow, gradient descent, despite being a very generic optimisation algorithm, works much better than expected at successfully fitting the above model. We have derived Deep Learning.

VHRanger3y ago

RE: 1-layer neural nets, yes, that's the point.

Deep learning has been a series of *engineering* successes stacking over each other rather than theory being applied rigorously.

It's hard to scale training on the "dumb" approximators like a kitchen sink regression, and balancing overfitting becomes a nightmare.

1 more reply

theGnuMe3y ago

1-layer nets are equivalent in theory and universal approximators, they are just hard to train.

2 more replies

fancyfredbot3y ago

Sine is non-linear, but you are right about not needing non-linear functions, as recently proven by Tom VII in a groundbreaking SIGBOVIK paper:

http://tom7.org/grad/

(Edit: this whole thread feels like a setup for this punchline)

j / k navigate · click thread line to collapse