undefined | Better HN

0 pointsmaxander8y ago0 comments

The article is making a simpler point than that. If I show you the table:

And so on for some arbitrary number of rows, you can look at the table all you want but you will not perceive "A+C=B". It's just not written there. To get A+C=B you have to generate something else in addition to the table, namely a hypothesis- but this is a creative act, not an empirical one.

0 comments

LolWolf8y ago

In this case, any linear model (with reasonable, minimizable loss, which could be, say convex) will learn the correct thing.

Here's a quick gist[0] doing it using least-squares, and learning it exactly (also, for B in the third row you may have intended 35 instead of 45?).

This simple regression model learns exactly the weights (-1, 1)—equivalently, it learns -A + B = C.

------

[0] https://gist.github.com/guillean/6f3ff05fa99b2b377fcf309bdc4...

tlb8y ago

If you connect A and B as the input to a linear neural net, and train against C, it'll very quickly arrive at weights of [-1, +1] and be able to correctly predict C given A and B. Whether or not it represents it in notation humans are familiar with, it has learned it for the practical purpose of being able to compute the function.

goatlover8y ago

But how would a neural network know to connect data for mass with data for the measured speed of light? Why would a neural network be looking for an equation for energy conversion in the first place? If you just provide tons of raw data from instruments, what does that mean? What do yo do with it?

Sure, a human can clean the data and put it into a format that gives meaningful results. But if we're just talking about an AI learning from raw data with no supervision, where does it even start?

mannykannot8y ago

As someone who finds this interesting but does not know enough to take a position, I think a bigger question would be how does it come up with the abstract concept of energy?

I am aware that Alpha Go Zero came up with various strategic abstractions of the game that are recognized by competent players, and some novel ones, but I do not know where this program and its self-play training stands in the dichotomy of this debate.

1 more reply

PeterisP8y ago

Given a sufficiently large table, an agent can certainly perceive without acting (testing a hypothesis) that "A+C=B" through its compressibility; the most compact(least complex) representation of the data can replace the many individual datapoints in one column with a learned rule how to calculate it.

There's related research on how children learn language, namely, how much observed evidence (i.e. based on cases where we have monitored and counted every word a child has heard in their life) is needed for a child to switch from a "lookup table" approach for certain features to "rule based" approach (detectable by observing overregularization, applying a systematic rule even when the actual language, including examples the child has heard, has an exception to that rule) and then to a "rule+exceptions" correct understanding; the experiments point towards "learning a rule" then and only then when a "compressed representation" is beneficial from information theory point of view.

perl4ever8y ago

It seems to me that A+C=B and E=mc^2 are both written in, or consistent with, that table. But widening the context produces something different in each from a probabilistic perspective. A+C=B is 100% certain with narrow context, but falsified with a little more context, while E=mc^2 is less certain with narrow context, but increasingly certain with wider context.*

*Like, this table is on a computer connected to a global network, which is based on electromagnetism, which is intimately related to relativity. The more you understand what the table is, the more certain you can be of E=mc^2.

AstralStorm8y ago

Yes, but this is just total energy. Try kinetic energy instead. (Even Newton's Ek=0.5 * mv^2 much less Lorentz special relativity or general relativity.)

Now the model with kinetic energy plus rest energy. A network unaware of time will be unable to figure it out. Especially the differential in velocity.

What you need to actually devise such laws is generalizing conflict-driven clause learning with some good rule to pick models, name them and enumerate them. E.g. defining minimum generalizing set of logic clauses with support for undecidable and uncomputable functions. (Which means deciding when to give up.) This is essentially the inverse of a MAX-SAT solver. Minimax logic representation so to speak.

irishsultan8y ago

Well, that hypotesis would be wrong in this particular case (check the middle row).

Which doesn't make your point wrong of course, and this is for a simple function.

j / k navigate · click thread line to collapse