A B C
1 4 3
20 45 15
8 15 7
And so on for some arbitrary number of rows, you can look at the table all you want but you will not perceive "A+C=B". It's just not written there. To get A+C=B you have to generate something else in addition to the table, namely a hypothesis- but this is a creative act, not an empirical one.Here's a quick gist[0] doing it using least-squares, and learning it exactly (also, for B in the third row you may have intended 35 instead of 45?).
This simple regression model learns exactly the weights (-1, 1)—equivalently, it learns -A + B = C.
------
[0] https://gist.github.com/guillean/6f3ff05fa99b2b377fcf309bdc4...
Sure, a human can clean the data and put it into a format that gives meaningful results. But if we're just talking about an AI learning from raw data with no supervision, where does it even start?
I am aware that Alpha Go Zero came up with various strategic abstractions of the game that are recognized by competent players, and some novel ones, but I do not know where this program and its self-play training stands in the dichotomy of this debate.
There's related research on how children learn language, namely, how much observed evidence (i.e. based on cases where we have monitored and counted every word a child has heard in their life) is needed for a child to switch from a "lookup table" approach for certain features to "rule based" approach (detectable by observing overregularization, applying a systematic rule even when the actual language, including examples the child has heard, has an exception to that rule) and then to a "rule+exceptions" correct understanding; the experiments point towards "learning a rule" then and only then when a "compressed representation" is beneficial from information theory point of view.
*Like, this table is on a computer connected to a global network, which is based on electromagnetism, which is intimately related to relativity. The more you understand what the table is, the more certain you can be of E=mc^2.
Now the model with kinetic energy plus rest energy. A network unaware of time will be unable to figure it out. Especially the differential in velocity.
What you need to actually devise such laws is generalizing conflict-driven clause learning with some good rule to pick models, name them and enumerate them. E.g. defining minimum generalizing set of logic clauses with support for undecidable and uncomputable functions. (Which means deciding when to give up.) This is essentially the inverse of a MAX-SAT solver. Minimax logic representation so to speak.
Which doesn't make your point wrong of course, and this is for a simple function.