undefined | Better HN

0 pointsahgamut4y ago0 comments

> They tested it on problems from recent contests. The implication being: the statements and solutions to these problems were not available when the Github training set was collected.

Yes, and I would like to know how similar the dataset(s) were. Suppose the models were trained only on greedy algorithms and then I provided a dynamic programming problem in the test set, (how) would the model solve it?

> And yet, many humans who participate in these contests are unable to do so (although I guess the issue here is that Github is not properly indexed and searchable for humans?).

Indeed, so we don't know what "difficult" means for <human+indexed Github>, and hence we cannot compare it to <model trained on Github>.

My point is, whenever I see a new achievement of deep learning, I have no frame of reference (apart from my personal biases) of how "trivial" or "awesome" it is. I would like to have a quantity that measures this - I call it generalization difficulty.

Otherwise the datasets and models just keep getting larger, and we have no idea of the full capability of these models.

0 comments

pedrosorio4y ago

> Suppose the models were trained only on greedy algorithms and then I provided a dynamic programming problem in the test set, (how) would the model solve it?

How many human beings do you personally know who were able to solve a dynamic programming problem at first sight without ever having seen anything but greedy algorithms?

Deepmind is not claiming they have a machine capable of performing original research here.

Many human programmers are unable to solve DP problems even after having them explained several times. If you could get a machine that takes in all of Github and can solve "any" DP problem you describe in natural language with a couple of examples, that is AI above and beyond what many humans can do, which is "awesome" no matter how you put it.

sibeshk964y ago

> that is AI above and beyond what many humans can do, which is "awesome" no matter how you put it.

That's not the point being made. The point OP is making is that it is not possible to understand how impressive at "generalizing" to uncertainty a model is if you don't know how different the training set is from the test set. If they are extremely similar to each other, then the model generalizes weakly (this is also why the world's smartest chess bot needs to play a million games to beat the average grandmaster, who has played less than 10,000 games in her lifetime). Weak generalization vs strong generalization.

Perhaps all such published results should contain info about this "difference" so it becomes easier to judge the model's true learning capabilities.

machiaweliczny4y ago

I guess weaker generalisation is why it's better though. It converges slower but in the end it knowledge is more subtle. So my bet is more compute and programing and math is "solved" - not in research sense but very helpful "copilot".

The real fun will begin once someone discovers how to make any problem differentiable so try/error method isn't needed. I suggest watching recent Yann Le Cun interview. This will solve researching as well.

ahgamutOP4y ago

> How many human beings do you personally know who were able to solve a dynamic programming problem at first sight without ever having seen anything but greedy algorithms?

Zero, which is why if a trained network could do it, that would be "impressive" to me, given my personal biases.

>. If you could get a machine that takes in all of Github and can solve "any" DP problem you describe in natural language with a couple of examples, that is AI above and beyond what many humans can do, which is "awesome" no matter how you put it.

I agree with you that such a machine would be awesome, and AlphaCode is certainly a great step closer towards that ideal. However, I would like to have a number measures the "awesomeness" of the machine (not elo rating because that depends on a human reference), so I will have something as a benchmark to refer to when the next improvement arrives.

pedrosorio4y ago

I understand wanting to look at different metrics to gauge progress, but what is the issue with this?

> not elo rating because that depends on a human reference

2 more replies

j / k navigate · click thread line to collapse

0 comments

pedrosorio4y ago

> Suppose the models were trained only on greedy algorithms and then I provided a dynamic programming problem in the test set, (how) would the model solve it?

How many human beings do you personally know who were able to solve a dynamic programming problem at first sight without ever having seen anything but greedy algorithms?

Deepmind is not claiming they have a machine capable of performing original research here.

sibeshk964y ago

> that is AI above and beyond what many humans can do, which is "awesome" no matter how you put it.

Perhaps all such published results should contain info about this "difference" so it becomes easier to judge the model's true learning capabilities.

machiaweliczny4y ago

ahgamutOP4y ago

> How many human beings do you personally know who were able to solve a dynamic programming problem at first sight without ever having seen anything but greedy algorithms?

Zero, which is why if a trained network could do it, that would be "impressive" to me, given my personal biases.

pedrosorio4y ago

I understand wanting to look at different metrics to gauge progress, but what is the issue with this?

> not elo rating because that depends on a human reference

2 more replies

j / k navigate · click thread line to collapse