undefined | Better HN

0 pointsjmmcd8y ago0 comments

The point about infinite training data is potentially useful. The other one I still don't agree with. Your goal is only to understand the NN insofar as it models the original data. Any errors the NN is making are not worth learning about. So it would be better to train the understandable method (DT) on the original data.

0 comments

Houshalter8y ago

>Any errors the NN is making are not worth learning about.

But that's the whole point of this method! To understand what errors the NN might be making. It's also quite possible the NN's errors aren't really errors, if there are mistakes or noise in the labels.

This technique has been called "dark knowledge" and is really interesting. See http://www.kdnuggets.com/2015/05/dark-knowledge-neural-netwo... They train much simpler models to get the same accuracy as much bigger models, just by copying the predictions of the bigger model on the same data. In fact you can get crazy results like this:

>When they omitted all examples of the digit 3 during the transfer training, the distilled net gets 98.6% of the test 3s correct even though 3 is a mythical digit it has never seen.

jmmcdOP8y ago

Ah, very interesting! I agree that would be useful. But I think this thread has ended up with a proposal very different from the one I started replying to.

1 more reply

j / k navigate · click thread line to collapse