undefined | Better HN

0 pointsaabaker994y ago0 comments

1. Gradient descent almost always finds a non optimum local min (it is not guaranteed to find a global min).

0 comments

Isn’t the current best practice to train highly over-parametrized models to zero training error? That’d be a global optima, no?

Unless we’re talking about the optima of test error.

aabaker99OP4y ago

If you find a zero in a non negative function, I would call that a global minima, yes.

aledalgrande4y ago

Yeah but depending on the data you might have even worse results, selecting the right subset to be representative is really important.

jhgb4y ago

Would a random sample be representative? Statistically this seems to be the case for any large N. In fact it's not clear to me that any other sample would be more representative.

aledalgrande4y ago

Many public datasets have skewed classes so if you take a random approach you're not gonna have a good result. And N might not be big enough anyway.

j / k navigate · click thread line to collapse

0 comments

agnosticmantis4y ago

Isn’t the current best practice to train highly over-parametrized models to zero training error? That’d be a global optima, no?

Unless we’re talking about the optima of test error.

aabaker99OP4y ago

If you find a zero in a non negative function, I would call that a global minima, yes.

aledalgrande4y ago

Yeah but depending on the data you might have even worse results, selecting the right subset to be representative is really important.

jhgb4y ago

Would a random sample be representative? Statistically this seems to be the case for any large N. In fact it's not clear to me that any other sample would be more representative.

aledalgrande4y ago

Many public datasets have skewed classes so if you take a random approach you're not gonna have a good result. And N might not be big enough anyway.

j / k navigate · click thread line to collapse