Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
undefined | Better HN
0 points
aabaker99
4y ago
0 comments
Share
1. Gradient descent almost always finds a non optimum local min (it is not guaranteed to find a global min).
0 comments
default
newest
oldest
agnosticmantis
4y ago
Isn’t the current best practice to train highly over-parametrized models to zero training error? That’d be a global optima, no?
Unless we’re talking about the optima of test error.
aabaker99
OP
4y ago
If you find a zero in a non negative function, I would call that a global minima, yes.
aledalgrande
4y ago
Yeah but depending on the data you might have even worse results, selecting the right subset to be representative is really important.
jhgb
4y ago
Would a random sample be representative? Statistically this seems to be the case for any large N. In fact it's not clear to me that any other sample would be more representative.
aledalgrande
4y ago
Many public datasets have skewed classes so if you take a random approach you're not gonna have a good result. And N might not be big enough anyway.
j
/
k
navigate · click thread line to collapse