undefined | Better HN

0 pointsjampekka10mo ago0 comments

Computationally OLS is taking the pseudoinverse of the system matrix, which for dense systems has a complexity of O(samples * parameters^2). For some GD implementations the complexity of a single step is probably O(samples * parameters), so there could be a asymptotic benefit, but it's hard to imagine a case where the benefit is even realized, let alone makes a practical difference.

And in any case nobody uses GD for regressions for statistical analysis purposes. In practice Newton-Raphson or other more complicated schemes (with a lot higher computation, memory and IO demands) with a lot nicer convergence properties are used.

0 comments

easygenes10mo ago

Mini batch and streaming GD make the benefits obvious and trivial. Closed form OLS is unbeatable so long as samples * params^2 is comfortably sitting in memory. You often lose that as soon as your p approaches 10^5, which is common these days. Soon as you need distributed, streaming, or your data is too tall and or too wide then first order methods are the point of call.

jampekkaOP10mo ago

With batching it becomes SGD. If you're OK with approximations, you have e.g. randomized, reduced rank and streaming SVDs. And these tend have a lot nicer approximation and convergence properties than SGD.

What are the common cases for 10^5 parameter OLS? Perhaps something like weather models could include such computations?

j / k navigate · click thread line to collapse

0 pointsjampekka10mo ago0 comments

0 comments

easygenes10mo ago

jampekkaOP10mo ago

What are the common cases for 10^5 parameter OLS? Perhaps something like weather models could include such computations?

j / k navigate · click thread line to collapse