undefined | Better HN

0 pointsshakow4y ago0 comments

> Or is it lacking scalability in practice?

Only speaking from my own little perspective in bioinformatics, lack of scalability above all else, both for BNNs and GPs.

Sure, the library support could be better, but that was not the main hurdle, more of a friction.

0 comments

NeedMoreTime4Me4y ago

Do you have an anecdotal guess on the scalability barrier maybe? Like does it take too long with more than 10,000 data points having 100 features? Just to get a feel.

shakowOP4y ago

Please don't quote me on that, as it was academic work in a given language and a given library and might not be representative of the whole ecosystem.

But in a nutshell, on OK-ish CPUs (Xeons a few generations old), we started seeing problems past a few thousands points with a few dozens features.

And not only was the training slow, but also the inference: as we used the whole sampled chain of the weights distributions parameters, not only was memory consumption a sight to behold, but inference time quickly grew through the roof when subsampling was not used.

And all that was on standard NNs, so no complexity added by e.g. convolution layers.

rsfern4y ago

The main bottleneck in GP models is the inversion of an NxN covariance matrix, so training with the most straightforward algorithm has cubic complexity (and quadratic memory complexity). 10k instance is what I’ve seen as the limit of tractability.

The input dimensionally doesn’t necessarily matter since it’s kernel method, but if you have many features and want to do feature selection or optimize parameters things can really stack up.

There are scalable approximate inference algorithms, and pretty good library support (gpflow, gpytorch, etc), but it seems like they are not widely known, and there are definitely tradeoffs to consider among the different methods.

j / k navigate · click thread line to collapse