Please don't quote me on that, as it was academic work in a given language and a given library and might not be representative of the whole ecosystem.
But in a nutshell, on OK-ish CPUs (Xeons a few generations old), we started seeing problems past a few thousands points with a few dozens features.
And not only was the training slow, but also the inference: as we used the whole sampled chain of the weights distributions parameters, not only was memory consumption a sight to behold, but inference time quickly grew through the roof when subsampling was not used.
And all that was on standard NNs, so no complexity added by e.g. convolution layers.