> Secondly, the reduction of noise by merging is at best the sqrt (number of frames)
This assumes the (typically Gaussian) noise is applied to a static image. Arguably, one could exploit the slight shakiness in a handheld shot to create an image with even less noise.
As a thought experiment, consider thousands of shots of a perfectly static scene made with an idealized, noiseless camera that is moved a very tiny amount for each shot. You could continue improving the resolution of a generated, composite image quite a bit until warping due to camera displacement became noticeable.
Recent techniques like this are actually being used for cryo-electron microscopy to create extremely high-resolution imaging of proteins.