That's another good question and (as ever) the devil is in the engineering detail. I work with noise professionally (well, with signal in a low SNR environment, where the SNR per acquisition of interest is definitely << 1...) and those sorts of questions are good to ask but depend a lot on the exact noise stats: your approach requires that the average value of the noise is the same in both cases (read-out N times, digitally average; vs read-out once vs "analogue" averaging). In practice, because the distributions differ of different noise sources and images are positive semi-definite, I doubt this is true. The advantage of stacking is that helps with motion (a lot) and also helps with a finite dynamic range instrument (i.e. you don't blow it and can do gain compression).
> Are smaller sensors also faster to read, given the lower capacitance?
This Stanford paper with a model of a CMOS sensor [1] is rather old but quite a good explanation of where the readout time comes from; the capacitance across the active area is C_{pd} but the minimum read-out time is dominated by the capacitance of the readout bias, C_T, across the ADC line. As a result it scales by transistor feature size (fig 5) independent of the sensor area. Of course, as 'moar megapixelz' came along, this got higher and other designs were explored to mitigate it – a paper from Rochester [2] states that removing it buggers up the noise statistics unless you do "clever things" (which they describe in detail).
> That is a good point that I hadn't considered, thanks.
I should procrastinate more productively, but thank you!
[1] https://isl.stanford.edu/~abbas/group/papers_and_pub/tcas1.p... [2] https://sci-hub.se/10.1109/ISCAS.2008.4541803