So the vulnerability likely isn't something nobody thought of, it's just that nobody seriously expected the CPU vendors to make the mistake of speculating across multiple loads and actually leaving observable modifications in the caches.
Note that even speculating across multiple loads could lead to observable side-effects by measuring memory bandwidth to differentiate between loads of accessible and silent page fault addresses. [1]
An interesting question is whether the CPU would also speculate on loads from mapped PCI device regions, as that could be also detectable in many different ways.
[1] https://eprint.iacr.org/2016/613.pdf
> Both hardware thread systems (SMT and TMT) expose contention within the execution core. In SMT, the threads effectively compete in real time for access to functional units, the L1 cache, and speculation resources (such as the BTB). This is similar to the real-time sharing that occurs between separate cores, but includes all levels of the architecture. [...] SMT has been exploited in known attacks (Sections 4.2.1 and 4.3.1)