But once you're up in that "I can't even fit this in an 4-8U sled" territory (whatever it is in a given decade) you're probably doing some kind of map/reduce thing, so there's a strong incentive to have a column-major layout. If you can periodically sort by some important column so much the better (log2 n binary search), but mostly you've got a bunch of mappers (which you work hard to get locality on relative to the DFS replicas where the disks live, maybe on the same machine, maybe in the same top-of-rack switch or whatever) zipping through different columns or column sets and producing eligible conceptual "rows" to go into your "shuffle/sort/reduce" pipeline to deal with joins and sorts and stuff like that.
I don't know how Google does it, but I think most everyone else started with something like the Hadoop ecosystem and many with something like Hive/HQL to give a SQL-like way to express that job, especially for ad-hoc queries (long-lived, rarely changing overnight jobs might get optimized into some lower-level representation).
Around the time I was getting out of that game, Spark was starting to get really big, which was due to some combination of RAM getting really abundant and just kind of a re-think on what was by then a pretty old cost model. I have no idea what people are doing now.
I'd love it if someone with up-to-date knowledge about how this stuff works these days chimed in.
That said, they've kind of introduced it with the Search Optimization Service, which is like an index across the whole table for fast lookups, but even that is automatically maintained in your behalf.
the indices is nice, but the bigger selling feature for me is if you have many services, and each services data are in the warehouse, you can join against them all together.