When the hardware scales, the tricks to wring maximum performance out of it change.
When you come back to build the new version of your game on the next gen console sure you can now add all those features but the processing pipelines are different now and the disk performance and memory to cache ratios have changed - getting your hyper optimized code to work in the new platform takes a ton of effort - so you either run it in some kind of emulation mode, sacrificing some of the performance for productivity, or you rewrite it.
Same happens with new generations of server hardware. Your clever hack to maximize NUMA locality of data to each core becomes a liability when the next hardware gen comes out and on-die caches are bigger. Decisions about what should use RAM get invalidated by faster SSDs.
Maybe you can build a service this way - hardware first. Pick a server platform for a couple of years; build to that capacity; ship; then start designing the next gen service to run on a new set of hardware?
To some extent this is how database or virtualization systems software is written. And if I’m not mistaken Twitter actually did develop their own database stack to optimally handle their particular data storage model, and I assume that was done pretty close to the metal.