function calls push registers, allocate a stack frame, and push a return address. By breaking your inner loops into more functions you've now generated megabytes of pointless memory thrash... on a console. Additionally, at the time this stuff was written, compilers and toolchain weren't neccessarily standardized, or were customized to accomodate certain use cases, so they needed more hand-holding to generate performant code. This is just a fact.. I'm not saying you're wrong, but you're definitely viewing this with hindsight.
Function calls do none of those things if they’re inlined, and compilers have been able to inline functions for a long time. Perhaps this code is even older than that.