Case in point: a matrix library I used to use needed to a full row/column pass each time. We put a layer in between it and our code. Reduced lookups required by 30%. We were processing the same amount of data and getting the same results but requiring far less time. That layer also reduced memory requirements. Now we could process larger datasets faster with the same hardware. Thats just one example.
Your choice of CPU and other hardware isn't always the limiting factor. Even the language choice has an impact. Some languages/solutions require more data processing overhead than others to get the same final result.
Even the way your program's Makefile or module composition can have an effect on compiling performance. I remember the use of a code generator we included that meant it had to regenerate a massive amount of code each run due to its input files being changed. We improved it by a ridiculous amount simply by hashing its inputs and comparing the hashes prior to running the code generator. Simply not running that code generator each time meant we sped up the build significantly. 30 minute build times reduced by 5-10 minutes. Same hardware. And that was easily triggered by a trivial file change.