Big O/algorithmic complexity is interesting in that it tends to abstract away the architecture of the underlying processor. A copy is a copy is a copy. Arithmetic is arithmetic is arithmetic. Regardless of what instructions/processing units the underlying processor has. Regardless of data access. Regardless of parallelization. Regardless of memory complexity. All we use are brutish units of “work” — despite not all “workers” being equal.
It reminds me a bit of scientific management: treating your “workers” as interchangeable units that carry out what has been decided to be the most efficient movements and processes — completely disregarding the individual characteristics of the worker. For a humanist example, consider the individual quirks of the physiology (not only in size, length, and composition of the skeletal system via the muscles, bones, tendons and so on that would necessitate there being specific movements and workloads that best suit them — but also in psychology; the brain as its own work-producing part that is uniquely suited and most efficient for specific workloads; rather than an all-encompassing abstract average “best workstyle” that makes no note of these characteristics, but simply decides to use external metrics like “time to pick up a box using X, Y, or Z form.”)
The same parallel can be drawn to computers: different processors and collections of hardware (what is essentially the “body and brain”) have different quirks and different workloads they perform best at. For example, GPUs are much more useful in the case of vectorized/parallel workloads where the operations are relatively simple (f.e matrix arithmetic). While you can run an iterative matrix multiplication algorithm on a CPU in n^3 time — your data access costs will be significant. Instead, running a parallel algorithm on a GPU, with RAM very close by, you can achieve log^2 n.
This is where CS research really shines: not running away from the realities of physical systems, but embracing them.