Still, what you normally do and can assume is that you're using just "CPU" memory with pretty predictable (I think) latencies and a transparent caching layer.
When different things are mapped into the address space, that's an abstraction the programmer (or the user) consciously made. It should be possible to figure out the performance characteristics there.
Of course, many programs work on various machines with their own performance characteristics. You should still be able optimize for any one of them by querying the hardware and selecting an appropriate implementation. If you want to put in the work.
I don't think assembler/C is such a big problem here. But then again, I'm not a low level guy (in this sense) for now.