Sure if you are counting cache misses or optimizing for a specific compute unit. Few people do that outside of AI, crypto, AV codecs, simulation, etc. If you are doing that you may need to specialize per arch anyway because you’re probably using vector units.
I was speaking to whether the code will work. The answer is almost always yes. Optimizations at a higher level such as algorithm choice or where allocations are performed also are architecture neutral. This is most optimization.