I wanted to write a memcpy() routine for a microcontroller. I wrote a naive version where I copied from src to dst one byte at a time. You can find algorithms which are more efficient than this, which will typically copy 32 bit words at a time.
The interesting thing is, I turned on compiler optimisations. When I examined the assembled output (even though my knowledge of assembly is poor), I discovered that it had made the optimisations that you would find in a more complex C implementation. The compiler obviously thought to itself "I see what you're doing here", and put in a better version.
So the moral of the story is: your compiler is likely to be able to figure out a lot.