During dynamic linking, glibc picks a memcpy implementation which seems most appropriate for the current machine. We have about 13 different implementations just for x86-64. We could add another one for current(ish) AMD CPUs, select a different existing implementation for them, or change the default for a configurable cutover point in a parameterized implementation.