An optimized memcpy/memset using normal instructions is much faster than "rep movsb"/"rep stosb" only in certain ranges of the copy/fill size (on all modern Intel/AMD CPUs).
Using normal instructions for memcpy can have around a double speed for copy sizes under 1 kB, but it is always slower for very big copies.
For an optimized memcpy/memset, one must choose between normal instructions and "rep movsb"/"rep stosb" for each copy/fill, depending on the CPU model and on the copy/fill size.