Indeed, in some cases you want the unrolls. The 6502 is good in the twisties, but if you're trying to do any kind of copy or fill then the percentage of meaningful cycles is disappointingly low, and the unroll may be necessary. Also, if you're trying to keep in sync with some other piece of hardware, then just doing it step at a time can be much easier.
I have done a lot of all of this sort of code and I am quite familiar with the 6502 tradeoffs. But for printing 15 chars by calling a ROM routine, I stand by my comments.