It's about where you read data from, not how much data gets read. For example an AVX read is aligned if the address being read from is a multiple of 32 bytes, otherwise it's unaligned and runs slightly slower, and slower still if it happens to straddle two cachelines. The same applies to write instructions as well.
It's less of an issue than it used to be, the penalty for unaligned access has steadily been reduced by newer CPU architectures, but it's still there.