Sure, but the point is that for the most significant limb, there is no point in having redundant bits because whatever you put in them will be discarded after normalization.
Ah, in that case, you're right, it would make sense to use all 64 bits for the top limb. Still, making them all equal-sized can have benefits if you use SIMD or similar techniques to operate on them uniformly. One project of mine has been trying to work with large integers in CUDA by distributing their limbs across a warp.
Maybe some use cases want to know if there was overflow? In which case having more space for carry bits to accumulate makes it easy to give an accurate answer