The fact that the right shift for a negative integer gives the floor function of the result just makes the correction easier than if you had used division with truncation towards zero.
The shifted out bit is always positive, regardless whether the shift had been applied to negative or positive numbers.
Except for following a tradition generated by a random initial choice, programming would have been in many cases easier if the convention for the division of signed numbers would have been to always generate positive remainders, instead of generating remainders with the same sign as the quotient.
Actually I was curious to see if GCC would be smart enough to automatically choose what's the best optimization depending on the underlying architecture, but it doesn't appear to be the case.
For x86_64 (with -O3 or -Os):
avg_64bits:
.LFB0:
.cfi_startproc
movl %edi, %edi
movl %esi, %esi
leaq (%rdi,%rsi), %rax
shrq %rax
ret
.cfi_endproc
avg_patented_do_not_steal:
.LFB1:
.cfi_startproc
movl %edi, %eax
movl %esi, %edx
andl %esi, %edi
shrl %eax
shrl %edx
andl $1, %edi
addl %edx, %eax
addl %edi, %eax
ret
Clearly just casting to 64bits seems to denser codeFor ARM32 (-O3 and -Os):
avg_64bits:
push {fp, lr}
movs r3, #0
adds fp, r1, r0
adc ip, r3, #0
mov r0, fp
mov r1, ip
movs r1, r1, lsr #1
mov r0, r0, rrx
pop {fp, pc}
avg_patented_do_not_steal:
and r3, r1, #1
ands r3, r3, r0
add r0, r3, r0, lsr #1
add r0, r0, r1, lsr #1
bx lr
A lot more register spilling in the 64bit version since it decides to do a true 64bit add using two registers and an adc.My code, for reference:
uint32_t avg_64bits(uint32_t a, uint32_t b) {
uint64_t la = a;
uint64_t lb = b;
return (la + lb) / 2;
}
uint32_t avg_patented_do_not_steal(uint32_t a, uint32_t b) {
return (a / 2) + (b / 2) + (a & b & 1);
}