Here is a complete simplified Kahan summation test and indeed it works with -O3 but fails with -Ofast. There must have been something else going on in my real program at -O3. However the original point that 'volatile' can be a workaround for some optimization problems is still valid (you may want the rest of your program to benefit from -Ofast without breaking certain parts).
Changing the three kahan_* variables to volatile makes this work (slowly) with -Ofast.
#include <stdio.h>
int main(int argc, char **argv) {
int i;
double sample, sum;
double kahan_y, kahan_t, kahan_c;
// initial values
sum=0.0;
sample=1.0; // start with "large" value
for (i=0; i <= 1000000000; i++) { // add 1 large value plus 1 billion small values
// Kahan summation algorithm
kahan_y=sample - kahan_c;
kahan_t=sum + kahan_y;
kahan_c=(kahan_t - sum) - kahan_y;
sum=kahan_t;
// pre-load next small value
sample=1.0E-20;
}
printf("sum: %.15f\n", sum);
}