undefined | Better HN

0 pointsAnotherGoodName2y ago0 comments

The inputs here are random which is the problem and why this isn't demonstrating that. Create an input of all 's' and compare it.

0 comments

torstenvl2y ago

Better than random input, but still only ~half as fast as using sete

    [19:13:34 user@boxer ~/src/looptest] $ diff -u bench.c bench-alls.c      
    --- bench.c 2023-07-06 16:04:16.000000000 -0400
    +++ bench-alls.c 2023-07-06 19:13:34.000000000 -0400
    @@ -17,7 +17,7 @@
       int num_rand_calls = number / CHAR_BIT + 1;
       unsigned char *buffer = malloc(num_rand_calls * CHAR_BIT);
       for (int i = 0; i < num_rand_calls; i++) {
    -    buffer[i] = rand();
    +    buffer[i] = 's'; //rand();
       }
       return buffer;
     }
    [19:13:37 user@boxer ~/src/looptest] $ gcc -O3 bench-alls.c loop2.s -o l2
    [19:13:42 user@boxer ~/src/looptest] $ gcc -O3 bench-alls.c loop4.s -o l4
    [19:13:47 user@boxer ~/src/looptest] $ time ./l2 1000 1
    250001000
    ./l2 1000 1  0.69s user 0.00s system 99% cpu 0.699 total
    [19:13:55 user@boxer ~/src/looptest] $ time ./l4 1000 1
    250001000
    ./l4 1000 1  1.28s user 0.00s system 99% cpu 1.290 total

Jumps are slower.

Guvante2y ago

Microbenchmarks are hard. You aren't doing any meaningful work that could benefit from speculatively executing instead of stalling for the conditional value.

Similarly you might be busting the pipeline by chaining together the jumps so close together.

Not saying your point is wrong, just saying your proof isn't super solid.

1 more reply

gpderetta2y ago

In this benchmark the only loop carried dependency is over the res variable (edit: and of course the index). The jump doesn't break these dependencies, so for this specific problem, the additional latency of the cmov doesn't matter as it is always perfectly pipelined and cmov will always come up on top. But if the input of cmov depended on a previous value, then potentially a branch could be better given an high enough prediciton rate.

j / k navigate · click thread line to collapse