Do you think this is true because C code has traditionally relied so heavily on branching?
Put another way, perhaps branch-free programming is has greater speed potential, but the tools we use have been heavily optimized for branch-laden programming. Maybe putting sufficient resources into optimizing code for branch-free programming (and stripping out optimizations for branching) would yield faster execution.
I don't know the answer, just musing. I am curious, though.