The nbody sim at least is forced to use the same algorithm. It seems unlikely that an optimised pypy (non-BLAS) implementation beats an optimised C imp by 20x.
That's because you assume an algorithm implemented on top of the C implementation of python is equivalent to a C implemention of the algorithm. You don't understand how CPython works.