undefined | Better HN

0 pointspjscott5y ago0 comments

It's not entirely surprising that a carefully-optimized C program using explicit SSE intrinsics, plus a fancy trick involving a low-precision square root instruction fixed up with two iterations of Newton's method, would be fast. :-)

What impresses me is that the Rust version didn't do any of that stuff, just wrote very boring, straightforward code -- and got the same speed anyway. Some impressive compilation there!

0 comments

neopallium5y ago

Someone wrote a 6 part of blog post [0] about porting that nbody benchmark from C to Rust. They went from a straight line by line port using unsafe rust using the same SSE based design to clean no-unsafe and no SSE rust code that was faster then the original C code with hand optimized SSE.

It is a great example of how the Rust compiler can auto-vectorize code.

0. http://cliffle.com/p/dangerust/6/

creato5y ago

I would be shocked if those two programs performed the same. C and Rust are using the same compiler backend, better aliasing information probably isn't going to make that big of a difference.

Are you sure the rust performance data isn't for one of the other implementations that use the same crufty tricks as the C version?

e.g.: https://benchmarksgame-team.pages.debian.net/benchmarksgame/..., https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

pjscottOP5y ago

It's pretty startling, but yes, that non-crufty version is "Rust #8" that tops the n-body performance lists:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

It looks like the autovectorizer did a really good job on this one.

creato5y ago

It also looks like this version is using an algorithm that none of the others use: it precomputes the distance pairs for all bodies. Some of the others precompute the vectors between the pairs, but that's not the expensive part of computing the distance.

As an aside: this made me notice this n-body simulation only has 5 bodies! This is a pretty strange case that makes these O(n^2) optimizations practical. The n-body simulations I've been familiar with in the past had thousands of bodies, where this approach probably isn't a good idea.

FartyMcFarter5y ago

Good point! It would be interesting to find out where the Rust version gets most of its speed from.

neopallium5y ago

The Rust compiler can auto-vectorize loop code.

This blog post shows how to write simple idiomatic Rust code that will allow the compiler to auto-vectorize:

http://cliffle.com/p/dangerust/6/

jcelerier5y ago

But, C and C++ compilers also can autovectorize quite well. I had some SSE & AVX algorithms where the compiler ended up doing the same or lightly better job in C++ for instance.

So maybe it's just the C benchmark being a cargo cult.

amelius5y ago

Yes, it could be mostly LLVM doing the heavy lifting here, for all we know.

j / k navigate · click thread line to collapse

0 comments

neopallium5y ago

It is a great example of how the Rust compiler can auto-vectorize code.

0. http://cliffle.com/p/dangerust/6/

creato5y ago

I would be shocked if those two programs performed the same. C and Rust are using the same compiler backend, better aliasing information probably isn't going to make that big of a difference.

Are you sure the rust performance data isn't for one of the other implementations that use the same crufty tricks as the C version?

e.g.: https://benchmarksgame-team.pages.debian.net/benchmarksgame/..., https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

pjscottOP5y ago

It's pretty startling, but yes, that non-crufty version is "Rust #8" that tops the n-body performance lists:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

It looks like the autovectorizer did a really good job on this one.

creato5y ago

FartyMcFarter5y ago

Good point! It would be interesting to find out where the Rust version gets most of its speed from.

neopallium5y ago

The Rust compiler can auto-vectorize loop code.

This blog post shows how to write simple idiomatic Rust code that will allow the compiler to auto-vectorize:

http://cliffle.com/p/dangerust/6/

jcelerier5y ago

But, C and C++ compilers also can autovectorize quite well. I had some SSE & AVX algorithms where the compiler ended up doing the same or lightly better job in C++ for instance.

So maybe it's just the C benchmark being a cargo cult.

amelius5y ago

Yes, it could be mostly LLVM doing the heavy lifting here, for all we know.

j / k navigate · click thread line to collapse