Rust avoids those from the start by making slices idiomatic.
Another thing I commonly see is the usage of suboptimal containers (like arrays with linear search) - just because it’s there’s no better alternative at hand (standard library doesn’t offer ones and dependency management is messy). Which also makes it less surprising that code in higher level languages might perform better.
The reason that C programs often don’t perform as well as an equivalent rust program[1] is that it’s so incredibly hard to do anything at all in C (especially something reliable) that one can usually only do the simplest thing possible and this typically means simple data structures, simple algorithms and arrays
[1] suppose the programs are independently created at the same time by programmers of equal ability and are idiomatic in their languages. If a C program is rewritten in rust, it is likely to be faster, but that would likely also be true if it were rewritten in C.
Brian Cantrill talks[1] about exactly this: in his C version he was using AVL trees because they are easy to implement, while his Rust version was using B-trees just because he could. And as a result, his naive first attempt in Rust outperformed his long-optimized C code.
Reliability is very often the name of the game with C, and part of the reason you might see it written in such a simplistic fashion. In embedded systems, we often follow very strict code convention that has strict requirements on how a C program is to be written. This includes everything from how memory is to be allocated, the maximum number of local variables, the maximum number of the arguments, various limits on a function, constraints to handle errors, etc. We do this because it minimizes potential hazards especially when working with limited memory and system resources.
C is an unforgiving language in that mistakes can occur silently and on mission critical hardware these mistakes can cost more than just your project milestones. These programs are very specialized and often have complex algorithms associated with them. Most of the systems I've worked with include various kinds of feedback control systems, and accompanying algorithms. If you're familiar with control systems, you'll know these algorithms are certainly not trivial by any means.
The challenge with C is more as a developer your C code needs to be perfect. Bugs just aren't an option like they are in other environments. Once a specialized piece of hardware ships it needs to work as intended under a myriad of conditions without powering off for the next 30 years (not always the case but more often than you might think). Sometimes this kind of software is going to be put a position it may have never been designed for. The best we can do is try to add various check/support mechanisms both in software and in hardware, and maintain as safe and hazard free software as possible.
In terms of performance, again coming from an embedded environment, there is almost always a spec we are aiming for. It needs to do X tasks in N time for example. Of course high level design questions like whether to poll or wait for interrupt, or how to broker data from shared resources is decided long before you get to the question of how should I, or if I even should, compare two strings. It is nevertheless advised to go with a trivial solution if the ends satisfy the needs.
Is C time-consuming to write? Is C a "hard" language? These are subjective and depend on the nature of the project. I would certainly never use C with only libc to write a webserver that's going to be serving SaaS Co's backend API.
I assume most C++ programs heavily use more of these advanced data structures, correct?
In many programs in many domains the sizes of these data structures will rarely exceed this limit.
I have no idea whether that matters or even easy to measure...
Doesn't that mean that null terminated strings is idiomatic C? That is, my understanding of the term idiomatic is that it is defined by whatever is most natural to users of a language regardless of whether it is the most performant.
It would be the equivalent of teaching people to write books without encouraging them to read anything.
To break myself of the habit I started reading some well regarded programs for fun. And oh boy, have I learned a lot from doing so. One of my first discoveries was this beauty in the Redis source code:
https://github.com/redis/redis/blob/3.0/src/sds.h
The idea is to have a string struct that stores its length and content. But the pointer passed around is a pointer to the (null terminated) contents field in the struct. The string is efficient for internal calls (the length can be queried by subtracting from the pointer). But the pointer is also an idiomatic null-terminated C string pointer, compatible with the standard library and everything else. (typedef char *sds;)
Dovecot is also a gem to read if you're looking for inspiration. The way it manages memory pools is delightful - and I'm sure much more performant than idiomatic rust. (That is, without reaching for arena allocator crates and alternate implementations of Box and Vec).
When I once tested std::sort against qsort (sorting 4-byte integer) I measure a 2x difference. So yes, definitely non-trivial, but it won't get much worse than that.
Have you ever seen a program that was slow because of a slow sorting routine?
If you ever need a fast sort (~ never) then the last thing you should do is use std::sort anyway. You should figure out what your data looks like and hand roll an implementation. For example, a radix sort is often possible to use, easy to implement, and much faster than std::sort.
What is meant here? Fast? Reliable? Secure? Memory efficient? Power efficient? Easy to write? Easy to maintain? Quick to compile? Easy to debug?
I have no illusions about the reliability/security/correctness of real-world C code, especially since it's usually not compiled with a memory-safe compiler or run with memory-safe libraries and runtime environments (though it's often sandboxed to limit the damage.) It's relatively easy to introduce memory errors which are not detected by the compiler or runtime.
Certainly many algorithms and data structures (in C and other languages) exhibit tradeoffs including things like speed vs. memory use vs. code size vs. complexity, etc..
But C compilers are pretty fast, which I really like. Then there are/were environments like Turbo Pascal or Think C, which seem to have been amazingly compact while offering a rapid edit-compile-debug cycle as well as decent runtime speed and code size.
I agree, C is a really poor choice for string or other human readable content handling.
But if you have to read-shift-mask-write bytes to hardware control registers, it is a pretty easy language to use, and faster than assembly language in 90% of the cases, with modern compilers. That last note was less true in the mid 1980s, but by the mid-1990s, compiler optimization had gotten to be quite good.
Cellular modem code that runs on DSPs is written in C. Maybe someday that will be written in Rust. We'll see.
for (int i=0; i < 10; i++)
We all instantly know this means loop 10 times, from 0 through 9. But on first read it takes a little work to figure out what's happening.There is a culture of bad C code from the 90s - especially C code written in OOP-y ways where that was never necessary. Like, calling malloc() + free() for every little thing instead of more structured memory management. A lot of that code is written in C not with a efficiency or elegance mindset but simply because C was the language that you wrote programs in.
Do not tie computational operations on string operations in C.
Pick one of many string manipulation libraries to do any serious work with them.
[1] https://github.com/rust-lang/rust/issues/54878#issuecomment-...
Still, real world performance will probably benefit from this fix so it's a positive change regardless.
Where I expect Rust to really shine performance-wise is at the larger scale of real code, where it affords code that copies less often because the programmer isn't sure in this particular function whether or not they own this so they just have to take a copy, or the compiler can't work out aliasing, etc. Ensuring at scale that you don't take extra copies, or have an aliasing problem, or that you don't have to take copies of things just to "be sure" in multithreading situations, is hard, and drains a lot of performance.
Most C++ STL algorithms take iterators, that in many cases are just pointers in benchmarks because these often only deal with arrays. Hell most C standard APIs (e.g. memcpy) take multiple pointers.
I've done it when we really needed that performance for an inner loop(particle system). It can be a real bastard to keep the non-alias constraint held constant in a large, multi-person codebase and the error cases are really gnarly to chase down.
Compare that to Rust which has this knowledge built in since it naturally falls out of the ownership model.
LLVM implements only coarse per-function aliasing information needed by C, and doesn't properly preserve fine-grained aliasing information that Rust can provide.
And how do you know this? Have you actually measured the difference?
My CPU-heavy program (which takes ~30 seconds to run on a 32-core Threadripper) gains ~15% extra performance if I turn it on.
That's like saying C programmers could just write memory safe code if they felt like this would help, so clearly memory safety is not important.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
On a quick inspection:
- The Rust code is about twice as long.
- The Rust code has CPU feature detection and SSE intrinsics, while the C code is more idiomatic.
- The lookup table is larger in the Rust code.
I wouldn't be surprised if there's something similar happening here.
I like PyPy as an example: on the surface, implementing a Python runtime in Python and expecting performance gains seems crazy. PyPy manages to outperform CPython because although a C implementation should theoretically be faster, realistically the increased expressiveness of Python lets the PyPy devs opt into optimizations the CPython devs find out of reach.
I don't know C or Rust well enough to comment on these specific scenarios, but if two technologies can be fast, and one makes that speed accessible while the other encourages implementors to leave performance on the table, that's much more useful information to me than seeing a back-and-forth where folks compete to implement optimizations I'll never make in my own projects.
1. http://www.ats-lang.org/ 2. http://web.archive.org/web/20121218042116/http://shootout.al... 3. https://stackoverflow.com/questions/26958969/why-was-the-ats...
In this case it seems like benchmark code is allowed to use intrinsics, which can degenerate into a situation where a benchmark in language X is more "glorified x86 Assembly code" than actual code in language X.
This is not very useful for comparing languages IMO. Especially since all of Rust, C, C++ can use this strategy and become almost identical in both code and performance.
It is not idiomatic Haskell at all and loses all of the touted benefits.
Is there a separate benchmark that only accepts idiomatic code?
Am I missing something?, is there a list somewhere of the people who have contributed to this?
These benchmarks are almost always either useless or a scam - you either end up writing rewriting the same implementation n times or you don't utilize the capabilities of the language, either way you're not really measuring much of anything intrinsic to the language itself - Rust and C both have the same backends and if you really care about performance you're going to take it to the max anyway, so inference by the compiler isn't that important.
• https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
• https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
• https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
It's possible that someone has already submitted both algorithms for both languages, and different approaches won for language-specific reasons.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
None of them have SSE intrinsics or are quite as long as the Rust version.
I find it doubtful that SSE intrinsics wouldn't help the C version, if they are indeed helping the Rust version. This seems fairly easy to check as the Rust version has a non-SSE fallback code path - I'd do it myself but am not able to at the moment.
Well, it looks like the submission process[0] and the maintainer[1] do, actually.
[0]: https://www.reddit.com/r/rust/comments/kpqmrh/rust_is_now_ov... [1]: https://www.reddit.com/r/rust/comments/kpqmrh/rust_is_now_ov...
In addition to what you've already mentioned, if I'm not mistaken, the programs also take different multi-threading approaches. One of the tasks in this benchmark involves reversing three very large strings. My C program took a fairly simple approach and does that reversal process for each string serially but processes multiple strings in parallel. The Rust program on the other hand does the opposite and does the reversal process for each string in a parallel manner but only processes strings one at a time. The algorithm the Rust program uses is more complicated which results in a considerably larger program size that also is a bit less flexible regarding the input (it assumes each line of input is 60 characters whereas my program will work with lines of any length) but it results in better CPU utilization, less clock time, and less memory use. I've been meaning to submit a faster C program using this faster algorithm but have simply been too busy with other things.
If I had to write a performant library at work, I too might rely on CPU-specific assembly wrappers in my code. But IMO, such code has no place in a general-purpose cross-language benchmark site.
My guess is that other people would take your "performant library at work" premise as justification for including that code.
C code that pretends it does not need to care about it's platform is not idiomatic, it's just suboptimal.
- Code that most C books/courses would teach you how to write
- Portable C code (arguably portability is one of C's biggest successes!)
- Code that you'd expect to find in the K&R book
These are the kind of things that make working in one language different from another.
I'd actually like to see the SSE version in C so I can compare the two implementations and how much grief you have to go through.
Some years ago I submitted an n-body implementation that used the crunchy crate to unroll the inner loop. This was rejected as being non-idiomatic and obscuring what compilers are/aren't capable of. Rust is currently leading the benchmark because someone added the flag -C llvm-args='-unroll-threshold=500' which achieves the same effect. Why one of these is acceptable and the other isn't is beyond me, and all of this makes the whole project very discouraging.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
What impresses me is that the Rust version didn't do any of that stuff, just wrote very boring, straightforward code -- and got the same speed anyway. Some impressive compilation there!
1. https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
What I don't fully understand is: "GCC has the option -fstrict-aliasing which enables aliasing optimizations globally and expects you to ensure that nothing gets illegally aliased. This optimization is enabled for -O2 and -O3 I believe." (source: https://stackoverflow.com/a/7298596)
Doesn't this mean that C++ programs compiled in release mode behave as if all pointers are marked with __restrict?
I don't think there is a fair comparison between Rust and C: C is just an higher level assembler, if the programmer knows what he's doing he can use the hardware 100% of its potential. That is the reason why C is still used in all the embedded applications where you have ridiculous low power microcontrollers and you must squeeze out the best performance.
That is the difference between C and Rust to me: for each fast Rust program you are guaranteed that you can write an equivalent performant program in C (or assembly). Worst case scenario you use inline assembly in C and you get that.
Thus the contrary cannot be true for Rust: if I give you a heavily optimized C program not always you can produce an equivalent version in Rust.
Also not always these optimizations are what you want. In C you can choose the level of optimizations, and most of the time, at least on the program that I write, I choose a low level of optimization. The reason is that a lot of time performance is not the only thing that matters, but it maybe matters most the stability of the code (and a code compiler with optimizations is more likely to contain bugs) or the ability to debug (and thus the readability of the assembly output of the compiler).
Rust gives out an horrible assembly code, that is impossible to debug, or to check for correctness. You just have to hope that the compiler doesn't contains bugs. For the same reason Rust is the ideal language to write viruses, since it's difficult to reverse engineer.
[1]: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
An example of simpler version of this is Fortran that can be faster for numerical loads due to the fact that Fortran disallows aliasing of function arguments. C on the other hand, must pay the price of having to be conservative with how it treats arguments just in case they overlap.
Rust is a good lang though. I am glad something else is pushing up there for the top spots. And more competition in performance is a good thing.
You don't always have to pick your sides. I just want to be able to write good code and be happy writing it :)
OpenMP is only available for C, C++, and Fortran so that is why most programs won't use it. However most of the other programming languages have their own ways for doing multi-threading and many of the programs for this benchmark do make use of them.
The rules at https://benchmarksgame-team.pages.debian.net/benchmarksgame/... request that submitters use the same algorithms as existing programs and I try to follow that rule. I believe all the binary-trees programs are using recursive functions so naturally I did the same to avoid breaking the rule about using different algorithms.
But yes, some of these do provide hints for the compiler, e.g. constexpr, ownership semantics per unique_ptr. There's nothing stopping a human from writing equivalent C, so my suspicion is that the performance gap is primarily due to the benchmark implementation.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
I know that using PCRE2 in a Python program isn't typical but the Benchmarks Game does allow using other libraries and many of the previously submitted programs have been doing this for a long time already. The pidigits benchmark is the other benchmark that is heavily dependent on the libraries being used as is illustrated by their being a nearly ten way tie for second place by a bunch of programs that all use GMP.
(Looking at just C gcc vs Rust) https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
I tried running the C code with 2 and 4 threads, wall-time and CPU time don't change much in either case which is strange (this is cygwin with gcc 10.2.0):
2 threads:
$ /usr/bin/gcc -pipe -Wall -O3 -fomit-frame-pointer -march=ivybridge -fopenmp fasta.c -o fasta.gcc-2.gcc_run && time ./fasta.gcc-2.gcc_run 25000000 | md5sum
fd55b9e8011c781131046b6dd87511e1 *-
real 0m0.724s
user 0m1.468s
sys 0m0.108s
4 threads: $ /usr/bin/gcc -pipe -Wall -O3 -fomit-frame-pointer -march=ivybridge -fopenmp fasta.c -o fasta.gcc-2.gcc_run && time ./fasta.gcc-2.gcc_run 25000000 | md5sum
fd55b9e8011c781131046b6dd87511e1 *-
real 0m0.670s
user 0m1.514s
sys 0m0.046sI know that Python 3 can do some runtime stuff that Nodejs can't, but I wonder whether that's worth so much performance. Maybe if the answer is that you would include C modules in Python if you need the speed, but I don't know if that's a good answer to the problem.
Examples? This is too interesting to just throw in then hand wave away.
There are probably others as well, but this is the advantage I'm familiar with. C's ambiguity makes it harder to achieve some optimizations that can really matter on modern CPUs.
The tracking bug is https://github.com/rust-lang/rust/issues/54878
And then there are the whole LLVM and GCC based eco-systems.
It is gratifying to see C++ identified, here, as the hands-down fastest implementation language, but odd to see Rust performance still compared, in the headline, to C, as if that were the goal. The headline should say that Rust speed is approaching C++ speed. In principle, Rust speed should someday exceed C++'s, in some cases, because it leaves behind some boat anchors C++ must retain for backward compatibility. In particular, if compiler optimizers could act on what they have been told about language semantics, they should be able to do optimizations they could not do on C++ code. As it is, Rust relies on optimizers coded for C and C++.
Rust may never reliably beat C++, because Rust has itself committed to details that interfere with optimization, principally in its standard library: The C++ Standard Library offers more knobs for tuning, while the Rust libraries are simpler to use.
Some of the reasons that C++ is so reliably faster than C are subtle and, to some, surprising. As noted elsewhere in this thread, the compiler knows more about what the language is doing, and can act on that knowledge, but C and C++ compilers share their optimizer, so that makes less difference than one might guess. Mainly, the C++ code does many things you might do in C, but in exactly one way that the optimizer can recognize easily.
The biggest reason why C++ is so much faster than C is that C++ can capture optimizations in libraries and reliably deliver those optimizations to library users. The result is that people who write C++ libraries pay a great deal of attention to performance because it pays, and that attention directly benefits all users of the library, including other libraries.
Because you can capture more semantics in C++ libraries, C++ programs naturally use better algorithms than C programs can. C programs much more frequently use pointer-chasing data structures because those are easier to put into a C library, or quicker to open-code, where the corresponding C++ program will use a library that has been hand-tuned for performance without giving up anything else.
Rust gets to claim many of the same benefits, because it is also more expressive than C, and Rust libraries are often as carefully tuned. Rust is not as expressive as C++, yet, and has many, many fewer people working to produce optimal libraries, but it is doing well with what it has.
One example: C re-write of a popular chess engine Stockfish (written in C++) is significantly faster (10%-30% depending on who measures it at which time and on what hardware). This is a piece of code which was already heavily optimized for many years as speed is critical for the engine's performance. One guy (although very talented one) re-wrote it in C and got the speed gains. Another guy (also very talented one) re-wrote it in assembly and got even bigger speed gains (see CFish and asmFish projects)
It looks to me that you are comparing badly written C to decently written C++ and claiming speed gains. I use C for my own software(solver for a popular game) and I wouldn't dream of using stuff like "pointer chasing structures". If you need to use stuff like hash tables in your performance critical code you have to code them yourself anyway as using some generic hash and memory handling algorithm is a recipe for being slow.
I suspect surprising factors are in play.
For instance, in real-world usage, you might do a bit more redundant copying in C in places you would avoid it with the borrow checker in Rust.
Conversely, enforcing RAII in Rust might cost performance in some scenarios relative to C.
The smallest "hello world" binary rustc has ever produced was 137 bytes. https://github.com/tormol/tiny-rust-executable
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
2) The home page (click the banner) has links which compare benchmarks across two language implementations.
3) Each of those comparison pages has links which compare all the programs, for all the language implementations, for each benchmark.
If Rust implementation is father than C's, kudos goes to the compiler, not to the language
Both C and Rust are capable of just inlining optimal assembly, so comparing "pure speed potential" is pointless.
Space.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
Why is julia using 250x the memory? (and it`s still fast)
If I wrap things in a function and run it a second time in the repl, this is what I get:
$ JULIA_LLVM_ARGS="-unroll-threshold=500" ~/julia-b00e9f0bac/bin/julia -O3 --check-bounds=no
julia> include("nbody.jl")
run (generic function with 1 method)
julia> @time run(50000000)
-0.169075164
-0.169059907
2.044478 seconds (294.25 k allocations: 12.599 MiB, 8.89% compilation time)
julia> @time run(50000000)
-0.169075164
-0.169059907
1.867936 seconds (12 allocations: 1.750 KiB)
So, the second time there's no compilation overhead, just like in Rust, C, C++.If you are willing to spend the time to perform at the highest level, Rust can bring you there.
I don't get those. What's the difference?
Basically, it's toy programs written in each of these languages. They measure how fast each one is executed. This methodology does have it's limitations, which that page is upfront about.
Also is there also a way to filter out posts with the word "Rust" in it so I don't see them?
You are correct that the C feature is often banned. Rust doesn’t even support it at all.
There are finite limits to the complexity cost developers are willing to pay for performance. Because the cost in each of these three languages is different to express some thing, sometimes there will be a threshold where in one or more of these languages most developers will choose a less optimal design. This manifests as practical performance differences in real software even though in theory they are equally expressive with enough effort.
This comes with another tradeoff. Efficient expressiveness relative to software performance comes at a cost of language complexity. C is a simple language that has enough efficient expressiveness for simple software architectures. C++ is at the extreme opposite; you can do mind-boggling magic with its metaprogramming facilities that can express almost anything optimally but god help you if you are trying to learn how to do this yourself. Rust sits in the middle; much more capable than C, not as expressive as C++.
This suggests the appropriate language is partly a function of the investment a developer is willing to make in learning a language and what they need to do with it. Today, I use modern C++ even though it is a very (unnecessarily) complex language and write very complex software. Once you pay the steep price of learning C++ well, you acutely feel the limitations of what other languages can't express easily when it comes to high performance software design.
I used to write a lot of C. It still has critical niches but not for high-performance code in 2021, giving up far too much expressiveness. C++17/20 is incredibly powerful but few developers really learn how to wield that power, though usage of it has been growing rapidly in industry as scale and efficiency have become more important. Rust is in many ways an heir apparent to C and/or Java, for different reasons. Or at least, that is how I loosely categorize them in my head, having a moderate amount of contact with all three. They all have use cases where you probably wouldn't want to use the others.
Yep. Rust's safety means that in some cases, you can be more aggressive because you know the compiler has your back. And in other cases, it makes harder things tractable. The example of Stylo is instructive; Mozilla tried multiple times to pull the architecture off in C++, but couldn't manage to do it until Rust.
Heh, well. The downvotes are clearly trying to tell me something, though I'm not sure what. I can guess. I assumed that this was something widely agreed upon at this point but clearly I assumed incorrectly.
I think ATS is in that category too but it was removed from the benchmarks game at some point. It's also vastly more esoteric and complicated than Rust is from my limited knowledge. Perhaps someday Zig will get there as well. I wouldn't have expected that this niche would ever see so much exploration, as C and C++ were the only game in town for so long.
(*) of course part of the output is also looking at the Rust code and evaluating style and `unsafe`-wise what the price for winning was.