Yes. Today, I integrated two parts of a 3D graphics program. One refreshes the screen and lets you move the viewpoint around. The other loads new objects into the scene. Until today, all the objects were loaded, then the graphics window went live. Today, I made those operations run in parallel, so the window comes up with just the sky and ground, and over the next few seconds, the scene loads, visibly, without reducing the frame rate.
This took about 10 lines of code changes in Rust. It worked the first time it compiled.
How did you do that in Rust? Doesnt one of those have to own the scene at a time? Or is there a way to make that exclusive ownership more granular?
I'm using Rend3, which is a 3D graphics library for Rust that uses Vulkan underneath. Rend3 takes care of memory allocation in the GPU, which Vulkan leaves to the caller, and it handles all the GPU communication. The Rend3 user has to create all the vertex buffers, normal buffers, texture maps, etc., and send them to Rend3 to be sent to the GPU. It's a light, safe abstraction over Vulkan.
This is where Rust's move semantics ownership transfer helps. The thread that's creating object to be displayed makes up the big vertex buffers, etc., and then asks Rend3 to turn them into a "mesh object", "texture object", or "material object". That involves some locking in Rend3, mostly around GPU memory allocation. Then, the loader puts them together into an "object", and tells Rend3 to add it to the display list. This puts it on a work queue. At the beginning of the next frame, the render loop reads the work queue, adds and deletes items from the display list, and resumes drawing the scene.
Locking is brief, just the microseconds needed for adding things to lists. The big objects are handed off across threads, not recopied. Adding objects does not slow down the frame rate. That's the trouble with the existing system. Redraw and new object processing were done in the same thread, and incoming updates stole time from the redraw cycle.
If this was in C++, I'd be spending half my time in the debugger. In Rust, I haven't needed a debugger. My own code is 100% safe Rust.
The Arc is an async reference counter that allows multiple ownership. And the nested Mutex enforces only one mutable borrow at a time.
Rust "works badly with memory mapped files" doesn't mean, "Rust can't use memory mapped files." It means, "it is difficult to reconcile Rust's safety story with memory maps." ripgrep for example uses memory maps because they are faster sometimes, and its safety contract[3] is a bit strained. But it works.
[1] - https://github.com/BurntSushi/fst/
[2] - https://github.com/BurntSushi/imdb-rename
[3] - https://docs.rs/grep-searcher/0.1.7/grep_searcher/struct.Mma...
You can mmap files in Rust just fine, but it’s generally as dangerous as it is in C.
These issues about multiple processes and distributed systems are framework and OS level concerns. Rust helps you build fast concurrent solutions to those problems, but you’re correct that it can not solve problems exterior to the application runtime. How is that a deficiency with Rust?
The most important design decision while writing a parallel algorithm is to decide for what amount of data is not worth it.
I don't get the obsession of parallel code in low level languages by the way. If you have an architecture where you can afford real parallelism you can afford higher level languages anyway.
In embedded applications you don't usually have the possibility to have parallel code, and even in low level software (for example the classical UNIX utilities), for simplicity and solidity using a single thread is really fine.
Threads also are not really as portable as they seem, different operating systems have different way to manage threads, or even don't supports thread at all.
This isn't an "obsession." It's engineering.
[1] - I make this claim loosely. Absence of evidence isn't evidence of absence and all that. But if I saw ripgrep implemented in, say, Python and it matched speed in the majority of cases, I would learn something.
Depends on which of the classic utilities you are talking about.
Many of them are typically IO bound. You might not get much out of throwing more CPU at them.
The primary reason c libraries do this is not for safety, but to maintain ABI compatibility. Rust eschews dynamic linking, which is why it doesn't bother. Common lisp, for instance, does the same thing as c, for similar reasons: the layout of structures may change, and existing code in the image has to be able to deal with it.
> Rust by default can inline functions from the standard library, dependencies, and other compilation units. In C I'm sometimes reluctant to split files or use libraries, because it affects inlining
This is again because c is conventionally dynamically linked, and rust statically linked. If you use LTO, cross-module inlining will happen.
Rust provides ABI compatibility against its C ABI, and if you want you can dynamically link against that. What Rust eschews is the insane fragile ABI compatibility of C++, which is a huge pain to deal with as a user:
https://community.kde.org/Policies/Binary_Compatibility_Issu...
I don't think we'll ever see as comprehensive an ABI out of Rust as we get out of C++, because exposing that much incidental complexity is a bad idea. Maybe we'll get some incremental improvements over time. Or maybe C ABIs are the sweet spot.
However, as the parent comment you responded to you can enable LTO when compiling C. As rust is mostly always statically linked it basically always got LTO optimizations.
If you have an API that allows the caller to instantiate a structure on the stack and pass a reference to it to your function, then the caller must now be recompiled when the size of that structure changes. If that API now resides in a separate dynamic library, then changing the size of the structure is an ABI-breaking change, regardless of the language.
If instead you’re referring to the fact that all the fields of a struct aren’t explicitly obvious when you have such a value, well I don’t really agree that it’s always what you want. A great thing about pattern matching with exhaustiveness checks is that it forces you to acknowledge that you don’t care about new record fields (though the Common Lisp way of dealing with this probably involves CLOS instead).
[1] some implementations may use NaN-boxing to get around this
Heap allocations, yes; pointer indirections no.
A structure is referenced by pointer no matter what. Remember that the stack is accessed via a stack pointer.
The performance cost is that there are no inline functions for a truly opaque type; everything goes through a function call. Indirect access through functions is the cost, which is worse than a mere pointer indirection.
An API has to be well-designed this regard; it has to anticipate the likely use cases that are going to be performance critical and avoid perpetrating a design in which the application has to make millions of API calls in an inner loop. Opaqueness is more abstract and so it puts designers on their toes to create good abstractions instead of "oh, the user has all the access to everything, so they have all the rope they need".
Opaque structures don't have to cost heap allocations either. An API can provide a way to ask "what is the size of this opaque type" and the client can then provide the memory, e.g. by using alloca on the stack. This is still future-proof against changes in the size, compared to a compile-time size taken from a "sizeof struct" in some header file. Another alternative is to have some worst-case size represented as a type. An example of this is the POSIX struct sockaddr_storage in the sockets API. Though the individual sockaddrs are not opaque, the concept of providing a non-opaque worst-case storage type for an opaque object would work fine.
There can be half-opaque types: part of the structure can be declared (e.g. via some struct type that is documened as "do not use in application code"). Inline functions use that for direct access to some common fields.
Sure, there are libraries which have `init(&struct, sizeof(struct))`. This adds extra ABI fragility, and doesn't hide fields unless the lib maintains two versions of a struct. Some libraries that started with such ABI end up adding extra fields behind internal indirection instead of breaking the ABI. This is of course all solvable, and there's no hard limit for C there. But different concerns nudge users towards different solutions. Rust doesn't have a stable ABI, so the laziest good way is to return by value and hope the constructor gets inlined. In C the solution that is both accepted as a decent practice and also the laziest is to return malloced opaque struct.
I'd like to point out that this is not always the case. Some libraries, especially those with embedded systems in mind, allow you to provide your own memory buffer (which might live on the stack), where the object should be constructed. Others allow you to pass your own allocator.
This made me laugh
Heartbleed wasn't caused by reusing buffers; it was caused by not properly sanitizing the length of the buffer from entrusted input, and reading over it's allocated size, thus allowing the attacker to read into memory that wasn't meant for him.
... In rust I'd just declare an enum for this. Enums in Rust can store data. In this way they are like a safe union.
The issue with this is that 'clever' compilers can optimise out any memset calls you do.
I did a deep dive into this topic lately when exploring whether to add a language feature to zig for this purpose. I found that, although finnicky, LLVM is able to generate the desired machine code if you give it a simple enough while loop continue expression[1]. So I think it's reasonable to not have a computed goto language feature.
More details here, with lots of fun godbolt links: https://github.com/ziglang/zig/issues/8220
> C++, D, and Go have throw/catch exceptions, so foo() might throw an exception, and prevent bar() from being called. (Of course, even in Zig foo() could deadlock and prevent bar() from being called, but that can happen in any Turing-complete language.)
Well, you could bite the bullet and carefully make Zig non-Turing complete. (Or at least put Turing-completeness behind an escape hatch marked 'unsafe'.)
That's how Idris and Agda etc do it.
Languages like Idris and Agda are different because sometimes code isn’t executed at all. A proof may depend on knowing that some code will terminate without running it.
As you said though, this is finicky, and if you need this optimization for performance then you don’t want to rely on compiler heuristics.
However, in this specific instance at least, this isn't as optimal as it could be. What this is basically doing is creating a jump table to find out which branch it should go down. But, because all the functions have the same signature, and each branch does the same thing, what it could have done instead is create a jump table for the function to call. At that point, all it would need to do is use the Inst's discriminant to index into the jump table.
I'm not sure what it would look like in Zig, but it's not that hard to get that from Rust[1]. The drawback of doing it this way is that it now comes with the maintenance overhead of ensuring the order and length of the jump table exactly matches the enum, otherwise you get the wrong function being called, or an out-of-bounds panic. You also need to explicitly handle the End variant anyway because the called function can't return for its parent.
I don't know Zig, but from what I understand it has some pretty nice code generation, so maybe that could help with keeping the array and enum in step here?
This reminds me of when I use to write supercomputing codes. Lots of programming language nerds would wonder why we didn’t use functional models to simplify concurrency and parallelism. Our code was typically old school C++ (FORTRAN was already falling out of use). The truth was that 1) the software architecture was explicitly single-threaded — some of the first modern thread-per-core designs — to maximize performance, obviating any concerns about mutability and concurrency and 2) the primary performance bottlenecks tended to be memory bandwidth, of which functional programming paradigms tend to be relatively wasteful compared to something like C++. Consequently, C++ was actually simpler and higher performance for massively parallel computation, counterintuitively.
My experience with process-based parallelism is that yes on Linux it's basically isomorphic to thread-based parallelism. It's just so much more code to do the same thing.
In Rust adding a new special-purpose background thread with some standard-library channels is 30 lines of code and I can probably even access the same logging system from the other thread.
If I wanted to do that with processes I need to:
- Coordinate a shared memory file over command line arguments or make sure everything is fork-safe
- Find a library for shared-memory queues
- Deal with making sure that if either process crashes the other process goes down with it in a reasonable way.
- Make sure all my monitoring/logging is also hooked up to the other process.
If I want to use a shared memory data-structure with atomics I need to either not use pointers or live dangerously and try and memory-map it at the exact same offsets in each process and ensure I use a special allocator for things in the shared file.
Yes you can do all the same things with both approaches, I just find threads take way less code. It's not too bad if all your processes are doing the same thing, and you also need to scale to many servers anyhow. It's more annoying if you want to have a bunch of different types of special background processes.
I think what's nice about rust is that, because it makes it difficult to write thread-unsafe code, it's naturally easier to add threading at some point in the future without too much pain. As a result, more applications can benefit from having access to multiple CPU cores. I don't think that's quite the same thing as pure performance per watt, though. That really comes down to how the code was written, and how well the compiler can optimize it. Rust may have some advantages there over C, since it constrains what you can do so much that the compiler has a smaller state space to optimize over. Someone who knows what they're doing in C, though, could likely write very efficient code that effectively uses parallelism, and may gain an edge over rust simply by cleverly leveraging the relative lack of training wheels. For high performance compute, rust vs. C may be a wash. For consumer facing applications, though, the more programs that can use multiple cores to run faster (even if less efficiently), the better.
Do you happen to have a link to code that does this? This sounds similar to a problem I have right now and I’d love to see what solution you’ve arrived at.
Not my experience at all. One big problem is that most languages in 2021 have very, very poor support for thread-based parallelism. It’s crazy how many languages make it hard to do basic data parallel tasks. That steers people toward writing single threaded code and/or trying to rely on process-based parallelism which is basically strictly worse.
I’ve been writing parallel code at the largest scales most of my career. The state-of-the-art architectures are all, effectively, single-threaded with latency-hiding. This model has a lot of mechanical sympathy with real silicon which is why it is used. It is also pleasantly simple in practice.
Unfortunately, that is only a certain subsection of problems and usually you want to be able to use parallel computations on the function call level. There the support for parallel computations of Rust or Go shines. When at each point in the program flow you can decide to go parallel.
Why is that worse?
I very seldomly use threads for concurrency, it creates monolithic binaries that are hard to maintain, configure, and understand.
I much prefer a process based architecture with mmap'd shared memories for interprocess communications.
Heavily multi threaded code is difficult to write correctly. Do it wrong, you wind up with race conditions, data corruption, dead locks because a thread pool or other resource is exhausted, thread leaks because you didn't shut something down correctly. The problems go on and on.
For example I consider glyph drawing as "performance optimized". It requires massive parallelism just to be able to display text smoothly in a high definition screen.
But most people will never see it, because they use a library that they call that does all the work for them and do not need to care about that.
The difference is tremendous. We are talking 100x more efficiency just using GPUs alone. You can get 1000x, 10.000x with hardware(electronic chip design) acceleration parallelism(increasing the cost and rigidity, and times to market too).
It is so big that it is a different level. It is not performance alone. It is that some things are so inefficient that are just not practical(like expending a million dollars in your energy bill in order to solve a problem).
Same happens with of course 3D, audio or video recognition. Sensor I/O. Artificial intelligence.
Rust lets you just prototype lots of code in a parallel way in the CPU, even for things that will run in a FPGA or ASIC in the future. It let's you transition smaller steps: CPU->GPU->FPGA->ASIC
Why?
Edit: Thanks for all the replies. It seems this applies to data-parallel workloads only. I'd use a GPU for this. An RTX 3090 has around ~10000 CUDA cores (10000 simultaneous operations) v/s just ~10 for CPUs.
This creates a new problem: how do you balance load across cores? What if the workload is not evenly distributed across the data held by each core? Real workloads are like this! Fortunately, over the last decade, architectures and techniques for dynamic cross-core load shedding have become smooth and efficient while introducing negligible additional inter-core coordination. At this point, it is a mature way of designing extremely high throughput software.
Functional programming (especially, say, actor systems) is better for organizing mental models of concurrency when your concurrency is coupled with communication between the components. For hpc, you're typically optimizing for gustafson scaling (versus amdhal scaling) where you are running multiple copies of the same, computationally costly linearly organized code with no coupling between instances except statistical aggregation of results, so there is no particular benefit to functional-style concurrency.
(And some FPLs,like Julia, are perfectly good at hpc anyways)
The codes I worked on were complex graph analysis, spatiotemporal behavioral analysis, a bit of geospatial environmental modeling, and in prehistoric times thermal and mass transport modeling. These codes (pretty much anything involving reality) are intrinsically tightly coupled across compute nodes. Low-latency interconnects eventually gave way to latency-hiding software architectures but at no point did we use map-reduce as that would have been insanely inefficient given the sparsity and unpredictability of the interactions between nodes.
These were the prototype software architectures for later high-performance databases. Every core is handling thousands or millions of independent shards of the larger computational model, which makes latency-hiding particularly efficient.
It is about how easily a programmer, who deals with a certain subtask in a system, can utilize more cores for the this task. Not talking about supercomputing, but looking at a smarktphone or a typical PC. There you usually have most cores just idle unused, but if the user triggers an action, you want to be able to use as many cores as it speeds up computation. Language support for parallelism makes a huge difference there. In Go I can write a function to do a certain computation and quite often it is trivial to spread several calls across goroutines.
It's one of the secrets exploited by the M1 chip, seen in how many more cache lines the CPU's LFB can fill concurrently compared to Intel chips and that these are now 128 byte cache lines instead of 64 byte cache lines.
I do not accept this premise. Things are increasingly multithreaded.
Looks like you're compiling C code with -O2. Does Rust build set -O3 on clang? Did you try -O3 with C? I know it's not guaranteed to be faster, just curious.
https://doc.rust-lang.org/cargo/reference/profiles.html#rele...
>"Clever" memory use is frowned upon in Rust. In C, anything goes. For example, in C I'd be tempted to reuse a buffer allocated for one purpose for another purpose later (a technique known as HEARTBLEED).
Ha!
>It's convenient to have fixed-size buffers for variable-size data (e.g. PATH_MAX) to avoid (re)allocation of growing buffers. Idiomatic Rust still gives a lot control over memory allocation, and can do basics like memory pools, combining multiple allocations into one, preallocating space, etc., but in general it steers users towards "boring" use or memory.
Since I write a lot of memory-constrained embedded code this actually annoyed me a bit with Rust, but then I discovered the smallvec crate: https://docs.rs/smallvec/1.5.0/smallvec/
Basically with it you can give your vectors a static (not on the heap) size, and it will automatically reallocate on the heap if it grows beyond that bound. It's the best of both world in my opinion: it lets you remove a whole lot of small useless allocs but you still have all the convenience and API of a normal Vec. It might also help slightly with performance by removing useless indirections.
Unfortunately this doesn't help with Strings since they're a distinct type. There is a smallstring crate which uses the same optimization technique but it hasn't been updated in 4 years so I haven't dared use it.
The good thing about having a decent type system is that I expect that transitioning to smartstrings should be painless! Thank you for that.
There is this habit in both academia and industry where people say "as fast as C" and justify this by comparing to a tremendously slow C program, but don't even know they are doing it. It's the blind leading the blind.
The question you should be asking yourself is, "If all these claims I keep seeing about X being as fast as Y are true, then why does software keep getting slower over time?"
(If you don't get what I am saying here, it might help to know that performance programmers consider malloc to be tremendously slow and don't use it except at startup or in cases when it is amortized by a factor of 1000 or more).
I wouldn't call that a first approximation. Take ripgrep as an example. In a checkout of the Linux kernel with everything in my page cache:
$ time rg zqzqzqzq -j1
real 0.609
user 0.315
sys 0.286
maxmem 7 MB
faults 0
$ time rg zqzqzqzq -j8
real 0.116
user 0.381
sys 0.464
maxmem 9 MB
faults 0
This alone, to me, says "to a first approximation, the speed of your program in 2021 is determined by the number of cores it uses" would be better than your statement. But I wouldn't even say that. Because performance is complicated and it's difficult to generalize.Using Rust made it a lot easier to parallelize ripgrep.
> C allows you to do bulk memory operations, Rust does not (unless you turn off the things about Rust that everyone says are good). Thus C is tremendously faster.
Talk about nonsense. I do bulk memory operations in Rust all the time. Amortizing allocation is exceptionally common in Rust. And it doesn't turn off anything. It's used in ripgrep in several places.
> There is this habit in both academia and industry where people say "as fast as C" and justify this by comparing to a tremendously slow C program, but don't even know they are doing it. It's the blind leading the blind.
I've never heard anyone refer to GNU grep as a "tremendously slow C program."
> The question you should be asking yourself is, "If all these claims I keep seeing about X being as fast as Y are true, then why does software keep getting slower over time?"
There are many possible answers to this. The question itself is so general that I don't know how to glean much, if anything, useful from it.
You chose an embarrassingly parallel problem, which most programs are not. So you cannot generalize this example across most software. When you try to parallelize a structurally complicated algorithm, the biggest issue is contention. I was leaving this out because it really is a 2nd order problem -- most software today would get faster if you just cleaned up its memory usage, than if you just tried to parallelize it. (Of course it'd get even faster if you did both, but memory is the E1).
> There are many possible answers to this.
How come so few people are concerned with the answers to that question and which are true, but so many people are concerned with making performance claims?
As I've pointed out in the article, Rust does give you precise control over memory layout. Heap allocations are explicit and optional. In safe code. You don't even need to avoid any nice features (e.g. closures and iterators can be entirely on stack, no allocations needed).
Move semantics enables `memcpy`ing objects anywhere, so they don't have a permanent address, and don't need to be allocated individually.
In this regard Rust is different from e.g. Swift and Go, which claim to have C-like speed, but will autobox objects for you.
Rust is now getting support for custom local allocators ala C++, including in default core types like Box<>, Vec<> and HashMap<>. It's an unstable feature, hence not yet part of stable Rust but it's absolutely being worked on.
Compared to all the religious texts I've read about Rust, this is a huge breath of fresh air.
Thanks for sharing! Bookmarking this.
No, it's not, especially if you have multiple binaries. There are hacks, like using a multi-call single binary, (forget about file-based privilege separation), or using an unmaintained fork of cargo to build a rust toolchain capable of dynamic linking libstd. See: https://users.rust-lang.org/t/link-the-rust-standard-library... and https://github.com/johnthagen/min-sized-rust
I'd be interested in any up-to-date trick to do better than this.
https://github.com/antoyo/rustc_codegen_gcc https://github.com/Rust-GCC/gccrs https://github.com/sapir/gcc-rust/
I remember making an argument on a mailing list against using alloca on the grounds that there's usually a stack-blowing bug hiding behind it. As I revisited the few examples I remembered of it being used correctly, I strengthened my argument by finding more stack-blowing bugs hiding behind uses of alloca.
When I ran my simple fuzz test in rust it seg faulted, crashing in 'safe' code. I thought for a moment there might be something wrong with the compiler (hahaha no). Sure enough, there was a bug in one of my far-too-clever unsafe blocks that was corrupting memory. Then that was in turn causing a crash later in the program's execution.
That was one of my first big "aha" moments for rust - in rust because segfaults (should be) impossible in safe code, I only needed to study the code in my ~30 lines of unsafe code to find the bug. (Compared to 150+ lines of regular code). I had some similar bugs when I wrote the C version earlier, and they took all day to track down because in C memory corruption can come from anywhere.
I don't tend to think of Rust as "portable assembly", and this is indeed one of the points where I think it differs the most from C. I think of "portable assembly" as being applicable to C, because it is some version of a "minimal" level of abstraction for a high-level language. Rust is very much a tool for abstraction, and one of the USPs of rust is that the compiler abstracts away the low-level details of memory management in a way which is not as costly as other automatic memory management strategies.
Maybe it's due to lack of experience, but with C code it's fairly easy to look at a block of code and imagine approximately which assembly would be generated. With highly abstract Rust code, like with template-heavy C++ code, I don't feel like that at all.
Rust does not abstract away memory management. For example, it never heap allocates anything implicitly. It inserts destructors, but does so predictably at end of scopes, in a specified order.
Rust heavily uses iterators with closures, but these get aggressively inlined, and you can rely on them optimizing down to a basic loop. For code generation they're not too different from a fancy C macro.
And if in doubt, there's https://rust.godbolt.org/ (don't forget to add -O to flags)
The fact that Rust specialises its generic code according to the type it's used with it not some inherent disadvantage of generics. That's what they're supposed to do. By choosing to not specialise, you're actively making the decision to make your code slower. Rust has mechanisms for avoiding generic specialisation. They're called trait objects and they work brilliantly.
When you use void* in your data structures in C, you're not winning anything when compared to Rust. You're just producing slower code that mimics the behaviour of Rust's trait objects, but more dangerously.
Code 'bloat' (otherwise known as 'specialising your code correctly to make it run faster') is not a reason to not use Rust in 2021, so please stop pretending that it is.
> Rust has mechanisms for avoiding generic specialisation. They're called trait objects and they work brilliantly.
As someone who uses a lot of rust, they are sort of the red-headed stepchild. As a minimum to make the properly usable, we need a way of passing one object with multiple different traits.
What do you mean?
fn foo<T: TraitA + TraitB>(x: T) { T.something(); }Supertraits?
You can do that in Java (with byte arrays) or in Common Lisp, so what is the point here? It is not practice in Java, Lisp nor in C and C++.
> It's convenient to have fixed-size buffers for variable-size data (e.g. PATH_MAX) to avoid (re)allocation of growing buffers
This is because OS/Kernel/filesystem guarantee path max size.
> Idiomatic Rust still gives a lot control over memory allocation, and can do basics like memory pools, ... but in general it steers users towards "boring" use or memory.
The same is done by sane C libraries (e.g. glib).
> Every operating system ships some built-in standard C library that is ~30MB of code that C executables get for "free", e.g. a "Hello World" C executable can't actually print anything, it only calls the printf shipped with the OS.
printf is not shipped with the OS, but with libc runtime. It doesn't have to be runtime (author needs to learn why this libc runtime is shared library and not the usually statically linked library) and you can use minimal implementations (musl) if you want static binaries with minimal size.
So you are saying Rust doesn't call (g)libc at all and directly invoke kernel interrupts? Sure, you can avoid this print "overhead" in C with 3-4 lines of inline assembly, but, why?
> Rust by default can inline functions from the standard library, dependencies, and other compilation units.
So do C compiler.
> In C I'm sometimes reluctant to split files or use libraries, because it affects inlining and requires micromanagement of headers and symbol visibility.
Functions doesn't have to be in headers to be inlined.
> C libraries typically return opaque pointers to their data structures, to hide implementation details and ensure there's only one copy of each instance of the struct. This costs heap allocations and pointer indirections. Rust's built-in privacy, unique ownership rules, and coding conventions let libraries expose their objects by value, so that library users decide whether to put them on the heap or on the stack. Objects on the stack can can be optimized very aggressively, and even optimized out entirely.
WTF? Stopped reading after this.
I find this post a random nonsense and I'd urge author to read some serious C book.
> > For example, in C I'd be tempted to reuse a buffer allocated for one purpose for another purpose later (a technique known as HEARTBLEED).
> You can do that in Java (with byte arrays) or in Common Lisp, so what is the point here? It is not practice in Java, Lisp nor in C and C++.
C is a really old language with ancient libraries that are still widely used even though they are simply bad by modern standards. For that reason, I roll my eyes when people say something is not practice in C or talk about "sane" C libraries. A big part of working with C is dealing with ancient insanity.
You can make much stronger statements about what is idiomatic in Rust (and to some extent Java) simply because it's newer and more cohesive.
> > It's convenient to have fixed-size buffers for variable-size data (e.g. PATH_MAX) to avoid (re)allocation of growing buffers
> This is because OS/Kernel/filesystem guarantee path max size.
I think you've got that backwards. There's an advertised max path size because people wanted to stick paths in fixed-size buffers rather than deal with dynamic allocation. PATH_MAX is fairly arbitrary considering that there are certainly ways of creating and opening files which have paths exceeding that limit. I found this doc talking about this: https://eklitzke.org/path-max-is-tricky
> printf is not shipped with the OS, but with libc runtime.
"The OS" doesn't mean "the kernel". Read...anything...even the lackluster wikipedia article about operating systems...and you'll see stuff like GUIs described as part of the OS. They (generally) don't mean those are in the kernel. You can also see this for example in the GNU GPL; they call out "system libraries", which certainly includes libc.
> So you are saying Rust doesn't call (g)libc at all and directly invoke kernel interrupts? Sure, you can avoid this print "overhead" in C with 3-4 lines of inline assembly, but, why?
Rust's own standard library uses libc's system call wrappers but not stdio. It has its own libraries for buffer management and formatting which provide the safety one would expect of Rust, know how to integrate with Rust's Display trait for formatting arbitrary Rust data structures, etc. You could call libc::printf yourself if you wanted to, but that's not idiomatic. I wrote some Rust code calling libc::vsnprintf just the other day, because I got a format string + va_list from C in a log callback.
"Expert C Programming" [1]. Not up to date, but written from a C compiler writer standpoint. A lot of references to why C (and libs) are the way they are.
[1] https://www.amazon.com/Expert-Programming-Peter-van-Linden/d...
Non-checked malloc returns - ouch, I count 12 (out of 56) without a check. Thanks for pointing this out.
Which is why so many people are creating formal verification languages and spending years in research to fix those ... That just isn't true. It's a very complex problem that is an issue in both hardware (cache-coherency protocols) to OS (atomics locks) to higher level construct (commit-rollback in databases).
Consequently
> But the biggest potential is in ability to fearlessly parallelize majority of Rust code, even when the equivalent C code would be too risky to parallelize. In this aspect Rust is a much more mature language than C.
This couldn't be more wrong either. Rust doesn't help you write synchronization primitives safely because it doesn't handle synchronization like locks, condition variables or atomics. You need formal verification to be fearless.
Memory safety is just a small part and is a much easier problem than ensuring the absence of race conditions.
If it was that simple, Tokio wouldn't need to formally verify their implementation with an external tool and it wouldn't have found dozens of well hidden bugs.
C programming patterns have more-or-less equivalents in Rust. OTOH non-trivial C++ OOP or template usage is alien and hard to adapt to Rust.
Rust has 1 (one) way to initialize an object. No constructors, initializer lists, or rules-of-<insert number>. Move semantics are built-in, without move/copy constructors/NRVO/moved-out-of state. No inheritance. No object truncation. Methods are regular function pointers. No SFINAE (generics are equivalent to concepts, and dumber, e.g. no variadic). Iterators require only implementing a single method. Operator overloading is all done in the style of the spaceship operator.
It's not the same kind of complexity.
For fuck's sake.
Maximum speeds are already explored. I wanted to discuss an aspect that's not typically covered by pure benchmarks: what can you expect from normal day-to-day use of these languages. Not fine-tuned hot loops, but a "median" you can expect when you just need to get shit done.
If I tried to write a benchmark code to represent average, practical, idiomatic, but less-than-maximally optimized code, I don't think anyone would believe me that's a fair comparison. So I describe problems and patterns instead, and leave it to readers to judge how much applies to their problems and programming style.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
Also sub-maximum speeds — start at the bottom of the measurements and work up from the 5.37s g++ program to the 0.72s g++ program :-)
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
My experience using Rust vs C aligns with yours as well.
Pahaha
am I a minority having this opinion?
To make static analysis robust in C you need to start reliably tracking ownership and forbid type-erasing constructs. This typically means adding smart pointers, some kind of borrow checking or garbage collection, generics to replace void*, maybe tagged unions, and a new standard library that embraces these features.
It's going to bring most of Rust's complexity and require major code changes anyway, but you won't even get benefits of a newer language.
C++, OTOH, you could probably port most of Rust's concepts into (with some extra language changes for various reasons I don't want to get into). However, since almost no existing C++ code would typecheck in the "safe" subset without modifications, it would effectively be a different language anyway. And to be clear, this isn't necessarily because people are routinely doing dangerous stuff in C++ -- the whole Rust ecosystem has grown up around the borrow checker, which means some very basic things people use in most other languages aren't done. Here are some examples of things typical Rust code does differently from typical C++ code due to it making it much harder to perform safety checks, beyond the obvious aspect of lifetime annotations and genuinely unsafe patterns like accessing globals (sorry, it just is):
* far less use of accessors, especially mutable ones (because Rust can't track split field ownership)
* Rust tends to split up big "shared context" structures depending on function use, rather than logical relationships, for much the same reason (Rust conservatively assumes that all fields are used when a context object gets passed to a function as long as any pointer to the structure remains, even if the fields you use aren't being accessed).
* Rust almost never uses internal or cyclic pointers. It's safe to do it with boxed data or data that doesn't move, and there are safe type mechanisms around that, but it's cumbersome since it has to be visible to the typechecker, so people usually don't bother.
* single-threaded mutation through multiple pointers into the same data structure, which may even be aliased. Again, often safe (though not always), and in the safe cases there are generally safe types to enable it in Rust, but since it's not the default and requires pre-planning for all but the simplest cases, people usually don't bother.
* Rust types are always annotated with thread safety information. This is usually done by default, but if it weren't it would be a huge amount of boilerplate. The reason this works is that in the cases where people are doing unsafe stuff, the type system automatically opts out and requires them to opt in. Libraries have been built around this assumption. Even if we were to port such a mechanism over to C++, the lack of these explicit annotations would mean that in practice it just wouldn't work that well--you would have to do a very detailed thread safety analysis of basically any existing library to try to assign types.
Often, complying with these kinds of rules is what people coming to Rust struggle with--not so much local lifetime issues which the compiler can usually figure out, but how to structure the entire program to make life easy for the borrow checker. However, complying comes with a big benefit--it allows safety analysis to proceed purely locally in almost all cases. The reason that static analyzers don't just "do what Rust does" is that they're dealing with programs that aren't structured that way and need to perform far more global analysis to catch most of the interesting memory safety bugs that pop up in mature C++ codebases, especially the ones that evade code review.
So--do I think it would be great to port this stuff over to C++ (or C, hypothetically?). Absolutely--I still prefer Rust as a language, but at the end of the day memory safety you could layer on top of existing C code would be a huge win for everyone. But I don't see it happening because of the fact that Rust's solution requires serious code restructuring. if people are going to have to rewrite their old programs anyway to work with a tractable static analysis, and not be able to use almost any existing libraries, it's not clear how much more benefit they'd have from using this subset than from just switching to Rust.
And most pertinently, this critique was written by someone who genuinely loves programming in Rust. Shows you that Rust users aren't blinded to the faults of the language. You shouldn't think that Rust users are fanboys just because you see push back to low effort, low knowledge critiques.
That's too much assuming, btw I read in this thread a comment from a well-known Nim dev working in multithreading (with much knowledge on the subject) and it was downvoted to oblivion.
That is putting the bar impossible high. I would expect most of the criticism to come from people who hate to program in Rust, which it is fine as long as the criticism is well argued.
I have a number of specific critiques of Rust, chief being that APIs and implementations are bound too tightly. &[String] and &[&str] are logically similar but changing from one to the other in your implementation might mean a breaking API change.
The people implementing the libraries you use (e.g. Rayon) may have to use TSAN, of course.
A more useful comparison would be to modern C++.
Given that RCU is a complex wait-free data structure (though I don't fully understand it), I suspect it may not necessarily be possible to implement it without unsafe blocks, purely in terms of the standard library concurrency types (atomics and Arc can be used without unsafe, but themselves contain unsafe blocks). The general goal is to create an abstraction which encapsulates unsafe blocks such that it's impossible for outside users calling safe functions to violate memory safety. Of course, libraries sometimes have bugs that need to be fixed.
Even more surprising it got to front page
Do people really have low standard of quality on hacker news too?
Billions of cars with multi-billion ECUs, practically every device running an OS, and several NASA rovers disagree.
"Rust enforces thread-safety of all code and data, even in 3rd party libraries, even if authors of that code didn't pay attention to thread safety. Everything either upholds specific thread-safety guarantees, or won't be allowed to be used across threads."
If you write a library, and use e.g. thread-unsafe `Rc` or not-sure-if-safe raw pointers anywhere in your structs, the compiler will stop me from using your library in my threaded code.
This is based on a real experience. I've written a single threaded batch-processing code, and then tried to make it parallel. The compiler told me that I used a GitHub client, which used an HTTP client, which used an I/O runtime, which in this configuration stored shared state in an object without a Mutex. Rust pointed out exactly the field in 3rd party code that would cause a data race. At compile time.
It's not marketing speak.
This is as same as some one tell you that you will never loose any money by investing a certain asset.
The fact that C is used in so many places speaks for itself about it usefulness. And this is done by writing software by majority of C programmers instead of jumping on every forum to attack other languages, writing extended blog posts just to convince people that they "should" switch to the language they like.
Also if you believe bounds check is the most difficult thing in software development, it just mean that you haven't dealt with a sufficient system yet or you just pretends to be.
The similar thing also applied to that if you think naively putting pthread_mutex_lock and unlock around the data structure is hard, it just means you haven't touched the scenarios that C programmers resorts to non-trivial locking mechanisms for.
As the article mentions, C is 50 years old. The fact that it's still used is evidence of its usefulness, sure. It has outlasted almost all of its peers.
Rust has been stable for under 6 years. In that time, it's been adopted by a slew of major companies, and people have used their free time to write some extremely good software in it. So by that metric, Rust's usefulness speaks for itself, too.
- Regardless it is true or not, this seldom works in long term. I just simply point this observation out.
In fact language as tool is never about more features, it is about minimum features for maximize utilities, and Rust is already on the domain of "feature-rich" language.
I didn't read it, because it might present outdated knowledge.
No, it does not. If Rust programmers don't have discipline in C, other people have.
And don't drag out some random CVE numbers again. These are about a fraction of existing C projects, many of them were started 1980-2000.
It is an entirely different story if a project is started with sanitizers, Valgrind and best practices.
I'm not against Rust, except that they managed to take OCaml syntax and make it significantly worse. It's just ugly and looks like design by committee.
But the evangelism is exhausting. I also wonder why corporations are pushing Rust. Is it another method to take over C projects that they haven't assimilated yet?
I don't think it's ugly because it's design-by-committee, I think they intentionally made it ugly so that it's familiar to C++ people.
> I also wonder why corporations are pushing Rust.
You said it yourself: undisciplined people can't write C without introducing memory-related bugs, and it's much easier to hire undisciplined people than disciplined people.
> It is an entirely different story if a project is started with sanitizers, Valgrind and best practices.
Do you have an example of a project that is (a) built in such a way, (b) large, and (c) has a good track record on memory safety?
Some people are hard learners.
My best guess is that people who are "stuck" working in C or C++ wish they could use Rust at their Jobs.
Or that others would make the leap and get over the learning curve.
For anything else managed languages are a much more productive option, other than writing kernel and drivers.
And TBH I rarely see other popular language did the similar things either, including very popular ones like python, Java or Go.
And you even observe there is thing called "C evangelism" actually exists?
I want to be able to write code without having to be "disciplined" about how I access memory. Means I can be more "disciplined" about business logic.
What are the agreed upon tools and best practices in the C community as of right now?
Recruiting.
All benchmarks should be delivered in the form of a graph and histogram, I had to close a PR recently where the "optimization" was 1% of a standard deviation away from the mean without even running either implementation!