https://github.com/Amanieu/hashbrown
https://blog.waffles.space/2018/12/07/deep-dive-into-hashbro...
I've hand-rolled and/or reused hash tables that use various probing methods (removing both allocations and extra indirections) and the performance improvement is non-trivial (for my workloads anyhow).
Never went for performance, but at the most I use `std::map<int, pointer>` (which is a RB-tree).
You can implement RQ without a hashmap.
It is only used to track the symbol number (uint32), so introducing hashing sounds a bit wasteful. You could also do a normal vector actually, but since the symbol numbers are taken from the network, that might result in a lot of reordering or pointless allocation if you are not really careful
--edit: typos
RaptorQ is kind of a gaussian elimination of a matrix, so it all depends on the block size (=>matrix size). The algorithm has basically cubic complexity on the number of symbols in a block. RFC6330 is made to work on files, which are divided into blocks with a certain number of symbols, and the bytes are interleaved.
This implementation does not do the (complex and almost pointless) interleaving, which is fine, even OpenRQ does not.
The bench seems to be done on a.... 10kb file? It all fits in the L2. We are not given the symbol size (which determines the block size!) and I assume all of this fits in a 10x10 matrix.
You are benchmarking operations on a matrix that is (more or less) a 10x10 byte matrix.
The biggest part of this benchmark might almost be the generation of repair symbols (was it even done?), since that would require multiple xoring of the above-mentioned symbols.
This is much closer to micro-benchmarking than an actual benchmark, imho. It would have been more interesting to see what happens with files at least larger than the L3 cache.
You can also cache intermediate results, (which he does not do) which is especially useful for encoding, but only when working on matrix >= 100x100, otherwise just searching the cache, getting from memory (my implementation optionally did LZ4 compression/decompression) and doing a matrix multiplication is slower than just computing the matrix again.
Still, it's nice to see implementations of the RFC, which is a real pain to read...
If anyone is qualified enough to take a crack at it I can try and arrange for sponsorship as well.
O(n) sounds impressive, although I heard such claims on RQ too...
RQ complexity is O(n^3), but on the internal matrix, not on the input. It all depends on the blocksize you choose.
The numbers are impressive, too. Encoding 40Mbytes in 0usec? Will have to check the theory behind that.
The concept of a fountain seems interesting but what is a good use case? Variable strength error correction?
Receivers would listen for as many of these packets as they can, as and when they can (the transmission can be "lossy"), and if a receiver pickups up 100 unique packets, any 100 unique packets in any order, it'll have a 99.9% (or something along those lines) chance of decoding the message successfully. Each extra unique packet adds a 9 to the chances.
That's why the "fountain". It's a data fountain, and you can grab any quantity of data off it at any point and still have a good chance of reconstructing the message.
In case you don't find what you're looking for, here are all lectures on "Information Theory, Pattern Recognition, and Neural Networks": http://videolectures.net/course_information_theory_pattern_r...
I'd be remiss not to mention that the lectures are by the late David MacKay: https://news.ycombinator.com/item?id=11500221
ISBN 9780521782807
Most of the optimizations listed have nothing to do with unsafe, and have to do with algorithmic changes that you could apply in any language, and could apply to safe code just as well as to unsafe code. Even if we pretend all unsafe used there is necessary, and interpret the source in the worst possible light for rust - that the optimization steps would be impossible without unsafe - I think we'd only be talking 4x to 5x slower from this article, not 25x.
But - I strongly suspect not all that unsafe is necessary, and the post certainly doesn't claim it is.
The first code snippet uses unsafe to skip bounds checks. There are plenty of other ways to get the optimizer to skip them for you in safe code - or at the very least, hoist them out of the loop instead. This seems like a good read showing some concrete examples:
https://coaxion.net/blog/2018/01/speeding-up-rgb-to-grayscal...
Explicitly using SIMD code like the second code snippet does might require unsafe for the transmuted slicing under the hood (implemented as raw pointer derefs in this case), but it's possible to build safe abstractions on top of that. That blog post links this, for example, although it didn't end up using it (EDIT: Actually, it initially didn't, but now does): https://github.com/AdamNiederer/faster . It does use unsafe internally: https://github.com/AdamNiederer/faster/search?q=unsafe&unsco...
The fragment of code in the article with `get_unchecked_mut()` shouldn't be necessary. It's a simple case that LLVM should be able to optimize. And if it didn't, it could be helped by either iterating both slices with `zip()` or a trick `let slice = &slice[0..len];` which proves to LLVM that you have the required length and it doesn't need to be checked again.
But overall the article seems in line what you'd expect from Rust: you have low-level control over memory layout and safety checks, and you can make trade-offs to squeeze maximum performance out of an algorithm if you need to.
I don't see why they used unsafe in `add_assign` - an assert!(octets.len() == other.len()) would likely have elided the bounds checks.
I would imagine without these simd optimizations you would see a 2X slowdown at least.
[0] https://en.wikipedia.org/wiki/Turbo_code
[1] https://en.wikipedia.org/wiki/Polar_code_(coding_theory)
Qualcomm has asserted that these Raptor code-related patents (an early one of which was filed in 2004) are standards essential, and require to be licensed from Qualcomm.[1][2]
The below-linked patent would expire in 2024. However, there are a slew of continuation applications that expire much later than 2024.
Update: additionally, there is at least one earlier-dated patent filed in 1999, which expired in February. [3]
[1] https://datatracker.ietf.org/ipr/2554/