An informal comparison of the three major implementations of std:string (opens in new tab)

(devblogs.microsoft.com)

127 pointstyoma2y ago29 comments

29 comments

Somewhat related - but there are a ridiculous number of platform/systems that ship with derivative of Dinkum C++ standard library - MSVC (included). When you lookup the company behind it - it's basically small shop that seems to be primarily one guy up in MA.

nanoxide2y ago

That one guy even has it's own Wikipedia article https://en.m.wikipedia.org/wiki/P._J._Plauger

camel-cdr2y ago

He wrote a very lovely book called "The C standard library". It contains a full implementation of the c standard library with lots of explenations and excerpts from the C standard.

rqtwteye2y ago

That one guy has been at it for a very long time. I remember buying Dinkum around 2000.

Iwan-Zotow2y ago

This one guy has something like badge #10 in C++

tialaramex2y ago

The SSO capacity tables seem obviously wrong to me. For clang/libc++ we see 11 or 22 bytes, in each case it's the size of the whole data structure, minus one byte for the bit flag and length value, and another byte for the zero (ASCII NUL) that's obligatory in C++

But for MSVC and GCC we're told 16 bytes as each has a 16-byte buffer. However they still need that obligatory zero byte, for ASCII NUL so surely the table should show 15.

The most popular string in C++, the one which makes SSO almost obligatory as a design feature for the language's string object, is the empty string. On a modern (64-bit) machine Clang can store that inline in a local 24-byte object, MSVC and GCC need 32-bytes. Without SSO it would mean probably 24 bytes and a heap allocation. That's hard to swallow.

tialaramex2y ago

[Raymond has now corrected the charts in the article, and also made other fixes]

blt2y ago

This is great. It would be a good C/C++ interview question to compare these. Of course you can't expect a Raymond Chen level performance, but it should give some insight into experience with low level programming.

mianos2y ago

Interesting expectations. So you expect the new guy to be famously better than anyone already at the firm. Highly experienced in something 0.0001% of developers ever actually do. Top marks for bike-shedding as well. I guess being on the ISO committee and being responsible for adding some rando feature would be a ring in.

dmazzoni2y ago

I was imagining you’d show someone the simplified struct representations from the top of the article and ask them to compare/contrast them. Not to know this stuff of the top of their head.

blt2y ago

yes, that is what I meant.

dmazzoni2y ago

This is so interesting!

I wonder if anyone has ever tried an implementation that prioritizes even more minimal memory usage for small strings?

Of the three implementations discussed, the smallest (on a 64-bit system) is 24 bytes.

How about 8 bytes: the union of a single pointer and a char[8]?

For strings of 6 or fewer characters, it uses the first 6 bytes to store the string, one for the null terminator, and the last byte to store the length, but with a trick (see below).

For strings of 7 or more characters, it uses all 8 to store the address of a larger block of memory storing the capacity, size, and string bytes.

The trick is to use the last byte as a way to know whether the bytes are encoding a short string or a pointer. Since pointers will never be odd, we just store an odd number in that last byte. For example, you could store (size << 1) | 0x1

So if the last bit is 1, it's a short string. The size is bytes[7] >> 1, and the data() pointer is just equal to the string itself.

If the last bit is 0, treat the whole data as a pointer to a structure that encodes the capacity and size and then string bytes as usual.

variadix2y ago

Having to do a load to get the metadata is probably why. Say you have a big array of strings and you care about the compactness of that array for cache reasons, so making the string structs small matters. You probably want to do some operation on all of them, e.g. accumulate the length of all of them so you can determine how much space is required to write them out. Now this either requires a byte scan for the null to determine length, or it requires a load from the heap (that’s probably cold) for the metadata, and a second load from the heap (that is also probably cold).

Generally when you do memory compactification of non-serialized structures it’s for cache related performance improvements, so unless the common case is overwhelmingly small strings there’s probably not a benefit. It’s worth testing to see where this is an improvement, but my guess is that it’s in extremely limited regimes, but maybe some bit manipulations instead of traditional byte scan could yield improvements, likewise it may be worth going to larger SSO sizes for SIMD and a larger scope of SSO cases.

If you’re interested in data structure optimizations like this there’s a cppcon talk about Facebook’s string implementation that has some clever tricks.

dataflow2y ago

The most interesting thing imo is what they all do similarly: they all store the size, instead of the end pointer -- unlike, say, std::vector. Exercise for the reader as to why this is the right tradeoff.

techjamie2y ago

Maybe the sheer amount that strings are getting iterated through makes storing length computationally cheaper? Unless you're using very optimized methods to work with the string, the majority of string operations involve a lot of iteration and comparison. Vectors, on the other hand, have much more varied use and possibly may never be iterated at all (though they likely will somewhere).

cpeterso2y ago

> getting iterated through

So for loop conditions like

  for (size_t i = 0; i < str.size(); i++)

are cheaper to calculate?

whoopdedo2y ago

GCC putting a pointer at the top of the structure seems reminiscent of the way Pascal stored strings. A PString is the address of a character buffer like C, but the length of the string is stored at a negative offset. I may be remembering wrong but I think there was an older C++ STL that also used negative offsets.

As much as these snippets make clang look heavier, I wonder what it compiles to in practice when the compiler can make better inferences. If you can prove the state of the `is_small` bit those branches disappear. Even at runtime, which implementation is more performant? Real-world profiling may favor clang with branch prediction and speculative processing. Then again, speculation has become a dirty word lately.[1]

[1] Get it? "Dirty" because of the cache. I'm sorry, that pun was entirely unintentional.

asveikau2y ago

> A PString is the address of a character buffer like C, but the length of the string is stored at a negative offset. I may be remembering wrong but I think there was an older C++ STL that also used negative offsets.

In the Microsoft world, BSTR from COM/OLE does this. Though I think the length prefix is 4 bytes, and the payload is 16 bit wchars.

cpeterso2y ago

An interesting difference between BSTR and Pascal strings is that BSTR strings, in addition to the length prefix, are NUL terminated (for compatibility with C string APIs). And since Pascal strings track their length in the prefix, they can support strings with embedded NUL bytes.

twoodfin2y ago

Agree, re: clang. In dominant 64-bit platforms it’s both smaller and eliminates more allocations.

When placing a bet on real workload performance, I’d take those attributes every day of the week and twice on Sundays.

Iwan-Zotow2y ago

> it’s both smaller

not sure about "smaller" - cacheline is 32bytes or larger on all modern 64bit cpus

kingsleyopara2y ago

Obligatory link to a must watch, the CppCon 2016 talk on the complexities of std::string: https://youtu.be/kPR8h4-qZdk?si=x2DbgNIZcKyK5PKt

gary_02y ago

Note that EASTL (an alternative STL used in game development) does std::string the "Clang way": https://github.com/electronicarts/EASTL/blob/master/include/...

archermarks2y ago

Nice article, thanks for sharing! Been implementing my own string type for fun recently and this is a useful reference!

ajross2y ago

tl;dr: libc++ is just bad, libstdc++ and MSVC trade punches for first place, with the eyeball win going to the FSF.

Though really the performance gates on string-heavy code tend to be in the heap and not the string library itself.

GuB-422y ago

Size is very important, and in may experience less memory usage tend to beat using less instructions. So I understand Clang decision to trade size for complexity.

Clang developers are not stupid, if they did that instead of copying what GCC or MSVC did (they both predate clang), there is a good reason, and my guess is that it is better aligned with how modern architectures and optimizers work. Anyways, that's a tradeoff, but on a modern high performance computer, intuitively, I would side with clang.

cozzyd2y ago

On the other hand, ABI incompatibility has a cost (not that GCC has stayed ABI-compatible with itself...)

BeeOnRope2y ago

libc++ string is smaller with a higher SSO capacity which in many scenarios can overwhelm the code generation. So it's hard to draw an absolute conclusions.

Iwan-Zotow2y ago

> libc++ string is smaller

modern 64bit cpus has no less than 32bytes cacheline

j / k navigate · click thread line to collapse

29 comments

gte525u2y ago

nanoxide2y ago

That one guy even has it's own Wikipedia article https://en.m.wikipedia.org/wiki/P._J._Plauger

camel-cdr2y ago

He wrote a very lovely book called "The C standard library". It contains a full implementation of the c standard library with lots of explenations and excerpts from the C standard.

rqtwteye2y ago

That one guy has been at it for a very long time. I remember buying Dinkum around 2000.

Iwan-Zotow2y ago

This one guy has something like badge #10 in C++

tialaramex2y ago

But for MSVC and GCC we're told 16 bytes as each has a 16-byte buffer. However they still need that obligatory zero byte, for ASCII NUL so surely the table should show 15.

tialaramex2y ago

[Raymond has now corrected the charts in the article, and also made other fixes]

blt2y ago

mianos2y ago

dmazzoni2y ago

I was imagining you’d show someone the simplified struct representations from the top of the article and ask them to compare/contrast them. Not to know this stuff of the top of their head.

blt2y ago

yes, that is what I meant.

dmazzoni2y ago

This is so interesting!

I wonder if anyone has ever tried an implementation that prioritizes even more minimal memory usage for small strings?

Of the three implementations discussed, the smallest (on a 64-bit system) is 24 bytes.

How about 8 bytes: the union of a single pointer and a char[8]?

For strings of 6 or fewer characters, it uses the first 6 bytes to store the string, one for the null terminator, and the last byte to store the length, but with a trick (see below).

For strings of 7 or more characters, it uses all 8 to store the address of a larger block of memory storing the capacity, size, and string bytes.

So if the last bit is 1, it's a short string. The size is bytes[7] >> 1, and the data() pointer is just equal to the string itself.

If the last bit is 0, treat the whole data as a pointer to a structure that encodes the capacity and size and then string bytes as usual.

variadix2y ago

If you’re interested in data structure optimizations like this there’s a cppcon talk about Facebook’s string implementation that has some clever tricks.

dataflow2y ago

techjamie2y ago

cpeterso2y ago

> getting iterated through

So for loop conditions like

  for (size_t i = 0; i < str.size(); i++)

are cheaper to calculate?

whoopdedo2y ago

[1] Get it? "Dirty" because of the cache. I'm sorry, that pun was entirely unintentional.

asveikau2y ago

In the Microsoft world, BSTR from COM/OLE does this. Though I think the length prefix is 4 bytes, and the payload is 16 bit wchars.

cpeterso2y ago

twoodfin2y ago

Agree, re: clang. In dominant 64-bit platforms it’s both smaller and eliminates more allocations.

When placing a bet on real workload performance, I’d take those attributes every day of the week and twice on Sundays.

Iwan-Zotow2y ago

> it’s both smaller

not sure about "smaller" - cacheline is 32bytes or larger on all modern 64bit cpus

kingsleyopara2y ago

Obligatory link to a must watch, the CppCon 2016 talk on the complexities of std::string: https://youtu.be/kPR8h4-qZdk?si=x2DbgNIZcKyK5PKt

gary_02y ago

Note that EASTL (an alternative STL used in game development) does std::string the "Clang way": https://github.com/electronicarts/EASTL/blob/master/include/...

archermarks2y ago

Nice article, thanks for sharing! Been implementing my own string type for fun recently and this is a useful reference!

ajross2y ago

tl;dr: libc++ is just bad, libstdc++ and MSVC trade punches for first place, with the eyeball win going to the FSF.

Though really the performance gates on string-heavy code tend to be in the heap and not the string library itself.

GuB-422y ago

Size is very important, and in may experience less memory usage tend to beat using less instructions. So I understand Clang decision to trade size for complexity.

cozzyd2y ago

On the other hand, ABI incompatibility has a cost (not that GCC has stayed ABI-compatible with itself...)

BeeOnRope2y ago

libc++ string is smaller with a higher SSO capacity which in many scenarios can overwhelm the code generation. So it's hard to draw an absolute conclusions.

Iwan-Zotow2y ago

> libc++ string is smaller

modern 64bit cpus has no less than 32bytes cacheline

j / k navigate · click thread line to collapse