One interesting corollary is that moving short strings in an implementation that does this could actually be ever so slightly (negligibly) slower than moving long ones (since byte copies are slower than word copies). But generally, this is a free lunch optimisation and can save you hundreds of megs of memory when writing programs dealing with millions of short strings.
[1] http://llvm.org/svn/llvm-project/libcxx/trunk/include/string - search for "union"
Also (pedantic):
#define RSTRING_EMBED_LEN_MAX ((int)((sizeof(VALUE)*3)/sizeof(char)-1))
sizeof(char) is always 1, so that division is superfluous.http://stackoverflow.com/q/4562249/25981
sizeof(char) is always defined to be one. This can't be altered by a conforming compiler.
comeon peopleeeee, a bit of an arbitrary internal standard, no?
i understand the point about "CONCLUSION: it doesn't matter for a few strings!" but.... comeonnnn it must matter on some level, otherwise why is Rails such a pain in the ass to optimize? these things must add up...
And even if you'd assume web development as the only purpose, there's a lot of strings that are shorter than 23 characters: Header for request and responses, form fields passed by the client (usernames, passwords, ...), field names passed in hashes, table and column names, template and file names, URLs or even the occasional, totally rare string in a json structure. It's an optimization with major gain and little loss.
Conclusion: "Don’t worry! I don’t think you should refactor all your code to be sure you have strings of length 23 or less."
struct RString {
struct RBasic basic;
union {
struct {
long len;
char *ptr;
union {
long capa;
VALUE shared;
} aux;
} heap;
char ary[];
} as;
};
/* apologies if I messed up the syntax here */
#define RSTRING_EMBED_LEN_MAX (sizeof(((RString*)(0))->as) - 1)
Then you can even use the padding the compiler added, if any, plus you can add more things to heap and the embed length will grow automatically.On a related note, I've found a much less comprehensive (but still useful) guide to Python internals: http://tech.blog.aknin.name/category/my-projects/pythons-inn....
Also, I would guess the performance gains (from skipping malloc) would wash out the longer your average string gets -- even if the huge stack use doesn't kill your performance for some other reason (blowing the d-cache?).
Ruby experts can correct me if I'm wrong, but when Ruby sees a name like "str2" it looks it up in a table, which points it to the RString structure. From there, it can follow the pointer to the actual array of characters, which in this case is only stored once.
In fact, the diagram is simply wrong. This was rectified by the author in an article two weeks later:
http://patshaughnessy.net/2012/1/18/seeing-double-how-ruby-s...
VALUE seems to be unsigned int defined via "typedef uintptr_t VALUE;" and "typedef unsigned __int64 uintptr_t;"
But why is it calculated like that I don't get. Anyone can explain?
What I don't know is why they don't simply use sizeof(heap) as the buffer size.
https://github.com/ruby/ruby/blob/8f77cfb308061ff49de0a47e82...
Note the `as` union. The `heap` version has three VALUE-sized entries, so RSTRING_EMBED_LEN_MAX is calculated accordingly, with the -1 to account for the null terminator.
Edit: I missed that part of that was another union, removed what I said about it being off on 32 bit.
I still don't understand why they go so roundabout by dividing by one and casting to int...
When programmers don't know in advance how long name/email/input/whatever field is going to be - they just use the magic "power of two" length :)
So 32 (or 33) in this case would be more reasonable.
#define RSTRING_EMBED_LEN_MAX ((int)((sizeof(VALUE)*3)/sizeof(char)-1))
23 wasn't chosen, it was calculated to be the size that would be required for a struct describing a heap string, and will actually be a different number for different architectures. Choosing to make it bigger would add unnecessary overhead to the RString struct.They did know. And there are many more cutoffs than powers of two, depending on storage backend.
If this is intended to sit on the stack, which I find highly unlikely (especially given the timings that seem to be the delta between one malloc and two, and would be much more significant if it were a stack allocation versus a heap allocation. This is not comparable to small string optimizations for the stack in C++), maybe. But otherwise it seems like a poorly considered hack.
The string type could as easily have been dynamically allocated based upon the length of the string, where the ptr by default points inside that same allocated block. If the string is expanded it can then be realloced and the string alloced somewhere else. No waste, a single allocation, etc.
allocation of the RString (which becomes larger and thus more difficult to malloc), and the 23 string bytes that will sit unused for longer strings.
I got the distinct impression (backed up by an actual code snippet defining the max embedded string size) that the 23 byte limit was calculated to exactly match the size of the data that would otherwise have to be stored for a heap string anyway. Thus, it doesn't actually take any extra space in the struct, and those 23 bytes do not go unused in other strings.So this holds (on a 64-bit machine) 8-bytes for the pointer, 8-bytes for the length (string not null terminated), and 8-bytes for the capacity. Alternately, via a union, it stores 24 bytes of string (null terminated). It knows whether it is a or b via a separate flag that it holds separately in RBasic.
I retract my jab about memory loss, but it still sounds rather terrible. Every bit of code dealing with strings needs to validate flags on every use to determine what it is dealing with, alternate between length specified or null terminated, etc. Ugh.
The key point is that for short strings, you're not adding it to the cost of the long string, but instead overwriting the data that you would track the long string.
In other words, the memory layout you get is this:
short:
[Rbasic]['s', 't', 'r', 'i', 'n', 'g', '\0']
long:
[Rbasic][ length ][ dataptr ][capa or value]
Which leads to another interesting question: How does the optimization interact with null in the middle of short strings, since only long strings have a length stored? Does Ruby check for embedded null when creating a string, and disable this optimization?