The move is so we can avoid allocating a string each we declare and use it since it will be frozen by default. It is a big optimization for GC mainly. Before we had to do such optimization by hand if we intend not to modify it:
# before
def my_method
do_stuff_with("My String") # 1 allocation at each call
end
# before, optim
MY_STRING = "My String".freeze # this does 2 allocations with 1 at init being GC quite early
def my_method
do_stuff_with(MY_STRING)
end
# after
def my_method
do_stuff_with("My String") # 1 allocation first time
end
But this move also complicates strings manipulation in the sense of it will lean users toward immutable ops that tend to allocate a lot of strings. foo.upcase.reverse
# VS
bar = foo.dup
bar.upcase!
bar.reverse!
So now we have to be deliberate about it: my_string = +"My String" # it is not frozen
We have frozen string literals for quite a while now, enabled file by file with the "frozen_string_literal: true" comment and I've seen it as the recommended way by the community and the de-facto standard in most codebase I've seen. It is generally enforced by code quality tools like Rubocop.So the mutable vs immutable is well known, and as it is part of the language, well, people should know the ins and outs.
I'm just a bit surprised that they devised this long path toward real frozen string literals, because it is already ongoing for years with the "frozen_string_literal: true" comment. Maybe to add proper warnings etc. in a way that does not "touch" code ? I prefer the explicit file by file comment. And for deps, well, the version bump of Ruby adding frozen string literals by default is quite a filter already.
Well, Ruby is well alive and it is what matters)
The original plan was to make the breaking change in 3.0, but that plan was canceled because it broke too much code all at once.
Hence why I proposed this multi-step plan to ease the transition.
See the discussion on the tracker if you are curious: https://bugs.ruby-lang.org/issues/20205
I say sorta late to the party, as I think it is more than fair to say there was not much of a party that folks were interested in in the lisp world. :D
Oh, I think I see some nameless person I know over there. Well-met Lisper, but goodbye!
Would Ruby be as successful if they had all those complicated features right from the start ?
Or do all languages start from a nice simple clean slate tabula rasa to get developers hooked, until the language is enough famous to get well developed and starts to be similar to all others big programming languages ?
Mutable strings are totally possible (and not even especially hard) in compiled, statically typed, and lower-level languages. They're just not especially performant, and are sometimes a footgun.
> all those complicated features right from the start
Arguably, mutable strings are the more complicated feature. Removing them by default simplifies the language, or at least forces you to go out of your way to find the complexity.
What? Mutable strings are more performant generally. Sometimes immutability allows you to use high level algorithms that provide better performance, but most code doesn't take advantage of that.
<< is inplace append operator for strings/arrays, while + is used to make copy. So += will make new string & rebind variable
Good reminder that anyone can go on the internet, just say stuff, and be wrong.
I recall it was a bit bumpy, but not all that rough in the end. I suppose static type checking helps here to find all the ways how it could be used. There was a switch to allow running old code (to make strings and buffers interchangeable).
Ruby is not doing that, it's transitioning from mutable strings that can be frozen with no special treatment of literals (unless you opt-in to literals being frozen on per file basis) to mutable strings with all string literals frozen.
With immutable strings literals, string literals can be reused.
You make an arrow function that takes an object as input, and calls another with a string and a field from the object, for instance to populate a lookup table. You probably don’t want someone changing map keys out from under you, because you’ll break resize. So copies are being made to ensure this?
fooLit = "foo"
fooVar = "f".concat("o").concat("o")
This would have fooLit be frozen at parse time. In this situation there would be "foo", "f", and "o" as frozen strings; and fooLit and fooVar would be two different strings since fooVar was created at runtime.Creating a string that happens to be present in the frozen strings wouldn't create a new one.
1. Strings have a flag (FL_FREEZE) that are set when the string is frozen. This is checked whenever a string would be mutated, to prevent it.
2. There is an interned string table for frozen strings.
> Does it keep a reference count to each unique string that requires a set lookup to update on each string instance’s deallocation?
This I am less sure about, I poked around in the implementation for a bit, but I am not sure of this answer. It appears to me that it just deletes it, but that cannot be right, I suspect I'm missing something, I only dig around in Ruby internals once or twice a year :)
The interned string table uses weak references. Any string added to the interned string tables has the `FL_FSTR` flag set to it, and when a string a freed, if it has that flag the GC knowns to remove it from the interned string table.
The keyword to know to search for this in the VM is `fstring`, that's what interned strings are called internally:
- https://github.com/ruby/ruby/blob/b146eae3b5e9154d3fb692e8fe...
- https://github.com/ruby/ruby/blob/b146eae3b5e9154d3fb692e8fe...
Though since Ruby already has symbols which act as immutable interned strings, frozen literals might just piggyback on that, with frozen strings being symbols under the hood.
Variables don't "contain" a string, they just point to objects on the heap.
So:
my_string = same_string = "Hello World"
Here both variables are essentially pointers to a pre-existing object on the heap, and that object is immutable. SUB_ME = ':sub_me'.freeze
def my_method(method_argument)
foo = 'foo_:sub_me'
foo.sub!(SUB_ME, method_argument)
foo
end
which, without `# frozen_string_literal: true`, I believe allocates a string when the application loads (it sounds like it might be 2) and another string at runtime and then mutate that.That seems like it's better than doing
# frozen_string_literal: true
FOO = 'foo_:sub_me'
SUB_ME = ':sub_me'
def my_method(method_argument)
FOO.sub(SUB_ME, method_argument)
end
because that will allocate the frozen string to `FOO` when the application loads, then make a copy of it to `foo` at runtime, then mutate that copy. That means two strings that never leave memory (FOO, SUB_ME) and one that has to be GCed (return value) instead of just one that never leaves memory (SUB_ME) and one that has to be GCed (foo/return value).This is true in particular when FOO is only used in `my_method`. If it's also used in `my_other_method` and it logically makes sense for both methods to use the same base string, then it's beneficial to use the wider-scope constant.
(The reason this seems reasonable in an application is that the method defines the string, mutates it, and sends it along, which primarily works because I work on a small team. Ostensibly it should send a frozen string, though I rarely do that in practice because my rule is don't mutate a string outside the context in which it was defined, and that seems sensible enough.)
Am I mistaken and/or is there another, perhaps more common pattern that I'm not thinking about that makes this desirable? Presumably I can just add # frozen_string_literal: false to my files if I want so this isn't a complaint. I'm just curious to know the reasoning since it is not obvious to me.
So I sometimes wonder why JIT isn't used as a motivation to move / remove features. Basically if you want JIT to work, your code has to be x ready or without feature x. So if you still want those performance improvements you will have to move forward.
But not actually stated it's the plan. I'd bet whatever LLM wrote the article took it as a stronger statement than it is.
I had to explain the same reasoning in Reddit the other day. Perhaps it’s time to take this as a feedback and update the blog.
Btw I just asked gpt to write an article on the same topic, with a reference to the Ruby issues page. And it DID NOT add the future proposal part. So LLMs are definitely smarter than me.
An obviously good change, actually massive performance improvements not hard to implement but its still gonna be such a headache and dependency hell
https://www.ruby-lang.org/en/news/2015/12/25/ruby-2-3-0-rele...
Most linting setups I've seen since then have required this line. I don’t expect many libraries to run afoul of this, and this warning setting will make finding them easy and safe. This will be nothing like the headache Python users faced transitioning to 3.
I agree it has been a well advertised and loudly migration path and timeframe for it
The rest of the changes were a bit annoying but mostly boring; some things could have been done better here too, but the string encoding thing was the main issue that caused people to hold on to Python 2 for a long time.
The frozen string literal changes are nothing like it. It's been "good practise" to do this for years, on errors fixing things is trivial, there is a long migration path, and AFAIK there are no plans to remove "frozen_string_literal: false". It's just a change in the default from false to true, not a change in features.
"Learning lessons" doesn't mean "never do anything like this ever again". You're the one who failed to learn from Python 3, by simply saying "language change bad" without deeper understanding of what went wrong with Python 3, and how to do things better. Other languages like Go also make incompatible changes to the language, but do so in a way that learned the lessons from Python 3 (which is why you're not seeing people complain about it).
And since that flag really doesn't require lots of work in the VM, it's likely to be kept around pretty much forever.