Symbols and frozen strings

B

Brian Candler

I just had a thought.

One of the problems with using strings as hash keys is that every time you
refer to them, you create a throw-away garbage string:

params["id"]
^
+-- temporary string, needs to be garbage collected

In Rails you have HashWithIndifferentAccess, but this actually isn't any
better. Although you write params[:id], when executed the symbol is
converted to a string anyway.

In a Rails-like scenario, using symbols as the real keys within the hash
doesn't work: the keys come from externally parsed data, which means (a)
they were strings originally, and (b) if you converted them to symbols you'd
risk a symbol exhaustion attack.

So I thought, wouldn't it be nice to have a half-way house: being able to
converting a symbol to a string, in such a way that you always got the same
(frozen) string object?

This turned out to be extremely easy:

class Symbol
def fring
@fring ||= to_s.freeze
end
end

irb(main):006:0> :foo.fring
=> "foo"
irb(main):007:0> :foo.fring.object_id
=> -605512686
irb(main):008:0> :foo.fring.object_id
=> -605512686
irb(main):009:0> :bar.fring
=> "bar"
irb(main):010:0> :bar.fring.object_id
=> -605543036
irb(main):011:0> :bar.fring.object_id
=> -605543036
irb(main):012:0> :bar.fring << "x"
TypeError: can't modify frozen string
from (irb):12:in `<<'
from (irb):12
from :0

Is this a well-known approach, and/or it does it exist in any extension
library?

I suppose that an instance variable lookup isn't necessarily faster than
always creating a temporary string with to_s and then garbage collecting it
at some point later in time, but it feels like it ought to be :)

However, since I've seen discussion about string modifiers like "..."u,
perhaps there's scope for adding in-language support, e.g.

"..."f - frozen string, same object ID each time it's executed

In that case, it might be more convenient the other way round:

"..." - frozen string literal, same object
"..."m - mutable (unfrozen) string literal, new objects
String.new("...") - another way of making a mutable string
"...".dup - and another

That would break a lot of existing code, but it could be pragma-enabled.

Sorry if this ground has been covered before - it's hard to keep up with
ruby-talk :)

Regards,

Brian.
 
N

Nobuyoshi Nakada

Hi,

At Thu, 6 Sep 2007 16:50:28 +0900,
Brian Candler wrote in [ruby-talk:267857]:
So I thought, wouldn't it be nice to have a half-way house: being able to
converting a symbol to a string, in such a way that you always got the same
(frozen) string object?

Rather, Symbol#to_s should return frozen String?
I suppose that an instance variable lookup isn't necessarily faster than
always creating a temporary string with to_s and then garbage collecting it
at some point later in time, but it feels like it ought to be :)

However, since I've seen discussion about string modifiers like "..."u,
perhaps there's scope for adding in-language support, e.g.

"..."f - frozen string, same object ID each time it's executed

What about "..."o like Regexp?
 
B

Brian Candler

Rather, Symbol#to_s should return frozen String?

Yes, as long as it returns the same frozen string each time.

Hmm, this sounds like a good solution - it's technically not
backwards-compatible, but I doubt that much code does a Symbol#to_s and
later mutates it.
What about "..."o like Regexp?

Sure, I don't mind about the actual syntax.

Of course, you don't even need to add 'o' to a Regexp in the case where it
doesn't contain any #{...} interpolation:

irb(main):001:0> RUBY_VERSION
=> "1.8.4"
irb(main):002:0> 3.times { puts /foo/.object_id }
-605554606
-605554606
-605554606

Regards,

Brian.
 
T

Trans

Yes, as long as it returns the same frozen string each time.

Hmm, this sounds like a good solution - it's technically not
backwards-compatible, but I doubt that much code does a Symbol#to_s and
later mutates it.

I've tried that. There are some places where it blows up Ruby. So
those would have to be rooted out first.

T.
 
R

Robert Klemme

2007/9/6 said:
I've tried that. There are some places where it blows up Ruby. So
those would have to be rooted out first.

I always prefer less intrusive solutions. Why not do this:

SYMS = Hash.new {|h,sy| h[sy]=sy.to_s}

Then, wherever you need this, just do "SYMS[a_sym]" instead
"a_sym.to_s". Added advantage, you can throw away or clear SYMS when
you know you do not need it any more thusly freeing up memory.

Kind regards

robert
 
J

Joel VanderWerf

Brian said:
I just had a thought.

One of the problems with using strings as hash keys is that every time you
refer to them, you create a throw-away garbage string:

params["id"]
^
+-- temporary string, needs to be garbage collected

Setting aside the question of freezing, why can't ruby share string data
for all strings generated from the same symbol? And in that case you
could do the following to avoid garbage:

params[:id.to_s]

(Or ruby could even look up the literal "id" in the symbol table and do
this for you.)

This code shows some of the cases in which ruby does and does not share
string contents:


def show_vmsize
GC.start
puts `ps -o vsz #$$`[/\d+/]
end

s = "a"*1000
sym = s.to_sym

show_vmsize # 8712

# ruby apparently does not share storage for strings derived
# from the same symbol:

strs1 = (0..10_000).map do
sym.to_s
end

show_vmsize # 18488

# ruby does share storage for string ops:

strs2 = (0..10_000).map do
s[0..-1]
end

show_vmsize # 18616

strs3 = (0..10_000).map do
s.dup
end

show_vmsize # 18616
 
J

Joel VanderWerf

Joel said:
Brian said:
I just had a thought.

One of the problems with using strings as hash keys is that every time
you
refer to them, you create a throw-away garbage string:

params["id"]
^
+-- temporary string, needs to be garbage collected

Setting aside the question of freezing, why can't ruby share string data
for all strings generated from the same symbol? And in that case you
could do the following to avoid garbage:

params[:id.to_s]

Sorry... _reduce_ garbage, not avoid it altogether, since there is still
the T_STRING, even though the data is reused. It would help more for
long strings than for short strings, because the T_DATA is smaller in
proportion.

The idea of a literal for a unique frozen string would reduce garbage
further, sharing the T_STRING as well as the data.
 
K

khaines

I just had a thought.

One of the problems with using strings as hash keys is that every time you
refer to them, you create a throw-away garbage string:

params["id"]
^
+-- temporary string, needs to be garbage collected

Absolutely.

Whether you need to care about this, though, depends on how often your
code is building these throwaway strings, and on just how much you really
need to neurotically performance tweak your code.

What I do to deal with this in code where I consider it important is to
use constants that contain frozen strings.

Id = 'id'.freeze

params[Id]

Constant lookup isn't the fastest thing in Ruby, but it's faster than the
combined load of creating the throwaway string object, and then garbage
collecting it.


Kirk Haines
 
J

Joel VanderWerf

Joel said:
Sorry... _reduce_ garbage, not avoid it altogether, since there is still
the T_STRING, even though the data is reused. It would help more for
long strings than for short strings, because the T_DATA is smaller in
proportion.

Sorry again... I don't know where T_DATA came from. Should be T_STRING,
the constant-size overhead for a string object. Will stop posting until
caffeine hits.
 
B

Brian Candler

Setting aside the question of freezing, why can't ruby share string data
for all strings generated from the same symbol?

Because it could generate unexpected aliasing. The normal, expected
behaviour is no aliasing:

irb(main):001:0> a = :foo.to_s
=> "foo"
irb(main):002:0> b = :foo.to_s
=> "foo"
irb(main):003:0> b << "bar"
=> "foobar"
irb(main):004:0> a
=> "foo"

That's why the string has to be frozen.

Regards,

Brian.
 
J

Joel VanderWerf

Brian said:
Because it could generate unexpected aliasing. The normal, expected
behaviour is no aliasing:

irb(main):001:0> a = :foo.to_s
=> "foo"
irb(main):002:0> b = :foo.to_s
=> "foo"
irb(main):003:0> b << "bar"
=> "foobar"
irb(main):004:0> a
=> "foo"

This was what I was thinking of:

irb(main):001:0> a = :foo.to_s
=> "foo"
irb(main):002:0> b = a.dup
=> "foo"
irb(main):003:0> b << "bar"
=> "foobar"
irb(main):004:0> a
=> "foo"

Internally, a and b use the same storage, but copy-on-write prevents
aliasing.
 
E

evanwebb

I just had a thought.

However, since I've seen discussion about string modifiers like "..."u,
perhaps there's scope for adding in-language support, e.g.

"..."f - frozen string, same object ID each time it's executed

In that case, it might be more convenient the other way round:

"..." - frozen string literal, same object
"..."m - mutable (unfrozen) string literal, new objects
String.new("...") - another way of making a mutable string
"...".dup - and another

Rubinius has a compiler extension that detects code in the form of

"name".static

Inside the quotes can be any string, and the static method call is
removed,
but everytime the code is run, the same String object is returned.
This is
highly useful when using strings as hash keys, and avoids having to
put them
in constants that must be looked up later.
That would break a lot of existing code, but it could be pragma-enabled.

Sorry if this ground has been covered before - it's hard to keep up with
ruby-talk :)

Regards,

Brian.

- Evan Phoenix
 
D

Daniel DeLorme

Joel said:
show_vmsize # 8712

# ruby apparently does not share storage for strings derived
# from the same symbol:

strs1 = (0..10_000).map do
sym.to_s
end

show_vmsize # 18488

# ruby does share storage for string ops:

strs2 = (0..10_000).map do
s[0..-1]
end

Hmm, we could use that property of strings...

class Symbol
alias _to_s to_s
def to_s
(@str || @str = _to_s)[0..-1]
end
end

Daniel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top