Yukihiro Matsumoto said:
In message "Memory behavior of String#dup"
|does String#dup also copy the byte sequence of the string or does it only
|copy a reference and does a copy on write?
When memory is already shared between strings, it does copy-on-write,
otherwise it copies. From my observation, many of duped strings are
modified right after the dup, so that I felt it is wise to avoid
making new internal copy-on-write entries for duping.
s1 = "foo"
s2 = s2.dup
So, if I understand you correctly s1 and s2 don't share the same byte
sequence since s1 is the only string referring tho the sequence "foo" when
the dup occurs (i.e. the sequence is not shared). Is that correct?
The question why I'm asking is, that for hashes where an entry shares the
key (either directly because it is the same string in h[s1]=s1 or
indirectly because the value is an instance that refers the key
indirectly) there would be enourmous memory consumption if all those
dup'ed hash key strings did also contain a copy of the byte sequence. The
problem I have with this duping is that I can't prevent it. So there's at
least the overhead of a new created String instance, because apparently (v
1.7.3) the Hash doesn't honor the freeze state of the string.
If that change has not been incorporated I suggest doing the dup only if a
string is not frozen. Otherwise the user has no chance to avoid the dup
for strings.
Regards
robert
h = Hash.new
s1 = "key 1"
s2 = "key 2"
s2.freeze
h[s1]=s1
h[s2]=s2
h.each do |k,v|
puts "#{k}=>#{v}"
puts "#{k.id}=>#{v.id}"
case k
when s1
p [k.equal?( s1 ), v.equal?( s1 )]
when s2
p [k.equal?( s2 ), v.equal?( s2 )]
end
end
yields
key 1=>key 1
22381332=>22394808
[false, true]
key 2=>key 2
22376868=>22390356
[false, true]