Memory behavior of String#dup

R

Robert Klemme

Hi all,

does String#dup also copy the byte sequence of the string or does it only
copy a reference and does a copy on write?

robert
 
Y

Yukihiro Matsumoto

Hi,

In message "Memory behavior of String#dup"

|does String#dup also copy the byte sequence of the string or does it only
|copy a reference and does a copy on write?

When memory is already shared between strings, it does copy-on-write,
otherwise it copies. From my observation, many of duped strings are
modified right after the dup, so that I felt it is wise to avoid
making new internal copy-on-write entries for duping.

matz.
 
R

Robert Klemme

Yukihiro Matsumoto said:
In message "Memory behavior of String#dup"

|does String#dup also copy the byte sequence of the string or does it only
|copy a reference and does a copy on write?

When memory is already shared between strings, it does copy-on-write,
otherwise it copies. From my observation, many of duped strings are
modified right after the dup, so that I felt it is wise to avoid
making new internal copy-on-write entries for duping.

s1 = "foo"
s2 = s2.dup

So, if I understand you correctly s1 and s2 don't share the same byte
sequence since s1 is the only string referring tho the sequence "foo" when
the dup occurs (i.e. the sequence is not shared). Is that correct?

The question why I'm asking is, that for hashes where an entry shares the
key (either directly because it is the same string in h[s1]=s1 or
indirectly because the value is an instance that refers the key
indirectly) there would be enourmous memory consumption if all those
dup'ed hash key strings did also contain a copy of the byte sequence. The
problem I have with this duping is that I can't prevent it. So there's at
least the overhead of a new created String instance, because apparently (v
1.7.3) the Hash doesn't honor the freeze state of the string.

If that change has not been incorporated I suggest doing the dup only if a
string is not frozen. Otherwise the user has no chance to avoid the dup
for strings.

Regards

robert


h = Hash.new

s1 = "key 1"
s2 = "key 2"
s2.freeze

h[s1]=s1
h[s2]=s2

h.each do |k,v|
puts "#{k}=>#{v}"
puts "#{k.id}=>#{v.id}"
case k
when s1
p [k.equal?( s1 ), v.equal?( s1 )]
when s2
p [k.equal?( s2 ), v.equal?( s2 )]
end
end

yields

key 1=>key 1
22381332=>22394808
[false, true]
key 2=>key 2
22376868=>22390356
[false, true]
 
Y

Yukihiro Matsumoto

Hi,

In message "Re: Memory behavior of String#dup"

|Though I still worry about the overhead of one more ruby instance (there
|must be some bookkeeping done etc.). Is this neglectible?

I guess so. It's only 20 bytes per object on 32 bit CPU.

matz.
 
Y

Yukihiro Matsumoto

Hi,

In message "Re: Memory behavior of String#dup"

|Hm, that amounts to 2 million bytes for 100000 instances - which is not to
|much IMHO. Plus, there will be some overheads for object lookups I guess.
|
|I'd like to propose the change to not dup frozen strings as Hash keys.
|Should I enter an RCR? Do we discuss this here?

Early optimization is the source of all evil. ;-)

Putting joke aside, frozen key string is very useful for finding
bugs. So I think optimization should be done differently.

matz.
 
R

Robert Klemme

Yukihiro Matsumoto said:
Hi,

In message "Re: Memory behavior of String#dup"

|Putting joke aside, frozen key string is very useful for finding
|bugs. So I think optimization should be done differently.

You lost me here. Maybe I wasn't clear enough and we have a
misunderstanding. I meant - quite informally:

class Hash
def []=(key, val)
if key.kind_of? String && !key.frozen?
key = key.dup
key.freeze
end

# now insert key and value
end
end
Your suggestion inspired me a new dup-freeze optimization. It'll be
available soon on the CVS. Thank you.

You're welcome! Do you mean a specialized dup method that returns self if
frozen like

class Object
def dupFreeze
frozen? ? self : dup
end
end

Kind regards

robert
 
Y

Yukihiro Matsumoto

Hi,

In message "Re: Memory behavior of String#dup"

|> Your suggestion inspired me a new dup-freeze optimization. It'll be
|> available soon on the CVS. Thank you.
|
|You're welcome! Do you mean a specialized dup method that returns self if
|frozen like

<snip>

Yes. Also this specialized dup returns hidden shared string without
making copy if it is available.

matz.
 
R

Robert Klemme

Yukihiro Matsumoto said:
Hi,

In message "Re: Memory behavior of String#dup"

|> Your suggestion inspired me a new dup-freeze optimization. It'll be
|> available soon on the CVS. Thank you.
|
|You're welcome! Do you mean a specialized dup method that returns self if
|frozen like

<snip>

Yes. Also this specialized dup returns hidden shared string without
making copy if it is available.

Sounds great! Thanks a lot!

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top