String copy-on-write question

Lars Christensen · May 5, 2008

Hello group,

Ruby implements copy-on-write for strings, so you can do stuff like
this very cheaply:

str = 0.chr * (2**24) # 16MiB allocated
str[100..-1] # this costs only a small amount of memory

How come this optimization does not apply in this case?:

str[100..-2] # this costs around 16MiB bytes of memory

As a side effect, if using regexps on a large string, the pre-match
and post-match variables behave differently:

s = 0.chr * (2**23) + "Hello" + 0.chr * (2**23) # About 16MiB
allocated (after GC)
s.scan(/Hello/) { |m| p m } # This is free
p $'.size # This is free
p $`.size # This costs another 8MiB.

Any insights?

Lars

ts · May 5, 2008

Lars Christensen wrote:

Well, it's best if you look at rb_str_substr() in string.c

str[100..-1] # this costs only a small amount of memory

ruby just need to adjust the pointer and the length in the new
object

str[100..-2] # this costs around 16MiB bytes of memory

one character is missing from the previous string, if it do the
same thing than previously then it must
* adjust the pointer
* adjust the length
* add \0 at the end

This mean that fatally it has modified the string, this is why it
duplicate.

p $'.size # This is free
p $`.size # This costs another 8MiB.

same reason here.

Guy Decoux

Robert Klemme · May 5, 2008

Lars Christensen wrote:

Well, it's best if you look at rb_str_substr() in string.c

str[100..-1] # this costs only a small amount of memory

Click to expand...

ruby just need to adjust the pointer and the length in the new
object

str[100..-2] # this costs around 16MiB bytes of memory

Click to expand...

one character is missing from the previous string, if it do the
same thing than previously then it must
* adjust the pointer
* adjust the length
* add \0 at the end

This mean that fatally it has modified the string, this is why it
duplicate.

p $'.size # This is free
p $`.size # This costs another 8MiB.

Click to expand...

same reason here.

Interesting. Do you also happen to know why not an additional field is
used that stores the length? Is the reason maybe usage of C library
string functions that work on zero terminated strings?

Cheers

robert

ts · May 5, 2008

Robert said:
Interesting. Do you also happen to know why not an additional field is
used that stores the length?

I've not understood : it has a field which give it the length of
the string, for example with

str = '0' * 200
str[100 .. -1]

the first object (in str) will have 200 for its length
the field length in the new object will have the value 100

Is the reason maybe usage of C library
string functions that work on zero terminated strings?

only matz know this

Guy Decoux

Robert Klemme · May 5, 2008

I've not understood : it has a field which give it the length of
the string, for example with

Ah, ok. This happens when one is too lazy to look into the source.

Somehow I had assumed that the length was not stored because you made
the point that the \0 could not be inserted without altering the
original. I concluded, there is no length.

str = '0' * 200
str[100 .. -1]

the first object (in str) will have 200 for its length
the field length in the new object will have the value 100

Is the reason maybe usage of C library
string functions that work on zero terminated strings?

Click to expand...

only matz know this

Well, maybe he'll stop by and enlighten us.

Kind regards

robert

Lexical Analysis on C++	1	Oct 31, 2023
Python code problem	2	Apr 23, 2023
Failed to write unicode string to excel	1	Sep 3, 2010
Ruby 1.9 bug, no copy-on-write for MatchData#post_match	0	Feb 6, 2008
Need help with this script	4	Mar 12, 2023
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
Snowing Effect	2	Apr 24, 2023
Help with code	0	Jun 12, 2022

String copy-on-write question

Lars Christensen

ts

Robert Klemme

ts

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads