String copy-on-write question

  • Thread starter Lars Christensen
  • Start date
L

Lars Christensen

Hello group,

Ruby implements copy-on-write for strings, so you can do stuff like
this very cheaply:

str = 0.chr * (2**24) # 16MiB allocated
str[100..-1] # this costs only a small amount of memory

How come this optimization does not apply in this case?:

str[100..-2] # this costs around 16MiB bytes of memory

As a side effect, if using regexps on a large string, the pre-match
and post-match variables behave differently:

s = 0.chr * (2**23) + "Hello" + 0.chr * (2**23) # About 16MiB
allocated (after GC)
s.scan(/Hello/) { |m| p m } # This is free
p $'.size # This is free
p $`.size # This costs another 8MiB.

Any insights?

Lars
 
T

ts

Lars Christensen wrote:

Well, it's best if you look at rb_str_substr() in string.c
str[100..-1] # this costs only a small amount of memory

ruby just need to adjust the pointer and the length in the new
object
str[100..-2] # this costs around 16MiB bytes of memory

one character is missing from the previous string, if it do the
same thing than previously then it must
* adjust the pointer
* adjust the length
* add \0 at the end

This mean that fatally it has modified the string, this is why it
duplicate.
p $'.size # This is free
p $`.size # This costs another 8MiB.

same reason here.


Guy Decoux
 
R

Robert Klemme

Lars Christensen wrote:

Well, it's best if you look at rb_str_substr() in string.c
str[100..-1] # this costs only a small amount of memory

ruby just need to adjust the pointer and the length in the new
object
str[100..-2] # this costs around 16MiB bytes of memory

one character is missing from the previous string, if it do the
same thing than previously then it must
* adjust the pointer
* adjust the length
* add \0 at the end

This mean that fatally it has modified the string, this is why it
duplicate.
p $'.size # This is free
p $`.size # This costs another 8MiB.

same reason here.

Interesting. Do you also happen to know why not an additional field is
used that stores the length? Is the reason maybe usage of C library
string functions that work on zero terminated strings?

Cheers

robert
 
T

ts

Robert said:
Interesting. Do you also happen to know why not an additional field is
used that stores the length?

I've not understood : it has a field which give it the length of
the string, for example with

str = '0' * 200
str[100 .. -1]

the first object (in str) will have 200 for its length
the field length in the new object will have the value 100
Is the reason maybe usage of C library
string functions that work on zero terminated strings?

only matz know this :)


Guy Decoux
 
R

Robert Klemme

I've not understood : it has a field which give it the length of
the string, for example with

Ah, ok. This happens when one is too lazy to look into the source. :)
Somehow I had assumed that the length was not stored because you made
the point that the \0 could not be inserted without altering the
original. I concluded, there is no length. :)
str = '0' * 200
str[100 .. -1]

the first object (in str) will have 200 for its length
the field length in the new object will have the value 100
Is the reason maybe usage of C library
string functions that work on zero terminated strings?

only matz know this :)

Well, maybe he'll stop by and enlighten us.

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,062
Latest member
OrderKetozenseACV

Latest Threads

Top