[QUIZ] Twisting a Rope (#137)

E

Eric Mahurin

Eric, may I ask you to test one more thing: your #slice does not have any
input checks. To make a fair comparison, could you please comment first 3
lines of my #slice,

Sure. I also did the same to Gustav Munkby's code. Here are the new results:

CPU(user+sys,sec) mem(peak,MB)
------------------- ------------
build sort total build sort class
----- ---- ----- ----- ---- -----
0.10 1.70 1.80 287 1327 String
0.01 0.27 0.28 22 153 Porth::Rope
0.02 0.83 0.85 22 34 Choudhury::Rope (failed)
0.02 0.06 0.08 22 29 Mahurin::StringRope (immutable)
0.00 0.08 0.08 22 30 Mahurin::DenormalStringRope (immutable)
0.02 0.14 0.16 22 29 Mahurin::MutableStringRope
0.07 0.10 0.17 151 151 Munkby::ArrayRope
0.00 0.09 0.09 22 22 Munkby::ArrayRope (no dup)
0.00 0.07 0.07 22 22 Munkby::ArrayRope (no dup, simple slice)
0.01 0.12 0.13 22 22 Kalenkovich::Rope
0.01 0.07 0.08 22 22 Kalenkovich::Rope (simple slice)


Munkby::ArrayRope is marginally the fastest. Upon closer inspection
of the code, there is a pretty significant problem - concatenation is
O(n) where n is the number of segments in the right rope of the
concatenation. A normal rope has O(1) concatenations and a
self-balancing one (Mahurin::StringRope) has O(log(n)). The reason
this isn't a problem in this benchmark is that qsort iterates over the
segments (n>1) for the right rope of a concat right before the concat.
So the big-O performance is the same for the qsort - O(n*log(n)).
Munkby::ArrayRope may not look so good on other benchmarks.

I'm surprised that the dup version of Munkby::ArrayRope takes so much
memory. I don't see anything that the benchmark uses that would
modify the strings, so it looks like the COW in string.c doesn't work.
I added dup on the input strings to Mahurin::StringRope, and it made
no difference in run-time or memory.
 
C

Carl Porth

Ok, here's my last submission (I hope). I modified slice to return a
Rope for a speed increase and cleaned up a bit.

I never explained my design: I just have the Rope instances represent
string concatenations such that only leaf nodes are strings. This
way, I can just recurse left and right without doing any type
checking, resulting in fairly clean code.

Oh, and I cheated to get full String functionality (but not completely
tested).

http://pastie.caboo.se/94118

Eric, can I get a redo on those benchmarks? (I've always depended on
the kindness of strangers.) Thanks!

Carl
 
E

Eric Mahurin

http://pastie.caboo.se/94118

Eric, can I get a redo on those benchmarks? (I've always depended on
the kindness of strangers.) Thanks!

Here is the latest:

CPU(user+sys,sec) mem(peak,MB)
------------------- ------------
build sort total build sort class
----- ---- ----- ----- ---- -----
0.10 1.70 1.80 287 1327 String
0.00 0.27 0.27 22 153 Porth::Rope (no dup)
0.00 0.19 0.19 22 22 Porth::Rope (no dup)
0.02 0.83 0.85 22 34 Choudhury::Rope (failed)
0.02 0.06 0.08 22 29 Mahurin::StringRope (immutable)
0.00 0.08 0.08 22 30 Mahurin::DenormalStringRope (immutable)
0.02 0.14 0.16 22 29 Mahurin::MutableStringRope (uses immutables)
0.07 0.10 0.17 151 151 Munkby::ArrayRope
0.00 0.09 0.09 22 22 Munkby::ArrayRope (no dup)
0.00 0.07 0.07 22 22 Munkby::ArrayRope (no dup, simple slice)
0.01 0.12 0.13 22 22 Kalenkovich::Rope (no dup)
0.01 0.07 0.08 22 22 Kalenkovich::Rope (no dup, simple slice)

With the exception of Munkby::ArrayRope (which has an O(n)
concatenation problem), I don't trust any of the mutable "no dup"
implementations. This will be a problem:

rope1 = ...
rope2 = ...
rope3 = ...
rope4 = RopeClass.new(rope1, rope2)
rope1 << rope2
rope2 << rope3

Without dups of the inputs (in initialize and #<<), rope4 and rope2
will be screwed up.

I only deal with immutable ropes, so dupping the ropes isn't
necessary. The only concern you might have would be with the leaf
strings. I added dups at that level and it didn't affect run-time or
memory. You could also just make a requirement - incoming strings
should not be modified (externally dup if necessary).

I added dups to Kalenkovich's code and saw the same 150+MB usage as
Munkby's dup code. I still don't understand this.
 
E

Eugene Kalenkovich

Eric Mahurin said:
I added dups to Kalenkovich's code and saw the same 150+MB usage as
Munkby's dup code. I still don't understand this.

I'm not sure about Gustav's code, but for my one memory in duping option is
eaten by normalization, that do not really need any duping at all.

--EK
 
E

Eugene Kalenkovich

Eugene Kalenkovich said:
I'm not sure about Gustav's code, but for my one memory in duping option
is eaten by normalization, that do not really need any duping at all.

--EK

I've modified my code to avoid duping in normalization - at a cost of
ugliness :) (too ugly for quiz submission, and anyway, too late :)
You can see that memory is fine now.


class NilClass
def length; 0; end
end

class String
def shift
return nil if empty?
res=self[0]
self[0]=""
res
end
end

class Rope
attr_reader :length, :left, :right

def initialize(left=nil,right=nil,dup=true)
@left = left ? left.kind_of?(Rope)&&dup ? left.dup : left : nil
@right = right ? right.kind_of?(Rope)&&dup ? right.dup : right : nil
@length=left.length+right.length
end

def append(what, dup=true)
len=what.length
if (len>0)
@left=self.dup if @right.length>0
@right=what.kind_of?(Rope)&&dup ? what.dup : what
@length+=len
end
self
end

alias << append

def prepend(what, dup=true)
len=what.length
if (len>0)
@right=self.dup if @left.length>0
@left=what.kind_of?(Rope)&&dup ? what.dup : what
@length+=len
end
self
end

def to_s
@left.to_s + @right.to_s
end

def [](i)
return i.match(self.to_s)[0] if i.kind_of? Regexp
if i.kind_of? Range
pos,last=i.first,i.last
pos = @length+pos if pos<0
last = @length+last if last<0
return nil if pos<0 || last<0
return slice(pos,last-pos+1)
end
i = @length+i if i<0
return nil if i<0 || i>@length-1
llen = @left.length
i<llen ? @left : @right[i-llen]
end

def []=(i,val)
#fixnum only
i = @length+i if i<0
""=0 if i<0 || i> @length-1
@length+=val.length-1
llen = @left.length
i<llen ? @left=val : @right[i-llen]=val
end

def slice(pos,len)
return pos.match(self.to_s)[len] if pos.kind_of? Regexp
pos = @length+pos if pos<0
return nil if pos<0 || len<0 || pos>@length-1
llen = @left.length
return @left.slice(pos,len) if pos+len<=llen || ! @right
return @right.slice(pos-llen, len) if pos>=llen
Rope.new(@left.slice(pos,llen-pos),@right.slice(0,len+pos-llen))
end

def shift
return nil if @length==0
@length-=1
res = @left.length>0 ? @left.shift : @right.shift
@left=nil if @left.length==0
@right=nil if @right.length==0
res
end

def normalize
r=Rebalancer.new(@length)
self.traverse { |str| r.append(str) }
@left, @right=r.get_ropes
self
end

def traverse(&blck)
@left.kind_of?(String) ? yield( @left) : @left.traverse(&blck) if @left
@right.kind_of?(String) ? yield( @right) : @right.traverse(&blck) if
@right
end

end

class Rebalancer
def initialize len
@limits=[1,2]
@slots=[]
n=2
@limits<< n = @limits[-2] + @limits[-1] while n<len
end

def append str
@slots[0] = @slots[0] ? Rope.new( @slots[0],str, false) : str
i=0
while @slots.length>@limits
@slots[i+1] = @slots[i+1] ? Rope.new( @slots[i+1],@slots,false) :
@slots
@slots = nil
i+=1
end
end

def get_ropes
@slots.compact!
(@slots.length-1).times { |i|
@slots[i+1]=@slots[i+1] ? Rope.new(@slots[i+1],@slots,false) :
@slots
@slots=nil
i+=1
}
[@slots[-1].left,@slots[-1].right]
end
end
 
G

Gustav Munkby

Munkby::ArrayRope is marginally the fastest. Upon closer inspection
of the code, there is a pretty significant problem - concatenation is
O(n) where n is the number of segments in the right rope of the
concatenation. A normal rope has O(1) concatenations and a
self-balancing one (Mahurin::StringRope) has O(log(n)). The reason
this isn't a problem in this benchmark is that qsort iterates over the
segments (n>1) for the right rope of a concat right before the concat.
So the big-O performance is the same for the qsort - O(n*log(n)).
Munkby::ArrayRope may not look so good on other benchmarks.

Whether to use a binary tree or an array is a classical decision, and one
of the reasons that I implemented my rope using an array was because
I suspected that it would not matter much for the benchmark.
I'm surprised that the dup version of Munkby::ArrayRope takes so much
memory. I don't see anything that the benchmark uses that would
modify the strings, so it looks like the COW in string.c doesn't work.

The perhaps surprising, but oh so obvious solution here is that dup doesn't
(always) do sharing. This was somewhat explained here:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/74742

I've found out, that one can replace 's.dup' with 's[0..-1]' to enforce sharing.
I've tried this and memory usage went down, without sacrificing the 'safety'.
So apparently, the rule is; use dup if you want to modify the duplicate, or
s[0..-1] if you want to protect the duplicate from changes to the original. =)
I added dup on the input strings to Mahurin::StringRope, and it made
no difference in run-time or memory.

This is easily explained by the fact that Mahurin::StringRope treats strings
and ropes differently, and thereby only dups on first insertion, whereas my
solution dups all the time.

!g
 
A

Ari Brown

Awesome quiz!

Sorry, I have school, but am veeeeeery close to getting this done
with! Need to debug my slice method.

If you all want to help me fix it **hint hint**, make it faster, send
hate mail, etc. the code is here:
http://pastie.caboo.se/94560

BTW, some of my code (just little lines and ideas) were.....
borrowed..... from Gustav Munkby. Sorry!

I swear it's coming in today!
Ari
--------------------------------------------|
If you're not living on the edge,
then you're just wasting space.
 
A

Ari Brown

--Apple-Mail-1-357442536
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
delsp=yes;
format=flowed

Nyah!!!!

Here it is. I tried to keep this sort of normalish, and each leaf(?)
has a base limit of 15 characters. This makes searching for a
specific character at a lengthy location pretty quick (i hope),
however, my slice method uses a very unorthodox method, and I am
BEGGING for help in making it better. I want to start using my Rope
implementation regularly (for kicks), but am also hoping it can be
fast. I envy Mahurin's speed.


--Apple-Mail-1-357442536
Content-Transfer-Encoding: 7bit
Content-Type: text/x-ruby-script;
x-unix-mode=0644;
name=rope.rb
Content-Disposition: attachment;
filename=rope.rb

#!/usr/local/bin/ruby
#
# Created by me on 2007-09-05.
# Copyright (c) 2007. All pwnage reserved.

class String
alias_method:)old_plus, :+)
def +(other)
self + other.to_s if other.is_a? Rope
self.old_plus(other)
end

def to_rope
Rope.new(self)
end
alias_method:)to_r, :to_rope)
end

class Rope
attr_reader :strands
attr_reader :length
attr_reader :depth
attr_reader :segments

def initialize(strand, chunk_size=15)
CHUNK_SIZE = chunk_size
raise "What are you doing?!" if strands.nil?
strand = strand.to_s if strands.is_a? Integer
@strands = check_size(strand)
@depth = 1
@segments = 0
calc_info
end

def update(&b)
instance_eval(b)
calc_info
conform
end

def calc_info
@strands.each { |s| @length += s.length }
@strands.each { |s| @depth += s.depth if s.is_a? Rope}
@strands.each { |s| @segments += 1}
end

def check_size(strand)
return if strand.nil?
if strand.size > CHUNK_SIZE
start, finish = 0, (-CHUNK_SIZE + 1)
strands = []
until strand[start] == nil
strands << strand.slice(start, CHUNK_SIZE)
start = finish
finish += CHUNK_SIZE
end
puts strands
return strands
end
return [strand]
end

def conform
@strands = check_size(to_s)
end

def append(*strings)
update do
raise "What are you doing?!" if strings.nil?
strings.each { |s| @strands << s}
end
end

alias_method:)<<, :append)

def prepend(*strings)
update do
old_strands = @strands
strings.each { |s|
s = s.to_s if s.is_a? Integer
@strands = s
}
@strands << old_strands
end
end

def flatten
old_strands, @strands = @strands, []
update do
@strands.each do |s|
if s.is_a? Rope
s.strands.map { |s| @strands << s }
elsif not s.nil?
@strands << s
end
end
end
self
end

def +(other)
update do
if other.is_a? Rope
@strands += other.strands
elsif not other.nil?
@strands += other.to_s
end
end
end

alias_method:)add, :+)
alias_method:)push, :+)

def to_s
case @strands
when 0: ""
else
return (@strands * "")
end
end

def slice(here, length=nil)
begin
return nil if here < 0 || here > self.length
return to_s.slice(start, length||0) if here.is_a? Regexp
location = here%CHUNK_SIZE
car = here/CHUNK_SIZE
left = start
ret = ""
if here < 0 || here > self.length
nil
elsif !length
@segments.detect { |s| (start -= s.length) < 0 }.slice(start)
elsif true
while @strands[car]
thing = @strands[car][0..-1]
thing.each_byte { |l|
ret << l.chr
left -= 1
return ret if left == 0
}
car += 1 if left != 0
end
end
rescue
to_s.slice(here, length)
end
end

def [](here, length)
if !length && start.is_a? Range
length = here.last - here.first - (exclude_end? ? 1 : 0)
here = here.first
end
self.slice(here, length)
self.slice(here, length)
end

def at(here)
location = here%CHUNK_SIZE
car = here/CHUNK_SIZE
train_car = @strands[car]
return train_car[location].chr
end

def method_missing(m, *args, &block)
if "".responds_to?(m) && m =~ /^string_(.+)/
ret = to_s.__send__($1, *args, &block)
if ret.is_a? String
ret = Rope.new(ret)
end
end
ret
end
end
--Apple-Mail-1-357442536
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
delsp=yes;
format=flowed


---------------------------------------------------------------|
~Ari
"I don't suffer from insanity. I enjoy every minute of it" --1337est
man alive




--Apple-Mail-1-357442536--
 
E

Eric Mahurin

I've modified my code to avoid duping in normalization - at a cost of
ugliness :) (too ugly for quiz submission, and anyway, too late :)

Here is another issue that is akin to the self-assignment problem in C++:

r1 = Rope.new("A")
r2 = Rope.new("B")
r1<<r2
puts r1
r1<<r1
puts r1

When you are modifying self and an operand for that modification can
be self, you have to be careful about the order in which you do
things. I imagine some of the other implementations have this
problem.

BTW, I sped up my code by another 20% by simply using
attr_reader:)left,:right,:length,:depth) instead of defining the
methods explicitly. I always thought these attr* things were just a
convenient way to define getter and setter methods. Now, I'll be
using them more since there is a significant run-time benefit.
 
E

Eugene Kalenkovich

Eric Mahurin said:
r1 = Rope.new("A")
r2 = Rope.new("B")
r1<<r2
puts r1
r1<<r1
puts r1

When you are modifying self and an operand for that modification can
be self, you have to be careful about the order in which you do
things. I imagine some of the other implementations have this
problem.

Good catch, will need to fix it.
As a side note, I do not think that having ropes immutable is an acceptable
requirement.
Ultimate goal is to have a Rope behaving exactly as a String...

--EK
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top