[SUMMARY] Twisting a Rope (#137)

Discussion in 'Ruby' started by Ruby Quiz, Sep 6, 2007.

  1. Ruby Quiz

    Ruby Quiz Guest

    The discussion for this quiz was very interesting. It's probably worth your
    time to go back and read those messages, if you haven't already. Some points I
    got out of the discussion were:

    * It's tricky to get ropes right. Clever implementations may loose key
    rope qualities, like the O(1) concatenation time.
    * The functional approach, building immutable ropes, is probably
    superior considering how ropes are intended to be used.
    * Autorebalancing didn't hurt much, at least in the quiz test case.
    * Copy-On-Write is helpful, when it works.

    Many of these points came out due to Eric Mahurin's work in benchmarking the
    submitted solutions. Eric also submitted several variations of rope classes, a
    few of which I want to take a look at below. Let's begin with the class Eric
    uses to build the rope trees:

    module Mahurin
    class Rope
    include Enumerable
    # form a binary tree from two ropes (possibly sub-trees)
    def initialize(left,right)
    @left = left
    @right = right
    @llength = @left.length
    @length = @
    @depth = [left.depth, right.depth].max+1
    end
    # number of elements in this rope
    def length
    @length
    end
    # depth of the tree (to help keep the tree balanced)
    def depth
    @depth
    end
    # left rope (not needed when depth==0)
    def left
    @left
    end
    # right rope (not needed when depth==0)
    def right
    @right
    end

    # ...

    Here we see the setup and relevant attributes of Rope objects. First we have
    the fact that they are binary trees, with left and right subtrees. Next we see
    that Eric is going to track two lengths for Rope objects, both the total length
    and the length of just the left subtree. The reasoning for that will become
    apparent when we examine indexing. Finally, Eric tracks a depth, for use in
    rebalancing.

    There are really two major operations that are key to a Rope implementation:
    concatenation and indexing. Here's the concatenation side of the puzzle:

    # ...

    # appended rope (non-modifying)
    def +(other)
    # balance as an AVL tree
    balance = other.depth-@depth
    if balance>+1
    left = other.left
    right = other.right
    if left.depth>right.depth
    # rotate other to right before rotating self+other to left
    (self + left.left) + (left.right + right)
    else
    # rotate self+other to left
    (self + left) + right
    end
    elsif balance<-1
    if @right.depth>@left.depth
    # rotate self to left before rotating self+other to right
    (@left + @right.left) + (@right.right + other)
    else
    # rotate self+other to right
    @left + (@right + other)
    end
    else
    self.class.new(self, other)
    end
    end
    alias_method:)<<, :+)

    # ...

    This method is only this long because it automatically rebalances the tree as
    needed. In fact, if you glance down to the final else clause, you will see the
    trivial implementation, which is just to construct a new Rope from the current
    Rope and the concatenated element.

    The rebalancing done here is, as the comment suggests, a textbook AVL
    implementation. With an AVL tree, you subtract the left depth from the right
    depth to get a tree's balance factor. Anything in the range of -1 to 1 is a
    balanced tree. If the factor is outside of that range, one or two rotations are
    required to rebalance the tree.

    I'm not going to go into the specific rotations. If you would like to read up
    on them, I recommend:

    http://fortheloot.com/public/AVLTreeTutorial.rtf

    Let's move on to the indexing methods used to pull information back out of a
    Rope:

    # ...

    # slice of the rope
    def slice(start, len)
    return self if start.zero? and len==@length
    rstart = start-@llength
    return @right.slice(rstart, len) if rstart>=0
    llen = @llength-start
    rlen = len-llen
    if rlen>0
    @left.slice(start, llen) + @right.slice(0, rlen)
    else
    @left.slice(start, len)
    end
    end
    # element at a certain position in the rope
    def at(index)
    rindex = index-@llength
    if rindex<0
    @left.at(index)
    else
    @right.at(rindex)
    end
    end

    # ...

    Have a look at the at() method first, because it's easier to digest and it shows
    the concept of indexing this tree structure. Essentially, to index in the tree
    you check a position against the left length. If it's less than, index the left
    side. If it's greater than, index the right. This search tactic is the
    hallmark attribute of binary trees.

    The slice() method works the same way. It's just more complicated because it
    has to work with two indices instead of one. Finding the start index is the
    same strategy we saw in at(). If that index is in the right subtree, the end
    will be as well and the code makes the recursive hand-off without further
    checks. When it's in the left, the end must be located. If that end point
    turns out to also be in the left, the hand-off is made to the left side. When
    it is in the right, a partial slice() is made from both halves and combined.
    This covers all the cases.

    Eric added a couple more methods to the Rope class that cover iteration and
    converting to a String:

    # ...

    # iterate through the elements in the rope
    def each(&block)
    @left.each(&block)
    @right.each(&block)
    end
    # flatten the rope into a string (optionally starting with a prefix)
    def to_s(s="")
    @right.to_s(@left.to_s(s))
    end
    end

    # ...

    That covers the tree implementation. What we haven't seen yet though, are the
    leaf nodes. We need two types for the implementation I want to examine,
    EmptyRope and StringRope. Here's the first of those:

    # ...

    EmptyRope = Object.new
    class << EmptyRope
    include Enumerable
    def length
    0
    end
    def depth
    0
    end
    def +(other)
    other
    end
    alias_method:)<<, :+)
    def slice(start, len)
    self
    end
    def each
    end
    def to_s
    ""
    end
    end

    # ...

    This implementation is kind of a poor man's singleton instance (the design
    pattern, not the Ruby concept, though we see both here). There shouldn't be any
    surprises in this code. The attributes are zeroed, concatenation results in
    whatever the concatenated element is, and slice()ing just returns self which
    doubles as an empty String.

    That leaves just one last leaf, StringRope:

    # ...

    class StringRope
    include Enumerable
    def self.new(*args)
    if args.empty?
    EmptyRope
    else
    super
    end
    end
    def initialize(data)
    @data = data
    end
    def length
    @data.length
    end
    def depth
    0
    end

    # ...

    This class just wraps a String to support the Rope API we've been examining.
    About the only interesting trick here is the is that the default new() for this
    class is overridden to allow returning an EmptyRope as needed. Anytime the
    argument is provided though, this method does hand-off to the default new()
    implementation.

    Here's the concatenation method:

    # ...

    def +(other)
    balance = other.depth
    if balance>1
    left = other.left
    right = other.right
    if left.depth>right.depth
    # rotate other to right before rotating self+other to left
    (self + left.left) + (left.right + right)
    else
    # rotate self+other to left
    (self + left) + right
    end
    else
    Rope.new(self, other)
    end
    end
    alias_method:)<<, :+)

    # ...

    We see some more balance work here, but the algorithm is simplified since only
    the right side can be a subtree. Beyond that, we've seen this code before.

    Here are the final few methods:

    # ..

    def slice(start, len)
    return self if start.zero? and len==@data.length
    # depend on ruby's COW mechanism to just reference the slice data
    self.class.new(@data.slice(start, len))
    end
    def at(index)
    @data[index]
    end
    def each(&block)
    @data.each_char(&block)
    end
    def to_s(s="")
    s.concat(@data.to_s)
    end
    end
    end

    Note here that slice() was overridden to return a StringRope instead of a
    String. As the comment says, Ruby internally uses some Copy On Write semantics
    to reference sliced Strings. This should keep it from wildly duplicating the
    data, but it was found that a couple of solutions had problems with this for
    unknown reasons.

    That covers a basic Rope implementation. We won't bother to go into the
    destructive methods as you are probably better off working without them.

    My thanks to all who explored this newly popular data structure with us. It
    looks like there will be a talk on ropes at this year's Rubyconf, so hopefully
    we gave the speaker some extra material to work with.

    This week's Ruby Quiz will start a day early, to adjust for my Lone Star
    Rubyconf schedule this weekend. The no-spoiler period is still 48 hours and I
    will still summarize it at the same time, you just get an extra day to work on
    it. Stay tuned for that problem in just a few minutes...
    Ruby Quiz, Sep 6, 2007
    #1
    1. Advertising

  2. Ruby Quiz

    Eric Mahurin Guest

    On 9/6/07, Ruby Quiz <> wrote:
    > The discussion for this quiz was very interesting. It's probably worth your
    > time to go back and read those messages, if you haven't already. Some points I
    > got out of the discussion were:
    >
    > * It's tricky to get ropes right. Clever implementations may loose key
    > rope qualities, like the O(1) concatenation time.
    > * The functional approach, building immutable ropes, is probably
    > superior considering how ropes are intended to be used.
    > * Autorebalancing didn't hurt much, at least in the quiz test case.
    > * Copy-On-Write is helpful, when it works.
    >
    > Many of these points came out due to Eric Mahurin's work in benchmarking the
    > submitted solutions. Eric also submitted several variations of rope classes, a
    > few of which I want to take a look at below.


    I'm glad to see that you didn't let requests of the quiz get in the
    way. There were a bunch of things that I didn't do that the quiz
    asked for (in the classes you presented):

    * be able to append/prepend/shift mutable ropes (made immutable ropes instead)
    * allow ropes to operate directly with strings (required a separate
    StringRope proxy class instead)
    * provide a normalize method (used auto-balancing concatenation instead)
    * O(1) concatenation - the max and average number of rotations for my
    auto-balancing concatenation is O(log(n)). Slicing is also O(log(n)),
    so it isn't a big problem.

    Eric
    Eric Mahurin, Sep 6, 2007
    #2
    1. Advertising

  3. Ruby Quiz

    John Miller Guest

    Re: Twisting a Rope (#137)

    Hi All,

    I'm very sorry that this quiz ended up coming out over the labor day
    weekend and the start of my semester, I hardly had a chance to look at
    it all week, but I'm excited about what I've read. I wanted to add a
    couple of comments based on trying to use these thing in practice. The
    example was a good test case because it showed a large difference
    between strings and ropes. Real situations are not quite as neat and
    clean. In particular concatenating single characters to a string as it
    is build and removing them from the front as it is used are very common
    operations that the test case didn't measure. No one really addressed
    these two operations because there was not a test case to show them.

    James Gray wrote:
    > The rebalancing done here is, as the comment suggests, a textbook AVL
    > implementation. With an AVL tree, you subtract the left depth from the
    > right
    > depth to get a tree's balance factor. Anything in the range of -1 to 1
    > is a
    > balanced tree. If the factor is outside of that range, one or two
    > rotations are
    > required to rebalance the tree.


    This was the other interesting part of the quiz. Every answer I saw
    balanced the tree giving each leaf the same weight. The leaves on the
    other hand varied in length between 8 characters and 512k. The
    suggested way to balance a Rope was based on the length of the leaves in
    such a way that longer leaves were nearer the root because presumably
    they will be accessed more often.

    This is to take nothing away from the solutions that came out. They
    were very well written and I am most humbled by them. More this is an
    address to the comment about "Where is the Gem?" If and when Ropes do
    become a general use library, these are things that I believe need to be
    addressed. I also am looking forward to hearing about the presentation
    at RubyConf.

    Thanks to everybody who worked on this. I wish I had more time to join
    in.

    John Miller
    --
    Posted via http://www.ruby-forum.com/.
    John Miller, Sep 6, 2007
    #3
  4. Re: Twisting a Rope (#137)

    "John Miller" <> wrote in message
    news:...
    > Hi All,
    >
    > This was the other interesting part of the quiz. Every answer I saw
    > balanced the tree giving each leaf the same weight. The leaves on the
    > other hand varied in length between 8 characters and 512k. The
    > suggested way to balance a Rope was based on the length of the leaves in
    > such a way that longer leaves were nearer the root because presumably
    > they will be accessed more often.
    >


    At least two implementations did this (James Koppel's and mine). OTOH this
    added a lot of mess to the code, and a simple change on e.g. Eric's
    implementation (to use length instead of depth) will achieve practically the
    same.

    --EK
    Eugene Kalenkovich, Sep 7, 2007
    #4
  5. Ruby Quiz

    Eric Mahurin Guest

    Re: Twisting a Rope (#137)

    On 9/7/07, Eugene Kalenkovich <> wrote:
    > "John Miller" <> wrote in message
    > news:...
    > > Hi All,
    > >
    > > This was the other interesting part of the quiz. Every answer I saw
    > > balanced the tree giving each leaf the same weight. The leaves on the
    > > other hand varied in length between 8 characters and 512k. The
    > > suggested way to balance a Rope was based on the length of the leaves in
    > > such a way that longer leaves were nearer the root because presumably
    > > they will be accessed more often.
    > >

    >
    > At least two implementations did this (James Koppel's and mine). OTOH this
    > added a lot of mess to the code, and a simple change on e.g. Eric's
    > implementation (to use length instead of depth) will achieve practically the
    > same.
    >
    > --EK


    I'd think you'd treat depth as log2(length) and do something similar.
    Instead of this for concatenation:

    balance = other.depth-@depth
    if balance>+1
    ...
    elsif balance<-1
    ...

    You could do something like this (using depth=log2(length) and a little math):

    if other.length>2*@length
    ...
    elsif @length>2*other.length
    ...

    Maybe you'd also need some info about the smallest, largest, leftmost,
    and/or rightmost leaf lengths so that you aren't trying to balance
    something that can't be. I haven't thought this all the way through.

    I think this is a good idea. It increases the max run-time for
    slicing (but still O(log(n)), but hopefully reduces the average.

    On the other hand, I don't know that it is a good assumption that
    you'll be accessing longer leaves more often than shorter leaves.
    This assumes random access of the elements in the rope. For example,
    if you are using a rope for a text editor buffer, there will be a
    you'll only be accessing a small part of the rope you'll be accessing
    while typing stuff in. And at that point, you'll probably be dealing
    with short leaves since things are changing around where you are
    editing. You may want the most recently used closer to the root of
    the tree.
    Eric Mahurin, Sep 7, 2007
    #5
  6. Re: Twisting a Rope (#137)

    "Eric Mahurin" <> wrote in message
    news:...
    > On 9/7/07, Eugene Kalenkovich <> wrote:
    >> "John Miller" <> wrote in message
    >> news:...
    >> > Hi All,
    >> >
    >> > This was the other interesting part of the quiz. Every answer I saw
    >> > balanced the tree giving each leaf the same weight. The leaves on the
    >> > other hand varied in length between 8 characters and 512k. The
    >> > suggested way to balance a Rope was based on the length of the leaves
    >> > in
    >> > such a way that longer leaves were nearer the root because presumably
    >> > they will be accessed more often.
    >> >

    >>
    >> At least two implementations did this (James Koppel's and mine). OTOH
    >> this
    >> added a lot of mess to the code, and a simple change on e.g. Eric's
    >> implementation (to use length instead of depth) will achieve practically
    >> the
    >> same.
    >>
    >> --EK

    >
    > I'd think you'd treat depth as log2(length) and do something similar.
    > Instead of this for concatenation:
    >
    > balance = other.depth-@depth
    > if balance>+1
    > ...
    > elsif balance<-1
    > ...
    >
    > You could do something like this (using depth=log2(length) and a little
    > math):
    >
    > if other.length>2*@length
    > ...
    > elsif @length>2*other.length
    > ...



    In reality all in-time short-cut rebalancings will perform worse than the
    full one. For now I do not see any algorithm better than in original
    article, though it is also not ideal (but good on memory usage)

    >
    > On the other hand, I don't know that it is a good assumption that
    > you'll be accessing longer leaves more often than shorter leaves.
    > This assumes random access of the elements in the rope. For example,
    > if you are using a rope for a text editor buffer, there will be a
    > you'll only be accessing a small part of the rope you'll be accessing
    > while typing stuff in. And at that point, you'll probably be dealing
    > with short leaves since things are changing around where you are
    > editing. You may want the most recently used closer to the root of
    > the tree.
    >


    Everything is about probability of access to particular symbol. You can
    optimize for particular scenario (in this one - you do not want any
    rebalancing at all), but these optimizations will behave miserably in any
    other cases

    --EK
    Eugene Kalenkovich, Sep 7, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Manish Hatwalne

    Understanding error - Java returned: 137

    Manish Hatwalne, Sep 3, 2004, in forum: Java
    Replies:
    1
    Views:
    7,257
    Thomas Fritsch
    Sep 3, 2004
  2. Rajesh.Rapaka

    8+8 = 137 ??

    Rajesh.Rapaka, Aug 10, 2005, in forum: Java
    Replies:
    14
    Views:
    942
    Chris Uppal
    Aug 15, 2005
  3. =?ISO-8859-1?Q?S=F6ren?=

    sgi rope experiences?

    =?ISO-8859-1?Q?S=F6ren?=, Jul 29, 2003, in forum: C++
    Replies:
    0
    Views:
    572
    =?ISO-8859-1?Q?S=F6ren?=
    Jul 29, 2003
  4. Ruby Quiz

    [QUIZ] Twisting a Rope (#137)

    Ruby Quiz, Aug 31, 2007, in forum: Ruby
    Replies:
    29
    Views:
    424
    Eugene Kalenkovich
    Sep 7, 2007
  5. Braulio Lima

    Getting machine name by port 137

    Braulio Lima, Jan 18, 2011, in forum: Ruby
    Replies:
    0
    Views:
    81
    Braulio Lima
    Jan 18, 2011
Loading...

Share This Page