[SUMMARY] Editing Text (#145)

R

Ruby Quiz

Implementing an efficient data structure for this problem can be tricky. A
couple of the solutions turned out to have a higher complexity than it first
appeared. That doesn't mean they aren't valuable to examine though.

Let's have a look at Holger's code below. It's an implementation of the
quiz-recommended gap buffer data structure. It's a very clean implementation of
the idea that helped me understand what a gap buffer really is. Here's the
constructor:

class GapBuffer
def initialize(data="", i=0)
@data = data
@gap_start = i
@gap_len = 0
@GAP = " "*64
end

# ...

The actual content of our text editing session will be kept in @data. However,
that data may contain some extra material called a "gap," which will begin at
the index @gap_start and be @gap_len characters long. @GAP is just a constant
used to insert new gaps when that last one is exhausted.

How this interacts in usage is well shown by the insert methods:

# ...

def insert_before(ch)
if @gap_len.zero?
@data[@gap_start, 0] = @GAP
@gap_len = @GAP.length
end
@data[@gap_start] = ch
@gap_start += 1
@gap_len -= 1
end

def insert_after(ch)
if @gap_len.zero?
@data[@gap_start, 0] = @GAP
@gap_len = @GAP.length
end
@data[@gap_start+@gap_len-1] = ch
@gap_len -= 1
end

# ...

The idea for both of these methods is simple. To add content, we place it in
either end of the gap, shrinking the space available there. To insert a
character before the caret we must add the character to the beginning of the
gap, bump the gap start index beyond the new character, and decrease our count
of available space in the gap. Adding after the caret just means placing the
character at the end and decreasing the length counter.

The if statement at the beginning of both of these methods watches for the gap
to be exhausted. When we run out of space, a new gap is inserted and our length
counter is reset.

Deleting characters is just the opposite operation:

# ...

def delete_before
return if @gap_start.zero?
@gap_start -= 1
@gap_len += 1
@data[@gap_start]
end

def delete_after
return if @gap_start+@gap_len >= @data.length
@gap_len += 1
@data[@gap_start+@gap_len-1]
end

# ...

To remove something all we really have to do is to manipulate our index and/or
length to put the discarded character back into the gap. Once there, later
operations can overwrite it normally.

Note that the last lines of these methods return the "deleted" character and the
first lines skip the operations when there are no characters beyond the gap.

Now the harder operations of a gap buffer are moving the caret. We'll start
with left() and right(), which are the easier moves:

# ...

def left
return if @gap_start.zero?
@data[@gap_start+@gap_len-1] = @data[@gap_start-1]
@gap_start -= 1
@data[@gap_start]
end

def right
return if @gap_start+@gap_len>[email protected]
@data[@gap_start] = @data[@gap_start + @gap_len]
@gap_start += 1
@data[@gap_start - 1]
end

# ...

Moving the caret to either side means transferring a character from one end of
the buffer to the other. Then we can just expand the gap to include the moved
character's previous location, so future editing operations will swallow in up.
For example, given the following data:

The gap buffer[ ], or our caret, is shown here with brackets.

A move to the left() is gives us:

The gap buffe[r ]r, or our caret, is shown here with brackets.

The moved character is on the right of the buffer while the original is inside.
Characters inside the buffer don't really exist in the content so we don't have
any duplication here.

Now for the hard moves:

# ...

def up
col = column
cursor = @gap_start-col
return if cursor.zero?
cursor_line = @data.rindex(?\n, cursor-2)
cursor_line = 0 if cursor_line.nil?
cursor_line += col+1
if cursor_line > cursor-1
cursor_line = cursor-1
end
left while @gap_start > cursor_line
true
end

def down
col = column
cursor = @data.index(?\n, @gap_start + @gap_len)
return if cursor.nil?
cursor_line = cursor+1+col
cursor = @data.index(?\n, cursor+1)
cursor = @data.length if cursor.nil?
cursor_line = cursor if cursor_line > cursor
right while @gap_start + @gap_len < cursor_line
true
end

# ...

While this looks like a lot of code, the procedure is simple.

The column() helper method, which we will see shortly, figures out where we are
in the current line. For the down() method, that's really all we need to mark
our current location. When moving up() though, we want to reach the line before
ours, or somewhere between one and two newlines back. To make that easier, up()
first backs up by the column count to start the search from the beginning of the
line.

Then we have a terrific use index()/rindex() with the optional minimum index
parameter to find the next newline past the end of the gap buffer. When that
search fails, the caret is moved to the beginning or end of the data which must
be the last line we needed to cross. Having found the desired line, the current
column is added on, and trimmed to match the data if it exceeds the line.

Together, these operations give us a where we are and where we need to get to.
The actual move is performed with repeated calls to left()/right(), one for each
character we need to travel.

Here are a couple of location methods, one of which we just saw in use:

# ...

def position
@gap_start
end

def column
lbreak = @data.rindex(?\n, @gap_start-1)
lbreak.nil? ? @gap_start : (@gap_start-(lbreak+1))
end

# ...

As you can see, position() is just an alias for @gap_start. The column() method
though is just more clever use of the rindex() method. It really works just
like we saw in up(), save that it doesn't need to skip over a newline.

The last operation supported by this solution is to stringify the data:

# ...

def to_s
return @data[0, @gap_start] +
@data[@gap_start+@gap_len, @data.length-@gap_start-@gap_len]
end
end

The actual data is just the concatenation of what's on both sides of the gap.
The code above assembles that with simple indexing.

Holger had hoped the solution was O(n), but unfortunately it turns out to be
O(n**2). Eric explains why this is and provides tips for how to get it to O(n)
in this email:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/277308

While the code does need the mentioned changes to reduce its complexity, I still
felt it was a very clean implementation of the gap buffer and quite easy to
follow. This should make fixing it up a snap.

My thanks to those who did manage to find the time for the quiz. It was a
tricky problem to do well and all of the solutions showed off neat approaches.

Tomorrow we will do some vehicle counting, programmer style...
 
E

Eric Mahurin

Note: parts of this message were removed by the gateway to make it a legal Usenet post.

Implementing an efficient data structure for this problem can be
tricky. A
couple of the solutions turned out to have a higher complexity than it
first
appeared. That doesn't mean they aren't valuable to examine though.


I thought we had a good mix of data structures in the solutions. Anybody
with an interest in data structures should take a look. Here are the data
structures (or features) I saw in the solutions :

* conventional gap buffer
* "double-ended" gap buffer (one using Array and another String)
* deferred gap movement until editing operations
* linked list of lines (strings)
* circular buffer
* linked list of buffers (gap and file)

I thought all of them were very good solutions. Other possibilities were
also mentioned.

Let's have a look at Holger's code below.


...

def insert_before(ch)
if @gap_len.zero?
@data[@gap_start, 0] = @GAP
@gap_len = @GAP.length
end
@data[@gap_start] = ch
@gap_start += 1
@gap_len -= 1
end

def insert_after(ch)
if @gap_len.zero?
@data[@gap_start, 0] = @GAP
@gap_len = @GAP.length
end
@data[@gap_start+@gap_len-1] = ch
@gap_len -= 1
end


...

Holger had hoped the solution was O(n), but unfortunately it turns out to be
O(n**2). Eric explains why this is and provides tips for how to get it to
O(n)
in this email:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/277308

While the code does need the mentioned changes to reduce its complexity, I
still
felt it was a very clean implementation of the gap buffer and quite easy
to
follow. This should make fixing it up a snap.


BTW, the simple way to make Holger's code O(n) is to replace this line (in
both inserts):

@data[@gap_start, 0] = @GAP

with this:

@data[@gap_start , 0] = @data

This makes the gap (garbage data) the same size as the real data. This
amortizes the O(n) gap increase across O(n) operations, so that on average
each insert is still O(1). The gap just needs to be increased to a size
relative to the size of the real data. This is the same way the
reallocating arrays/strings (like ruby's C-level Array/String
implementations) are done so that operations at the end (where the "gap" is)
are amortized O(1).

Eric
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top