[QUIZ] Editing Text (#145)

Discussion in 'Ruby' started by Ruby Quiz, Oct 26, 2007.

  1. Ruby Quiz

    Ruby Quiz Guest

    The three rules of Ruby Quiz:

    1. Please do not post any solutions or spoiler discussion for this quiz until
    48 hours have passed from the time on this message.

    2. Support Ruby Quiz by submitting ideas as often as you can:


    3. Enjoy!

    Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
    on Ruby Talk follow the discussion. Please reply to the original quiz message,
    if you can.


    by Eric Mahurin

    Have you ever wondered how a text buffer might be represented in a text editor
    or word processor? A simple string to represent text buffer isn't efficient
    enough because inserting (i.e. typing) and deleting (backspace) in the middle
    would result in moving all of the text to the end for each operation. A data
    structure that can efficiently insert and delete in the middle is needed.

    The task is to implement data structures for efficiently editing text. One
    common data structure is a gap buffer:


    Other options to consider include: ropes (see quiz #137), linked lists, simple
    strings, or a multi-level combination of data structures (i.e. for lines vs.
    characters in a line). There are many data structures that may work efficiently
    with simple editing operations, but not all of those data structures will work
    well for more complex functionality.

    All of the basic operations occur around a text cursor. The minimal operations
    on/at the cursor should be:

    * Insert a character before or after the cursor.
    * Delete a character before or after the cursor and return the
    deleted character.
    * Move the cursor left or right (crossing line boundaries if
    necessary) and return the character or nil at the beginning
    or end of the buffer.
    * Move the cursor up or down a line and return nil/false only if a
    line boundary could not be crossed. The cursor may be placed in
    the most natural column for the data structure.

    Additional useful operations that you might find in a typical text editor can
    also be added:

    * Get current line and column numbers
    * Copy some amount of text before or after the cursor and return
    this buffer.
    * Display some context around the cursor.
    * Cut some amount of text before or after the cursor and return
    this buffer.
    * Paste a copy/cut buffer before or after the cursor.
    * Insert a file.
    * Write to a file.
    * Goto a specific line or column.
    * Goto the begin/end of the line/buffer.
    * Copy or cut to a specific line/column.
    * Filter some text through a ruby block.
    * Search (and possibly replace) using a regular expression.
    * Undo/redo.

    Major bonus points for the following where gap buffers probably won't work:

    * Only store changes to a file.
    * Handle multiple efficient cursors in a text buffer.

    Although the focus is on data structures and making the ruby DSL equivalent to
    unix "ed" or DOS "edlin", a GUI could be added to make a full-screen/window text

    Here is a benchmark for testing that needs the minimal implementation
    (#insert_before, #insert_after, #delete_before, #delete_after, #left, #right,
    #up, #down):

    # edit_test.rb
    # usage: ruby -r klass.rb edit_test.rb <iter> \
    # [<constructor> [<lines> <columns>] ...]

    require 'benchmark'
    require 'test/unit/assertions'
    include Test::Unit::Assertions

    # char = byte pre 1.9, each_char already defined in 1.9
    unless "".respond_to?:)each_char)
    class String;alias_method:)each_char, :each_byte);end

    iterations = ARGV.shift.to_i

    while cursor = ARGV.shift
    nlines = (ARGV.shift || 10000).to_i
    ncolumns = (ARGV.shift || 100).to_i
    n = nlines*ncolumns
    chars = (?a..?z).to_a
    line = (0...ncolumns).inject("") { |line, i|
    line << chars[i%chars.length]
    line[-1] = ?\n

    iterations.times { eval(cursor).instance_eval {
    "#{cursor}: #{nlines}x#{ncolumns}\n",16,nil,"total"
    ) { |b|

    total = b.report("insert_before") {
    nlines.times { line.each_char { |ch| insert_before(ch) } }
    i = 0
    total += b.report("left") { i += 1 while left }
    assert_equal(n, i)
    i = 0
    total += b.report("right") { i += 1 while right }
    assert_equal(n, i)
    i = 0
    total += b.report("up") { i += 1 while up }
    assert_equal(nlines, i)
    i = 0
    total += b.report("down") { i += 1 while down }
    assert_equal(nlines, i)
    total += b.report("insert_after") {
    nlines.times { line.each_char { |ch| insert_after(ch) } }
    i = 0
    total += b.report("delete_before") {
    i += 1 while delete_before
    assert_equal(n, i)
    i = 0
    total += b.report("delete_after") {
    i += 1 while delete_after
    assert_equal(n, i)


    } }
    Ruby Quiz, Oct 26, 2007
    1. Advertisements

  2. Ruby Quiz

    Todd Burch Guest

    James, thanks for putting all these challenges together, and thanks to
    the contributors too.

    Just a terminology thing - the proper term is "caret", not a "text

    Todd Burch, Oct 26, 2007
    1. Advertisements

  3. James Edward Gray II, Oct 26, 2007
  4. Ruby Quiz

    Todd Burch Guest

    Not out of line in the least. Without resorting to cheesey emoticons
    (wink wink, smiley, grin, et al), mine was a passing comment. I am sure
    no one got confused from your description.

    Again, thanks.
    Todd Burch, Oct 26, 2007
  5. Just to be clear, I didn't write that quiz. Eric Mahurin did.

    James Edward Gray II
    James Edward Gray II, Oct 26, 2007
  6. Actually, I'd say that caret is a UI/View related term (its the name
    of the ^ character, traditionally used in blue-pencil editing to mark
    a place to insert test.

    Cursor is kind of both a view and a model term, and I think we're
    talking about a model in this quiz.

    That said, in general, when I've looked at similar code, the analogous
    concept usually used is a selection, which represents a contiguous run
    of zero or more characters within the buffer. The quiz seems to be
    using a degenerate version of this where the length is constrained to
    be zero.
    Rick DeNatale, Oct 26, 2007
  7. When the Macintosh first came out, the little blinking vertical bar in
    text editors was called the insertion point or caret, but, because of
    its shape, it was also called the "stick".

    It doesn't matter whether you see it as a caret or a stick, as long as
    it provides sufficient motivation for the Quiz. ;) ;) ;)
    Joel VanderWerf, Oct 26, 2007
  8. Ruby Quiz

    Eric Mahurin Guest

    Correct. I've also seen the term cursor being used with databases
    (pointer to what row is being operated on). It has also been used to
    mean the same as what rubyists call an "external iterator" (ruby
    mainly uses "internal iterators" or what others might call visitors).

    Before mice came on the scene the symbol representing the text
    insertion point was also called a "cursor". I didn't realize that
    "cursor" is now mainly used for the mouse now and the text cursor has
    been demoted to "caret". I usually just say "mouse pointer" and "text
    cursor". According to how the term "cursor" is used in data
    structures, I think the text cursor/caret deserves to use this term
    more than a mouse pointer/cursor does.
    Don't let the test in this quiz stop you. I just provided a benchmark
    for the simple stuff. Feel free to implement other text editor
    operations in addition to the simple operations that the benchmark
    uses. And don't let the benchmark restrict your API. If the model
    that the benchmark assumes doesn't match your model (i.e. a selection
    in your case?), change the benchmark test to what you need. Also,
    achieving the best absolute performance on this benchmark isn't
    necessarily the most important, but you should get reasonable big-O
    performance. I'd like to see what data structures some of you come up
    with and how far you can take them in terms of functionality. I think
    there are lots of solutions with a variety of data structures.

    Have fun!

    Eric Mahurin, Oct 27, 2007
  9. Ruby Quiz

    Eric Mahurin Guest

    Here is an inefficient string-based solution:

    class StringCaret
    # cursor/caret is between char i-1 and i
    def initialize(data="", i=0)
    @data = data
    @i = i
    def insert_before(ch)
    @data[@i,0] = ("" << ch)
    @i += 1
    def insert_after(ch)
    @data[@i,0] = ("" << ch)
    def delete_before
    @i.nonzero? and @data.slice!(@i-=1)
    def delete_after
    def left
    @i.nonzero? and @i-=1
    def right
    @i<@data.length and @i+=1
    def up
    while @i.nonzero?
    @i -= 1
    break(true) if @data[@i]==?\n
    def down
    while @i<@data.length
    break(@i+=1) if @data[@i]==?\n
    @i += 1

    The benchmark results look like this on my machine:
    StringCaret.new: 1000x100
    total 7.800000 0.030000 7.830000 ( 7.851117)
    StringCaret.new: 2000x100
    total 26.810000 0.100000 26.910000 ( 27.015445)

    The run-time almost quadrupled when the data-size doubled, so this is O(n**2).

    An O(n) solution with this minimal functionality is doable in about
    the same number of lines as above. But, you have to use another data

    Eric Mahurin, Oct 29, 2007
  10. Ruby Quiz

    Philip Gatt Guest

    Now that you've submitted a reference implementation, it looks much
    more interesting to try to optimize. I bet you'll get a better
    response now.

    Philip Gatt, Oct 29, 2007
  11. Ruby Quiz

    Eric Mahurin Guest

    Yes, I probably should have put something like this in the quiz to begin with.
    Eric Mahurin, Oct 29, 2007
  12. Ruby Quiz

    Eric Mahurin Guest

    Since I haven't seen any solutions yet, I'd thought I'd elaborate on
    some possible solutions I was thinking of:

    * conventional gap buffer implemented with a string. The string would
    have 3 parts (a couple indices might be used for the markers): before
    the cursor, the "gap" (containing garbage), and after the cursor.
    When you have no more space in the gap, copy the data to a new string
    using a larger gap. Think about how the gap should be sized. I
    believe most text editors use a gap buffer.

    * gap buffer implemented with two strings. Represent the text before
    the cursor with one string and the text after the cursor with another
    string that is reversed. All operations around the cursor correspond
    to operations at the ends of these strings - O(1). Unlike the
    conventional gap buffer, you don't have to deal with reallocation (gap
    filing up). Instead let ruby do its string reallocation when a
    string's capacity fills up. The memory efficiency might not be as
    good as the conventional gap buffer, though.

    * gap buffer with deferred gap movement. The "gap" is really only
    useful when making edits (insertions/deletions). For simple cursor
    movements, you don't have to "move" the gap immediately. The cursor
    position could be changed independently of the gap. But, you'll have
    to synchronize the gap to the cursor position once an edit is needed.
    Random access cursor movement can also be done in O(1) time, but
    you'll still have to pay for it if you immediately make an edit at
    that new position.

    * use a rope implementation (quiz #137). You could start with the
    inefficient string solution I gave earlier and operate on a rope
    instead (making new ones if using an immutable implementation). It
    may be difficult to make the solution as fast as the gap buffers for
    simple stuff (likely slower by O(log(n))), but ropes will probably be
    more powerful for complex functionality. I believe ropes were
    invented for text editing (something called Coral I believe).

    * linked lists. The text could be represented as a linked list of
    characters. The cursor position might correspond to one of the linked
    list nodes or links. All operations around the cursor should be O(1).
    An advantage over gap buffers is that you can efficiently implement
    multiple cursors (possibly from different editing panes or even
    different users). The downside is the memory usage is higher and
    random access might be higher.

    * use a two level data structure. One data structure could be used
    for lines and another for the characters within each line. All of the
    above data structures apply to both. For managing lines, you'd need
    to use arrays instead of strings in the data structure since the
    element is a line instead of a character. Using a two-level data
    structure will give better performance for line operations (i.e.


    Eric Mahurin, Oct 30, 2007
  13. Ruby Quiz

    Bill Kelly Guest

    I was hoping to use this quiz to play around with mmap, but I don't
    think I'll have time this week. :(

    For fun, I was going to read the file-to-be-edited into a memory
    mapped file, breaking the data up into VM-page sized blocks (4K, on
    my system I think.)

    The blocks would be linked together, and a given block would not
    be required to be full. Thus, similar to the gap buffer technique,
    if an insert occurred in a full block, it could be split into two
    or more partially full blocks. (Adjacent partially full blocks
    could be coalesced in such a way as to prevent the accumulation of
    lots of partially full blocks, wasting space. (Haven't fully
    thought this through, but it doesn't seem like it would be
    problematic, unless there's a snag of some sort I'm overlooking.))

    Then, I wanted to play with a B-tree to support fast random access
    into the block list. I didn't quite figure this out, but it seems
    there should be a way to construct the B-tree such that one could
    index into it using the desired file offset, and wind up at the
    appropriate block. Seems like I would need to store the byte
    count of how much data is held by a block in the B-tree leaf nodes,
    and as inserts & deletes occurred these sizes would change, and
    would propagate up to the parent, to the root of the B-tree. In
    other words, each B-tree node would know the byte count of the
    amount of data held by its children.

    With a 4K page size, each B-tree node could have lots of children,
    and thus the tree would tend to be very shallow. However because
    I think I'd want to store the child node sizes in a relative way,
    it would preclude doing a binary search at any given node level to
    find a desired offset; one would have to sum the sizes at a given
    node level to locate the child node one was interested in. . . .
    Still the trade-offs seem probably reasonable because most
    relative cursor movement could occur without searching the tree
    (the blocks are linked together). So the B-tree would be used
    just for random-access jumps through the file.

    Alternatively, it might be neat to instead construct the B-tree to
    organize the file by line offsets. Random access by line number
    would probably make more sense in a text editor than random access
    by byte offset anyway. :) :)

    Oh well - anyway, thanks for the quiz. Even though I'll not
    likely find time to try implementing the above, it's been fun to
    think about!


    Bill Kelly, Oct 30, 2007
  14. Ruby Quiz

    Robert Dober Guest

    Exactly the same here, maybe this is one of the occasions to
    prolongate for a week?
    I wanted to finish what I began with ropes, and I have a long WE ahead ;).

    Robert Dober, Oct 30, 2007
  15. I'm willing to extend this quiz a week if that's what people want.
    In my experience that doesn't generally fetch more solutions, but I
    too am strapped for time and would like to try it. All in favor?

    James Edward Gray II
    James Edward Gray II, Oct 30, 2007
  16. Ruby Quiz

    Eric Mahurin Guest

    I'm in favor. Another quiz during rubyconf may not be a good idea
    anyways. I was hoping to at least see some simple solutions right
    now. A minimal solution can be done in about 35 lines of code (most
    of those lines are class/def/end).

    Eric Mahurin, Oct 30, 2007
  17. Three votes is good enough for me. We'll add a week for this quiz.

    Robert, you are now honor bound to solve it. ;)

    James Edward Gray II
    James Edward Gray II, Oct 30, 2007
  18. Ruby Quiz

    Robert Dober Guest

    Not much to lose;) but I here the message ;)
    Robert Dober, Oct 30, 2007
  19. Ruby Quiz

    Robert Dober Guest

    Robert Dober, Oct 30, 2007
  20. Ruby Quiz

    Eric Mahurin Guest

    Here is a strange solution of mine.

    I start by partially implementing a string class that works equally
    well on the front and the back (double ended queue string). My
    array.c patch from a couple years ago did something like this, but the
    data structure below is a bit more memory efficient since it can share
    a single "gap" and recirculates unused space better. Basically, the
    implementation below is a circular buffer that reallocates when it is
    full. It would be great if the standard String and Array were
    symmetrical like this. It would increase the ways to use simple
    strings and arrays.

    Next I use this double ended queue string to implement something like
    a gap buffer. You can think of the gap as being at the ends of of
    this double-ended string. Text before the cursor is at the end of the
    string and text before the cursor is at the beginning.

    The performance is O(n), but not nearly as fast as some a simple
    solution (5X slower). If String was symmetical, I would use that and
    it would be a different story.

    If I have time, I may try this DQString in a linked-list data-structure...


    # characters can be in either of these ranges:
    # gapped : @begin...length, [email protected]
    # contiguous : @[email protected]
    # inheriting String only for memory efficiency
    # inherited methods won't necessarily make sense

    class DQString < String
    def initialize(data="")
    @begin = 0
    @end = data.length
    def inspect
    "#<DQString: begin=#{@begin}, end=#{@end}, #{super}>"
    def length
    if (len = @[email protected])>=0
    def << (ch)
    if @end==size
    if @begin>1
    # use the front, making a gap in the middle
    self[0] = ch
    @end = 1
    @end += 1
    # let ruby realloc when needed
    self[@end] = ch
    @end += 1
    if @[email protected]
    # double the size when the gap becomes empty
    @begin += size
    def >> (ch)
    if @begin.zero?
    @begin = size-1
    if @end<@begin
    # use the back, making a gap in the middle
    # double the size to make room at the front
    @end += size
    @begin -= 1
    if @[email protected]
    # double the size when the gap becomes empty
    @begin += size
    self[@begin] = ch
    def pop
    if @[email protected]
    @end -= 1
    ch = self[@end]
    if (len = @[email protected])>=0
    # remove excess trailing space if too much
    self.slice!(-len,len) if [email protected]>(len<<1)
    elsif @end.zero?
    len = (@end=size)[email protected]
    if @begin>(len<<1)
    # remove excess leading space
    self.slice!(0, len)
    @begin -= len
    @end -= len
    def shift
    if @[email protected]
    ch = self[@begin]
    @begin += 1
    if (len = @[email protected])>=0
    if @begin>(len<<1)
    # remove excess leading space
    self.slice!(0, len)
    @begin -= len
    @end -= len
    elsif @begin==size
    # remove excess trailing space if too much
    self.slice!([email protected],@end) if (@end<<1)+len<0
    @begin = 0


    require 'dqstring'

    class DQCursor
    def initialize
    @data = DQString.new
    @nafter = 0
    def insert_before(ch)
    @data << ch
    def insert_after(ch)
    @nafter += 1
    @data >> ch
    def delete_before
    @data.length>@nafter and @data.pop
    def delete_after
    @nafter.nonzero? and (@nafter-=1; @data.shift)
    def left
    @data.length>@nafter and (@nafter+=1; @data >> @data.pop)
    def right
    @nafter.nonzero? and (@nafter-=1; @data << @data.shift)
    def up
    nbefore = @[email protected]
    while nbefore.nonzero?
    nbefore -= 1
    @data >> ([email protected])
    return(true) if ch==?\n
    @[email protected]
    def down
    while @nafter.nonzero?
    @nafter -= 1
    @data << ([email protected])
    return(true) if ch==?\n
    Eric Mahurin, Oct 31, 2007
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.