[QUIZ] Editing Text (#145)

J

Jesús Gabriel y Galán

I'm willing to extend this quiz a week if that's what people want.
In my experience that doesn't generally fetch more solutions, but I
too am strapped for time and would like to try it. All in favor?

I'm sorry, I won't have time this week anyway (my second baby was born
on Monday, so not much time to use the computer :). This was an
interesting quiz, I would have liked to have time to give it a try.

Jesus.
 
J

james

I'm sorry, I won't have time this week anyway (my second baby was born
on Monday, so not much time to use the computer :).

Congratulations on the new child!

James Edward Gray II
 
E

eric.mahurin

First attempt
- plain gap buffer implementation without much optimization
- only basic operations
- O(n)

Regards
Holger




# Solution to ruby quiz #145
# http://www.rubyquiz.com/quiz145.html
# by Holger
#
# GapBuffer works similar to StringCaret
#
# Some remarks:
# - if gap size is zero before insert it will be set to 64
# - data buffer will never shrink -> gap might be very large
# - lazy up and down methods, but cursor remains in same column (if fits in
line)


Hi Holger,

Thanks for the solution. One comment though. This solution is not
O(n). Keep in mind that every time you increase the gap (insert in
the middle of a string), it is an O(n) operation (written in C
though). For each 64 characters that you insert, you'll hit this
case. For insert, you have one component that is O(n) and another
that is O(n*n/64)~O(n**2). For large n, the n**2 component will
dominate, so it is O(n**2). For the sizes you are looking at, the
n**2 component isn't dominating because its coefficient is small (1/64
* insert written in C).

With a small change, you can make this order n. Hint: don't increase
the gap by a constant amount.

Eric
 
H

hmack.gm

Note: parts of this message were removed by the gateway to make it a legal Usenet post.

Hi Eric,

you're right, I missed this point with my first solution. One simple way to
improve this -- as you mentioned in your hint -- use a fraction A of the
complete gap length. This way we will have O(n).

Nevertheless, I wanted to send in a second solution, which will take care of
all the other cases (better multi-lines, ...) this time with two strings as
data buffer. This solution is still quite handy, but unfortunately not yet
finished. I think, with two buffers (first before the gap, second after) the
O(n) is always valid.

Unfortunately, I think I don't have enough time to complete the full
solution, maybe the first part with two strings.

Holger
 
M

Matthew Moss

My solution is a deque... The text before the cursor is kept
left-to-right, while the part after the cursor is reversed. This makes
for some very simple code for the basic operations, using mostly
push/pop. It might be worthwhile to not reverse the data, and make the
code slightly more assymetric by combining the push/pop with
unshift/shift. (Didn't try that, though...)

Initially, the "left" method was @post.push(@prev.pop), with the
"right" method similar. This proved to be terribly slow, so I kept a
"temporary" cursor that would collapse multiple left/right operations
into a single "sift" (i.e. moving n chars from one to the other). The
"sync" calls ensure that we do that sift before other operations.

The most complex methods (and slowest) here are "up" and "down".
Breaking the internals down into multiple lines rather than just the
@prev/@post pair, or somehow keeping track of the newlines would help
with the speed of up/down, but I got lazy. =)


class DequeBuffer
def initialize(data = "", i = 0)
@prev, @post = data[0, i].unpack("c*"), data[i..-1].unpack("c*").reverse
@curs = 0
end

def insert_before(ch)
sync
@prev.push(ch)
end

def insert_after(ch)
sync
@post.push(ch)
end

def delete_before
sync
@prev.pop
end

def delete_after
sync
@post.pop
end

def left
@curs -= 1 if @curs > (e-mail address removed)
end

def right
@curs += 1 if @curs < @post.length
end

def up
sync
c = @prev.rindex(?\n)
if c
c -= @prev.length
sift(c) # move to end of prev line
d = (@prev.rindex(?\n) || -1) - @prev.length - c
sift(d) if d < 0 # move to column
true
end
end

def down
sync
c = @post.rindex(?\n)
if c
c -= @post.length
sift(-c) # move to start of next line
d = (@post.rindex(?\n) || -1) - @post.length - c
sift(-d) if d < 0 # move to column
true
end
end

def to_s
(@prev + @post.reverse).pack("c*")
end

private

def sift(n)
if n < 0
@post.concat(@prev.slice!( n, -n).reverse)
elsif n > 0
@prev.concat(@post.slice!(-n, n).reverse)
end
end

def sync
sift(@curs)
@curs = 0
end
end


Running the test using simple string solution as comparison:

ruby -r deque.rb edit_test.rb 1 Simple.new 2000 100 DequeBuffer.new 2000 100
Simple.new: 2000x100
insert_before 0.890000 0.000000 0.890000 ( 0.889501)
left 0.170000 0.000000 0.170000 ( 0.173849)
right 0.190000 0.000000 0.190000 ( 0.184943)
up 0.200000 0.000000 0.200000 ( 0.204166)
down 0.210000 0.000000 0.210000 ( 0.205926)
insert_after 4.790000 0.010000 4.800000 ( 4.802009)
delete_before 3.480000 0.000000 3.480000 ( 3.484238)
delete_after 1.760000 0.010000 1.770000 ( 1.764394)
total 11.690000 0.020000 11.710000 ( 11.709026)
DequeBuffer.new: 2000x100
insert_before 0.280000 0.000000 0.280000 ( 0.285044)
left 0.180000 0.000000 0.180000 ( 0.182072)
right 0.160000 0.000000 0.160000 ( 0.160996)
up 1.530000 1.850000 3.380000 ( 3.373222)
down 0.780000 0.920000 1.700000 ( 1.714187)
insert_after 0.290000 0.000000 0.290000 ( 0.285273)
delete_before 0.250000 0.000000 0.250000 ( 0.251946)
delete_after 0.250000 0.000000 0.250000 ( 0.257460)
total 3.720000 2.770000 6.490000 ( 6.510201)
 
E

Eric Mahurin

The task is to implement data structures for efficiently editing text. One
common data structure is a gap buffer:

http://en.wikipedia.org/wiki/Gap_buffer

Other options to consider include: ropes (see quiz #137), linked lists, simple
strings, or a multi-level combination of data structures (i.e. for lines vs.
characters in a line). There are many data structures that may work efficiently
with simple editing operations, but not all of those data structures will work
well for more complex functionality.

Here is another solution :

http://rubyquiz.com/hosted_solutions/145/eric/linked.rb

http://rubyquiz.com/hosted_solutions/145/eric/edit_test2.rb

In addition to the minimal functionality, I implemented a method that
inserts a file in the text (#read). But, it doesn't read the entire
file into memory (it actually only needs to hold one character of the
file at a time). Only edits to the file are held in memory.

I implemented this by using a linked list of text buffers. Each
buffer is a gap buffer or a slice of an IO (begin/end positions).
Although the gap buffer I implemented (using two strings) is simple
and fast in ruby, it isn't as memory efficient as a normal gap buffer
since each string effectively has a gap (where a normal gap buffer has
one shared gap between before/after text).

I changed the API for this implementation. All of the methods also
take a lambda/proc that is called when switching from one buffer to
another (updating the current cursor object). Another possibility
which keeps the original API is to add a wrapper class to do this, but
it won't be nearly as efficient because of the extra layer (the
performance cost of the linked list is almost zero using the new API).
I also added some testing of file "read"s and
moving/inserting/deleting through files. SplitGapCursor can actually
be used with the old or new benchmarks (just don't link it to another
buffer). Here are some results (total measures the same as before and
ftotal measures operations on file text) :

SplitGapCursor: 4000x100
insert_before 0.360000 0.000000 0.360000 ( 0.359155)
left 0.630000 0.000000 0.630000 ( 0.632066)
right 0.620000 0.000000 0.620000 ( 0.623372)
up 0.480000 0.000000 0.480000 ( 0.470801)
down 0.480000 0.000000 0.480000 ( 0.478180)
insert_after 0.340000 0.000000 0.340000 ( 0.343666)
delete_before 0.500000 0.000000 0.500000 ( 0.496911)
delete_after 0.500000 0.000000 0.500000 ( 0.501599)
read 0.000000 0.000000 0.000000 ( 0.000031)
up 0.480000 0.000000 0.480000 ( 0.475890)
read 0.000000 0.000000 0.000000 ( 0.000025)
delete_before 0.970000 0.000000 0.970000 ( 0.971401)
right 0.510000 0.000000 0.510000 ( 0.502776)
left 0.560000 0.000000 0.560000 ( 0.559028)
down 0.550000 0.000000 0.550000 ( 0.552482)
up 0.520000 0.000000 0.520000 ( 0.516446)
delete_after 0.920000 0.000000 0.920000 ( 0.920615)
total 3.910000 0.000000 3.910000 ( 3.905750)
ftotal 4.030000 0.000000 4.030000 ( 4.022773)

Notice the read time (0). But, I used StringIO instead of a real file
for testing.
 
E

Eric Mahurin

My solution is a deque... The text before the cursor is kept
left-to-right, while the part after the cursor is reversed. This makes
for some very simple code for the basic operations, using mostly
push/pop.

I'm glad to see someone try an Arrays instead of a Strings. You'd
think the run-time would be similar between the two since the number
of method calls should be the same (#<< and #slice(-1) instead of
#push and #pop). But, the memory should be about 4X-8X though since
each array slot takes 32/64-bit object pointer/handle instead of an
8-bit character.
It might be worthwhile to not reverse the data, and make the
code slightly more assymetric by combining the push/pop with
unshift/shift. (Didn't try that, though...)

Don't expect very much performance since unshift is usually O(n).
Initially, the "left" method was @post.push(@prev.pop), with the
"right" method similar. This proved to be terribly slow, so I kept a
"temporary" cursor that would collapse multiple left/right operations
into a single "sift" (i.e. moving n chars from one to the other). The
"sync" calls ensure that we do that sift before other operations.

Odd. I tried both ways with String and it was faster to do @post <<
@prev.slice(-1). Since the operations I used were O(1), I think it
boiled down to minimizing the number of method calls. Using deferred
"sifting" resulted in more complexity and method calls. I wouldn't be
surprised if there is some performance bug with @post.push(@prev.pop
or return).
The most complex methods (and slowest) here are "up" and "down".
Breaking the internals down into multiple lines rather than just the
@prev/@post pair, or somehow keeping track of the newlines would help
with the speed of up/down, but I got lazy. =)

These are way too slow. It looks O(n**2). You should look to see
what happens when you double the size. I think you might have hit an
Array COW (copy-on-write) performance/memory bug(/leak?). I've seen
plenty of them.

Why not use the deferred "sifting" like you do for left/right instead?
This would avoid the problem.

Eric
 
M

Matthew Moss

The most complex methods (and slowest) here are "up" and "down".
These are way too slow. It looks O(n**2). You should look to see
what happens when you double the size. I think you might have hit an
Array COW (copy-on-write) performance/memory bug(/leak?). I've seen
plenty of them.

Why not use the deferred "sifting" like you do for left/right instead?
This would avoid the problem.

Ten minutes after I posted my solution, I asked myself the same thing. =)

I'll try a rewrite if I get a chance, and look into your other comments as well.
 
S

Sean Surname

Sort of late, but here's an approach I call a "patch buffer".
Basically, the idea is to accumulate edits as cheaply as possible using
a sort of "patch," then perform them all when the cursor moves.
One-character movement could be implemented by editing the patch, but
that's more work... There are almost certainly some off-by-one bugs
lurking in it, but luckily the benchmark test doesn't mix operations
very much.

Attachments:
http://www.ruby-forum.com/attachment/934/pb.rb
 
R

Randy Kramer

Sort of late, but here's an approach I call a "patch buffer".
Basically, the idea is to accumulate edits as cheaply as possible using
a sort of "patch," then perform them all when the cursor moves.

Nice idea!

(I didn't look at the code yet, but I can imagine what you might have done.)

Randy Kramer
 
E

Eric Mahurin

Note: parts of this message were removed by the gateway to make it a legal Usenet post.

Sort of late, but here's an approach I call a "patch buffer".
Basically, the idea is to accumulate edits as cheaply as possible using
a sort of "patch," then perform them all when the cursor moves.
One-character movement could be implemented by editing the patch, but
that's more work... There are almost certainly some off-by-one bugs
lurking in it, but luckily the benchmark test doesn't mix operations
very much.

If you start mixing edits and movements, I don't think the performance will
be very good because a fixup is O(n) after some edits. Consider a sequence
of n (insert_before;left). Each left will do a full fixup making this
sequence O(n**2).

Eric
 
S

Sean Surname

Eric said:
If you start mixing edits and movements, I don't think the performance
will
be very good because a fixup is O(n) after some edits. Consider a
sequence
of n (insert_before;left). Each left will do a full fixup making this
sequence O(n**2).

It's about 5x slower than a gap buffer in this case for the 1000x100
size, but breaks even by the time you do 20 insert_before()s between
left()s. On the other hand, I think any array-based approach has an
O(n^2) worst case -- there's always some locality assumption. For
example, a gap buffer will be quadratic for a sequence of n
(insert_before; up; insert_after; down) on a string with longish lines,
since it has to shuffle half the buffer back and forth (try attached).
The patch approach seems reasonable given the limitations of working in
a high-level interpreted language with fast strings, though it wouldn't
make sense in C.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top