shift vs. slice!(0) and others


E

Eric Mahurin

I just did some benchmarking of various ways to insert/delete
elements off of either end of arrays and strings (100,000
operations starting or ending with a 100,000 element
array/string):

user system total real
------
push 0.063000 0.000000 0.063000 ( 0.062000)
append 0.078000 0.000000 0.078000 ( 0.078000)
assign1 0.093000 0.000000 0.093000 ( 0.094000)
assign 1.063000 0.000000 1.063000 ( 1.060000)
insert 0.953000 0.000000 0.953000 ( 1.013000)
string append 0.078000 0.000000 0.078000 ( 0.078000)
string assign0 0.469000 0.000000 0.469000 ( 0.468000)
string insert 0.437000 0.000000 0.437000 ( 0.436000)
------
unshift 8.438000 0.000000 8.438000 ( 8.587000)
assign 9.312000 0.000000 9.312000 ( 9.475000)
insert 9.282000 0.000000 9.282000 ( 9.507000)
string assign 2.562000 0.000000 2.562000 ( 2.649000)
string insert 2.468000 0.016000 2.484000 ( 2.571000)
------
pop 0.047000 0.000000 0.047000 ( 0.047000)
delete_at 0.062000 0.000000 0.062000 ( 0.063000)
slice! 0.094000 0.000000 0.094000 ( 0.093000)
assign 0.297000 0.000000 0.297000 ( 0.311000)
string slice! 0.234000 0.000000 0.234000 ( 0.296000)
string assign 0.219000 0.000000 0.219000 ( 0.218000)
------
shift 0.062000 0.000000 0.062000 ( 0.062000)
delete_at 44.313000 0.000000 44.313000 ( 45.260000)
slice! 40.468000 0.000000 40.468000 ( 41.285000)
assign 8.688000 0.000000 8.688000 ( 8.915000)
string slice! 2.469000 0.000000 2.469000 ( 2.478000)
string assign 2.391000 0.000000 2.391000 ( 2.479000)


The one that really sticks out is shift vs. delete_at(0) and
slice!(0). I see a 650X difference in performance. I'm
guessing shift is O(1) and the others are O(n). I'm assuming
that shift simply increments the start of the array pointer and
frees memory at certain boundaries. If that is the case, it
would be nice if the other forms did the same and unshift did
the opposite (simply decremented the start of the array/string
pointer). This sure would be nice for easy and high
performance implementations of circular and gap buffers.


FYI, I did this on v1.8.2 on a windows machine. Here is the
benchmark code:

require 'benchmark'
n =3D 100000
v =3D ?X
a0 =3D [v]*n
s0 =3D ("" << v)*n

Benchmark.bmbm { |b|
b.report("push" ) { a =3D []; n.times { |i|
a.push(v) } }
b.report("append" ) { a =3D []; n.times { |i| a << v }
}
b.report("assign1" ) { a =3D []; n.times { |i|
a[a.size]=3Dv } }
b.report("assign" ) { a =3D []; n.times { |i|
a[a.size,0]=3D[v] } }
b.report("insert" ) { a =3D []; n.times { |i|
a.insert(-1,v) } }
b.report("string append" ) { s =3D ""; n.times { |i| s << v }
}
b.report("string assign0") { s =3D ""; n.times { |i|
s[s.size,0]=3D("" << v) } }
b.report("string insert" ) { s =3D ""; n.times { |i|
s.insert(-1,"" << v) } }

b.report("unshift" ) { a =3D []; n.times { |i|
a.unshift(v) } }
b.report("assign" ) { a =3D []; n.times { |i|
a[0,0]=3D[v] } }
b.report("insert" ) { a =3D []; n.times { |i|
a.insert(0,v) } }
b.report("string assign") { s =3D ""; n.times { |i|
s[0,0]=3D("" << v) } }
b.report("string insert") { s =3D ""; n.times { |i|
s.insert(0,"" << v) } }

b.report("pop" ) { a =3D a0.dup; n.times { |i| a.pop
} }
b.report("delete_at" ) { a =3D a0.dup; n.times { |i|
a.delete_at(-1) } }
b.report("slice!" ) { a =3D a0.dup; n.times { |i|
a.slice!(-1) } }
b.report("assign" ) { a =3D a0.dup; n.times { |i| begin
a[-1] ensure a[-1,1]=3D[] end } }
b.report("string slice!") { s =3D s0.dup; n.times { |i|
s.slice!(-1) } }
b.report("string assign") { s =3D s0.dup; n.times { |i| begin
s[-1] ensure s[-1,1]=3D"" end } }

b.report("shift" ) { a =3D a0.dup; n.times { |i|
a.shift } }
b.report("delete_at" ) { a =3D a0.dup; n.times { |i|
a.delete_at(0) } }
b.report("slice!" ) { a =3D a0.dup; n.times { |i|
a.slice!(0) } }
b.report("assign" ) { a =3D a0.dup; n.times { |i| begin
a[0] ensure a[0,1]=3D[] end } }
b.report("string slice!") { s =3D s0.dup; n.times { |i|
s.slice!(0) } }
b.report("string assign") { s =3D s0.dup; n.times { |i| begin
s[0] ensure s[0,1]=3D"" end } }
}




=09
__________________________________=20
Yahoo! Mail Mobile=20
Take Yahoo! Mail with you! Check email on your mobile phone.=20
http://mobile.yahoo.com/learn/mail=20
 
Ad

Advertisements

N

Nikolai Weibull

Eric Mahurin wrote:

[...]
This sure would be nice for easy and high performance implementations
of circular and gap buffers.

Please do explain,
nikolai
 
J

jason r tibbetts

Nikolai said:
Eric Mahurin wrote:

[...]

This sure would be nice for easy and high performance implementations
of circular and gap buffers.


Please do explain,
nikolai

"circular buffers" == deques, which are "circular" arrays--incrementing
past the end takes you to the beginning, etc.

But back to the original suggestion that delete_at() be implemented the
same way as shift(). The former makes it clear that the index in
question is being removed altogether, which will result in a
down-shifting of the array elements. shift() neither implies nor
requires that same constraint.
 
E

Eric Mahurin

--- Nikolai Weibull
Eric Mahurin wrote:
=20
[...]
=20
This sure would be nice for easy and high performance
implementations of circular and gap buffers.
=20
Please do explain,
nikolai

OK. I've been thinking about this stuff quite a bit while
working on my cursor package. Let's start with a gap buffer.=20
The traditional approach is to a have an array with the data
before the cursor at the beginning of the array and data after
the cursor at the end of the array. In the middle is the "gap"
and could be gigabytes of virtual address space if you have
enough control over virtual memory (you don't in Ruby).=20
Something like this:

A B C D E -------------- F G H I
0 1 2 3 4 ---the gap--- -4 -3 -2 -1
^ ^ ^ ^
begin before after end

All single element operations (move cursor, read, write, and
especially insert/delete) are O(1) operations. Compare this to
an array/string where insert/delete are O(n).

Another way you could organize the data above would be like
this:
=20
F G H I A B C D E
0 1 2 3 4 5 6 7 8
^ ^ ^
after=3D0 begin_end before

In this case, the "gap" is what is outside the array. If
single all single element operations on the ends of this array
(push, pop, shift, unshift) were O(1), then all our single
element operations at the cursor in this structure would also
be O(1). The problem is shift/unshift usually aren't. But
they could be by simply moving the start array pointer around
(shift looks to be O(1)).

Another approach I'm taking now is implementing this gap buffer
by using 2 arrays/strings, where one represents what's before
the cursor and one represents what's after the cursor:

A B C D E
0 1 2 3 4
^ ^
begin before

I H G F
0 1 2 3
^ ^
end after

I store the "after" array/string in reverse so that all
operations at the cursor occur at the ends of one or both of
these arrays/strings.

I haven't seen either of these approaches before. Has anybody
else done them?

For implementing a circular buffer, the first two
implementations above can be easily adapted by removing the
begin, end, or begin_end indices and allowing to wrap around or
pass through them. Implementing a circular buffer using ideas
from the last implementation above is a little trickier, but I
think I have a solution.

This is just a sampling of implementations that my next cursor
release will provide. There are many more possibilities
(linked-lists, hybrids, trees, etc) that I probably won't get
to yet.



=09
__________________________________=20
Yahoo! Mail=20
Stay connected, organized, and protected. Take the tour:=20
http://tour.mail.yahoo.com/mailtour.html=20
 
E

Eric Mahurin

--- jason r tibbetts said:
But back to the original suggestion that delete_at() be
implemented the=20
same way as shift(). The former makes it clear that the index
in=20
question is being removed altogether, which will result in a=20
down-shifting of the array elements. shift() neither implies
nor=20
requires that same constraint.

As far as I can tell, all of these should be equivalent:

array.shift
array.delete_at(0)
array.slice!(0)

Are you saying these don't have the same functionality? What
do you mean when you say the index is "removed altogether"?

OT... does anybody else hate it that the String and Array API's
have suttle differences and lack methods of the other?



=09
____________________________________________________=20
Yahoo! Sports=20
Rekindle the Rivalries. Sign up for Fantasy Football=20
http://football.fantasysports.yahoo.com
 
N

Nikolai Weibull

Eric said:
Nikolai Weibull wrote:
Eric Mahurin wrote:

[...]
This sure would be nice for easy and high performance
implementations of circular and gap buffers.
Please do explain,
nikolai
OK. I've been thinking about this stuff quite a bit while working on
my cursor package. Let's start with a gap buffer. The traditional
approach is to a have an array with the data before the cursor at the
beginning of the array and data after the cursor at the end of the
array. In the middle is the "gap" and could be gigabytes of virtual
address space if you have enough control over virtual memory (you
don't in Ruby). Something like this:

A B C D E -------------- F G H I
0 1 2 3 4 ---the gap--- -4 -3 -2 -1
^ ^ ^ ^
begin before after end

All single element operations (move cursor, read, write, and
especially insert/delete) are O(1) operations. Compare this to an
array/string where insert/delete are O(n).

I meant, please explain how this applies to circular (?) and gap
buffers, whatever you mean by "circular and gap buffers". I know how a
gap buffer works (but perhaps other people on this list don't, so your
explanation has hopefully not fallen on muted ears).

The operations are not all O(1). Insert and delete are still O(n). It
can, however, be reasoned that for normal use these operations will
perform as if O(1). Doing random insertions/deletions will require O(n)
time, though. The move-to operation (as I refer to it) can be made to
always operate in O(1) time, if the gap is only moved when an
insert/delete is actually performed.
Another way you could organize the data above would be like this:

F G H I A B C D E
0 1 2 3 4 5 6 7 8
^ ^ ^
after=0 begin_end before

In this case, the "gap" is what is outside the array. If single all
single element operations on the ends of this array (push, pop, shift,
unshift) were O(1), then all our single element operations at the
cursor in this structure would also be O(1). The problem is
shift/unshift usually aren't. But
they could be by simply moving the start array pointer around
(shift looks to be O(1)).

No, it would be very strange to have both O(1) shift and push on an
array. A deque is a lot easier to implement using a linked list, but
then you loose O(1) lookup.

A possible solution is to use a "circular gap", i.e., a gap that can
span both ends of the buffer as necessary, e.g.,

+---------+--------------------------+---------+
| Gap | Buffer | Gap | .
+---------+--------------------------+---------+

Allowing for such gaps in a buffer can be very beneficial for certain
modification patterns for sure. Climacs has support for this kind of
gap buffering strategy through the Flexichain package.
Another approach I'm taking now is implementing this gap buffer by
using 2 arrays/strings, where one represents what's before the cursor
and one represents what's after the cursor:

A B C D E
0 1 2 3 4
^ ^
begin before

I H G F
0 1 2 3
^ ^
end after

I store the "after" array/string in reverse so that all operations at
the cursor occur at the ends of one or both of these arrays/strings.
I haven't seen either of these approaches before. Has anybody else
done them?

This is more or less typical of modifications of lists in, for example,
Haskell, albeit I haven't seen it being used in a text editor. The
problem with this solution is that you'll have to modify two
strings/arrays whenever you move the cursor, but pushing/popping is
efficient, so it's quite a good solution (for small buffers).
For implementing a circular buffer, the first two implementations
above can be easily adapted by removing the begin, end, or begin_end
indices and allowing to wrap around or pass through them.
Implementing a circular buffer using ideas from the last
implementation above is a little trickier, but I think I have a
solution.

Well, why not just use a circular list? People seem to be forgetting
about linked lists lately. They do have their applications you know
:).
This is just a sampling of implementations that my next cursor
release will provide. There are many more possibilities
(linked-lists, hybrids, trees, etc) that I probably won't get
to yet.

It's nice to see that someone is taking an interest in this kind of
stuff. String and Array are great for many applications, but perform
terribly for certain kinds of behavioral patterns and are in no way a
universal solution. I believe Ruby would benefit by having more data
structures that would perform gracefully for concatenations, e.g., ropes
from STL or something similar,
nikolai
 
Ad

Advertisements

N

Nikolai Weibull

jason said:
Nikolai Weibull wrote:
Eric Mahurin wrote:
[...]
This sure would be nice for easy and high performance implementations
of circular and gap buffers.
Please do explain,
nikolai
"circular buffers" == deques, which are "circular" arrays--incrementing
past the end takes you to the beginning, etc.

The "circular and gap buffers" construction was flawed. It should have
been "circular and gapped buffers" or, even better, "circular buffers
and gapped buffers".

Deques are not necessarily implemented using arrays. A doubly linked
list will do just as well (and doesn't have sizing issues).
But back to the original suggestion that delete_at() be implemented the
same way as shift(). The former makes it clear that the index in
question is being removed altogether, which will result in a
down-shifting of the array elements. shift() neither implies nor
requires that same constraint.

Why would #delete_at(0) behaving as #shift be weird?,
nikolai
 
E

Eric Hodel

As far as I can tell, all of these should be equivalent:

array.shift
array.delete_at(0)
array.slice!(0)

$ ri Array#shift
------------------------------------------------------------ Array#shift
array.shift -> obj or nil
...
$ ri Array#delete_at
-------------------------------------------------------- Array#delete_at
array.delete_at(index) -> obj or nil
...
$ ri Array#slice!
----------------------------------------------------------- Array#slice!
array.slice!(index) -> obj or nil
array.slice!(start, length) -> sub_array or nil
array.slice!(range) -> sub_array or nil
...

Are you saying these don't have the same functionality?

They take different arguments, so must parse their arguments
differently and behave differently based on that. I don't see any
reason to complicate matters to special case for rarely-used
behaviors. There's no reason to write arr.delete_at 0 when you could
write arr.shift.

(slice! index calls delete_at index on the inside.)
What do you mean when you say the index is "removed altogether"?

delete_at n removes from anywhere in the Array, so things must be
shifted around. This is not true for shift, you just move up the
pointer. (Ruby Arrays are copy-on-write.)
OT... does anybody else hate it that the String and Array API's
have suttle differences and lack methods of the other?

No, I don't notice it at all. Strings aren't Arrays and Arrays
aren't Strings. They only look the same if you squint really hard.
 
E

Eric Mahurin

--- Eric Hodel said:
=20
They take different arguments, so must parse their arguments=20
=20
differently and behave differently based on that. I don't
see any =20
reason to complicate matters to special case for rarely-used=20
=20
behaviors. There's no reason to write arr.delete_at 0 when
you could =20
write arr.shift.

If you wanted to delete 5 elements from the beginning of an
array (a big one), you would think that your best bet would be
arr.slice!(0,5), right? For performance, 5 arr.shift is much
faster - 1 O(N) ops vs. 5 O(N).

There is no reason why slice! couldn't make the shift
optimization when dealing with elements at the beginning of the
array.

I also noticed that if you do bunch of shifts (which seem to be
O(1)), the same number of unshifts are also fast (O(1)). But,
when you get back to where you started, the unshifts become
O(N). It is unfortunate that the implementation doesn't try to
stretch the array to the left just like it does to the right to
make all unshifts O(1). And then of course all the equivalents
should make the same optimization.
=20
No, I don't notice it at all. Strings aren't Arrays and
Arrays =20
aren't Strings. They only look the same if you squint really
hard.

I'm writing some general classes that equally apply to Strings
and Arrays. Unfortunately, I can't take advantage of
Array#shift because String doesn't have it. I use the least
common denominators: []/slice, []=3D, slice!, <<, and
size/length. If they had more in common I'd like to use those.



=09
__________________________________=20
Yahoo! Mail=20
Stay connected, organized, and protected. Take the tour:=20
http://tour.mail.yahoo.com/mailtour.html=20
 
E

Eric Mahurin

--- Nikolai Weibull
Eric Mahurin wrote:
=20
Nikolai Weibull wrote: =20
Eric Mahurin wrote:
=20
[...]
=20
This sure would be nice for easy and high performance
implementations of circular and gap buffers. =20
Please do explain,
nikolai
=20
OK. I've been thinking about this stuff quite a bit while working on
my cursor package. Let's start with a gap buffer. The traditional
approach is to a have an array with the data before the cursor at the
beginning of the array and data after the cursor at the end of the
array. In the middle is the "gap" and could be gigabytes of virtual
address space if you have enough control over virtual memory (you
don't in Ruby). Something like this:
=20
A B C D E -------------- F G H I
0 1 2 3 4 ---the gap--- -4 -3 -2 -1
^ ^ ^ ^
begin before after end
=20
All single element operations (move cursor, read, write, and
especially insert/delete) are O(1) operations. Compare this to an
array/string where insert/delete are O(n).
=20
I meant, please explain how this applies to circular (?) and
gap
buffers, whatever you mean by "circular and gap buffers". I
know how a
gap buffer works (but perhaps other people on this list
don't, so your
explanation has hopefully not fallen on muted ears).
=20
The operations are not all O(1). Insert and delete are still
O(n). It
can, however, be reasoned that for normal use these
operations will
perform as if O(1). Doing random insertions/deletions will
require O(n)
time, though. The move-to operation (as I refer to it) can
be made to
always operate in O(1) time, if the gap is only moved when an
insert/delete is actually performed.

Sorry, I was referring to all single element operations done at
the cursor location - those are all O(1). To move the cursor
to a random location is O(n).
=20
No, it would be very strange to have both O(1) shift and push
on an array.

Well, in the current Array implementation, push/pop are
obviously O(1), but also shift is too (judging from the
benchmarks). I'd imagine a shift is done by just moving the
start of array pointer and not any data. unshifts are also
O(1) as long as you don't unshift more than you shifted (then
they become O(n)). If unshifts tried to stretch the memory
allocation to the left or realloc the array in a new space like
pushes did, you could make them always O(1) (discounting
ocassional allocation time). Then the above scheme would work
fine for both a gap buffer and easily a circular buffer.
A deque is a lot easier to implement using a linked
list, but
then you loose O(1) lookup.
=20
A possible solution is to use a "circular gap", i.e., a gap
that can
span both ends of the buffer as necessary, e.g.,
=20
+---------+--------------------------+---------+
| Gap | Buffer | Gap | .
+---------+--------------------------+---------+
=20
Allowing for such gaps in a buffer can be very beneficial for
certain
modification patterns for sure. Climacs has support for this
kind of
gap buffering strategy through the Flexichain package.

Thats what I was trying to describe below with the circular
buffers. It is like the conventional gap buffer, but the gap
can wraparound (can be in the middle or the ends).
=20
This is more or less typical of modifications of lists in,
for example,
Haskell, albeit I haven't seen it being used in a text
editor. The
problem with this solution is that you'll have to modify two
strings/arrays whenever you move the cursor, but
pushing/popping is
efficient, so it's quite a good solution (for small buffers).
=20
=20
Well, why not just use a circular list?

That will be one of the implementations in Cursor::Circular
eventually. Along with others.
People seem to be
forgetting
about linked lists lately. They do have their applications
you know
:).

My application started with a general parser. I wanted a good
cursor API for the character stream and the token stream. I've
gone overboard on this cursor stuff.
=20
It's nice to see that someone is taking an interest in this
kind of
stuff. String and Array are great for many applications, but
perform
terribly for certain kinds of behavioral patterns and are in
no way a
universal solution. I believe Ruby would benefit by having
more data
structures that would perform gracefully for concatenations,
e.g., ropes
from STL or something similar,

Thanks. This thing I'm doing is STL on a lot of steriods. I
think it more generally gives a full API into any sequential
data structure you can think of - array, string, IO, buffered
IO, linked lists (double and single linked), gap buffer, linked
lists of fixed size blocks, various circular buffers, a
reversal of an of these, a concatenation of any of these, etc.=20
A nice feature that is missing from many of these external
iterator API's that mine has is ability to save/restore the
position of a cursor (the thing holding the position is in fact
another cursor).



=09
____________________________________________________=20
Yahoo! Sports=20
Rekindle the Rivalries. Sign up for Fantasy Football=20
http://football.fantasysports.yahoo.com
 
D

Daniel Brockman

No, I don't notice it at all. Strings aren't Arrays and Arrays
aren't Strings. They only look the same if you squint really hard.

Which is exactly what Jason was complaining about.
You *shouldn't* have to squint to make them look alike.
 
Ad

Advertisements

E

Eric Hodel

Which is exactly what Jason was complaining about.
You *shouldn't* have to squint to make them look alike.

I have never needed them to look alike much beyond << and empty?.
The differences between the two are what I appreciate about the
differences. I want String to be good at being a String, and Array
to be good at being a collection. Interface unity takes back seat.
(If I have to work with individual characters, split '' has been more
than adequate.)

How have you needed them to look alike?

In any event, the gap is hardly insurmountable:

class String
def shift
return nil if empty?
chr = self[0].chr
replace self[1..-1]
return chr
end
end
 

Top