shift vs. slice!(0) and others

Discussion in 'Ruby' started by Eric Mahurin, Jun 29, 2005.

  1. Eric Mahurin

    Eric Mahurin Guest

    I just did some benchmarking of various ways to insert/delete
    elements off of either end of arrays and strings (100,000
    operations starting or ending with a 100,000 element
    array/string):

    user system total real
    ------
    push 0.063000 0.000000 0.063000 ( 0.062000)
    append 0.078000 0.000000 0.078000 ( 0.078000)
    assign1 0.093000 0.000000 0.093000 ( 0.094000)
    assign 1.063000 0.000000 1.063000 ( 1.060000)
    insert 0.953000 0.000000 0.953000 ( 1.013000)
    string append 0.078000 0.000000 0.078000 ( 0.078000)
    string assign0 0.469000 0.000000 0.469000 ( 0.468000)
    string insert 0.437000 0.000000 0.437000 ( 0.436000)
    ------
    unshift 8.438000 0.000000 8.438000 ( 8.587000)
    assign 9.312000 0.000000 9.312000 ( 9.475000)
    insert 9.282000 0.000000 9.282000 ( 9.507000)
    string assign 2.562000 0.000000 2.562000 ( 2.649000)
    string insert 2.468000 0.016000 2.484000 ( 2.571000)
    ------
    pop 0.047000 0.000000 0.047000 ( 0.047000)
    delete_at 0.062000 0.000000 0.062000 ( 0.063000)
    slice! 0.094000 0.000000 0.094000 ( 0.093000)
    assign 0.297000 0.000000 0.297000 ( 0.311000)
    string slice! 0.234000 0.000000 0.234000 ( 0.296000)
    string assign 0.219000 0.000000 0.219000 ( 0.218000)
    ------
    shift 0.062000 0.000000 0.062000 ( 0.062000)
    delete_at 44.313000 0.000000 44.313000 ( 45.260000)
    slice! 40.468000 0.000000 40.468000 ( 41.285000)
    assign 8.688000 0.000000 8.688000 ( 8.915000)
    string slice! 2.469000 0.000000 2.469000 ( 2.478000)
    string assign 2.391000 0.000000 2.391000 ( 2.479000)


    The one that really sticks out is shift vs. delete_at(0) and
    slice!(0). I see a 650X difference in performance. I'm
    guessing shift is O(1) and the others are O(n). I'm assuming
    that shift simply increments the start of the array pointer and
    frees memory at certain boundaries. If that is the case, it
    would be nice if the other forms did the same and unshift did
    the opposite (simply decremented the start of the array/string
    pointer). This sure would be nice for easy and high
    performance implementations of circular and gap buffers.


    FYI, I did this on v1.8.2 on a windows machine. Here is the
    benchmark code:

    require 'benchmark'
    n =3D 100000
    v =3D ?X
    a0 =3D [v]*n
    s0 =3D ("" << v)*n

    Benchmark.bmbm { |b|
    b.report("push" ) { a =3D []; n.times { |i|
    a.push(v) } }
    b.report("append" ) { a =3D []; n.times { |i| a << v }
    }
    b.report("assign1" ) { a =3D []; n.times { |i|
    a[a.size]=3Dv } }
    b.report("assign" ) { a =3D []; n.times { |i|
    a[a.size,0]=3D[v] } }
    b.report("insert" ) { a =3D []; n.times { |i|
    a.insert(-1,v) } }
    b.report("string append" ) { s =3D ""; n.times { |i| s << v }
    }
    b.report("string assign0") { s =3D ""; n.times { |i|
    s[s.size,0]=3D("" << v) } }
    b.report("string insert" ) { s =3D ""; n.times { |i|
    s.insert(-1,"" << v) } }

    b.report("unshift" ) { a =3D []; n.times { |i|
    a.unshift(v) } }
    b.report("assign" ) { a =3D []; n.times { |i|
    a[0,0]=3D[v] } }
    b.report("insert" ) { a =3D []; n.times { |i|
    a.insert(0,v) } }
    b.report("string assign") { s =3D ""; n.times { |i|
    s[0,0]=3D("" << v) } }
    b.report("string insert") { s =3D ""; n.times { |i|
    s.insert(0,"" << v) } }

    b.report("pop" ) { a =3D a0.dup; n.times { |i| a.pop
    } }
    b.report("delete_at" ) { a =3D a0.dup; n.times { |i|
    a.delete_at(-1) } }
    b.report("slice!" ) { a =3D a0.dup; n.times { |i|
    a.slice!(-1) } }
    b.report("assign" ) { a =3D a0.dup; n.times { |i| begin
    a[-1] ensure a[-1,1]=3D[] end } }
    b.report("string slice!") { s =3D s0.dup; n.times { |i|
    s.slice!(-1) } }
    b.report("string assign") { s =3D s0.dup; n.times { |i| begin
    s[-1] ensure s[-1,1]=3D"" end } }

    b.report("shift" ) { a =3D a0.dup; n.times { |i|
    a.shift } }
    b.report("delete_at" ) { a =3D a0.dup; n.times { |i|
    a.delete_at(0) } }
    b.report("slice!" ) { a =3D a0.dup; n.times { |i|
    a.slice!(0) } }
    b.report("assign" ) { a =3D a0.dup; n.times { |i| begin
    a[0] ensure a[0,1]=3D[] end } }
    b.report("string slice!") { s =3D s0.dup; n.times { |i|
    s.slice!(0) } }
    b.report("string assign") { s =3D s0.dup; n.times { |i| begin
    s[0] ensure s[0,1]=3D"" end } }
    }




    =09
    __________________________________=20
    Yahoo! Mail Mobile=20
    Take Yahoo! Mail with you! Check email on your mobile phone.=20
    http://mobile.yahoo.com/learn/mail=20
     
    Eric Mahurin, Jun 29, 2005
    #1
    1. Advertisements

  2. Eric Mahurin wrote:

    [...]
    Please do explain,
    nikolai
     
    Nikolai Weibull, Jun 29, 2005
    #2
    1. Advertisements

  3. "circular buffers" == deques, which are "circular" arrays--incrementing
    past the end takes you to the beginning, etc.

    But back to the original suggestion that delete_at() be implemented the
    same way as shift(). The former makes it clear that the index in
    question is being removed altogether, which will result in a
    down-shifting of the array elements. shift() neither implies nor
    requires that same constraint.
     
    jason r tibbetts, Jun 29, 2005
    #3
  4. Eric Mahurin

    Eric Mahurin Guest

    --- Nikolai Weibull
    OK. I've been thinking about this stuff quite a bit while
    working on my cursor package. Let's start with a gap buffer.=20
    The traditional approach is to a have an array with the data
    before the cursor at the beginning of the array and data after
    the cursor at the end of the array. In the middle is the "gap"
    and could be gigabytes of virtual address space if you have
    enough control over virtual memory (you don't in Ruby).=20
    Something like this:

    A B C D E -------------- F G H I
    0 1 2 3 4 ---the gap--- -4 -3 -2 -1
    ^ ^ ^ ^
    begin before after end

    All single element operations (move cursor, read, write, and
    especially insert/delete) are O(1) operations. Compare this to
    an array/string where insert/delete are O(n).

    Another way you could organize the data above would be like
    this:
    =20
    F G H I A B C D E
    0 1 2 3 4 5 6 7 8
    ^ ^ ^
    after=3D0 begin_end before

    In this case, the "gap" is what is outside the array. If
    single all single element operations on the ends of this array
    (push, pop, shift, unshift) were O(1), then all our single
    element operations at the cursor in this structure would also
    be O(1). The problem is shift/unshift usually aren't. But
    they could be by simply moving the start array pointer around
    (shift looks to be O(1)).

    Another approach I'm taking now is implementing this gap buffer
    by using 2 arrays/strings, where one represents what's before
    the cursor and one represents what's after the cursor:

    A B C D E
    0 1 2 3 4
    ^ ^
    begin before

    I H G F
    0 1 2 3
    ^ ^
    end after

    I store the "after" array/string in reverse so that all
    operations at the cursor occur at the ends of one or both of
    these arrays/strings.

    I haven't seen either of these approaches before. Has anybody
    else done them?

    For implementing a circular buffer, the first two
    implementations above can be easily adapted by removing the
    begin, end, or begin_end indices and allowing to wrap around or
    pass through them. Implementing a circular buffer using ideas
    from the last implementation above is a little trickier, but I
    think I have a solution.

    This is just a sampling of implementations that my next cursor
    release will provide. There are many more possibilities
    (linked-lists, hybrids, trees, etc) that I probably won't get
    to yet.



    =09
    __________________________________=20
    Yahoo! Mail=20
    Stay connected, organized, and protected. Take the tour:=20
    http://tour.mail.yahoo.com/mailtour.html=20
     
    Eric Mahurin, Jun 29, 2005
    #4
  5. Eric Mahurin

    Eric Mahurin Guest

    As far as I can tell, all of these should be equivalent:

    array.shift
    array.delete_at(0)
    array.slice!(0)

    Are you saying these don't have the same functionality? What
    do you mean when you say the index is "removed altogether"?

    OT... does anybody else hate it that the String and Array API's
    have suttle differences and lack methods of the other?



    =09
    ____________________________________________________=20
    Yahoo! Sports=20
    Rekindle the Rivalries. Sign up for Fantasy Football=20
    http://football.fantasysports.yahoo.com
     
    Eric Mahurin, Jun 29, 2005
    #5
  6. I meant, please explain how this applies to circular (?) and gap
    buffers, whatever you mean by "circular and gap buffers". I know how a
    gap buffer works (but perhaps other people on this list don't, so your
    explanation has hopefully not fallen on muted ears).

    The operations are not all O(1). Insert and delete are still O(n). It
    can, however, be reasoned that for normal use these operations will
    perform as if O(1). Doing random insertions/deletions will require O(n)
    time, though. The move-to operation (as I refer to it) can be made to
    always operate in O(1) time, if the gap is only moved when an
    insert/delete is actually performed.
    No, it would be very strange to have both O(1) shift and push on an
    array. A deque is a lot easier to implement using a linked list, but
    then you loose O(1) lookup.

    A possible solution is to use a "circular gap", i.e., a gap that can
    span both ends of the buffer as necessary, e.g.,

    +---------+--------------------------+---------+
    | Gap | Buffer | Gap | .
    +---------+--------------------------+---------+

    Allowing for such gaps in a buffer can be very beneficial for certain
    modification patterns for sure. Climacs has support for this kind of
    gap buffering strategy through the Flexichain package.
    This is more or less typical of modifications of lists in, for example,
    Haskell, albeit I haven't seen it being used in a text editor. The
    problem with this solution is that you'll have to modify two
    strings/arrays whenever you move the cursor, but pushing/popping is
    efficient, so it's quite a good solution (for small buffers).
    Well, why not just use a circular list? People seem to be forgetting
    about linked lists lately. They do have their applications you know
    :).
    It's nice to see that someone is taking an interest in this kind of
    stuff. String and Array are great for many applications, but perform
    terribly for certain kinds of behavioral patterns and are in no way a
    universal solution. I believe Ruby would benefit by having more data
    structures that would perform gracefully for concatenations, e.g., ropes
    from STL or something similar,
    nikolai
     
    Nikolai Weibull, Jun 30, 2005
    #6
  7. The "circular and gap buffers" construction was flawed. It should have
    been "circular and gapped buffers" or, even better, "circular buffers
    and gapped buffers".

    Deques are not necessarily implemented using arrays. A doubly linked
    list will do just as well (and doesn't have sizing issues).
    Why would #delete_at(0) behaving as #shift be weird?,
    nikolai
     
    Nikolai Weibull, Jun 30, 2005
    #7
  8. Eric Mahurin

    Eric Hodel Guest

    $ ri Array#shift
    ------------------------------------------------------------ Array#shift
    array.shift -> obj or nil
    ...
    $ ri Array#delete_at
    -------------------------------------------------------- Array#delete_at
    array.delete_at(index) -> obj or nil
    ...
    $ ri Array#slice!
    ----------------------------------------------------------- Array#slice!
    array.slice!(index) -> obj or nil
    array.slice!(start, length) -> sub_array or nil
    array.slice!(range) -> sub_array or nil
    ...

    They take different arguments, so must parse their arguments
    differently and behave differently based on that. I don't see any
    reason to complicate matters to special case for rarely-used
    behaviors. There's no reason to write arr.delete_at 0 when you could
    write arr.shift.

    (slice! index calls delete_at index on the inside.)
    delete_at n removes from anywhere in the Array, so things must be
    shifted around. This is not true for shift, you just move up the
    pointer. (Ruby Arrays are copy-on-write.)
    No, I don't notice it at all. Strings aren't Arrays and Arrays
    aren't Strings. They only look the same if you squint really hard.
     
    Eric Hodel, Jun 30, 2005
    #8
  9. Eric Mahurin

    Eric Mahurin Guest

    If you wanted to delete 5 elements from the beginning of an
    array (a big one), you would think that your best bet would be
    arr.slice!(0,5), right? For performance, 5 arr.shift is much
    faster - 1 O(N) ops vs. 5 O(N).

    There is no reason why slice! couldn't make the shift
    optimization when dealing with elements at the beginning of the
    array.

    I also noticed that if you do bunch of shifts (which seem to be
    O(1)), the same number of unshifts are also fast (O(1)). But,
    when you get back to where you started, the unshifts become
    O(N). It is unfortunate that the implementation doesn't try to
    stretch the array to the left just like it does to the right to
    make all unshifts O(1). And then of course all the equivalents
    should make the same optimization.
    I'm writing some general classes that equally apply to Strings
    and Arrays. Unfortunately, I can't take advantage of
    Array#shift because String doesn't have it. I use the least
    common denominators: []/slice, []=3D, slice!, <<, and
    size/length. If they had more in common I'd like to use those.



    =09
    __________________________________=20
    Yahoo! Mail=20
    Stay connected, organized, and protected. Take the tour:=20
    http://tour.mail.yahoo.com/mailtour.html=20
     
    Eric Mahurin, Jun 30, 2005
    #9
  10. Eric Mahurin

    Eric Mahurin Guest

    --- Nikolai Weibull
    Sorry, I was referring to all single element operations done at
    the cursor location - those are all O(1). To move the cursor
    to a random location is O(n).
    Well, in the current Array implementation, push/pop are
    obviously O(1), but also shift is too (judging from the
    benchmarks). I'd imagine a shift is done by just moving the
    start of array pointer and not any data. unshifts are also
    O(1) as long as you don't unshift more than you shifted (then
    they become O(n)). If unshifts tried to stretch the memory
    allocation to the left or realloc the array in a new space like
    pushes did, you could make them always O(1) (discounting
    ocassional allocation time). Then the above scheme would work
    fine for both a gap buffer and easily a circular buffer.
    Thats what I was trying to describe below with the circular
    buffers. It is like the conventional gap buffer, but the gap
    can wraparound (can be in the middle or the ends).
    That will be one of the implementations in Cursor::Circular
    eventually. Along with others.
    My application started with a general parser. I wanted a good
    cursor API for the character stream and the token stream. I've
    gone overboard on this cursor stuff.
    Thanks. This thing I'm doing is STL on a lot of steriods. I
    think it more generally gives a full API into any sequential
    data structure you can think of - array, string, IO, buffered
    IO, linked lists (double and single linked), gap buffer, linked
    lists of fixed size blocks, various circular buffers, a
    reversal of an of these, a concatenation of any of these, etc.=20
    A nice feature that is missing from many of these external
    iterator API's that mine has is ability to save/restore the
    position of a cursor (the thing holding the position is in fact
    another cursor).



    =09
    ____________________________________________________=20
    Yahoo! Sports=20
    Rekindle the Rivalries. Sign up for Fantasy Football=20
    http://football.fantasysports.yahoo.com
     
    Eric Mahurin, Jun 30, 2005
    #10
  11. Which is exactly what Jason was complaining about.
    You *shouldn't* have to squint to make them look alike.
     
    Daniel Brockman, Jul 1, 2005
    #11
  12. Eric Mahurin

    Eric Hodel Guest

    I have never needed them to look alike much beyond << and empty?.
    The differences between the two are what I appreciate about the
    differences. I want String to be good at being a String, and Array
    to be good at being a collection. Interface unity takes back seat.
    (If I have to work with individual characters, split '' has been more
    than adequate.)

    How have you needed them to look alike?

    In any event, the gap is hardly insurmountable:

    class String
    def shift
    return nil if empty?
    chr = self[0].chr
    replace self[1..-1]
    return chr
    end
    end
     
    Eric Hodel, Jul 1, 2005
    #12
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.