Benchmark segfault [Was: Array#inject to create a hash versus Hash[*array.collect{}.flatten] ]

Discussion in 'Ruby' started by Michal Suchanek, Jun 12, 2007.

  1. On 11/06/07, Robert Klemme <> wrote:
    > On 11.06.2007 07:41, Anthony Martinez wrote:
    > > So, I was tinkering with ways to build a hash out of transforming an
    > > array, knowing the standard/idiomatic
    > >
    > > id_list = [:test1, :test2, :test3]
    > > id_list.inject({}) { |a,e| a[e]=e.object_id ; a }
    > >
    > > I also decided to try something like this:
    > >
    > > Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten]
    > >
    > > and further (attempt to) optimize it via
    > > Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten!]
    > > and
    > > Hash[ *id_list.collect! { |e| [e,e.object_id]}.flatten!]
    > >
    > > Running this via Benchmark#bmbm gives pretty interesting, and to me,
    > > unexpected, results (on a 3.2 GHz P4, 1GB of RAM, FC5 with ruby 1.8.4)
    > >
    > > require 'benchmark'
    > > id_list = (1..1_000_000).to_a
    > > Benchmark::bmbm do |x|
    > > x.report("inject") { id_list.inject({}) { |a,e| a[e] = e.object_id ; a} }
    > > x.report("non-bang") { Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten] }
    > > x.report("bang") { Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten!] }
    > > x.report("two-bang") { Hash[ *id_list.collect! { |e| [e,e.object_id]}.flatten!] }
    > > end
    > >
    > > Rehearsal --------------------------------------------
    > > inject 16.083333 0.033333 16.116667 ( 9.670747)
    > > non-bang 1657.050000 1.800000 1658.850000 (995.425642)
    > > bang 1593.716667 0.016667 1593.733333 (956.334565)
    > > two-bang 1604.816667 1.350000 1606.166667 (963.803356)
    > > -------------------------------- total: 4874.866667sec
    > >
    > > user system total real
    > > inject 5.183333 0.000000 5.183333 ( 3.102379)
    > > non-bang zsh: segmentation fault ruby
    > >
    > > Ow?
    > >
    > > Also, I just thought of a similar way to accomplish the same thing:
    > >
    > > x.report("zip") { Hash[ *id_list.zip(id_list.collect {|e| e.object_id})] }
    > >
    > > Array#collect! won't work right with this, of course, but it seems to
    > > have equally-bad performance. Is Array#inject just optimized for this,
    > > or something?

    >
    > The reason why you are seeing this (performance as well as timing) is
    > most likely caused by the different approach. When you use #inject you
    > just create one copy of the Array (the Hash). When you use #collect you
    > create at least one additional copy of the large array plus a ton of two
    > element arrays. That's way less efficient considering memory usage and
    > GC. You'll probably see much different results if your input array is
    > much shorter (try with 10 or 100 elements).
    >


    I also get a segfault (with 1.8.5 on Gentoo):

    ruby -w test.rb
    Rehearsal --------------------------------------------
    inject 7.210000 0.870000 8.080000 ( 13.011414)
    non-bang 4620.790000 179.730000 4800.520000 (6029.976497)
    bang 4586.140000 200.530000 4786.670000 (5970.267190)
    two-bang 4599.560000 268.080000 4867.640000 (6035.687979)
    ------------------------------- total: 14462.910000sec

    user system total real
    inject 10.740000 1.990000 12.730000 ( 15.500874)
    non-bang Segmentation fault
    hramrach@hp-tc2110:11(0) 06120053 16]~ $ ruby -v
    ruby 1.8.5 (2006-12-04 patchlevel 2) [i686-linux]

    Of course, it would be better to test with 1.8.6 but it takes quite
    some time to retest ;-)

    Thanks

    Michal
    Michal Suchanek, Jun 12, 2007
    #1
    1. Advertising

  2. Re: Benchmark segfault [Was: Array#inject to create a hash versusHash[*array.collect{}.flatten] ]

    On 12.06.2007 11:52, Michal Suchanek wrote:
    > On 11/06/07, Robert Klemme <> wrote:
    >> On 11.06.2007 07:41, Anthony Martinez wrote:
    >> > So, I was tinkering with ways to build a hash out of transforming an
    >> > array, knowing the standard/idiomatic
    >> >
    >> > id_list = [:test1, :test2, :test3]
    >> > id_list.inject({}) { |a,e| a[e]=e.object_id ; a }
    >> >
    >> > I also decided to try something like this:
    >> >
    >> > Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten]
    >> >
    >> > and further (attempt to) optimize it via
    >> > Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten!]
    >> > and
    >> > Hash[ *id_list.collect! { |e| [e,e.object_id]}.flatten!]
    >> >
    >> > Running this via Benchmark#bmbm gives pretty interesting, and to me,
    >> > unexpected, results (on a 3.2 GHz P4, 1GB of RAM, FC5 with ruby 1.8.4)
    >> >
    >> > require 'benchmark'
    >> > id_list = (1..1_000_000).to_a
    >> > Benchmark::bmbm do |x|
    >> > x.report("inject") { id_list.inject({}) { |a,e| a[e] =

    >> e.object_id ; a} }
    >> > x.report("non-bang") { Hash[ *id_list.collect { |e|

    >> [e,e.object_id]}.flatten] }
    >> > x.report("bang") { Hash[ *id_list.collect { |e|

    >> [e,e.object_id]}.flatten!] }
    >> > x.report("two-bang") { Hash[ *id_list.collect! { |e|

    >> [e,e.object_id]}.flatten!] }
    >> > end
    >> >
    >> > Rehearsal --------------------------------------------
    >> > inject 16.083333 0.033333 16.116667 ( 9.670747)
    >> > non-bang 1657.050000 1.800000 1658.850000 (995.425642)
    >> > bang 1593.716667 0.016667 1593.733333 (956.334565)
    >> > two-bang 1604.816667 1.350000 1606.166667 (963.803356)
    >> > -------------------------------- total: 4874.866667sec
    >> >
    >> > user system total real
    >> > inject 5.183333 0.000000 5.183333 ( 3.102379)
    >> > non-bang zsh: segmentation fault ruby
    >> >
    >> > Ow?
    >> >
    >> > Also, I just thought of a similar way to accomplish the same thing:
    >> >
    >> > x.report("zip") { Hash[ *id_list.zip(id_list.collect {|e|

    >> e.object_id})] }
    >> >
    >> > Array#collect! won't work right with this, of course, but it seems to
    >> > have equally-bad performance. Is Array#inject just optimized for this,
    >> > or something?

    >>
    >> The reason why you are seeing this (performance as well as timing) is
    >> most likely caused by the different approach. When you use #inject you
    >> just create one copy of the Array (the Hash). When you use #collect you
    >> create at least one additional copy of the large array plus a ton of two
    >> element arrays. That's way less efficient considering memory usage and
    >> GC. You'll probably see much different results if your input array is
    >> much shorter (try with 10 or 100 elements).
    >>

    >
    > I also get a segfault (with 1.8.5 on Gentoo):
    >
    > ruby -w test.rb
    > Rehearsal --------------------------------------------
    > inject 7.210000 0.870000 8.080000 ( 13.011414)
    > non-bang 4620.790000 179.730000 4800.520000 (6029.976497)
    > bang 4586.140000 200.530000 4786.670000 (5970.267190)
    > two-bang 4599.560000 268.080000 4867.640000 (6035.687979)
    > ------------------------------- total: 14462.910000sec
    >
    > user system total real
    > inject 10.740000 1.990000 12.730000 ( 15.500874)
    > non-bang Segmentation fault
    > hramrach@hp-tc2110:11(0) 06120053 16]~ $ ruby -v
    > ruby 1.8.5 (2006-12-04 patchlevel 2) [i686-linux]
    >
    > Of course, it would be better to test with 1.8.6 but it takes quite
    > some time to retest ;-)
    >
    > Thanks
    >
    > Michal


    I believe it's the star operator:

    13:10:22 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 1000000
    13:10:35 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 10000000
    -e:1: [BUG] Segmentation fault
    ruby 1.8.6 (2007-03-13) [i386-cygwin]

    Aborted (core dumped)
    13:10:39 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
    1000000
    13:11:09 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
    10000000
    -e:1: [BUG] Segmentation fault
    ruby 1.8.6 (2007-03-13) [i386-cygwin]

    Aborted (core dumped)
    13:11:14 [Temp]: uname -a
    CYGWIN_NT-5.2-WOW64 PDBXPWSRK38 1.5.24(0.156/4/2) 2007-01-31 10:57 i686
    Cygwin
    13:11:36 [Temp]: ruby --version
    ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-cygwin]

    Kind regards

    robert
    Robert Klemme, Jun 12, 2007
    #2
    1. Advertising

  3. Re: Benchmark segfault [Was: Array#inject to create a hash versusHash[*array.collect{}.flatten] ]

    Robert Klemme wrote:
    > I believe it's the star operator:
    >
    > 13:10:22 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 1000000
    > 13:10:35 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 10000000
    > -e:1: [BUG] Segmentation fault
    > ruby 1.8.6 (2007-03-13) [i386-cygwin]
    >
    > Aborted (core dumped)
    > 13:10:39 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
    > 1000000
    > 13:11:09 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
    > 10000000
    > -e:1: [BUG] Segmentation fault
    > ruby 1.8.6 (2007-03-13) [i386-cygwin]
    >
    > Aborted (core dumped)
    > 13:11:14 [Temp]: uname -a
    > CYGWIN_NT-5.2-WOW64 PDBXPWSRK38 1.5.24(0.156/4/2) 2007-01-31 10:57 i686
    > Cygwin
    > 13:11:36 [Temp]: ruby --version
    > ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-cygwin]


    The reason for this problem is, that the stack is used to pass A LOT of
    arguments to the Hash.[] method. If the stack size of the executed process isn't
    big enough (check your resource limits), this can lead to crashes.

    --
    Florian Frank
    Florian Frank, Jun 12, 2007
    #3
  4. Re: Benchmark segfault [Was: Array#inject to create a hash versusHash[*array.collect{}.flatten] ]

    On 12.06.2007 16:16, Florian Frank wrote:
    > Robert Klemme wrote:
    >> I believe it's the star operator:
    >>
    >> 13:10:22 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 1000000
    >> 13:10:35 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 10000000
    >> -e:1: [BUG] Segmentation fault
    >> ruby 1.8.6 (2007-03-13) [i386-cygwin]
    >>
    >> Aborted (core dumped)
    >> 13:10:39 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
    >> 1000000
    >> 13:11:09 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
    >> 10000000
    >> -e:1: [BUG] Segmentation fault
    >> ruby 1.8.6 (2007-03-13) [i386-cygwin]
    >>
    >> Aborted (core dumped)
    >> 13:11:14 [Temp]: uname -a
    >> CYGWIN_NT-5.2-WOW64 PDBXPWSRK38 1.5.24(0.156/4/2) 2007-01-31 10:57 i686
    >> Cygwin
    >> 13:11:36 [Temp]: ruby --version
    >> ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-cygwin]

    >
    > The reason for this problem is, that the stack is used to pass A LOT of
    > arguments to the Hash.[] method. If the stack size of the executed process isn't
    > big enough (check your resource limits), this can lead to crashes.


    That was what I was guessing. However, I haven't enough insight to
    verify. There is at least one point that causes me doubt: the array can
    be handled and stored like any other array, so it should be on the heap
    (or it is implicitly moved there).

    Kind regards

    robert
    Robert Klemme, Jun 12, 2007
    #4
  5. On Tue, Jun 12, 2007 at 11:25:04PM +0900, Robert Klemme wrote:
    > On 12.06.2007 16:16, Florian Frank wrote:
    > >Robert Klemme wrote:
    > >>I believe it's the star operator:
    > >>
    > >>13:10:22 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 1000000
    > >>13:10:35 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 10000000
    > >>-e:1: [BUG] Segmentation fault
    > >>ruby 1.8.6 (2007-03-13) [i386-cygwin]

    [...]
    > >The reason for this problem is, that the stack is used to pass A LOT of
    > >arguments to the Hash.[] method. If the stack size of the executed process
    > >isn't
    > >big enough (check your resource limits), this can lead to crashes.

    >
    > That was what I was guessing. However, I haven't enough insight to
    > verify. There is at least one point that causes me doubt: the array can
    > be handled and stored like any other array, so it should be on the heap
    > (or it is implicitly moved there).


    $ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 2091000
    -e:1:in `[]': stack level too deep (SystemStackError)
    from -e:1
    $ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 2092000
    Segmentation fault
    $ ulimit -s
    8192
    $ ulimit -s 16384
    $ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 2092000
    $ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 4184000
    -e:1:in `[]': stack level too deep (SystemStackError)
    from -e:1
    $ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 4190000
    Segmentation fault

    --
    Mauricio Fernandez - http://eigenclass.org - singular Ruby
    Mauricio Fernandez, Jun 12, 2007
    #5
  6. Re: Benchmark segfault [Was: Array#inject to create a hash versusHash[*array.collect{}.flatten] ]

    Robert Klemme wrote:
    > That was what I was guessing. However, I haven't enough insight to
    > verify. There is at least one point that causes me doubt: the array can
    > be handled and stored like any other array, so it should be on the heap
    > (or it is implicitly moved there).


    Yes, the implementation of this is in eval.c's rb_call0, after that the
    arguments are passed along via the argc/argv function arguments. Ruby also tries
    to throw a SystemStackError in rb_call0 like Mauricio has shown, but this isn't
    guaranteed to work in all cases.

    --
    Florian Frank
    Florian Frank, Jun 12, 2007
    #6
  7. Re: Benchmark segfault [Was: Array#inject to create a hash versus Hash[*array.collect{}.flatten] ]

    Hi,

    At Wed, 13 Jun 2007 00:50:30 +0900,
    Florian Frank wrote in [ruby-talk:255348]:
    > Yes, the implementation of this is in eval.c's rb_call0, after that the
    > arguments are passed along via the argc/argv function arguments. Ruby also tries
    > to throw a SystemStackError in rb_call0 like Mauricio has shown, but this isn't
    > guaranteed to work in all cases.


    Actually, the definition of TMP_ALLOC. Try replacing #ifdef
    C_ALLOCA with #if 1.

    --
    Nobu Nakada
    Nobuyoshi Nakada, Jun 13, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Francis Avila
    Replies:
    0
    Views:
    452
    Francis Avila
    Nov 2, 2003
  2. Replies:
    2
    Views:
    377
    Steve Pope
    Sep 27, 2006
  3. Peña, Botp

    inject does not inject last value

    Peña, Botp, Aug 7, 2006, in forum: Ruby
    Replies:
    4
    Views:
    169
    Peña, Botp
    Aug 7, 2006
  4. Anthony Martinez
    Replies:
    4
    Views:
    258
    Robert Klemme
    Jun 11, 2007
  5. Paul Butcher
    Replies:
    12
    Views:
    687
    Gary Wright
    Nov 28, 2007
Loading...

Share This Page