String += vs <<

Discussion in 'Ruby' started by Joshua Ball, Jun 17, 2009.

  1. Joshua Ball

    Joshua Ball Guest

    [Note: parts of this message were removed to make it a legal post.]

    A friend recently sent me this article:
    http://blog.metasploit.com/2009/03/blog-post.html

    In particular, note the perf difference of += vs << :

    framework3 $ time ruby -e 'a = "A"; 100000.times { a << "A" }'
    >
    > real 0m*0.338s*
    > user 0m*0.312s*
    > sys 0m0.024s
    >
    > framework3 $ time ruby -e 'a = "A"; 100000.times { a += "A" }'
    >
    > real 0m*15.462s*
    > user 0m*15.321s*
    > sys 0m0.068s



    Also note:

    *Before you run off and change every instance of += to << in your ruby code*,
    > it's important to note that the two don't perform the same operation.
    > Because ruby does assignment by reference, the latter overwrites any
    > variables that point to the one you're operating on while the former leaves
    > any references untouched.
    >
    >
    > framework3 $ irb
    > >> a = "A"

    > => "A"
    > >> b = a

    > => "A"
    > >> a << "B"

    > => "AB"
    > >> b

    > => "AB"


    >> c = "C"

    > => "C"
    > >> d = c

    > => "C"
    > >> c += "D"

    > => "CD"
    > >> d

    > => "C"
    >




    Thought I would pass it along...
     
    Joshua Ball, Jun 17, 2009
    #1
    1. Advertising

  2. Joshua Ball

    pat eyler Guest

    that's a nice article about some real-world benchmarking. I wish
    more people did things like this.

    If you'd like a short tutorial, you can look here:
    http://on-ruby.blogspot.com/2008/12/benchmarking-makes-it-better.html

    On Wed, Jun 17, 2009 at 11:06 AM, Joshua Ball<> wrote:
    > A friend recently sent me this article:
    > http://blog.metasploit.com/2009/03/blog-post.html
    >
    > In particular, note the perf difference of += vs << :
    >
    > framework3 $ time ruby -e 'a = "A"; 100000.times { a << "A" }'
    >>
    >> real 0m*0.338s*
    >> user 0m*0.312s*
    >> sys 0m0.024s
    >>
    >> framework3 $ time ruby -e 'a = "A"; 100000.times { a += "A" }'
    >>
    >> real 0m*15.462s*
    >> user 0m*15.321s*
    >> sys 0m0.068s

    >
    >
    > Also note:
    >
    > *Before you run off and change every instance of += to << in your ruby code*,
    >> it's important to note that the two don't perform the same operation.
    >> Because ruby does assignment by reference, the latter overwrites any
    >> variables that point to the one you're operating on while the former leaves
    >> any references untouched.
    >>
    >>
    >> framework3 $ irb
    >> >> a = "A"

    >> => "A"
    >> >> b = a

    >> => "A"
    >> >> a << "B"

    >> => "AB"
    >> >> b

    >> => "AB"

    >
    >>> c = "C"

    >> => "C"
    >> >> d = c

    >> => "C"
    >> >> c += "D"

    >> => "CD"
    >> >> d

    >> => "C"
    >>

    >
    >
    >
    > Thought I would pass it along...
    >




    --
    thanks,
    -pate
    -------------------------
    Don't judge those who choose to sin differently than you do

    http://on-ruby.blogspot.com
    http://eldersjournal.blogspot.com
     
    pat eyler, Jun 17, 2009
    #2
    1. Advertising

  3. Joshua Ball

    Ftf 3k3 Guest

    Ftf 3k3, Jun 17, 2009
    #3
  4. Joshua Ball

    Robert Dober Guest

    On Wed, Jun 17, 2009 at 7:29 PM, pat eyler<> wrote:
    > that's a nice article about some real-world benchmarking. =A0I wish
    > more people did things like this.

    If you search the archives you might find a certain Robert preaching,
    never to use a +=3D b when sequences were concerned. Do I feel clever
    now? No rather stupid.

    Appologies for the lengthy code snippets.

    Although I fully acknowledge the value of the post and that it might
    be a life saver I would like to add that I pretty much have the
    feeling that immutable is preferable over mutable.
    And it seems that modern VMs (jruby, 1.9, ???) kind of are written
    for that programming style. I am also aware that they make micro
    benchmarks like the following even less meaningless, but please
    consider it just as a Whack On The Head (nonviolently of course).

    ---------------------------------------------------------
    512/19 > cat strings.rb

    N =3D 10_000
    b =3D "Wassitmean"
    require 'benchmark'
    Benchmark.bmbm do | bench |
    a =3D "Ruby Rules Re Rowld"
    bench.report "+=3D" do
    N.times do
    a +=3D b
    end
    end
    a =3D "Ruby Rules Re Rowld"
    bench.report "<<" do
    N.times do
    a +=3D b
    end
    end
    end

    513/20 > jruby -v strings.rb
    jruby 1.3.0 (ruby 1.8.6p287) (2009-06-06 6586) (OpenJDK Client VM
    1.6.0_0) [i386-java]
    Rehearsal --------------------------------------
    +=3D 1.256000 0.000000 1.256000 ( 1.191000)
    << 9.384000 0.000000 9.384000 ( 9.384000)
    ---------------------------- total: 10.640000sec

    user system total real
    +=3D 23.397000 0.000000 23.397000 ( 23.397000)
    << 52.953000 0.000000 52.953000 ( 52.953000)

    ruby 1.9.1p129 (2009-05-12 revision 23412) [i686-linux]
    Rehearsal --------------------------------------
    +=3D 0.360000 0.020000 0.380000 ( 0.406038)
    << 1.040000 0.130000 1.170000 ( 1.209839)
    ----------------------------- total: 1.550000sec

    user system total real
    +=3D 1.770000 0.230000 2.000000 ( 2.056577)
    << 2.410000 0.240000 2.650000 ( 3.456429)


    I believe that I hit the GC in JRuby with the default settings and the
    above might be an indication how performing
    the short time object allocation is nowadays. Ruby1.9 has enough
    memory on my machine to be that fast but still +=3D is faster than <<.

    Cheers
    Robert


    --=20
    Toutes les grandes personnes ont d=92abord =E9t=E9 des enfants, mais peu
    d=92entre elles s=92en souviennent.

    All adults have been children first, but not many remember.

    [Antoine de Saint-Exup=E9ry]
     
    Robert Dober, Jun 18, 2009
    #4
  5. Joshua Ball

    Robert Dober Guest

    On Thu, Jun 18, 2009 at 10:07 AM, Robert Dober<> wrot=
    e:

    Very interesting benchmarks indeed ARRRGH
    Interesting how you can make happen what you want to happen, here are
    the correct results
    516/23 > ruby -v strings.rb
    ruby 1.9.1p129 (2009-05-12 revision 23412) [i686-linux]
    Rehearsal --------------------------------------
    +=3D 0.370000 0.010000 0.380000 ( 0.459725)
    << 0.000000 0.000000 0.000000 ( 0.002819)
    ----------------------------- total: 0.380000sec

    user system total real
    +=3D 1.800000 0.230000 2.030000 ( 2.145655)
    << 0.010000 0.000000 0.010000 ( 0.003220)

    518/25 > jruby -v strings.rb
    jruby 1.3.0 (ruby 1.8.6p287) (2009-06-06 6586) (OpenJDK Client VM
    1.6.0_0) [i386-java]
    Rehearsal --------------------------------------
    +=3D 1.350000 0.000000 1.350000 ( 1.283000)
    << 0.023000 0.000000 0.023000 ( 0.023000)
    ----------------------------- total: 1.373000sec

    user system total real
    +=3D 25.738000 0.000000 25.738000 ( 25.739000)
    << 0.004000 0.000000 0.004000 ( 0.004000)

    No happy surprises here, and BTW if you are bored step by reading my posts =
    :(

    Apologies
    Robert

    --=20
    Toutes les grandes personnes ont d=92abord =E9t=E9 des enfants, mais peu
    d=92entre elles s=92en souviennent.

    All adults have been children first, but not many remember.

    [Antoine de Saint-Exup=E9ry]
     
    Robert Dober, Jun 18, 2009
    #5
  6. Joshua Ball

    Marc Heiler Guest

    > Ruby1.9 has enough memory on my machine to be that fast
    > but still += is faster than <<.


    How should that be possible when += creates a new object
    whereas << does not?
    --
    Posted via http://www.ruby-forum.com/.
     
    Marc Heiler, Jun 18, 2009
    #6
  7. Joshua Ball

    Robert Dober Guest

    On Thu, Jun 18, 2009 at 10:52 AM, Marc Heiler<> wrote=
    :
    >> Ruby1.9 has enough memory on my machine to be that fast
    >> but still +=3D is faster than <<.

    >
    > How should that be possible when +=3D creates a new object
    > whereas << does not?

    Sorry please see my post above, I completely got lost.
    But be careful, it could be possible indeed, object allocation in the
    short living object pool could be way cheaper than copying into the
    long living object pool. But my benchmark was ridiculous I should have
    spotted the mistake.

    Robert
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >




    --=20
    Toutes les grandes personnes ont d=92abord =E9t=E9 des enfants, mais peu
    d=92entre elles s=92en souviennent.

    All adults have been children first, but not many remember.

    [Antoine de Saint-Exup=E9ry]
     
    Robert Dober, Jun 18, 2009
    #7
  8. Joshua Ball

    Marc Heiler Guest

    > But be careful, it could be possible indeed, object allocation in the
    > short living object pool could be way cheaper than copying into the
    > long living object pool.
    > But my benchmark was ridiculous I should have spotted the mistake.


    But this statement is in conflict with what the pickaxe said years
    ago about this - as far as I do remember it was said that << is
    faster than +=

    Now you say that this is perhaps not the case.

    All I would like to to know is, if this is the case here:

    Is using += faster than << for string objects?

    And if it is not, how is your statement about "short living
    objects" vs "long living objects" meant to be understood?

    Perhaps I am a bit slow, but reading the posts above I got
    the impression that you claimed that += is faster than <<
    --
    Posted via http://www.ruby-forum.com/.
     
    Marc Heiler, Jun 18, 2009
    #8
  9. Joshua Ball

    Robert Dober Guest

    On Thu, Jun 18, 2009 at 7:06 PM, Marc Heiler<> wrote:
    Sorry for the confusion

    << is much faster than += in JRuby and YARV

    All the rest was speculation which was not worth the bandwith I have
    wasted, really bad, sorry.

    I will stop speculating about what wonders generational GC might come
    up with some day, because I am kind of confusing lots of folks, myself
    being my first victim :(.

    Cheers
    Robert
     
    Robert Dober, Jun 18, 2009
    #9
  10. Joshua Ball

    Todd Benson Guest

    On 6/18/09, Robert Dober <> wrote:
    > On Thu, Jun 18, 2009 at 7:06 PM, Marc Heiler<> wrote:
    > Sorry for the confusion
    >
    > << is much faster than += in JRuby and YARV
    >
    > All the rest was speculation which was not worth the bandwith I have
    > wasted, really bad, sorry.
    >
    > I will stop speculating about what wonders generational GC might come
    > up with some day, because I am kind of confusing lots of folks, myself
    > being my first victim :(.
    >
    > Cheers
    >
    > Robert


    One should only apologize if one did something _truly_ wrong :)

    I have a problem with current benchmarks, because I think the data
    store of all the factors are limited. For example, if I get 500
    different identical CPU's with one tiny difference... they have
    different firmware, or different network/graphic cards, etc.; even if
    you build with the same options, well -- point being, the article pat
    demonstrated us is enlightening, but not all-encompassing.

    I tend to think many people forget that little last bit.

    Todd
     
    Todd Benson, Jun 18, 2009
    #10
  11. Classical Copy&Paste bug ;-)
    Lines 8 and 14 both read "a += b" in strings.rb.
    However line 14 should read "a << b".


    --
    Alexandre



    On Jun 18, 5:08 am, Robert Dober <> wrote:
    > On Wed, Jun 17, 2009 at 7:29 PM, pat eyler<> wrote:
    > > that's a nice article about some real-world benchmarking.  I wish
    > > more people did things like this.

    >
    > If you search the archives you might find a certain Robert preaching,
    > never to use a += b when sequences were concerned. Do I feel clever
    > now? No rather stupid.
    >
    > Appologies for the lengthy code snippets.
    >
    > Although I fully acknowledge the value of the post and that it might
    > be a life saver I would like to add that I pretty much have the
    > feeling that immutable is preferable over mutable.
    > And it seems that modern VMs (jruby, 1.9, ???)  kind of are written
    > for that programming style. I am also aware that they make micro
    > benchmarks like the following even less meaningless, but please
    > consider it just as a Whack On The Head (nonviolently of course).
    >
    > ---------------------------------------------------------
    > 512/19 > cat strings.rb
    >
    > N = 10_000
    > b = "Wassitmean"
    > require 'benchmark'
    > Benchmark.bmbm do | bench |
    >   a = "Ruby Rules Re Rowld"
    >   bench.report "+=" do
    >     N.times do
    >       a += b
    >     end
    >   end
    >   a = "Ruby Rules Re Rowld"
    >   bench.report "<<" do
    >     N.times do
    >       a += b
    >     end
    >   end
    > end
    >
    > 513/20 > jruby -v strings.rb
    > jruby 1.3.0 (ruby 1.8.6p287) (2009-06-06 6586) (OpenJDK Client VM
    > 1.6.0_0) [i386-java]
    > Rehearsal --------------------------------------
    > +=   1.256000   0.000000   1.256000 (  1.191000)
    > <<   9.384000   0.000000   9.384000 (  9.384000)
    > ---------------------------- total: 10.640000sec
    >
    >          user     system      total        real
    > +=  23.397000   0.000000  23.397000 ( 23.397000)
    > <<  52.953000   0.000000  52.953000 ( 52.953000)
    >
    > ruby 1.9.1p129 (2009-05-12 revision 23412) [i686-linux]
    > Rehearsal --------------------------------------
    > +=   0.360000   0.020000   0.380000 (  0.406038)
    > <<   1.040000   0.130000   1.170000 (  1.209839)
    > ----------------------------- total: 1.550000sec
    >
    >          user     system      total        real
    > +=   1.770000   0.230000   2.000000 (  2.056577)
    > <<   2.410000   0.240000   2.650000 (  3.456429)
    >
    > I believe that I hit the GC in JRuby with the default settings and the
    > above might be an indication how performing
    > the short time object allocation is nowadays. Ruby1.9 has enough
    > memory on my machine to be that fast but still += is faster than <<.
    >
    > Cheers
    > Robert
    >
    > --
    > Toutes les grandes personnes ont d’abord été des enfants, mais peu
    > d’entre elles s’en souviennent.
    >
    > All adults have been children first, but not many remember.
    >
    > [Antoine de Saint-Exupéry]
     
    Alexandre Hausen, Jun 19, 2009
    #11
  12. On Thu, Jun 18, 2009 at 3:08 AM, Robert Dober<> wrote=
    :
    > ---------------------------------------------------------
    > 512/19 > cat strings.rb
    >
    > N =3D 10_000
    > b =3D "Wassitmean"
    > require 'benchmark'
    > Benchmark.bmbm do | bench |
    > =C2=A0a =3D "Ruby Rules Re Rowld"
    > =C2=A0bench.report "+=3D" do
    > =C2=A0 =C2=A0N.times do
    > =C2=A0 =C2=A0 =C2=A0a +=3D b
    > =C2=A0 =C2=A0end
    > =C2=A0end
    > =C2=A0a =3D "Ruby Rules Re Rowld"
    > =C2=A0bench.report "<<" do
    > =C2=A0 =C2=A0N.times do
    > =C2=A0 =C2=A0 =C2=A0a +=3D b
    > =C2=A0 =C2=A0end
    > =C2=A0end
    > end


    Someone else noted the +=3D in the << section, but there's another
    issue: the "a" string is initialized only *once* for both rehearsal
    and actual runs, since the body of the bmbm block is only executed
    once to prepare the reports. If you modify it to put the a
    initialization into the report blocks, it behaves more like you'd
    expect. Here's a run with JRuby, with the bmbm above, "a" init fix,
    "<<" fix, and 5 iterations (only last iteration shown):

    Rehearsal --------------------------------------
    +=3D 0.343000 0.000000 0.343000 ( 0.343000)
    << 0.001000 0.000000 0.001000 ( 0.001000)
    ----------------------------- total: 0.344000sec

    user system total real
    +=3D 0.343000 0.000000 0.343000 ( 0.343000)
    << 0.001000 0.000000 0.001000 ( 0.001000)

    Here's JRuby all interpreted (no JIT compilation to bytecode):

    Rehearsal --------------------------------------
    +=3D 0.345000 0.000000 0.345000 ( 0.345000)
    << 0.002000 0.000000 0.002000 ( 0.002000)
    ----------------------------- total: 0.347000sec

    user system total real
    +=3D 0.356000 0.000000 0.356000 ( 0.356000)
    << 0.002000 0.000000 0.002000 ( 0.002000)

    The numbers are basically the same because this bench is almost
    completely limited by object allocation/GC and to a lesser extent
    String performance for the two operations. But obviously << is faster
    because it's growing the backing buffer for a single String rather
    than creating a new one each time and copying the contents of the
    previous string.

    Here's the same in Ruby 1.9:

    Rehearsal --------------------------------------
    +=3D 0.260000 0.510000 0.770000 ( 0.766618)
    << 0.000000 0.000000 0.000000 ( 0.002294)
    ----------------------------- total: 0.770000sec

    user system total real
    +=3D 0.250000 0.510000 0.760000 ( 0.771757)
    << 0.000000 0.000000 0.000000 ( 0.002235)

    This was JRuby 1.4.0dev on current Apple Java 6.

    > 513/20 > jruby -v strings.rb
    > jruby 1.3.0 (ruby 1.8.6p287) (2009-06-06 6586) (OpenJDK Client VM
    > 1.6.0_0) [i386-java]
    > Rehearsal --------------------------------------
    > +=3D =C2=A0 1.256000 =C2=A0 0.000000 =C2=A0 1.256000 ( =C2=A01.191000)
    > << =C2=A0 9.384000 =C2=A0 0.000000 =C2=A0 9.384000 ( =C2=A09.384000)
    > ---------------------------- total: 10.640000sec
    >
    > =C2=A0 =C2=A0 =C2=A0 =C2=A0 user =C2=A0 =C2=A0 system =C2=A0 =C2=A0 =C2=

    =A0total =C2=A0 =C2=A0 =C2=A0 =C2=A0real
    > +=3D =C2=A023.397000 =C2=A0 0.000000 =C2=A023.397000 ( 23.397000)
    > << =C2=A052.953000 =C2=A0 0.000000 =C2=A052.953000 ( 52.953000)


    Server would perform a lot better here, but I suspect the fact that
    the "a" string was never re-initialized and just kept getting bigger
    was the main reason for this peculiar result.

    > ruby 1.9.1p129 (2009-05-12 revision 23412) [i686-linux]
    > Rehearsal --------------------------------------
    > +=3D =C2=A0 0.360000 =C2=A0 0.020000 =C2=A0 0.380000 ( =C2=A00.406038)
    > << =C2=A0 1.040000 =C2=A0 0.130000 =C2=A0 1.170000 ( =C2=A01.209839)
    > ----------------------------- total: 1.550000sec
    >
    > =C2=A0 =C2=A0 =C2=A0 =C2=A0 user =C2=A0 =C2=A0 system =C2=A0 =C2=A0 =C2=

    =A0total =C2=A0 =C2=A0 =C2=A0 =C2=A0real
    > +=3D =C2=A0 1.770000 =C2=A0 0.230000 =C2=A0 2.000000 ( =C2=A02.056577)
    > << =C2=A0 2.410000 =C2=A0 0.240000 =C2=A0 2.650000 ( =C2=A03.456429)


    I'm not sure why Ruby 1.9 did better here, but it could be that we
    grow strings at different rates and so our strings get larger faster.
    At any rate, in the fixed benchmark things look a lot better.

    - Charlie
     
    Charles Oliver Nutter, Jul 3, 2009
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mladen Adamovic
    Replies:
    0
    Views:
    759
    Mladen Adamovic
    Dec 4, 2003
  2. Mladen Adamovic
    Replies:
    3
    Views:
    14,696
    Mladen Adamovic
    Dec 5, 2003
  3. Matt
    Replies:
    3
    Views:
    536
    Tor Iver Wilhelmsen
    Sep 17, 2004
  4. Bruce Sam
    Replies:
    15
    Views:
    7,999
    John C. Bollinger
    Nov 19, 2004
  5. =?Utf-8?B?UmFqZXNoIHNvbmk=?=

    'System.String[]' from its string representation 'String[] Array'

    =?Utf-8?B?UmFqZXNoIHNvbmk=?=, May 4, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    1,808
    =?Utf-8?B?UmFqZXNoIHNvbmk=?=
    May 4, 2006
Loading...

Share This Page