String#chop slow? REALLY slow?

Discussion in 'Ruby' started by Mat Schaffer, Jul 27, 2006.

  1. Mat Schaffer

    Mat Schaffer Guest

    I just did a quick benchmark to prove something to myself. But I'd
    like to get a sanity check from the people on the list.

    Basically I want to drop what will be a trailing "\n" from input.
    But it appears that using String#[] and if statements is nearly 200
    times more efficient than chop. Which just seems really weird, so
    here's the benchmark. Maybe I'm doing something wrong.

    Does this seem right? Anyone care to comment?

    ---- index_vs_chop.rb

    require 'benchmark'

    n = 100_000
    bigstring = "I am a big string " * 5_000

    Benchmark.bmbm do |bench|
    bench.report("Indexing") {
    n.times do
    bigstring[0..-1]
    end
    }

    bench.report("Chop") {
    n.times do
    bigstring.chop
    end
    }
    end

    ---- end index_vs_shop.rb

    output:

    Rehearsal --------------------------------------------
    Indexing 0.100000 0.000000 0.100000 ( 0.102362)
    Chop 7.190000 13.890000 21.080000 ( 22.477807)
    ---------------------------------- total: 21.180000sec

    user system total real
    Indexing 0.100000 0.000000 0.100000 ( 0.108777)
    Chop 7.290000 14.050000 21.340000 ( 22.755782)
    Mat Schaffer, Jul 27, 2006
    #1
    1. Advertising

  2. Mat Schaffer

    Guest

    On Fri, 28 Jul 2006, Mat Schaffer wrote:

    > I just did a quick benchmark to prove something to myself. But I'd like to
    > get a sanity check from the people on the list.
    >
    > Basically I want to drop what will be a trailing "\n" from input. But it
    > appears that using String#[] and if statements is nearly 200 times more
    > efficient than chop. Which just seems really weird, so here's the
    > benchmark. Maybe I'm doing something wrong.
    >
    > Does this seem right? Anyone care to comment?
    >
    > ---- index_vs_chop.rb
    >
    > require 'benchmark'
    >
    > n = 100_000
    > bigstring = "I am a big string " * 5_000
    >
    > Benchmark.bmbm do |bench|
    > bench.report("Indexing") {
    > n.times do
    > bigstring[0..-1]
    > end
    > }
    >
    > bench.report("Chop") {
    > n.times do
    > bigstring.chop
    > end
    > }
    > end
    >
    > ---- end index_vs_shop.rb
    >
    > output:
    >
    > Rehearsal --------------------------------------------
    > Indexing 0.100000 0.000000 0.100000 ( 0.102362)
    > Chop 7.190000 13.890000 21.080000 ( 22.477807)
    > ---------------------------------- total: 21.180000sec
    >
    > user system total real
    > Indexing 0.100000 0.000000 0.100000 ( 0.108777)
    > Chop 7.290000 14.050000 21.340000 ( 22.755782)


    on my node:

    harp:~ > ruby a.rb
    Rehearsal --------------------------------------------
    Indexing 0.150000 0.000000 0.150000 ( 0.145923)
    Chop 4.210000 16.200000 20.410000 ( 20.910127)
    Chop2 4.210000 0.220000 4.430000 ( 4.536517)
    ---------------------------------- total: 24.990000sec

    user system total real
    Indexing 0.140000 0.000000 0.140000 ( 0.142257)
    Chop 0.110000 0.000000 0.110000 ( 0.104612)
    Chop2 0.150000 0.000000 0.150000 ( 0.152083)


    harp:~ > cat a.rb
    require 'benchmark'

    n = 100_000
    bigstring = "I am a big string " * 5_000

    Benchmark.bmbm do |bench|
    bench.report("Indexing") {
    n.times do
    bigstring[0..-1]
    end
    }

    bench.report("Chop") {
    n.times do
    bigstring.chop
    end
    }

    bench.report("Chop2") {
    n.times do
    bigstring = bigstring[0..-2]
    end
    }
    end




    -a
    --
    we can never obtain peace in the outer world until we make peace with
    ourselves.
    - h.h. the 14th dali lama
    , Jul 27, 2006
    #2
    1. Advertising

  3. Mat Schaffer

    ChrisH Guest

    Re: String#chop slow? REALLY slow?

    Mat Schaffer wrote:
    ....
    > output:
    >
    > Rehearsal --------------------------------------------
    > Indexing 0.100000 0.000000 0.100000 ( 0.102362)
    > Chop 7.190000 13.890000 21.080000 ( 22.477807)
    > ---------------------------------- total: 21.180000sec
    >
    > user system total real
    > Indexing 0.100000 0.000000 0.100000 ( 0.108777)
    > Chop 7.290000 14.050000 21.340000 ( 22.755782)


    You might want to use chop!:

    Rehearsal --------------------------------------------
    Indexing 0.843000 0.000000 0.843000 ( 0.844000)
    Chop! 0.235000 0.000000 0.235000 ( 0.234000)
    ----------------------------------- total: 1.078000sec

    user system total real
    Indexing 1.437000 0.015000 1.452000 ( 1.453000)
    Chop! 0.203000 0.000000 0.203000 ( 0.203000)

    cheers
    Chris
    ChrisH, Jul 27, 2006
    #3
  4. Mat Schaffer

    Mat Schaffer Guest

    On Jul 27, 2006, at 12:20 PM, wrote:
    > on my node:
    >
    > harp:~ > ruby a.rb
    > Rehearsal --------------------------------------------
    > Indexing 0.150000 0.000000 0.150000 ( 0.145923)
    > Chop 4.210000 16.200000 20.410000 ( 20.910127)
    > Chop2 4.210000 0.220000 4.430000 ( 4.536517)
    > ---------------------------------- total: 24.990000sec
    >
    > user system total real
    > Indexing 0.140000 0.000000 0.140000 ( 0.142257)
    > Chop 0.110000 0.000000 0.110000 ( 0.104612)
    > Chop2 0.150000 0.000000 0.150000 ( 0.152083)


    Now that's interesting. I wonder why the rehearsal and the real run
    are so different....
    Mat Schaffer, Jul 27, 2006
    #4
  5. Mat Schaffer

    Caio Chassot Guest

    > Basically I want to drop what will be a trailing "\n" from input.
    > But it appears that using String#[] and if statements is nearly 200
    > times more efficient than chop. Which just seems really weird, so
    > here's the benchmark. Maybe I'm doing something wrong.


    Well, if you implement chop fully, you get very similar results:

    RubyMate r4106 running Ruby v1.8.4 (/usr/local/bin/ruby)
    >>> untitled


    Rehearsal -------------------------------------------------
    Indexing 1.790000 3.950000 5.740000 ( 7.099300)
    Chop 1.680000 3.930000 5.610000 ( 7.135508)
    Indexing crlf 1.780000 3.970000 5.750000 ( 6.895291)
    Chop crlf 1.670000 3.930000 5.600000 ( 6.573193)
    --------------------------------------- total: 22.700000sec

    user system total real
    Indexing 1.780000 3.980000 5.760000 ( 7.033924)
    Chop 1.670000 3.970000 5.640000 ( 7.297766)
    Indexing crlf 1.790000 4.020000 5.810000 ( 8.969243)
    Chop crlf 1.680000 4.000000 5.680000 ( 7.480123)

    ---

    require 'benchmark'

    n = 10_000
    bigstring = "I am a big string " * 5_000

    Benchmark.bmbm do |bench|
    bench.report("Indexing") {
    n.times do
    bigstring[0..-2] == "\r\n" ? bigstring[0..-2] : bigstring[0..-1]
    end
    }

    bench.report("Chop") {
    n.times do
    bigstring.chop
    end
    }

    bigstring << "\r\n"

    bench.report("Indexing crlf") {
    n.times do
    bigstring[0..-2] == "\r\n" ? bigstring[0..-2] : bigstring[0..-1]
    end
    }

    bench.report("Chop crlf") {
    n.times do
    bigstring.chop
    end
    }
    end
    Caio Chassot, Jul 27, 2006
    #5
  6. Mat Schaffer wrote:
    > I just did a quick benchmark to prove something to myself. But I'd =

    like=20
    > to get a sanity check from the people on the list.
    >=20
    > Basically I want to drop what will be a trailing "\n" from input. But =


    > it appears that using String#[] and if statements is nearly 200 times=20
    > more efficient than chop. Which just seems really weird, so here's =

    the=20
    > benchmark. Maybe I'm doing something wrong.
    >=20
    > Does this seem right? Anyone care to comment?


    <snip>

    As someone else pointed out, you'll probably want to use String#chop! =
    for=20
    faster performance, since it uses the current object instead of creating =
    a new one.

    Also note that str[0..-2] is not quite the same as str.chop when "\r\n" =
    is=20
    involved:

    irb(main):001:0> str =3D "hello world\r\n"
    =3D> "hello world\r\n"
    irb(main):002:0> str[0..-2]
    =3D> "hello world\r"
    irb(main):003:0> str.chop
    =3D> "hello world"

    I wouldn't think the extra work of checking for "\r\n" would add that =
    much=20
    overhead, though.

    Regards,

    Dan


    This communication is the property of Qwest and may contain confidential =
    or
    privileged information. Unauthorized use of this communication is =
    strictly=20
    prohibited and may be unlawful. If you have received this communication =

    in error, please immediately notify the sender by reply e-mail and =
    destroy=20
    all copies of the communication and any attachments.
    Daniel Berger, Jul 27, 2006
    #6
  7. Mat Schaffer

    Caio Chassot Guest

    On 2006-07-27, at 13:36 , Caio Chassot wrote:

    >> Basically I want to drop what will be a trailing "\n" from input.
    >> But it appears that using String#[] and if statements is nearly
    >> 200 times more efficient than chop. Which just seems really
    >> weird, so here's the benchmark. Maybe I'm doing something wrong.

    >
    > Well, if you implement chop fully, you get very similar results:


    Ah, but rangeless indexing yields much much better results:

    RubyMate r4106 running Ruby v1.8.4 (/usr/local/bin/ruby)
    >>> untitled


    Rehearsal -------------------------------------------------
    Indexing 0.110000 0.000000 0.110000 ( 0.151018)
    Chop 3.430000 7.920000 11.350000 ( 15.030196)
    Indexing crlf 0.110000 0.000000 0.110000 ( 0.128584)
    Chop crlf 3.430000 7.920000 11.350000 ( 14.815128)
    --------------------------------------- total: 22.920000sec

    user system total real
    Indexing 0.110000 0.000000 0.110000 ( 0.134087)
    Chop 3.430000 7.980000 11.410000 ( 14.305555)
    Indexing crlf 0.110000 0.000000 0.110000 ( 0.125122)
    Chop crlf 3.420000 7.990000 11.410000 ( 13.869411)

    ---

    require 'benchmark'

    n = 20_000
    bigstring = "I am a big string " * 5_000

    Benchmark.bmbm do |bench|
    bench.report("Indexing") {
    n.times do
    bigstring[-2,2] == "\r\n" ? bigstring[-2,2] : bigstring[-1,1]
    end
    }

    bench.report("Chop") {
    n.times do
    bigstring.chop
    end
    }

    bigstring << "\r\n"

    bench.report("Indexing crlf") {
    n.times do
    bigstring[-2,2] == "\r\n" ? bigstring[-2,2] : bigstring[-1,1]
    end
    }

    bench.report("Chop crlf") {
    n.times do
    bigstring.chop
    end
    }
    end
    Caio Chassot, Jul 27, 2006
    #7
  8. On Jul 27, 2006, at 11:12 AM, Mat Schaffer wrote:

    > Basically I want to drop what will be a trailing "\n" from input.


    String#chomp would probably be a better idea for this, but that's OT
    I suppose. Regardless, its performance is the same as chop, it seems.

    Here are my modifications:


    require 'benchmark'

    class String
    def my_chop
    self[0..-2]
    end
    end

    n = 100_000
    bigstring = "I am a big string " * 5_000

    Benchmark.bmbm do |bench|
    bench.report("Indexing") {
    n.times do
    bigstring[0..-1]
    end
    }

    bench.report("Chop") {
    n.times do
    bigstring.chop
    end
    }

    bench.report("My Chop") {
    n.times do
    bigstring.my_chop
    end
    }
    end


    And here are my results:


    Rehearsal --------------------------------------------
    Indexing 0.310000 0.000000 0.310000 ( 0.347943)
    Chop 11.940000 30.330000 42.270000 ( 44.501066)
    My Chop 12.620000 30.720000 43.340000 ( 46.339651)
    ---------------------------------- total: 85.920000sec

    user system total real
    Indexing 0.230000 0.000000 0.230000 ( 0.258177)
    Chop 11.980000 30.680000 42.660000 ( 44.966923)
    My Chop 12.610000 30.860000 43.470000 ( 45.859064)


    Let's see how String#chop is implemented...


    static VALUE
    rb_str_chop(str)
    VALUE str;
    {
    str = rb_str_dup(str);
    rb_str_chop_bang(str);
    return str;
    }


    So it's in C... interesting....

    - Jake McArthur
    Jake McArthur, Jul 27, 2006
    #8
  9. Re: String#chop slow? REALLY slow?

    On 7/27/06, Mat Schaffer <> wrote:
    > I just did a quick benchmark to prove something to myself. But I'd
    > like to get a sanity check from the people on the list.
    >

    Using Ara's code:

    Rehearsal --------------------------------------------
    Indexing 0.109000 0.000000 0.109000 ( 0.109000)
    Chop 6.766000 8.250000 15.016000 ( 15.110000)
    Chop2 2.656000 3.781000 6.437000 ( 6.468000)
    ---------------------------------- total: 21.562000sec

    user system total real
    Indexing 0.156000 0.000000 0.156000 ( 0.156000)
    Chop 0.094000 0.000000 0.094000 ( 0.094000)
    Chop2 0.187000 0.000000 0.187000 ( 0.187000)

    > ruby -v

    ruby 1.8.4 (2005-12-24) [i386-mswin32]

    I think the difference in performance is because internally chop does
    a dup on the string then calls chop! whereas the index operation
    creates a new string which shares the old string but with a different
    length. I guess this is also why the rehearsal and final results
    differ - cutting out the cost of GC doesn't reflect the true cost of
    using chop (especially with big strings).

    Regards,
    Sean
    Sean O'Halpin, Jul 27, 2006
    #9
  10. Re: String#chop slow? REALLY slow?

    On 7/27/06, Mat Schaffer <> wrote:
    > I just did a quick benchmark to prove something to myself. But I'd
    > like to get a sanity check from the people on the list.

    [snip]
    > Benchmark.bmbm do |bench|
    > bench.report("Indexing") {
    > n.times do
    > bigstring[0..-1]
    > end
    > }

    [snip]

    No-one seems to have noticed the typo....? I think that 4th line should be:

    bigstring[0..-2]

    Which is slower. That should account for part of the performance gap.
    Caleb Clausen, Jul 27, 2006
    #10
  11. Mat Schaffer

    Mat Schaffer Guest

    Re: String#chop slow? REALLY slow?

    On Jul 27, 2006, at 1:18 PM, Caleb Clausen wrote:

    > On 7/27/06, Mat Schaffer <> wrote:
    >> I just did a quick benchmark to prove something to myself. But I'd
    >> like to get a sanity check from the people on the list.

    > [snip]
    >> Benchmark.bmbm do |bench|
    >> bench.report("Indexing") {
    >> n.times do
    >> bigstring[0..-1]
    >> end
    >> }

    > [snip]
    >
    > No-one seems to have noticed the typo....? I think that 4th line
    > should be:
    >
    > bigstring[0..-2]
    >
    > Which is slower. That should account for part of the performance gap.
    >


    You're totally right! [0..-1] is the same string. Thanks for the
    catch. I'm surprised it took that long.

    Thanks for all the advice, everyone. Sorry to be a little brain-dead.
    -Mat
    Mat Schaffer, Jul 27, 2006
    #11
  12. Mat Schaffer

    Caio Chassot Guest

    On 2006-07-27, at 13:40 , Caio Chassot wrote:

    >
    > On 2006-07-27, at 13:36 , Caio Chassot wrote:
    >
    >>> Basically I want to drop what will be a trailing "\n" from
    >>> input. But it appears that using String#[] and if statements is
    >>> nearly 200 times more efficient than chop. Which just seems
    >>> really weird, so here's the benchmark. Maybe I'm doing something
    >>> wrong.

    >>
    >> Well, if you implement chop fully, you get very similar results:

    >
    > Ah, but rangeless indexing yields much much better results:


    Speaking of catching typos, I apparently went too happy with my de-
    ranging and implemented the wrong thing. Here are the actual results.
    Pretty much the same as with ranges:

    RubyMate r4106 running Ruby v1.8.4 (/usr/local/bin/ruby)
    >>> untitled


    Rehearsal -------------------------------------------------
    Indexing 3.690000 7.910000 11.600000 ( 13.937017)
    Chop 3.480000 7.890000 11.370000 ( 13.911387)
    Indexing crlf 3.690000 7.980000 11.670000 ( 15.256540)
    Chop crlf 3.530000 8.040000 11.570000 ( 16.200714)
    --------------------------------------- total: 46.210000sec

    user system total real
    Indexing 3.700000 8.050000 11.750000 ( 14.579216)
    Chop 3.520000 8.100000 11.620000 ( 15.165561)
    Indexing crlf 3.730000 8.090000 11.820000 ( 15.573669)
    Chop crlf 3.520000 8.100000 11.620000 ( 15.706817)


    ---

    require 'benchmark'

    n = 20_000
    s = "I am a big string " * 5_000

    Benchmark.bmbm do |bench|
    bench.report("Indexing") {
    n.times do
    s[-2,2] == "\r\n" ? s[0, s.length - 2] : s[0, s.length - 1]
    end
    }

    bench.report("Chop") {
    n.times do
    s.chop
    end
    }

    s << "\r\n"

    bench.report("Indexing crlf") {
    n.times do
    s[-2,2] == "\r\n" ? s[0, s.length - 2] : s[0, s.length - 1]
    end
    }

    bench.report("Chop crlf") {
    n.times do
    s.chop
    end
    }
    end
    Caio Chassot, Jul 27, 2006
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Aaron Walker

    best way to chop off leading char in string?

    Aaron Walker, Nov 22, 2003, in forum: C Programming
    Replies:
    3
    Views:
    431
    Sheldon Simms
    Nov 22, 2003
  2. Johnathan Smith

    chop and chop!

    Johnathan Smith, Jan 8, 2008, in forum: Ruby
    Replies:
    2
    Views:
    118
    darren kirby
    Jan 8, 2008
  3. Evgeni Belin

    String#chop chops last byte, not char

    Evgeni Belin, Apr 23, 2008, in forum: Ruby
    Replies:
    1
    Views:
    128
    Charles Oliver Nutter
    Apr 23, 2008
  4. yusufm

    print chop; VS chop; print;

    yusufm, Mar 9, 2006, in forum: Perl Misc
    Replies:
    2
    Views:
    105
    Tad McClellan
    Mar 9, 2006
  5. martin
    Replies:
    3
    Views:
    161
    Joe Smith
    Apr 15, 2006
Loading...

Share This Page