Curious regexp behavior

Discussion in 'Ruby' started by Derek Lewis, Feb 16, 2005.

  1. Derek Lewis

    Derek Lewis Guest

    On a whim, I just decided to try an experiment with regexps, to see how
    they perform in two slightly different cases. I wanted to see how using
    a single regexp object for many many evaluations performed compared to
    using the regexp within the loop.

    The scripts I wrote searched through a words file that is 234937 lines
    long.

    Here's the scripts I wrote, to clarify:
    First one:

    total = 0
    File.open( 'words', 'r' ) { |file|
    file.each_line { |line|
    word = line.chomp
    total +=1 if word =~ /[a-df-h][aeiou]{2}/
    }
    }
    puts total

    Second one:

    rexp = /[a-df-h][aeiou]{2}/
    total = 0
    File.open( 'words', 'r' ) { |file|
    file.each_line { |line|
    word = line.chomp
    total +=1 if word =~ rexp
    }
    }
    puts total


    I expected the second one to be slightly faster, but was surprised to
    see that it was actually slightly slower. I ran each one about 10-15
    times, and eyeballed an average. The results from each run after the
    first were pretty consistant.

    It's just a curiosity, but does anyone know what might cause them to be
    'backwards' like that? :)

    --
    Derek Lewis

    ===================================================================
    Java Web-Application Developer

    Email :
    Cellular : 778.898.5825
    Website : http://www.lewisd.com

    "If you've got a 5000-line JSP page that has "all in one" support
    for three input forms and four follow-up screens, all controlled
    by "if" statements in scriptlets, well ... please don't show it
    to me :). Its almost dinner time, and I don't want to lose my
    appetite :)."
    - Craig R. McClanahan
    Derek Lewis, Feb 16, 2005
    #1
    1. Advertising

  2. Derek Lewis wrote:
    > On a whim, I just decided to try an experiment with regexps, to see

    how
    > they perform in two slightly different cases. I wanted to see how

    using
    > a single regexp object for many many evaluations performed compared

    to
    > using the regexp within the loop.
    >
    > The scripts I wrote searched through a words file that is 234937

    lines
    > long.
    >
    > Here's the scripts I wrote, to clarify:
    > First one:
    >
    > total = 0
    > File.open( 'words', 'r' ) { |file|
    > file.each_line { |line|
    > word = line.chomp
    > total +=1 if word =~ /[a-df-h][aeiou]{2}/
    > }
    > }
    > puts total
    >
    > Second one:
    >
    > rexp = /[a-df-h][aeiou]{2}/
    > total = 0
    > File.open( 'words', 'r' ) { |file|
    > file.each_line { |line|
    > word = line.chomp
    > total +=1 if word =~ rexp
    > }
    > }
    > puts total
    >
    >
    > I expected the second one to be slightly faster, but was surprised to
    > see that it was actually slightly slower. I ran each one about 10-15
    > times, and eyeballed an average. The results from each run after the
    > first were pretty consistant.
    >
    > It's just a curiosity, but does anyone know what might cause them to

    be
    > 'backwards' like that? :)
    >

    I'll wager a guess. In the first version Ruby knows that
    '/[a-df-h][aeiou]{2}/' is a regexp. In the second one Ruby doesn't
    know if 'rexp' is a variable or method, so it has to do 1 maybe 2 look
    ups on every interation before it dispatches String#=~.
    Also regexp's are immutable so Ruby doesn't allocate a new regexp on
    every interation and storing the regexp has no effect in that regard.

    -Charlie
    Charles Mills, Feb 16, 2005
    #2
    1. Advertising

  3. Derek Lewis

    Eric Hodel Guest

    --Apple-Mail-15--771973824
    Content-Transfer-Encoding: 7bit
    Content-Type: text/plain; charset=US-ASCII; format=flowed

    On 15 Feb 2005, at 17:16, Derek Lewis wrote:

    > First one:
    >
    > total = 0
    > File.open( 'words', 'r' ) { |file|
    > file.each_line { |line|
    > word = line.chomp
    > total +=1 if word =~ /[a-df-h][aeiou]{2}/

    ^^^^ inline regexp (part of the AST)
    > }
    > }
    > puts total
    >
    > Second one:
    >
    > rexp = /[a-df-h][aeiou]{2}/
    > total = 0
    > File.open( 'words', 'r' ) { |file|
    > file.each_line { |line|
    > word = line.chomp
    > total +=1 if word =~ rexp

    ^^^^ variable lookup
    > }
    > }
    > puts total
    >
    >
    > I expected the second one to be slightly faster, but was surprised to
    > see that it was actually slightly slower. I ran each one about 10-15
    > times, and eyeballed an average. The results from each run after the
    > first were pretty consistant.
    >
    > It's just a curiosity, but does anyone know what might cause them to be
    > 'backwards' like that? :)


    Inline regexps are much faster than a variable lookup then using the
    methods on the Regexp object.

    --
    Eric Hodel - - http://segment7.net
    FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

    --Apple-Mail-15--771973824
    content-type: application/pgp-signature; x-mac-type=70674453;
    name=PGP.sig
    content-description: This is a digitally signed message part
    content-disposition: inline; filename=PGP.sig
    content-transfer-encoding: 7bit

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (Darwin)

    iD8DBQFCEtoXMypVHHlsnwQRAh7aAJ46tcIOb0m2MDduBhOXkGpMgX5jTgCfYeqS
    Go0t7KlyH3HliLg7xAtWbQE=
    =aKt5
    -----END PGP SIGNATURE-----

    --Apple-Mail-15--771973824--
    Eric Hodel, Feb 16, 2005
    #3
  4. Derek Lewis

    Ryan Davis Guest

    On Feb 15, 2005, at 5:16 PM, Derek Lewis wrote:

    > I expected the second one to be slightly faster, but was surprised to
    > see that it was actually slightly slower. I ran each one about 10-15
    > times, and eyeballed an average. The results from each run after the
    > first were pretty consistant.
    >
    > It's just a curiosity, but does anyone know what might cause them to be
    > 'backwards' like that? :)


    Use ParseTree and you can see why!!!

    <576> echo "a=/blah/; 's' =~ a" | parse_tree_show -f
    (cut for readability)
    [:lasgn, :a, [:lit, /blah/]],
    [:call, [:str, "s"], :=~, [:array, [:lvar, :a]]]]]]]]
    <577> echo "'s' =~ /blah/" | parse_tree_show -f
    (cut for readability)
    [:match3, [:lit, /blah/], [:str, "s"]]]]]]]

    Basically, the inline regex avoids the lvar lookup and the call and
    shoots straight into a match3 node. The lvar is probably not _that_
    expensive, but method dispatch is not terribly cheap.

    --
    - http://blog.zenspider.com/
    http://rubyforge.org/projects/ruby2c/
    http://rubyforge.org/projects/parsetree/
    http://www.zenspider.com/seattle.rb
    Ryan Davis, Feb 16, 2005
    #4
  5. "Derek Lewis" <> schrieb im Newsbeitrag
    news:...
    > On a whim, I just decided to try an experiment with regexps, to see how
    > they perform in two slightly different cases. I wanted to see how using
    > a single regexp object for many many evaluations performed compared to
    > using the regexp within the loop.
    >
    > The scripts I wrote searched through a words file that is 234937 lines
    > long.
    >
    > Here's the scripts I wrote, to clarify:
    > First one:
    >
    > total = 0
    > File.open( 'words', 'r' ) { |file|
    > file.each_line { |line|
    > word = line.chomp
    > total +=1 if word =~ /[a-df-h][aeiou]{2}/
    > }
    > }
    > puts total
    >
    > Second one:
    >
    > rexp = /[a-df-h][aeiou]{2}/
    > total = 0
    > File.open( 'words', 'r' ) { |file|
    > file.each_line { |line|
    > word = line.chomp
    > total +=1 if word =~ rexp
    > }
    > }
    > puts total
    >
    >
    > I expected the second one to be slightly faster, but was surprised to
    > see that it was actually slightly slower. I ran each one about 10-15
    > times, and eyeballed an average. The results from each run after the
    > first were pretty consistant.
    >
    > It's just a curiosity, but does anyone know what might cause them to be
    > 'backwards' like that? :)


    Did you try the same with the matching reversed, i.e., "rexp =~ word"
    instead of "word =~ rexp"? Did it make a difference?

    Kind regards

    robert
    Robert Klemme, Feb 16, 2005
    #5
  6. Excerpts from Ryan Davis's mail of 16 Feb 2005 (EST):
    > Use ParseTree and you can see why!!!
    >
    > <576> echo "a=/blah/; 's' =~ a" | parse_tree_show -f
    > (cut for readability)
    > [:lasgn, :a, [:lit, /blah/]],
    > [:call, [:str, "s"], :=~, [:array, [:lvar, :a]]]]]]]]
    > <577> echo "'s' =~ /blah/" | parse_tree_show -f
    > (cut for readability)
    > [:match3, [:lit, /blah/], [:str, "s"]]]]]]]


    Very nice answer.

    Like the original poster, I found the behavior counterintuitive. Perhaps
    this is because our assumptions come from the C model of the universe,
    where more local variables is typically faster, and method dispatch is
    not a problem.

    I wonder what the merits of collecting equivalences like these to form
    some kind of post-hoc parse-tree optimization would be. Probably not
    great, but it might be fun.

    --
    William <>
    William Morgan, Feb 16, 2005
    #6
  7. Derek Lewis

    Derek Lewis Guest

    On Wed, Feb 16, 2005 at 06:14:52PM +0900, Robert Klemme wrote:
    >
    >
    > Did you try the same with the matching reversed, i.e., "rexp =~ word"
    > instead of "word =~ rexp"? Did it make a difference?
    >
    > Kind regards
    >
    > robert
    >


    I did, actually, and it was very slightly faster. Still slower than an
    inline regexp, however.

    Thanks for the insightful answers, everyone. It quite interesting to
    find out how your favorite programming language works inside.

    --
    Derek Lewis

    ===================================================================
    Java Web-Application Developer

    Email :
    Cellular : 778.898.5825
    Website : http://www.lewisd.com

    "If you've got a 5000-line JSP page that has "all in one" support
    for three input forms and four follow-up screens, all controlled
    by "if" statements in scriptlets, well ... please don't show it
    to me :). Its almost dinner time, and I don't want to lose my
    appetite :)."
    - Craig R. McClanahan
    Derek Lewis, Feb 16, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    442
  2. mark

    Curious string behavior

    mark, Jan 28, 2004, in forum: Python
    Replies:
    2
    Views:
    306
    OKB (not okblacke)
    Jan 28, 2004
  3. BartlebyScrivener

    curious paramstyle qmark behavior

    BartlebyScrivener, Oct 20, 2006, in forum: Python
    Replies:
    7
    Views:
    355
    Jon Clements
    Oct 21, 2006
  4. Jim B. Wilson

    The curious behavior of integer objects

    Jim B. Wilson, Jan 15, 2007, in forum: Python
    Replies:
    15
    Views:
    394
    Carl Banks
    Jan 16, 2007
  5. Joao Silva
    Replies:
    16
    Views:
    359
    7stud --
    Aug 21, 2009
Loading...

Share This Page