Regular expression : non capturing groups are faster ?

Discussion in 'Python' started by candide, Jan 3, 2012.

  1. candide

    candide Guest

    Excerpt from the Regular Expression HOWTO
    (http://docs.python.org/howto/regex.html#non-capturing-and-named-groups) :


    -----------------------------------------------
    It should be mentioned that there’s no performance difference in
    searching between capturing and non-capturing groups; neither form is
    any faster than the other.
    -----------------------------------------------


    Now from the Perl Regular Expression tutorial
    (http://perldoc.perl.org/perlretut.html#Non-capturing-groupings) :


    -----------------------------------------------
    Because there is no extraction, non-capturing groupings are faster than
    capturing groupings.
    -----------------------------------------------


    The second assertion sounds more likely. It seems very odd that Python
    and Perl implementations are divergent on this point. Any opinion ?
     
    candide, Jan 3, 2012
    #1
    1. Advertising

  2. > The second assertion sounds more likely. It seems very odd that Python and
    > Perl implementations are divergent on this point. Any opinion ?


    The Python documentation oversimplifies. What it means to say is that
    while one might expect capturing matches to do extra work proportional
    to the capture, they do not. They don't do anything other than mark
    down where to extract submatches, and the extra work done is pretty
    much negligible. (That is, the work done for submatch extraction is a
    polynomial (looks like quadratic) in the number of capturing groups
    (which is very small almost always), with a small coefficient).

    The Perl documentation is technically correct, but if the HOWTO said
    it, it would give the wrong impression. You shouldn't care about
    capturing vs noncapturing except with regards to how it interferes
    with your group numbering scheme.

    -- Devin

    On Tue, Jan 3, 2012 at 6:14 AM, candide <> wrote:
    > Excerpt from the Regular Expression HOWTO
    > (http://docs.python.org/howto/regex.html#non-capturing-and-named-groups) :
    >
    >
    > -----------------------------------------------
    > It should be mentioned that there’s no performance difference in searching
    > between capturing and non-capturing groups; neither form is any faster than
    > the other.
    > -----------------------------------------------
    >
    >
    > Now from the Perl Regular Expression tutorial
    > (http://perldoc.perl.org/perlretut.html#Non-capturing-groupings) :
    >
    >
    > -----------------------------------------------
    > Because there is no extraction, non-capturing groupings are faster than
    > capturing groupings.
    > -----------------------------------------------
    >
    >
    > The second assertion sounds more likely. It seems very odd that Python and
    > Perl implementations are divergent on this point. Any opinion ?
    >
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
     
    Devin Jeanpierre, Jan 3, 2012
    #2
    1. Advertising

  3. From: "candide" <>
    Subject: Regular expression : non capturing groups are faster ?


    Excerpt from the Regular Expression HOWTO
    (http://docs.python.org/howto/regex.html#non-capturing-and-named-groups) :


    -----------------------------------------------
    It should be mentioned that there’s no performance difference in
    searching between capturing and non-capturing groups; neither form is
    any faster than the other.
    -----------------------------------------------


    Now from the Perl Regular Expression tutorial
    (http://perldoc.perl.org/perlretut.html#Non-capturing-groupings) :


    -----------------------------------------------
    Because there is no extraction, non-capturing groupings are faster than
    capturing groupings.
    -----------------------------------------------


    The second assertion sounds more likely. It seems very odd that Python
    and Perl implementations are divergent on this point. Any opinion ?


    --
    **
    At least in Perl's case, it is true. I tested and using (?:...) is much faster than ().

    Of course, it takes a few seconds for dozen million matches...

    Octavian
     
    Octavian Rasnita, Jan 3, 2012
    #3
  4. candide

    candide Guest

    Le 03/01/2012 12:56, Devin Jeanpierre a écrit :
    >> The second assertion sounds more likely. It seems very odd that Python and
    >> Perl implementations are divergent on this point. Any opinion ?

    >
    > The Python documentation oversimplifies.


    You meant Perl Documentation, didn't you ?


    It's a commun opinion that non-capturing groups have a price (minor),
    for instance Jan Goyvaerts, a well known regular expression guru,
    refering to Python code, tells :


    non-capturing groups (...) offer (slightly) better performance as the
    regex engine doesn't have to keep track of the text matched by
    non-capturing groups.


    [link is there :
    http://stackoverflow.com/questions/...pressions-non-capturing-group-is-not-working]



    It seems Javascript performs better respect to non-capturing groups :
    http://jsperf.com/regex-capture-vs-non-capture

    The same for java : http://eyalsch.wordpress.com/2009/05/21/regex/
    (no benchmarks).

    For my part, Python tests didn't show any kind of significative penality.
     
    candide, Jan 3, 2012
    #4
  5. > You meant Perl Documentation, didn't you ?

    I guess that works too. I did mean Python, though -- its intent is to
    say "you shouldn't worry about this", but in the process it says "this
    does not exist" (a lie).

    "slightly better performance" would be accurate, as said by Goyvaerts/

    -- Devin

    On Tue, Jan 3, 2012 at 9:50 AM, candide <> wrote:
    > Le 03/01/2012 12:56, Devin Jeanpierre a écrit :
    >>>
    >>> The second assertion sounds more likely. It seems very odd that Python
    >>> and
    >>> Perl implementations are divergent on this point. Any opinion ?

    >>
    >>
    >> The Python documentation oversimplifies.

    >
    >
    > You meant Perl Documentation, didn't you ?
    >
    >
    > It's a commun opinion that non-capturing groups have a price (minor), for
    > instance Jan Goyvaerts, a well known regular expression guru, refering to
    > Python code, tells :
    >
    >
    > non-capturing groups (...)  offer (slightly) better performance as the regex
    > engine doesn't have to keep track of the text matched by non-capturing
    > groups.
    >
    >
    > [link is there :
    > http://stackoverflow.com/questions/...pressions-non-capturing-group-is-not-working]
    >
    >
    >
    > It seems Javascript performs better respect to non-capturing groups :
    > http://jsperf.com/regex-capture-vs-non-capture
    >
    > The same for java : http://eyalsch.wordpress.com/2009/05/21/regex/
    > (no benchmarks).
    >
    > For my part, Python tests didn't show any kind of significative penality.
    > --
    > http://mail.python.org/mailman/listinfo/python-list
     
    Devin Jeanpierre, Jan 3, 2012
    #5
  6. From: "Devin Jeanpierre" <>
    Subject: Re: Regular expression : non capturing groups are faster ?


    >> You meant Perl Documentation, didn't you ?

    >
    > I guess that works too. I did mean Python, though -- its intent is to
    > say "you shouldn't worry about this", but in the process it says "this
    > does not exist" (a lie).



    **
    However, the Perl documentation doesn't lie.

    I tested 10 million matches on my computer using capturing groups and it took ~ 6 seconds, but only ~ 2 seconds with non-capturing params.

    So yes, it is very fast anyway, but ~ 3 times faster with non-capturing params, so there is a difference.

    Octavian
     
    Octavian Rasnita, Jan 3, 2012
    #6
  7. > I tested 10 million matches on my computer using capturing groups and it took ~ 6 seconds, but only ~ 2 seconds with non-capturing params.

    Are you talking about Python or Perl? I can't reproduce this in
    Python. Best I can do is a 3:4 ratio between running times. ('(a)*
    versus '(?:a)*)

    Also, wouldn't say "very fast". Compare those two groups with 'a*'.
    I'm not sure what's going on there.

    -- Devin

    On Tue, Jan 3, 2012 at 3:07 PM, Octavian Rasnita <> wrote:
    > From: "Devin Jeanpierre" <>
    > Subject: Re: Regular expression : non capturing groups are faster ?
    >
    >
    >>> You meant Perl Documentation, didn't you ?

    >>
    >> I guess that works too. I did mean Python, though -- its intent is to
    >> say "you shouldn't worry about this", but in the process it says "this
    >> does not exist" (a lie).

    >
    >
    > **
    > However, the Perl documentation doesn't lie.
    >
    > I tested 10 million matches on my computer using capturing groups and it took ~ 6 seconds, but only ~ 2 seconds with non-capturing params.
    >
    > So yes, it is very fast anyway, but ~ 3 times faster with non-capturing params, so there is a difference.
    >
    > Octavian
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
     
    Devin Jeanpierre, Jan 3, 2012
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,354
  2. Axel Kowald

    Regular expression problem with groups

    Axel Kowald, Aug 16, 2004, in forum: Python
    Replies:
    2
    Views:
    254
    Gerardo Herzig -Departamento de Proyectos Especial
    Aug 17, 2004
  3. Hugo Ferreira

    Typed named groups in regular expression

    Hugo Ferreira, May 16, 2007, in forum: Python
    Replies:
    5
    Views:
    346
    Paddy
    May 20, 2007
  4. Virtual Buddha
    Replies:
    3
    Views:
    357
    Virtual Buddha
    Jun 27, 2009
  5. pinkisntwell

    OT: GNU regex library and non-capturing groups

    pinkisntwell, Nov 13, 2009, in forum: C Programming
    Replies:
    1
    Views:
    395
    Seebs
    Nov 13, 2009
Loading...

Share This Page