Re: + in regular expression

Discussion in 'Python' started by Cameron Simpson, Oct 5, 2012.

  1. On 03Oct2012 21:17, Ian Kelly <> wrote:
    | On Wed, Oct 3, 2012 at 9:01 PM, contro opinion <> wrote:
    | > why the "\s{6}+" is not a regular pattern?
    |
    | Use a group: "(?:\s{6})+"

    Yeah, it is probably a precedence issue in the grammar.
    "(\s{6})+" is also accepted.
    --
    Cameron Simpson <>

    Disclaimer: ERIM wanted to share my opinions, but I wouldn't let them.
    - David Wiseman <>
     
    Cameron Simpson, Oct 5, 2012
    #1
    1. Advertising

  2. On 10/05/2012 04:23 AM, Duncan Booth wrote:
    > A regular expression element may be followed by a quantifier.
    > Quantifiers are '*', '+', '?', '{n}', '{n,m}' (and lazy quantifiers
    > '*?', '+?', '{n,m}?'). There's nothing in the regex language which says
    > you can follow an element with two quantifiers.

    In fact, *you* did -- the first sentence of that paragraph! :)

    \s is a regex, so you can follow it with a quantifier and get \s{6}.
    That's also a regex, so you should be able to follow it with a quantifier.

    I can understand that you can create a grammar that excludes it. I'm
    actually really interested to know if anyone knows whether this was a
    deliberate decision and, if so, what the reason is. (And if not --
    should it be considered a (low priority) bug?)

    Was it because such patterns often reveal a mistake? Because "\s{6}+"
    has other meanings in different regex syntaxes and the designers didn't
    want confusion? Because it was simpler to parse that way? Because the
    "hey you recognize regular expressions by converting it to a finite
    automaton" story is a lie in most real-world regex implementations (in
    part because they're not actually regular expressions) and repeated
    quantifiers cause problems with the parsing techniques that actually get
    used?

    Evan
     
    Evan Driscoll, Oct 5, 2012
    #2
    1. Advertising

  3. On 10/05/2012 10:27 AM, Evan Driscoll wrote:
    > On 10/05/2012 04:23 AM, Duncan Booth wrote:
    >> A regular expression element may be followed by a quantifier.
    >> Quantifiers are '*', '+', '?', '{n}', '{n,m}' (and lazy quantifiers
    >> '*?', '+?', '{n,m}?'). There's nothing in the regex language which says
    >> you can follow an element with two quantifiers.

    > In fact, *you* did -- the first sentence of that paragraph! :)
    >
    > \s is a regex, so you can follow it with a quantifier and get \s{6}.
    > That's also a regex, so you should be able to follow it with a
    > quantifier.

    OK, I guess this isn't true... you said a "regular expression *element*"
    can be followed by a quantifier. I just took what I usually see as part
    of a regular expression and read into your post something it didn't
    quite say. Still, the rest of mine applies.

    Evan
     
    Evan Driscoll, Oct 5, 2012
    #3
  4. Cameron Simpson

    MRAB Guest

    On 2012-10-05 16:27, Evan Driscoll wrote:
    > On 10/05/2012 04:23 AM, Duncan Booth wrote:
    >> A regular expression element may be followed by a quantifier.
    >> Quantifiers are '*', '+', '?', '{n}', '{n,m}' (and lazy quantifiers
    >> '*?', '+?', '{n,m}?'). There's nothing in the regex language which says
    >> you can follow an element with two quantifiers.

    > In fact, *you* did -- the first sentence of that paragraph! :)
    >
    > \s is a regex, so you can follow it with a quantifier and get \s{6}.
    > That's also a regex, so you should be able to follow it with a quantifier.
    >
    > I can understand that you can create a grammar that excludes it. I'm
    > actually really interested to know if anyone knows whether this was a
    > deliberate decision and, if so, what the reason is. (And if not --
    > should it be considered a (low priority) bug?)
    >
    > Was it because such patterns often reveal a mistake? Because "\s{6}+"
    > has other meanings in different regex syntaxes and the designers didn't
    > want confusion? Because it was simpler to parse that way? Because the
    > "hey you recognize regular expressions by converting it to a finite
    > automaton" story is a lie in most real-world regex implementations (in
    > part because they're not actually regular expressions) and repeated
    > quantifiers cause problems with the parsing techniques that actually get
    > used?
    >

    You rarely want to repeat a repeated element. It can also result in
    catastrophic
    backtracking unless you're _very_ careful.

    In many other regex implementations (including mine), "*+", "*+" and
    "?+" are possessive quantifiers, much as "??", "*?" and "??" are lazy
    quantifiers.

    You could, of course, ask why adding "?" after a quantifier doesn't
    make it optional, e.g. why r"\s{6}?" doesn't mean the same as
    r"(?:\s{6})?", or why r"\s{0,6}?" doesn't mean the same as
    r"(?:\s{0,6})?".
     
    MRAB, Oct 5, 2012
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Keith-Earl
    Replies:
    1
    Views:
    463
    Mary Chipman
    Jun 15, 2004
  2. VSK
    Replies:
    2
    Views:
    2,334
  3. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    861
    Alan Moore
    Dec 2, 2005
  4. GIMME
    Replies:
    3
    Views:
    11,997
    vforvikash
    Dec 29, 2008
  5. Noman Shapiro
    Replies:
    0
    Views:
    239
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page