Nothing to repeat

Discussion in 'Python' started by Tom Anderson, Jan 9, 2011.

  1. Tom Anderson

    Tom Anderson Guest

    Hello everyone, long time no see,

    This is probably not a Python problem, but rather a regular expressions
    problem.

    I want, for the sake of arguments, to match strings comprising any number
    of occurrences of 'spa', each interspersed by any number of occurrences of
    the 'm'. 'any number' includes zero, so the whole pattern should match the
    empty string.

    Here's the conversation Python and i had about it:

    Python 2.6.4 (r264:75706, Jun 4 2010, 18:20:16)
    [GCC 4.4.4 20100503 (Red Hat 4.4.4-2)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import re
    >>> re.compile("(spa|m*)*")

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/usr/lib/python2.6/re.py", line 190, in compile
    return _compile(pattern, flags)
    File "/usr/lib/python2.6/re.py", line 245, in _compile
    raise error, v # invalid expression
    sre_constants.error: nothing to repeat

    What's going on here? Why is there nothing to repeat? Is the problem
    having one *'d term inside another?

    Now, i could actually rewrite this particular pattern as '(spa|m)*'. But
    what i neglected to mention above is that i'm actually generating patterns
    from structures of objects (representations of XML DTDs, as it happens),
    and as it stands, patterns like this are a possibility.

    Any thoughts on what i should do? Do i have to bite the bullet and apply
    some cleverness in my pattern generation to avoid situations like this?

    Thanks,
    tom

    --
    If it ain't broke, open it up and see what makes it so bloody special.
     
    Tom Anderson, Jan 9, 2011
    #1
    1. Advertising

  2. Tom Anderson

    Ian Guest

    On 09/01/2011 16:49, Tom Anderson wrote:
    > Hello everyone, long time no see,
    >
    > This is probably not a Python problem, but rather a regular
    > expressions problem.
    >
    > I want, for the sake of arguments, to match strings comprising any
    > number of occurrences of 'spa', each interspersed by any number of
    > occurrences of the 'm'. 'any number' includes zero, so the whole
    > pattern should match the empty string.
    >
    > Here's the conversation Python and i had about it:
    >
    > Python 2.6.4 (r264:75706, Jun 4 2010, 18:20:16)
    > [GCC 4.4.4 20100503 (Red Hat 4.4.4-2)] on linux2
    > Type "help", "copyright", "credits" or "license" for more information.
    >>>> import re
    >>>> re.compile("(spa|m*)*")

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in <module>
    > File "/usr/lib/python2.6/re.py", line 190, in compile
    > return _compile(pattern, flags)
    > File "/usr/lib/python2.6/re.py", line 245, in _compile
    > raise error, v # invalid expression
    > sre_constants.error: nothing to repeat
    >
    > What's going on here? Why is there nothing to repeat? Is the problem
    > having one *'d term inside another?
    >
    > Now, i could actually rewrite this particular pattern as '(spa|m)*'.
    > But what i neglected to mention above is that i'm actually generating
    > patterns from structures of objects (representations of XML DTDs, as
    > it happens), and as it stands, patterns like this are a possibility.
    >
    > Any thoughts on what i should do? Do i have to bite the bullet and
    > apply some cleverness in my pattern generation to avoid situations
    > like this?
    >
    > Thanks,
    > tom
    >

    I think you want to anchor your list, or anything will match. Perhaps

    re.compile('/^(spa(m)+)*$/')

    is what you need.

    Regards

    Ian
     
    Ian, Jan 9, 2011
    #2
    1. Advertising

  3. On Sun, 09 Jan 2011 16:49:35 +0000, Tom Anderson wrote:

    >
    > Any thoughts on what i should do? Do i have to bite the bullet and apply
    > some cleverness in my pattern generation to avoid situations like this?
    >

    This sort of works:

    import re
    f = open("test.txt")
    p = re.compile("(spam*)*")
    for line in f:
    print "input line: %s" % (line.strip())
    for m in p.findall(line):
    if m != "":
    print "==> %s" % (m)

    when I feed it
    =======================test.txt===========================
    a line with no match
    spa should match
    spam should match
    so should all of spaspamspammspammm
    and so should all of spa spam spamm spammm
    no match again.
    =======================test.txt===========================

    it produces:

    input line: a line with no match
    input line: spa should match
    ==> spa
    input line: spam should match
    ==> spam
    input line: so should all of spaspamspammspammm
    ==> spammm
    input line: and so should all of spa spam spamm spammm
    ==> spa
    ==> spam
    ==> spamm
    ==> spammm
    input line: no match again.

    so obviously there's a problem with greedy matching where there are no
    separators between adjacent matching strings. I tried non-greedy
    matching, e.g. r'(spam*?)*', but this was worse, so I'll be interested to
    see how the real regex mavens do it.


    --
    martin@ | Martin Gregorie
    gregorie. | Essex, UK
    org |
     
    Martin Gregorie, Jan 9, 2011
    #3
  4. Tom Anderson

    Ian Guest

    On 09/01/2011 17:49, Ian wrote:
    > I think you want to anchor your list, or anything will match. Perhaps

    My bad - this is better

    re.compile('^((spa)*(m)*)+$')

    search finds match in 'spa', 'spaspaspa', 'spammmspa', '' and 'mmm'

    search fails on 'spats', 'mats' and others.
     
    Ian, Jan 9, 2011
    #4
  5. Tom Anderson

    Terry Reedy Guest

    On 1/9/2011 11:49 AM, Tom Anderson wrote:
    > Hello everyone, long time no see,
    >
    > This is probably not a Python problem, but rather a regular expressions
    > problem.
    >
    > I want, for the sake of arguments, to match strings comprising any
    > number of occurrences of 'spa', each interspersed by any number of
    > occurrences of the 'm'. 'any number' includes zero, so the whole pattern
    > should match the empty string.


    All you sure? A pattern that matches the empty string matches every string.

    > Here's the conversation Python and i had about it:
    >
    > Python 2.6.4 (r264:75706, Jun 4 2010, 18:20:16)
    > [GCC 4.4.4 20100503 (Red Hat 4.4.4-2)] on linux2
    > Type "help", "copyright", "credits" or "license" for more information.
    >>>> import re
    >>>> re.compile("(spa|m*)*")



    I believe precedence rule of * tighter than | (not in the doc) makes
    this re is the same as "(spa|(m)*)*", which gives same error traceback.
    I believe that for this, re compiles first (spa)* and then ((m)*)* and
    the latter gives the same traceback. Either would seem to match strings
    of 'm's without and 'spa', which is not your spec.

    "((spa|m)*)*" does compile, so it is not the nesting itself.

    The doc does not give the formal grammar for Python re's, so it is hard
    to pinpoint which informal rule is violated, or if indeed the error is a
    bug. Someone else may do better.

    > Now, i could actually rewrite this particular pattern as '(spa|m)*'.


    That also does not match your spec.

    > Any thoughts on what i should do? Do i have to bite the bullet and apply
    > some cleverness in my pattern generation to avoid situations like this?


    Well, it has to generate legal re's according to the engine you are
    using (with whatever bugs and limitations it has).

    --
    Terry Jan Reedy
     
    Terry Reedy, Jan 9, 2011
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. frank

    Repeat DataGrid?

    frank, Jul 11, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    431
    frank
    Jul 11, 2003
  2. Jane sharpe

    How do I repeat pages.....

    Jane sharpe, Dec 17, 2003, in forum: ASP .Net
    Replies:
    6
    Views:
    476
    Jane sharpe
    Dec 18, 2003
  3. SamIAm
    Replies:
    5
    Views:
    5,707
    S. Justin Gengo
    Jan 7, 2004
  4. Mike Lerch
    Replies:
    2
    Views:
    439
    Mike Lerch
    Mar 4, 2004
  5. Devin Jeanpierre
    Replies:
    2
    Views:
    497
    Devin Jeanpierre
    Feb 14, 2012
Loading...

Share This Page