regex problem

Discussion in 'Python' started by Odd-R., Jul 26, 2005.

  1. Odd-R.

    Odd-R. Guest

    Input is a string of four digit sequences, possibly
    separated by a -, for instance like this

    "1234,2222-8888,4567,"

    My regular expression is like this:

    rx1=re.compile(r"""\A(\b\d\d\d\d,|\b\d\d\d\d-\d\d\d\d,)*\Z""")

    When running rx1.findall("1234,2222-8888,4567,")

    I only get the last match as the result. Isn't
    findall suppose to return all the matches?

    Thanks in advance.


    --
    Har du et kjøleskap, har du en TV
    så har du alt du trenger for å leve

    -Jokke & Valentinerne
     
    Odd-R., Jul 26, 2005
    #1
    1. Advertising

  2. Am Tue, 26 Jul 2005 09:57:23 +0000 schrieb Odd-R.:

    > Input is a string of four digit sequences, possibly
    > separated by a -, for instance like this
    >
    > "1234,2222-8888,4567,"
    >
    > My regular expression is like this:
    >
    > rx1=re.compile(r"""\A(\b\d\d\d\d,|\b\d\d\d\d-\d\d\d\d,)*\Z""")


    Hi,

    try it without \A and \Z

    import re
    rx1=re.compile(r"""(\b\d\d\d\d,|\b\d\d\d\d-\d\d\d\d,)""")
    print rx1.findall("1234,2222-8888,4567,")
    # --> ['1234,', '2222-8888,', '4567,']

    Thomas

    --
    Thomas Güttler, http://www.thomas-guettler.de/
     
    Thomas Guettler, Jul 26, 2005
    #2
    1. Advertising

  3. Odd-R.

    John Machin Guest

    Odd-R. wrote:
    > Input is a string of four digit sequences, possibly
    > separated by a -, for instance like this
    >
    > "1234,2222-8888,4567,"
    >
    > My regular expression is like this:
    >
    > rx1=re.compile(r"""\A(\b\d\d\d\d,|\b\d\d\d\d-\d\d\d\d,)*\Z""")
    >
    > When running rx1.findall("1234,2222-8888,4567,")
    >
    > I only get the last match as the result. Isn't
    > findall suppose to return all the matches?


    For a start, an expression that starts with \A and ends with \Z will
    match the whole string (or not match at all). You have only one match.

    Secondly, as you have a group in your expression, findall returns what
    the group matches. Your expression matches zero or more of what your
    group matches, provided there is nothing else at the start/end of the
    string. The "zero or more" makes the re engine waltz about a bit; when
    the music stopped, the group was matching "4567,".

    Thirdly, findall should be thought of as merely a wrapper around a loop
    using the search method -- it finds all non-overlapping matches of a
    pattern. So the clue to get from this is that you need a really simple
    pattern, like the following. You *don't* have to write an expression
    that does the looping.

    So here's the mean lean no-flab version -- you don't even need the
    parentheses (sorry, Thomas).

    >>> rx1=re.compile(r"""\b\d\d\d\d,|\b\d\d\d\d-\d\d\d\d,""")
    >>> rx1.findall("1234,2222-8888,4567,")

    ['1234,', '2222-8888,', '4567,']

    HTH,
    John
     
    John Machin, Jul 26, 2005
    #3
  4. Odd-R.

    Duncan Booth Guest

    John Machin wrote:

    > So here's the mean lean no-flab version -- you don't even need the
    > parentheses (sorry, Thomas).
    >
    > >>> rx1=re.compile(r"""\b\d\d\d\d,|\b\d\d\d\d-\d\d\d\d,""")
    > >>> rx1.findall("1234,2222-8888,4567,")

    > ['1234,', '2222-8888,', '4567,']


    No flab? What about all that repetition of \d? A less flabby version:

    >>> rx1=re.compile(r"""\b\d{4}(?:-\d{4})?,""")
    >>> rx1.findall("1234,2222-8888,4567,")

    ['1234,', '2222-8888,', '4567,']
     
    Duncan Booth, Jul 26, 2005
    #4
  5. Odd-R.

    John Machin Guest

    Duncan Booth wrote:
    > John Machin wrote:
    >
    >
    >>So here's the mean lean no-flab version -- you don't even need the
    >>parentheses (sorry, Thomas).
    >>
    >>
    >>>>>rx1=re.compile(r"""\b\d\d\d\d,|\b\d\d\d\d-\d\d\d\d,""")
    >>>>>rx1.findall("1234,2222-8888,4567,")

    >>
    >>['1234,', '2222-8888,', '4567,']

    >
    >
    > No flab? What about all that repetition of \d? A less flabby version:
    >
    >
    >>>>rx1=re.compile(r"""\b\d{4}(?:-\d{4})?,""")
    >>>>rx1.findall("1234,2222-8888,4567,")

    >
    > ['1234,', '2222-8888,', '4567,']
    >



    OK, good idea to factor out the prefix and follow it by optional -1234.
    However optimising re engines do common prefix factoring, *and* they
    rewrite stuff like x{4} as xxxx.

    Cheers,
    John
     
    John Machin, Jul 26, 2005
    #5
  6. Odd-R.

    Odd-R. Guest

    On 2005-07-26, Duncan Booth <> wrote:
    >>>> rx1=re.compile(r"""\b\d{4}(?:-\d{4})?,""")
    >>>> rx1.findall("1234,2222-8888,4567,")

    > ['1234,', '2222-8888,', '4567,']


    Thanks all for good advice. However this last expression
    also matches the first four digits when the input is more
    than four digits. To resolve this problem, I first do a
    match of this,

    regex=re.compile(r"""\A(\b\d{4},|\d{4}-\d{4},)*(\b\d{4}|\d{4}-\d{4})\Z""")

    If this turns out ok, I do a find all with your expression, and then I get
    the desired result.


    --
    Har du et kjøleskap, har du en TV
    så har du alt du trenger for å leve

    -Jokke & Valentinerne
     
    Odd-R., Jul 27, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SmViQnVzaGVsbA==?=

    Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine?

    =?Utf-8?B?SmViQnVzaGVsbA==?=, Oct 22, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    716
    =?Utf-8?B?SmViQnVzaGVsbA==?=
    Oct 22, 2005
  2. Rick Venter

    perl regex to java regex

    Rick Venter, Oct 29, 2003, in forum: Java
    Replies:
    5
    Views:
    1,639
    Ant...
    Nov 6, 2003
  3. Replies:
    2
    Views:
    607
  4. Xah Lee
    Replies:
    1
    Views:
    951
    Ilias Lazaridis
    Sep 22, 2006
  5. Replies:
    3
    Views:
    780
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page