Python re repetative matching

Discussion in 'Python' started by Rich, Dec 22, 2003.

  1. Rich

    Rich Guest

    Im new to regex's and cant quite figure out how to get them to work, what
    I want is a tuple of all the matches from the regex. Ive simplified my
    actual problem and still cant get it to work

    Ive so far got this:

    print re.findall( r'(@\d+)|(\w+)', "@5489 heel all and thumb toe" )

    This dose exactly what I want, except it matches both matches each time,
    so I end up with a list full tuples each with blank elements.... so close

    I also tried my orginal idea

    a = re.match( r'(@\d+)\s+(\w+)', "@5489 heel all and thumb toe" )
    print a.groups()

    This matches the number and the first word, so I thought the following
    should rematch after the first word and give me what I wanted... but it
    dosent for some reason

    a = re.match( r'(@\d+)\s+(?:(\w+)\s*)', "@5489 heel all and thumb toe" )
    print a.groups()

    This is my next iteration, still gives me the number (first group) and
    only the word (the second match). So I extend it to ...

    a = re.match( r'(@\d+)\s+(?:(\w+)\s*)*', "@5489 heel all and thumb toe" )
    print a.groups()

    Now this gives me the number and the last but one word ? WHY!

    My logic suggests that this should do what I want... what am I missing,
    Ive spent all night trying to figure this out.

    Cheers

    Rich
     
    Rich, Dec 22, 2003
    #1
    1. Advertising

  2. Rich wrote in message ...
    >Im new to regex's and cant quite figure out how to get them to work, what
    >I want is a tuple of all the matches from the regex. Ive simplified my
    >actual problem and still cant get it to work


    For the following answers I assume you only feed one line at a time. (If
    this is an unacceptable restriction, things get uglier.)

    First, try and think if you need re's. Re's are always last resort. In
    this particular case, it seems to me that

    s = "@5489 heel all and thumb toe"
    s.split(' ', 1)

    is all you need. If you need more precision (and the digit sequence is
    always 4 chars long), the basic pattern is as follows:

    re.split(r'(?<=@\d{4}) (?=.*)', s)

    >Ive so far got this:
    >print re.findall( r'(@\d+)|(\w+)', "@5489 heel all and thumb toe" )


    You need nongrouping parens, and \w+ will split words.

    Split to digits and words, discarding nothing:
    re.findall(r'(?:mad:\d{4})|(?:.+)', s)

    Split each item separately, discarding whitespace.
    re.findall(r'(?:mad:\d{4})|(?:\w+)', s)

    >I also tried my orginal idea
    >
    >a = re.match( r'(@\d+)\s+(\w+)', "@5489 heel all and thumb toe" )
    >print a.groups()


    re.match( r'(@\d+) (.+)', s ).groups()

    >This matches the number and the first word, so I thought the following
    >should rematch after the first word and give me what I wanted... but it
    >dosent for some reason


    It doesn't because '\w' means 'words', i.e. [1-9a-zA-Z_]. It doesn't match
    spaces, so once it comes up against a space, it stops.

    >
    >a = re.match( r'(@\d+)\s+(?:(\w+)\s*)', "@5489 heel all and thumb toe" )
    >print a.groups()


    So you do know about nongrouping parens? Anyway, this doesn't match after
    the first word because it only matches words, not spaces.

    >This is my next iteration, still gives me the number (first group) and
    >only the word (the second match). So I extend it to ...
    >
    >a = re.match( r'(@\d+)\s+(?:(\w+)\s*)*', "@5489 heel all and thumb toe" )
    >print a.groups()
    >
    >Now this gives me the number and the last but one word ? WHY!


    Because * does not magically make new groups. It seems to me it should
    match the last word, though, instead of next-to-last, but I won't think
    about it too much because this re is hideous as it is, and shouldn't be
    used.

    >My logic suggests that this should do what I want... what am I missing,
    >Ive spent all night trying to figure this out.


    Your first error was using regular expressions:

    'Some people, when confronted with a problem, think "I know, I'll use
    regular expressions". Now they have two problems.' --Jamie Zawinski,
    comp.lang.emacs

    Use string methods, especially split().

    Also, I am no longer sure whether you want all items/words to be groups
    separately, or if you want one group of numbers, and the rest words. Either
    one is trivial for string methods:

    s.split() for each in a group.
    s.split(' ', 1) for only two groups.

    However, the first one is impossible for REs (I think) if the number of
    groups is variable, and ugly if the number of groups is fixed. The second
    one I've done ad nauseum here.

    See the RE Howto:
    http://www.amk.ca/python/howto/regex/

    Also, there's an O'Reilly book "Mastering Regular Expressions" which is said
    to be excellent. Also Mertz wrote a "Text Processing with Python" (or
    something like that) which is also said to be excellent. Mertz also has a
    bunch of online columns on Python, all of which are very good. But my guess
    is that you don't really need any of these.
    --
    Francis Avila
     
    Francis Avila, Dec 23, 2003
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Istvan Albert
    Replies:
    0
    Views:
    453
    Istvan Albert
    Aug 31, 2003
  2. Xah Lee
    Replies:
    9
    Views:
    867
    Chris Smith
    Feb 2, 2005
  3. Marc Bissonnette

    Pattern matching : not matching problem

    Marc Bissonnette, Jan 8, 2004, in forum: Perl Misc
    Replies:
    9
    Views:
    237
    Marc Bissonnette
    Jan 13, 2004
  4. Brad
    Replies:
    5
    Views:
    88
    Tad McClellan
    Oct 27, 2004
  5. Bobby Chamness
    Replies:
    2
    Views:
    230
    Xicheng Jia
    May 3, 2007
Loading...

Share This Page