Question: Optional Regular Expression Grouping

Discussion in 'Python' started by galyle, Oct 10, 2011.

  1. galyle

    galyle Guest

    HI, I've looked through this forum, but I haven't been able to find a
    resolution to the problem I'm having (maybe I didn't look hard enough
    -- I have to believe this has come up before). The problem is this:
    I have a file which has 0, 2, or 3 groups that I'd like to record;
    however, in the case of 3 groups, the third group is correctly
    captured, but the first two groups get collapsed into just one group.
    I'm sure that I'm missing something in the way I've constructed my
    regular expression, but I can't figure out what's wrong. Does anyone
    have any suggestions?

    The demo below showcases the problem I'm having:

    import re

    valid_line = re.compile('^\[(\S+)\]\[(\S+)\](?:\s+|\[(\S+)\])=|\s+[\d\
    [\']+.*$')
    line1 = "[field1][field2] = blarg"
    line2 = " 'a continuation of blarg'"
    line3 = "[field1][field2][field3] = blorg"

    m = valid_line.match(line1)
    print 'Expected: ' + m.group(1) + ', ' + m.group(2)
    m = valid_line.match(line2)
    print 'Expected: ' + str(m.group(1))
    m = valid_line.match(line3)
    print 'Uh-oh: ' + m.group(1) + ', ' + m.group(2)
    galyle, Oct 10, 2011
    #1
    1. Advertising

  2. galyle

    MRAB Guest

    On 10/10/2011 22:57, galyle wrote:
    > HI, I've looked through this forum, but I haven't been able to find a
    > resolution to the problem I'm having (maybe I didn't look hard enough
    > -- I have to believe this has come up before). The problem is this:
    > I have a file which has 0, 2, or 3 groups that I'd like to record;
    > however, in the case of 3 groups, the third group is correctly
    > captured, but the first two groups get collapsed into just one group.
    > I'm sure that I'm missing something in the way I've constructed my
    > regular expression, but I can't figure out what's wrong. Does anyone
    > have any suggestions?
    >
    > The demo below showcases the problem I'm having:
    >
    > import re
    >
    > valid_line = re.compile('^\[(\S+)\]\[(\S+)\](?:\s+|\[(\S+)\])=|\s+[\d\
    > [\']+.*$')
    > line1 = "[field1][field2] = blarg"
    > line2 = " 'a continuation of blarg'"
    > line3 = "[field1][field2][field3] = blorg"
    >
    > m = valid_line.match(line1)
    > print 'Expected: ' + m.group(1) + ', ' + m.group(2)
    > m = valid_line.match(line2)
    > print 'Expected: ' + str(m.group(1))
    > m = valid_line.match(line3)
    > print 'Uh-oh: ' + m.group(1) + ', ' + m.group(2)


    Instead of "\S" I'd recommend using "[^\]]", or using a lazy repetition
    "\S+?".

    You'll also need to handle the space before the "=" in line3.

    valid_line =
    re.compile(r'^\[(\[^\]]+)\]\[(\[^\]]+)\](?:\s+|\[(\[^\]]+)\])\s*=|\s+[\d\[\']+.*$')
    MRAB, Oct 10, 2011
    #2
    1. Advertising

  3. Re: Question: Optional Regular Expression Grouping

    2011/10/10 galyle <>:
    > HI, I've looked through this forum, but I haven't been able to find a
    > resolution to the problem I'm having (maybe I didn't look hard enough
    > -- I have to believe this has come up before).  The problem is this:
    > I have a file which has 0, 2, or 3 groups that I'd like to record;
    > however, in the case of 3 groups, the third group is correctly
    > captured, but the first two groups get collapsed into just one group.
    > I'm sure that I'm missing something in the way I've constructed my
    > regular expression, but I can't figure out what's wrong.  Does anyone
    > have any suggestions?
    >
    > The demo below showcases the problem I'm having:
    >
    > import re
    >
    > valid_line = re.compile('^\[(\S+)\]\[(\S+)\](?:\s+|\[(\S+)\])=|\s+[\d\
    > [\']+.*$')
    > line1 = "[field1][field2] = blarg"
    > line2 = "    'a continuation of blarg'"
    > line3 = "[field1][field2][field3] = blorg"
    >
    > m = valid_line.match(line1)
    > print 'Expected: ' + m.group(1) + ', ' + m.group(2)
    > m = valid_line.match(line2)
    > print 'Expected: ' + str(m.group(1))
    > m = valid_line.match(line3)
    > print 'Uh-oh: ' + m.group(1) + ', ' + m.group(2)
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >


    Hi,
    I believe, the space before = is causing problems (or the pattern missingit);
    you also need non greedy quantifiers +? to match as little as possible
    as opposed to the greedy default:

    valid_line = re.compile('^\[(\S+?)\]\[(\S+?)\](?:\s+|\[(\S+)\])\s*=|\s+[\d\[\']+.*$')

    or you can use word-patterns explicitely excluding the closing ], like:

    valid_line = re.compile('^\[([^\]]+)\]\[([^\]]+)\](?:\s+|\[([^\]]+)\])\s*=|\s+[\d\[\']+.*$')

    hth
    vbr
    Vlastimil Brom, Oct 10, 2011
    #3
  4. galyle

    Ian Kelly Guest

    Re: Question: Optional Regular Expression Grouping

    On Mon, Oct 10, 2011 at 4:49 PM, MRAB <> wrote:
    > Instead of "\S" I'd recommend using "[^\]]", or using a lazy repetition
    > "\S+?".


    Preferably the former. The core problem is that the regex matches
    ambiguously on the problem string. Lazy repetition doesn't remove
    that ambiguity; it merely attempts to make the module prefer the match
    that you prefer.

    Other notes to the OP: Always use raw strings (r'') when writing
    regex patterns, to make sure the backslashes are escape characters in
    the pattern rather than in the string literal.

    The '^foo|bar$' construct you're using is wonky. I think you're
    writing this to mean "match if the entire string is either 'foo' or
    'bar'". But what that actually matches is "anything that either
    starts with 'foo' or ends with 'bar'". The correct way to do the
    former would be either '^foo$|^bar$' or '^(?:foo|bar)$'.
    Ian Kelly, Oct 11, 2011
    #4
  5. galyle

    galyle Guest

    Re: Question: Optional Regular Expression Grouping

    On Oct 10, 4:59 pm, Vlastimil Brom <> wrote:
    > 2011/10/10 galyle <>:
    >
    >
    >
    >
    >
    >
    >
    >
    >
    > > HI, I've looked through this forum, but I haven't been able to find a
    > > resolution to the problem I'm having (maybe I didn't look hard enough
    > > -- I have to believe this has come up before).  The problem is this:
    > > I have a file which has 0, 2, or 3 groups that I'd like to record;
    > > however, in the case of 3 groups, the third group is correctly
    > > captured, but the first two groups get collapsed into just one group.
    > > I'm sure that I'm missing something in the way I've constructed my
    > > regular expression, but I can't figure out what's wrong.  Does anyone
    > > have any suggestions?

    >
    > > The demo below showcases the problem I'm having:

    >
    > > import re

    >
    > > valid_line = re.compile('^\[(\S+)\]\[(\S+)\](?:\s+|\[(\S+)\])=|\s+[\d\
    > > [\']+.*$')
    > > line1 = "[field1][field2] = blarg"
    > > line2 = "    'a continuation of blarg'"
    > > line3 = "[field1][field2][field3] = blorg"

    >
    > > m = valid_line.match(line1)
    > > print 'Expected: ' + m.group(1) + ', ' + m.group(2)
    > > m = valid_line.match(line2)
    > > print 'Expected: ' + str(m.group(1))
    > > m = valid_line.match(line3)
    > > print 'Uh-oh: ' + m.group(1) + ', ' + m.group(2)
    > > --
    > >http://mail.python.org/mailman/listinfo/python-list

    >
    > Hi,
    > I believe, the space before = is causing problems (or the pattern missing it);
    > you also need non greedy quantifiers +? to match as little as possible
    > as opposed to the greedy default:
    >
    > valid_line = re.compile('^\[(\S+?)\]\[(\S+?)\](?:\s+|\[(\S+)\])\s*=|\s+[\d\[\']+.*$')
    >
    > or you can use word-patterns explicitely excluding the closing ], like:
    >
    > valid_line = re.compile('^\[([^\]]+)\]\[([^\]]+)\](?:\s+|\[([^\]]+)\])\s*=|\s+[\d\[\']+. *$')
    >
    > hth
    >  vbr


    Thanks, I had a feeling that greedy matching in my expression was
    causing problem. Your suggestion makes sense to me, and works quite
    well.
    galyle, Oct 11, 2011
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,252
  2. ds2000
    Replies:
    2
    Views:
    402
    Jonathan Gardner
    May 24, 2004
  3. Replies:
    6
    Views:
    361
    Paul McGuire
    Aug 12, 2007
  4. Virtual Buddha
    Replies:
    3
    Views:
    340
    Virtual Buddha
    Jun 27, 2009
  5. Tom
    Replies:
    5
    Views:
    88
    Randy Webb
    Nov 16, 2006
Loading...

Share This Page