Help with regex and optional substring in search string

Discussion in 'Python' started by Timur Tabi, Oct 14, 2009.

  1. Timur Tabi

    Timur Tabi Guest

    I'm having trouble creating a regex pattern that matches a string that
    has an optional substring in it. What I'm looking for is a pattern
    that matches both of these strings:

    Subject: [PATCH 08/18] This is the patch name
    Subject: This is the patch name

    What I want is to extract the "This is the patch name". I tried this:

    m = re.search('Subject:\s*(\[[\w\s]*\])*(.*)', x)

    Unfortunately, the second group appears to be too greedy, and returns
    this:

    >>> print m.group(1)

    None
    >>> print m.group(2)

    [PATCH 08/18] Subject line
    >>>


    Can anyone help me? I'd hate to have to use two regex patterns, one
    with the [...] and one without.
     
    Timur Tabi, Oct 14, 2009
    #1
    1. Advertising

  2. Timur Tabi

    Timur Tabi Guest

    On Oct 14, 9:51 am, Timur Tabi <> wrote:
    > I'm having trouble creating a regex pattern that matches a string that
    > has an optional substring in it.  What I'm looking for is a pattern
    > that matches both of these strings:
    >
    > Subject: [PATCH 08/18] This is the patch name
    > Subject: This is the patch name
    >
    > What I want is to extract the "This is the patch name".  I tried this:
    >
    > m = re.search('Subject:\s*(\[[\w\s]*\])*(.*)', x)


    Never mind ... I figured it out. The middle block should have been [\w
    \s/]*
     
    Timur Tabi, Oct 14, 2009
    #2
    1. Advertising

  3. Timur Tabi

    Zero Piraeus Guest

    :

    2009/10/14 Timur Tabi <>:
    > I'm having trouble creating a regex pattern that matches a string that
    > has an optional substring in it.  What I'm looking for is a pattern
    > that matches both of these strings:
    >
    > Subject: [PATCH 08/18] This is the patch name
    > Subject: This is the patch name
    >
    > What I want is to extract the "This is the patch name".  I tried this:
    >
    > m = re.search('Subject:\s*(\[[\w\s]*\])*(.*)', x)
    >
    > Unfortunately, the second group appears to be too greedy, and returns
    > this:
    >
    >>>> print m.group(1)

    > None
    >>>> print m.group(2)

    > [PATCH 08/18] Subject line


    It's not that the second group is too greedy. The first group isn't
    matching what you want it to, because neither \w nor \s match the "/"
    inside your brackets. This works for your example input:

    >>> import re
    >>> pattern = re.compile("Subject:\s*(?:\[[^\]]*\])?\s*(.*)")
    >>> for s in (

    .... "Subject: [PATCH 08/18] This is the patch name",
    .... "Subject: This is the patch name",
    .... ):
    .... re.search(pattern, s).group(1)
    ....
    'This is the patch name'
    'This is the patch name'

    Going through the changes from your original regex in order:

    '(?:etc)' instead of '(etc)' are non-grouping parentheses (since you
    apparently don't care about that bit).

    '[^\]]' instead of '[\w\s]' matches "everything except a closing bracket".

    The '\s*' before the second set of parentheses takes out the leading
    whitespace that would otherwise be returned as part of the match.

    -[]z.
     
    Zero Piraeus, Oct 14, 2009
    #3
  4. Timur Tabi

    Zero Piraeus Guest

    :

    2009/10/14 Timur Tabi <>:
    > Never mind ... I figured it out.  The middle block should have been [\w
    > \s/]*


    This is fragile - you'll have to keep adding extra characters to match
    if the input turns out to contain them.

    -[]z.
     
    Zero Piraeus, Oct 14, 2009
    #4
  5. Timur Tabi

    Timur Tabi Guest

    On Wed, Oct 14, 2009 at 10:30 AM, Zero Piraeus <> wrote:

    > '(?:etc)' instead of '(etc)' are non-grouping parentheses (since you
    > apparently don't care about that bit).


    Ah yes, thanks.

    > '[^\]]' instead of '[\w\s]' matches "everything except a closing bracket".


    I originally had just '[^\]', and I couldn't figure out why it
    wouldn't work. Maybe I need new glasses.

    > The '\s*' before the second set of parentheses takes out the leading
    > whitespace that would otherwise be returned as part of the match.


    And I want that. The next line of my code is:

    description = m.group(2).strip() + "\n\n"

    --
    Timur Tabi
    Linux kernel developer at Freescale
     
    Timur Tabi, Oct 14, 2009
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Nico Grubert
    Replies:
    1
    Views:
    463
    Pierre Quentel
    Jun 21, 2006
  2. Replies:
    3
    Views:
    232
    Sherm Pendley
    Aug 3, 2005
  3. Paul

    search substring in a large string

    Paul, Dec 9, 2005, in forum: Perl Misc
    Replies:
    5
    Views:
    139
    Brian McCauley
    Dec 10, 2005
  4. enrique
    Replies:
    14
    Views:
    259
    Dr John Stockton
    May 23, 2005
  5. SM
    Replies:
    4
    Views:
    228
Loading...

Share This Page