Help with regex and optional substring in search string

T

Timur Tabi

I'm having trouble creating a regex pattern that matches a string that
has an optional substring in it. What I'm looking for is a pattern
that matches both of these strings:

Subject: [PATCH 08/18] This is the patch name
Subject: This is the patch name

What I want is to extract the "This is the patch name". I tried this:

m = re.search('Subject:\s*(\[[\w\s]*\])*(.*)', x)

Unfortunately, the second group appears to be too greedy, and returns
this:
print m.group(1) None
print m.group(2) [PATCH 08/18] Subject line

Can anyone help me? I'd hate to have to use two regex patterns, one
with the [...] and one without.
 
T

Timur Tabi

I'm having trouble creating a regex pattern that matches a string that
has an optional substring in it.  What I'm looking for is a pattern
that matches both of these strings:

Subject: [PATCH 08/18] This is the patch name
Subject: This is the patch name

What I want is to extract the "This is the patch name".  I tried this:

m = re.search('Subject:\s*(\[[\w\s]*\])*(.*)', x)

Never mind ... I figured it out. The middle block should have been [\w
\s/]*
 
Z

Zero Piraeus

:

2009/10/14 Timur Tabi said:
I'm having trouble creating a regex pattern that matches a string that
has an optional substring in it.  What I'm looking for is a pattern
that matches both of these strings:

Subject: [PATCH 08/18] This is the patch name
Subject: This is the patch name

What I want is to extract the "This is the patch name".  I tried this:

m = re.search('Subject:\s*(\[[\w\s]*\])*(.*)', x)

Unfortunately, the second group appears to be too greedy, and returns
this:
[PATCH 08/18] Subject line

It's not that the second group is too greedy. The first group isn't
matching what you want it to, because neither \w nor \s match the "/"
inside your brackets. This works for your example input:
import re
pattern = re.compile("Subject:\s*(?:\[[^\]]*\])?\s*(.*)")
for s in (
.... "Subject: [PATCH 08/18] This is the patch name",
.... "Subject: This is the patch name",
.... ):
.... re.search(pattern, s).group(1)
....
'This is the patch name'
'This is the patch name'

Going through the changes from your original regex in order:

'(?:etc)' instead of '(etc)' are non-grouping parentheses (since you
apparently don't care about that bit).

'[^\]]' instead of '[\w\s]' matches "everything except a closing bracket".

The '\s*' before the second set of parentheses takes out the leading
whitespace that would otherwise be returned as part of the match.

-[]z.
 
Z

Zero Piraeus

:

2009/10/14 Timur Tabi said:
Never mind ... I figured it out.  The middle block should have been [\w
\s/]*

This is fragile - you'll have to keep adding extra characters to match
if the input turns out to contain them.

-[]z.
 
T

Timur Tabi

'(?:etc)' instead of '(etc)' are non-grouping parentheses (since you
apparently don't care about that bit).

Ah yes, thanks.
'[^\]]' instead of '[\w\s]' matches "everything except a closing bracket".

I originally had just '[^\]', and I couldn't figure out why it
wouldn't work. Maybe I need new glasses.
The '\s*' before the second set of parentheses takes out the leading
whitespace that would otherwise be returned as part of the match.

And I want that. The next line of my code is:

description = m.group(2).strip() + "\n\n"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top