Need help in Python regular expression

M

meryl

Hi,

I have this regular expression
blockRE = re.compile(".*RenderBlock {\w+}")

it works if my source is "RenderBlock {CENTER}".

But I want it to work with
1. RenderTable {TABLE}

So i change the regexp to re.compile(".*Render[Block|Table] {\w+}"),
but that breaks everything

2. RenderBlock (CENTER)

So I change the regexp to re.compile(".*RenderBlock {|\(\w+}|\)"),
that also breaks everything

Can you please tell me how to change my reg exp so that I can support
all 3 cases:
RenderTable {TABLE}
RenderBlock (CENTER)
RenderBlock {CENTER}

Thank you.
 
M

Mark Tolonen

meryl said:
Hi,

I have this regular expression
blockRE = re.compile(".*RenderBlock {\w+}")

it works if my source is "RenderBlock {CENTER}".

But I want it to work with
1. RenderTable {TABLE}

So i change the regexp to re.compile(".*Render[Block|Table] {\w+}"),
but that breaks everything

2. RenderBlock (CENTER)

So I change the regexp to re.compile(".*RenderBlock {|\(\w+}|\)"),
that also breaks everything

Can you please tell me how to change my reg exp so that I can support
all 3 cases:
RenderTable {TABLE}
RenderBlock (CENTER)
RenderBlock {CENTER}

[abcd] syntax matches a single character from the set. Use non-grouping
parentheses instead:

-----------------------code----------------------
import re
pat = re.compile(r'Render(?:Block|Table) (?:\(\w+\)|{\w+})')

testdata = '''\
RenderTable {TABLE}
RenderBlock (CENTER)
RenderBlock {CENTER}
RenderTable {TABLE) #shouldn't match
'''

print pat.findall(testdata)
---------------------------------------------------

Result:

['RenderTable {TABLE}', 'RenderBlock (CENTER)', 'RenderBlock {CENTER}']

-Mark
 
J

John S

Hi,

I have this regular expression
blockRE = re.compile(".*RenderBlock {\w+}")

it works if my source is "RenderBlock {CENTER}".

But I want it to work with
1. RenderTable {TABLE}

So i change the regexp to re.compile(".*Render[Block|Table] {\w+}"),
but that breaks everything

2. RenderBlock (CENTER)

So I change the regexp to re.compile(".*RenderBlock {|\(\w+}|\)"),
that also breaks everything

Can you please tell me how to change my reg exp so that I can support
all 3 cases:
RenderTable {TABLE}
RenderBlock (CENTER)
RenderBlock {CENTER}

Thank you.

Short answer:

r = re.compile(r"Render(?:Block|Table)\s+[({](?:TABLE|CENTER)[})]")

s = """
blah blah blah
blah blah blah RenderBlock {CENTER} blah blah RenderBlock {CENTER}
blah blah blah RenderTable {TABLE} blah blah RenderBlock (CENTER)
blah blah blah
"""

print r.findall(s)



output:
['RenderBlock {CENTER}', 'RenderBlock {CENTER}', 'RenderTable
{TABLE}', 'RenderBlock (CENTER)']



Note that [] only encloses characters, not strings; [foo|bar] matches
'f','o','|','b','a', or 'r', not "foo" or "bar".
Use (foo|bar) to match "foo" or "bar"; (?xxx) matches xxx without
making a backreference (i.e., without capturing text).

HTH

-- John Strickler
 
M

meryl

I have this regular expression
blockRE = re.compile(".*RenderBlock {\w+}")
it works if my source is "RenderBlock {CENTER}".
But I want it to work with
1. RenderTable {TABLE}
So i change the regexp to re.compile(".*Render[Block|Table] {\w+}"),
but that breaks everything
2. RenderBlock (CENTER)
So I change the regexp to re.compile(".*RenderBlock {|\(\w+}|\)"),
that also breaks everything
Can you please tell me how to change my reg exp so that I can support
all 3 cases:
RenderTable {TABLE}
RenderBlock (CENTER)
RenderBlock {CENTER}

[abcd] syntax matches a single character from the set.  Use non-grouping
parentheses instead:

-----------------------code----------------------
import re
pat = re.compile(r'Render(?:Block|Table) (?:\(\w+\)|{\w+})')

testdata = '''\
RenderTable {TABLE}
RenderBlock (CENTER)
RenderBlock {CENTER}
RenderTable {TABLE)      #shouldn't match
'''

print pat.findall(testdata)
---------------------------------------------------

Result:

['RenderTable {TABLE}', 'RenderBlock (CENTER)', 'RenderBlock {CENTER}']

-Mark

Thanks for both of your help. How can i modify the RegExp so that
both
RenderTable {TABLE}
and
RenderTable {TABLE} [text with a-zA-Z=SPACE0-9]
will match

I try adding ".*" at the end , but it ends up just matching the second
one.

Thanks again.
 
V

Vlastimil Brom

2009/6/12 meryl said:
I try adding ".*" at the end , but it ends up just matching the second
one.

If there can be more matches in a line, maybe the non-greedy
quantifier ".*?", and a lookahead assertion can help.
You can try something like:
(?m)Render(?:Block|Table) (?:\(\w+\)|{\w+})(.+?(?=$|RenderBlock))?

(?m) multiline flag - also the end of line can be matched with $
..+? any character - one or more (no greedy, i.e. as little as possible)
(?=$|RenderBlock) the lookahead assertion - condition for the
following string - not part of the match - here the end of line/string
or "RenderBlock"

I guess, if you need to add more possibilities or conditions depending
on your source data, it might get too complex for a single regular
expression to match effectively.

hth
vbr
 
J

Jean-Michel Pichavant

To the OP,

I suggest if you haven't yet Kodos, to get it here
http://kodos.sourceforge.net/.
It's a python regexp debugger, a lifetime saver.

Jean-Michel

John said:
Hi,

I have this regular expression
blockRE = re.compile(".*RenderBlock {\w+}")

it works if my source is "RenderBlock {CENTER}".

But I want it to work with
1. RenderTable {TABLE}

So i change the regexp to re.compile(".*Render[Block|Table] {\w+}"),
but that breaks everything

2. RenderBlock (CENTER)

So I change the regexp to re.compile(".*RenderBlock {|\(\w+}|\)"),
that also breaks everything

Can you please tell me how to change my reg exp so that I can support
all 3 cases:
RenderTable {TABLE}
RenderBlock (CENTER)
RenderBlock {CENTER}

Thank you.

Short answer:

r = re.compile(r"Render(?:Block|Table)\s+[({](?:TABLE|CENTER)[})]")

s = """
blah blah blah
blah blah blah RenderBlock {CENTER} blah blah RenderBlock {CENTER}
blah blah blah RenderTable {TABLE} blah blah RenderBlock (CENTER)
blah blah blah
"""

print r.findall(s)



output:
['RenderBlock {CENTER}', 'RenderBlock {CENTER}', 'RenderTable
{TABLE}', 'RenderBlock (CENTER)']



Note that [] only encloses characters, not strings; [foo|bar] matches
'f','o','|','b','a', or 'r', not "foo" or "bar".
Use (foo|bar) to match "foo" or "bar"; (?xxx) matches xxx without
making a backreference (i.e., without capturing text).

HTH

-- John Strickler
 
R

Rhodri James

[snip]
-----------------------code----------------------
import re
pat = re.compile(r'Render(?:Block|Table) (?:\(\w+\)|{\w+})')

testdata = '''\
RenderTable {TABLE}
RenderBlock (CENTER)
RenderBlock {CENTER}
RenderTable {TABLE)      #shouldn't match
'''

print pat.findall(testdata)
---------------------------------------------------

Result:

['RenderTable {TABLE}', 'RenderBlock (CENTER)', 'RenderBlock {CENTER}']

-Mark

Thanks for both of your help. How can i modify the RegExp so that
both
RenderTable {TABLE}
and
RenderTable {TABLE} [text with a-zA-Z=SPACE0-9]
will match

I try adding ".*" at the end , but it ends up just matching the second
one.

Curious, it should work (and match rather more than you want, but
that's another matter. Try adding this instead:

'(?: \[[a-zA-Z= 0-9]*\])?'

Personally I'd replace all those spaces with \s* or \s+, but I'm
paranoid when it comes to whitespace.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top