A bug in Python's regular expression engine?

  • Thread starter Just Another Victim of the Ambient Morality
  • Start date
J

Just Another Victim of the Ambient Morality

This won't compile for me:


regex = re.compile('(.*\\).*')


I get the error:


sre_constants.error: unbalanced parenthesis


I'm running Python 2.5 on WinXP. I've tried this expression with
another RE engine in another language and it works just fine which leads me
to believe the problem is Python. Can anyone confirm or deny this bug?
Thank you...
 
D

Diez B. Roggisch

Just said:
This won't compile for me:


regex = re.compile('(.*\\).*')


I get the error:


sre_constants.error: unbalanced parenthesis


I'm running Python 2.5 on WinXP. I've tried this expression with
another RE engine in another language and it works just fine which leads
me
to believe the problem is Python. Can anyone confirm or deny this bug?

It pretty much says what the problem is - you escaped the closing
parenthesis, resulting in an invalid rex.

Either use raw-strings or put the proper amount of backslashes in your
string:

regex = re.compile(r'(.*\\).*') # raw string literal

regex = re.compile('(.*\\\\).*') # two consecutive \es, meaning an escaped
one

Diez
 
N

Neil Cerutti

This won't compile for me:


regex = re.compile('(.*\\).*')

I get the error:
sre_constants.error: unbalanced parenthesis

Hint 1: Always assume that errors are in your own code. Blaming
library code and language implementations will get you nowhere
most of the time.

Hint 2: regular expressions and Python strings use the same
escape character.

Hint 3: Consult the Python documentation about raw strings, and
what they are meant for.
 
P

Paul Hankin

This won't compile for me:

regex = re.compile('(.*\\).*')

I get the error:

sre_constants.error: unbalanced parenthesis

I'm running Python 2.5 on WinXP. I've tried this expression with
another RE engine in another language and it works just fine which leads me
to believe the problem is Python. Can anyone confirm or deny this bug?

Your code is equivalent to:
regex = re.compile(r'(.*\).*')

Written like this, it's easier to see that you've started a regular
expression group with '(', but it's never closed since your closed
parenthesis is escaped (which causes it to match a literal ')' when
used). Hence the reported error (which isn't a bug).

Perhaps you meant this?
regex = re.compile(r'(.*\\).*')

This matches any number of characters followed by a backslash (group
1), and then any number of characters. If you're using this for path
splitting filenames under Windows, you should look at os.path.split
instead of writing your own.

HTH
 
J

Just Another Victim of the Ambient Morality

Paul Hankin said:
Your code is equivalent to:
regex = re.compile(r'(.*\).*')

Written like this, it's easier to see that you've started a regular
expression group with '(', but it's never closed since your closed
parenthesis is escaped (which causes it to match a literal ')' when
used). Hence the reported error (which isn't a bug).

Perhaps you meant this?
regex = re.compile(r'(.*\\).*')

This matches any number of characters followed by a backslash (group
1), and then any number of characters. If you're using this for path
splitting filenames under Windows, you should look at os.path.split
instead of writing your own.

Indeed, I did end up using os.path functions, instead.
I think I see what's going on. Backslash has special meaning in both
the regular expression and Python string declarations. So, my version
should have been something like this:


regex = re.compile('(.*\\\\).*')


That is funny. Thank you for your help...
Just for clarification, what does the "r" in your code do?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top