Problem with RE matching backslash

  • Thread starter Ladvánszky Károly
  • Start date
L

Ladvánszky Károly

What is the correct way to match/search a backslash with regular
expressions?

print re.match('\\m', '\\m').group(0) raises an error while
print re.search('\\m', '\\m').group(0) yields 'm'

print re.search('\\m', '\m').group(0) yields 'm'
print re.search('\\m', 'm').group(0) yields 'm'

Any helpful comment on this would be appreciated,

Károly
 
L

Ladvánszky Károly

Thanks for your quick and very helpful reply, Peter.
What is still not clear to me is why the search examples work and yield
'm'.

Károly
 
P

Peter Otten

Ladvánszky Károly said:
What is the correct way to match/search a backslash with regular
expressions?

print re.match('\\m', '\\m').group(0) raises an error while
print re.search('\\m', '\\m').group(0) yields 'm'

print re.search('\\m', '\m').group(0) yields 'm'
print re.search('\\m', 'm').group(0) yields 'm'

Any helpful comment on this would be appreciated,

The backslash must be escaped twice: once for the Python string and once for
the regular expression:
\m

You can use raw strings to somewhat reduce the number of backslashes (but
beware of backslashes at the end of the string literals):
\m


Peter
 
J

Jeff Epler

Thanks for your quick and very helpful reply, Peter.
What is still not clear to me is why the search examples work and yield
'm'.

Because the first pattern, which you wrote as "\\m", will match a lone
'm'. So 'match' fails, because m is not in the first position. But
'search' succeeds at index 1.

Jeff
 
P

Peter Otten

Jeff said:
Because the first pattern, which you wrote as "\\m", will match a lone
'm'. So 'match' fails, because m is not in the first position. But
'search' succeeds at index 1.

Jeff

This was my first guess, but it took me a while do find it documented.
I will reproduce the tiny relevant paragraph from

http://www.python.org/doc/current/lib/re-syntax.html

"The special sequences consist of "\" and a character from the list below.
If the ordinary character is not on the list, then the resulting RE will
match the second character. For example, \$ matches the character "$". "
[description of characters with special meaning omitted]

I don't know if this list of characters is likely to grow, but I would
definitely prefer a "bogus escape" exception:
<_sre.SRE_Pattern object at 0x40289660>

versus
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/local/lib/python2.3/sre.py", line 179, in compile
return _compile(pattern, flags)
File "/usr/local/lib/python2.3/sre.py", line 229, in _compile
raise error, v # invalid expression
sre_constants.error: bogus escape: '\\1'


Peter
 
I

Inyeol Lee

Thanks for your quick and very helpful reply, Peter.
What is still not clear to me is why the search examples work and yield
'm'.

Károly

Python and SRE have different convention for handling unrecognized
escape sequence, such as "\\m".
From Section 2.4.1 String literals in Python 2.3.3 Reference Manual;
"""
Unlike Standard , all unrecognized escape sequences are left in the
string unchanged, i.e., the backslash is left in the string. (This
behavior is useful when debugging: if an escape sequence is
mistyped, the resulting output is more easily recognized as broken.)
"""
From Section 4.2.1 Regular Expression Syntax in Python 2.3.3 Lib Reference;
"""
The special sequences consist of "\" and a character from the list
below. If the ordinary character is not on the list, then the resulting
RE will match the second character. For example, \$
matches the character "$".
"""

So, your 3rd example

print re.search('\\m', '\m').group(0)

is actually the same as

print re.search('m', '\\m').group(0)

, thus returns 'm'.

-Inyeol Lee
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top