escapes in regular expressions

J

James Thiele

I was helping a guy at work with regular expressions and found
something I didn't expect:

It's not clear to me why these are the same. Could someone please
explain?
 
D

Dennis Benzinger

James said:
I was helping a guy at work with regular expressions and found
something I didn't expect:



'7'

'\d' is not recognized as a escape sequence by Python and therefore it
'7'
[...]

This is the correct version. The first backslash escapes the second one
and this version will work even if a future version of Python recognizes
\d as an escape sequence.


Dennis
 
P

Paul McGuire

James Thiele said:
I was helping a guy at work with regular expressions and found
something I didn't expect:


It's not clear to me why these are the same. Could someone please
explain?

This is not a feature of regexp's at all, but of Python strings. If the
backslash precedes a character that is not normally interpreted, then it is
treated like just a backslash. Look at this sample from the Python command
line:

This is one reason why Python programmers who use regexp's use the "raw"
notation to create strings (this is often misnomered as a "raw string", but
the resulting string is an ordinary string in every respect - what is "raw"
about it is the disabling of escape behavior of any backslashes that are not
the last character in the string). It is painful enough to litter your
regexp with backslashes, just because you have the misfortune of having to
match a '.', '+', '?', '*', or brackets or parentheses in your expression,
without having to double up the backslashes for escaping purposes. Consider
these sample statements:

So your question is really a string question - you just happened to trip
over it while defining a regexp.

-- Paul
 
D

Dennis Lee Bieber

I was helping a guy at work with regular expressions and found
something I didn't expect:


It's not clear to me why these are the same. Could someone please
explain?

Hypothesis:

\d is first matched against the list of recognized escapes: \t, \r,
\n, etc. and is not found; therefore, in kindness, it defaults to be
parsed as \ and d

\\d is first matched against the list recognized escapes... \\ is
the escape for passing a \ when the following character is one of t, r,
n, etc. so this one goes through the sequence of \\ and d => \ and d
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
H

Heiko Wundram

Am Sonntag 21 Mai 2006 19:49 schrieb James Thiele:
\d

'\d' evaluates to \d, because d is not a valid escape sequence. '\n' evaluates
to newline, because n is a valid escape sequence. '\\' evaluates to \,
because \ is a valid escape sequence.

--- Heiko.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top