a n00b regex qestion

N

nuffnough

I am doing a string.replace in a simple table generation app I wrote,
and I can't figure out how to match whitespace with /s, so I thought
I would see if osmeone where would be kind enough to tell me what I am
getting wrong.


This works:

string = string.replace('<tr>\n <th class="table">Field One</th>
\n <td>%FieldOneValue%</td>\n </tr>', '')


You can see I had to actually put in space characters and linefeeds
exactly as they are in the string.

I tried these this:

string = string.replace('<tr>\s*<th class="table">Field One</th>\s*<td>
%FieldOneValue%</td>\s*</tr>', '')


But this doesn't work. The doco for Python's regex suggests that \s
should match any whitespace including newlines which is what I
wanted, but just in case, I also tried this:

string = string.replace('<tr>\n\s*<th class="table">Field One</th>\n
\s*<td>%FieldOneValue%</td>\n\s*</tr>', '')


Any help explaining why these are not working would be greatly
appreciated.

TIA

nuffi
 
B

Bruno Desthuilliers

(e-mail address removed) a écrit :
I am doing a string.replace in a simple table generation app I wrote,
and I can't figure out how to match whitespace with /s,

Hahem... Where did you get the idea that str.replace would work with
regexps ?

"""
replace(...)
S.replace (old, new[, count]) -> string

Return a copy of string S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
"""

See any mention of regexps here ?-)


so I thought
I would see if osmeone where would be kind enough to tell me what I am
getting wrong.


This works:

string = string.replace('<tr>\n <th class="table">Field One</th>
\n <td>%FieldOneValue%</td>\n </tr>', '')


You can see I had to actually put in space characters and linefeeds
exactly as they are in the string.

Indeed. That's consistent with the doc.
I tried these this:

string = string.replace('<tr>\s*<th class="table">Field One</th>\s*<td>
%FieldOneValue%</td>\s*</tr>', '')


But this doesn't work.

It works if you have *litteraly* the first arg in your string !-)
The doco for Python's regex suggests that \s
should match any whitespace including newlines which is what I
wanted,

So why do you insist on using str.replace insted of re.sub ?-)
but just in case, I also tried this:

string = string.replace('<tr>\n\s*<th class="table">Field One</th>\n
\s*<td>%FieldOneValue%</td>\n\s*</tr>', '')


Any help explaining why these are not working would be greatly
appreciated.

import re
pat = r'<tr>\s*<th class="table">Field One</th>\s*<td>' \
+ r'%FieldOneValue%</td>\s*</tr>'
string = re.sub(pat, '', string)

HTH
 
T

Tim Chase

I tried these this:
string = string.replace('<tr>\s*<th class="table">Field One</th>\s*<td>
%FieldOneValue%</td>\s*</tr>', '')


But this doesn't work. The doco for Python's regex suggests that \s
should match any whitespace including newlines which is what I
wanted,

from http://docs.python.org/lib/module-re.html

"""
Regular expressions use the backslash character ("\") to indicate
special forms or to allow special characters to be used without
invoking their special meaning. This collides with Python's usage
of the same character for the same purpose in string literals;
for example, to match a literal backslash, one might have to
write '\\\\' as the pattern string, because the regular
expression must be "\\", and each backslash must be expressed as
"\\" inside a regular Python string literal.

The solution is to use Python's raw string notation for regular
expression patterns; backslashes are not handled in any special
way in a string literal prefixed with "r". So r"\n" is a
two-character string containing "\" and "n", while "\n" is a
one-character string containing a newline. Usually patterns will
be expressed in Python code using this raw string notation.
"""

and from http://docs.python.org/lib/re-syntax.html

"""
If you're not using a raw string to express the pattern, remember
that Python also uses the backslash as an escape sequence in
string literals; if the escape sequence isn't recognized by
Python's parser, the backslash and subsequent character are
included in the resulting string. However, if Python would
recognize the resulting sequence, the backslash should be
repeated twice. This is complicated and hard to understand, so
it's highly recommended that you use raw strings for all but the
simplest expressions.
"""

And if you don't know about raw strings, you can read about them
here:

http://docs.python.org/ref/strings.html

-tkc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top