Python Regular Expressions: re.sub(regex, replacement, subject)

V

Vibha Tripathi

Hi Folks,

I put a Regular Expression question on this list a
couple days ago. I would like to rephrase my question
as below:

In the Python re.sub(regex, replacement, subject)
method/function, I need the second argument
'replacement' to be another regular expression ( not a
string) . So when I find a 'certain kind of string' in
the subject, I can replace it with 'another kind of
string' ( not a predefined string ). Note that the
'replacement' may depend on what exact string is found
as a result of match with the first argument 'regex'.

Please let me know if the question is not clear.

Peace.
Vibha

=======
"Things are only impossible until they are not."

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
 
S

Steven Bethard

Vibha said:
In the Python re.sub(regex, replacement, subject)
method/function, I need the second argument
'replacement' to be another regular expression ( not a
string) . So when I find a 'certain kind of string' in
the subject, I can replace it with 'another kind of
string' ( not a predefined string ). Note that the
'replacement' may depend on what exact string is found
as a result of match with the first argument 'regex'.

Please let me know if the question is not clear.

It's still not very clear, but my guess is you want to supply a
replacement function instead of a replacement string, e.g.:

py> help(re.sub)
Help on function sub in module sre:

sub(pattern, repl, string, count=0)
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a callable, it's passed the match object and must return
a replacement string to be used.

py> def repl(match):
.... print match.group()
.... return '46'
....
py> re.sub(r'x.*?x', repl, 'yxyyyxxyyxyy')
xyyyx
xyyx
'y4646yy'

STeVe
 
B

Benjamin Niemann

Vibha said:
Hi Folks,

I put a Regular Expression question on this list a
couple days ago. I would like to rephrase my question
as below:

In the Python re.sub(regex, replacement, subject)
method/function, I need the second argument
'replacement' to be another regular expression ( not a
string) . So when I find a 'certain kind of string' in
the subject, I can replace it with 'another kind of
string' ( not a predefined string ). Note that the
'replacement' may depend on what exact string is found
as a result of match with the first argument 'regex'.

Do mean 'backreferences'?
'that12this foo13bar'

Note that the replacement string r"that\1this" is not a regular expression,
it has completely different semantics as described in the docs. (Just
guessing: are you coming from perl? r"xxx" is not a regular expression in
Python, like /xxx/ in perl. It's is just an ordinary string where
backslashes are not interpreted by the parser, e.g. r"\x" == "\\x". Using
r"" when working with the re module is not required but pretty useful,
because re has it's own rules for backslash handling).

For more details see the docs for re.sub():
http://docs.python.org/lib/node114.html
 
G

George Sakkis

Vibha Tripathi said:
Hi Folks,

I put a Regular Expression question on this list a
couple days ago. I would like to rephrase my question
as below:

In the Python re.sub(regex, replacement, subject)
method/function, I need the second argument
'replacement' to be another regular expression ( not a
string) . So when I find a 'certain kind of string' in
the subject, I can replace it with 'another kind of
string' ( not a predefined string ). Note that the
'replacement' may depend on what exact string is found
as a result of match with the first argument 'regex'.

In re.sub, 'replacement' can be either a string, or a callable that
takes a single match argument and should return the replacement string.
So although replacement cannot be a regular expression, it can be
something even more powerful, a function. Here's a toy example of what
you can do that wouldn't be possible with regular expressions alone:
'I was born 26 years ago and gratuated 9 years ago'

In cases where you don't have to transform the matched string (such as
calling int() and evaluating an expression as in the example) but only
append or prepend another string, there is a simpler solution that
doesn't require writing a replacement function: backreferences.
Replacement can be a string where \1 denotes the first group of the
match, \2 the second and so on. Continuing the example, you could hide
the dates by:
'I was hired in **** in a company of 2001 employees.'

By the way, run the last example without the 'r' in front of the
replacement string and you'll see why it is there for.

HTH,

George
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top