search/replace in Python

Vamsee Krishna Gomatam · May 28, 2005

Hello,
I'm having some problems understanding Regexps in Python. I want
to replace "<google>PHRASE</google>" with
"<a href=http://www.google.com/search?q=PHRASE>PHRASE</a>" in a block of
text. How can I achieve this in Python? Sorry for the naive question but
the documentation is really bad :-(

Regards,
GVK

Oliver Andrich · May 28, 2005

Hi,

2005/5/28 said:
Hello,
I'm having some problems understanding Regexps in Python. I want
to replace "<google>PHRASE</google>" with
"<a href=http://www.google.com/search?q=PHRASE>PHRASE</a>" in a block of
text. How can I achieve this in Python? Sorry for the naive question but
the documentation is really bad :-(

it is pretty easy and straightforward. After you imported the re
module, you can do the job with re.sub or if you have to do it often,
then you can compile the regex first with re.compile
and afterwards use the sub method of the compiled regex.

The interesting part is

re.sub(r"<google>(.*)</google>",r"<a
href=http://www.google.com/search?q=\1>\1</a>", text)

cause here the job is done. The first raw string is the regex pattern.
The second one is the target where \1 is replaced by everything
enclosed in () in the first regex.

Here is the output of my ipython session.

In [15]: text="This is a <google>Python</google>. And some random
nonsense test....."

In [16]: re.sub(r"<google>(.*)</google>",r"<a
href=http://www.google.com/search?q=\1>\1</a>", text)

Out[16]: 'This is a <a
href=http://www.google.com/search?q=Python>Python</a>. And some random
nonsense test.....'

Best regards,
Oliver

John Machin · May 28, 2005

Vamsee said:
Hello,
I'm having some problems understanding Regexps in Python. I want
to replace "<google>PHRASE</google>" with
"<a href=http://www.google.com/search?q=PHRASE>PHRASE</a>" in a block of
text. How can I achieve this in Python? Sorry for the naive question but
the documentation is really bad :-(

VKG,

Sorry you had such difficulty; if you can explain in a bit more detail,
we could perhaps fix that "really bad" [compared with what?] documentation.

Whether you are doing a substitution programatically in Python, or in
any other language, or manually using a text editor, you need to specify
some input text, a pattern, and a replacement.

The syntax for pattern and replacement for what you want to do differs
in only minor details (if at all) among major languages and common text
editors, and it's been that way for years. For example, see the
retro-computing museum exhibit at the end of this posting.

So, did you have any problem determining that PHRASE is represented by
(.*) -- good enough if there is only one occurrence of your target -- in
the pattern, and by \1 in the replacement?

Did you have a problem with this part of the docs (which is closely
followed by an example with \1 in the replacement), and if so, what was
the problem?

sub( pattern, repl, string[, count])

Return the string obtained by replacing the leftmost non-overlapping

occurrences of pattern in string by the replacement repl.

Have you used regular expressions before? If not, then you shouldn't
expect to learn how to use them from the documentation, which does
adequately _document_ the provided functionality. This is just like how
the manual supplied with a new car documents the car's functionality,
but doesn't attempt to teach how to drive it. If you haven't done so
already, you may like to do what the documentation suggests:

consult the Regular Expression HOWTO, accessible from
http://www.python.org/doc/howto/.

And out with the magic lantern ... here's the promised blast from the
past, using Oliver's sample input:

C:\junk>dir \bin\ed.com
[snip]
30/11/1985 04:43p 18,936 ed.com
[snip]
C:\junk>ed
Memory available : 59K bytes

>a

This is a said:
>p

This is a said:
>s/<google>\(.*\)<\/google>/<a

href=http:\/\/www.google.com\/search?q=\1>\1 said:
>p

This is a said:
>

Cheers,
John

Leif K-Brooks · May 28, 2005

Oliver said:
re.sub(r"<google>(.*)</google>",r"<a
href=http://www.google.com/search?q=\1>\1</a>", text)

For real-world use you'll want to URL encode and entityify the text:

import cgi
import urllib

def google_link(text):
text = text.group(1)
return '<a href="%s">%s</a>' % (cgi.escape(urllib.quote(text)),
cgi.escape(text))

re.sub(r"<google>(.*)</google>", google_link, "<google>foo bar</google>)

Vamsee Krishna Gomatam · May 28, 2005

Leif said:
Oliver Andrich wrote:

For real-world use you'll want to URL encode and entityify the text:

import cgi
import urllib

def google_link(text):
text = text.group(1)
return '<a href="%s">%s</a>' % (cgi.escape(urllib.quote(text)),
cgi.escape(text))

re.sub(r"<google>(.*)</google>", google_link, "<google>foo bar</google>)

Thanks a lot for your reply. I was able to solve it this way:
text = re.sub( "<google>([^<]*)</google>", r'<a
href="http://www.google.com/search?q=\1">\1</a>', text )

GVK

Leif K-Brooks · May 28, 2005

Vamsee said:
text = re.sub( "<google>([^<]*)</google>", r'<a
href="http://www.google.com/search?q=\1">\1</a>', text )

But see what happens when text contains spaces, or quotes, or
ampersands, or...

Search and replace text in XML file?	5	Jul 28, 2012
Search & Replace	6	Oct 26, 2006
RegEx conditional search and replace	6	Jul 5, 2006
mmap regex search replace	0	Apr 3, 2009
search and replace in Perl	4	Jan 18, 2010
Multiline Search-Replace With Perl One-liner	2	Jan 12, 2012
Change Location in the google search page	0	Sep 15, 2011
Help with Python Flask on PI as server SSE to website	0	Apr 23, 2022

search/replace in Python

Vamsee Krishna Gomatam

Oliver Andrich

John Machin

Leif K-Brooks

Vamsee Krishna Gomatam

Leif K-Brooks

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads