search/replace in Python

Discussion in 'Python' started by Vamsee Krishna Gomatam, May 28, 2005.

  1. Hello,
    I'm having some problems understanding Regexps in Python. I want
    to replace "<google>PHRASE</google>" with
    "<a href=http://www.google.com/search?q=PHRASE>PHRASE</a>" in a block of
    text. How can I achieve this in Python? Sorry for the naive question but
    the documentation is really bad :-(

    Regards,
    GVK
     
    Vamsee Krishna Gomatam, May 28, 2005
    #1
    1. Advertising

  2. Hi,

    2005/5/28, Vamsee Krishna Gomatam <>:
    > Hello,
    > I'm having some problems understanding Regexps in Python. I want
    > to replace "<google>PHRASE</google>" with
    > "<a href=http://www.google.com/search?q=PHRASE>PHRASE</a>" in a block of
    > text. How can I achieve this in Python? Sorry for the naive question but
    > the documentation is really bad :-(


    it is pretty easy and straightforward. After you imported the re
    module, you can do the job with re.sub or if you have to do it often,
    then you can compile the regex first with re.compile
    and afterwards use the sub method of the compiled regex.

    The interesting part is

    re.sub(r"<google>(.*)</google>",r"<a
    href=http://www.google.com/search?q=\1>\1</a>", text)

    cause here the job is done. The first raw string is the regex pattern.
    The second one is the target where \1 is replaced by everything
    enclosed in () in the first regex.

    Here is the output of my ipython session.

    In [15]: text="This is a <google>Python</google>. And some random
    nonsense test....."

    In [16]: re.sub(r"<google>(.*)</google>",r"<a
    href=http://www.google.com/search?q=\1>\1</a>", text)

    Out[16]: 'This is a <a
    href=http://www.google.com/search?q=Python>Python</a>. And some random
    nonsense test.....'

    Best regards,
    Oliver
     
    Oliver Andrich, May 28, 2005
    #2
    1. Advertising

  3. Vamsee Krishna Gomatam

    John Machin Guest

    Vamsee Krishna Gomatam wrote:
    > Hello,
    > I'm having some problems understanding Regexps in Python. I want
    > to replace "<google>PHRASE</google>" with
    > "<a href=http://www.google.com/search?q=PHRASE>PHRASE</a>" in a block of
    > text. How can I achieve this in Python? Sorry for the naive question but
    > the documentation is really bad :-(


    VKG,

    Sorry you had such difficulty; if you can explain in a bit more detail,
    we could perhaps fix that "really bad" [compared with what?] documentation.

    Whether you are doing a substitution programatically in Python, or in
    any other language, or manually using a text editor, you need to specify
    some input text, a pattern, and a replacement.

    The syntax for pattern and replacement for what you want to do differs
    in only minor details (if at all) among major languages and common text
    editors, and it's been that way for years. For example, see the
    retro-computing museum exhibit at the end of this posting.

    So, did you have any problem determining that PHRASE is represented by
    (.*) -- good enough if there is only one occurrence of your target -- in
    the pattern, and by \1 in the replacement?

    Did you have a problem with this part of the docs (which is closely
    followed by an example with \1 in the replacement), and if so, what was
    the problem?

    sub( pattern, repl, string[, count])

    Return the string obtained by replacing the leftmost non-overlapping

    occurrences of pattern in string by the replacement repl.

    Have you used regular expressions before? If not, then you shouldn't
    expect to learn how to use them from the documentation, which does
    adequately _document_ the provided functionality. This is just like how
    the manual supplied with a new car documents the car's functionality,
    but doesn't attempt to teach how to drive it. If you haven't done so
    already, you may like to do what the documentation suggests:

    consult the Regular Expression HOWTO, accessible from
    http://www.python.org/doc/howto/.

    And out with the magic lantern ... here's the promised blast from the
    past, using Oliver's sample input:

    C:\junk>dir \bin\ed.com
    [snip]
    30/11/1985 04:43p 18,936 ed.com
    [snip]
    C:\junk>ed
    Memory available : 59K bytes
    >a

    This is a <google>Python</google>. And some randomnonsense test.....
    ..
    >p

    This is a <google>Python</google>. And some randomnonsense test.....
    >s/<google>\(.*\)<\/google>/<a

    href=http:\/\/www.google.com\/search?q=\1>\1<\/a>/
    >p

    This is a <a href=http://www.google.com/search?q=Python>Python</a>. And
    some randomnonsense test.....
    >


    Cheers,
    John
     
    John Machin, May 28, 2005
    #3
  4. Oliver Andrich wrote:
    > re.sub(r"<google>(.*)</google>",r"<a
    > href=http://www.google.com/search?q=\1>\1</a>", text)


    For real-world use you'll want to URL encode and entityify the text:

    import cgi
    import urllib

    def google_link(text):
    text = text.group(1)
    return '<a href="%s">%s</a>' % (cgi.escape(urllib.quote(text)),
    cgi.escape(text))

    re.sub(r"<google>(.*)</google>", google_link, "<google>foo bar</google>)
     
    Leif K-Brooks, May 28, 2005
    #4
  5. Re: search/replace in Python (solved)

    Leif K-Brooks wrote:
    > Oliver Andrich wrote:
    >
    >
    > For real-world use you'll want to URL encode and entityify the text:
    >
    > import cgi
    > import urllib
    >
    > def google_link(text):
    > text = text.group(1)
    > return '<a href="%s">%s</a>' % (cgi.escape(urllib.quote(text)),
    > cgi.escape(text))
    >
    > re.sub(r"<google>(.*)</google>", google_link, "<google>foo bar</google>)



    Thanks a lot for your reply. I was able to solve it this way:
    text = re.sub( "<google>([^<]*)</google>", r'<a
    href="http://www.google.com/search?q=\1">\1</a>', text )


    GVK
     
    Vamsee Krishna Gomatam, May 28, 2005
    #5
  6. Re: search/replace in Python (solved)

    Vamsee Krishna Gomatam wrote:
    > text = re.sub( "<google>([^<]*)</google>", r'<a
    > href="http://www.google.com/search?q=\1">\1</a>', text )


    But see what happens when text contains spaces, or quotes, or
    ampersands, or...
     
    Leif K-Brooks, May 28, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brian Blais
    Replies:
    1
    Views:
    383
    Bruno Desthuilliers
    Jun 27, 2006
  2. Greg Ewing
    Replies:
    2
    Views:
    347
    Dieter Maurer
    Jun 29, 2006
  3. Alun
    Replies:
    3
    Views:
    4,524
    Masudur
    Feb 18, 2008
  4. Schif Schaf
    Replies:
    12
    Views:
    1,287
    Anthra Norell
    Feb 8, 2010
  5. Abby Lee
    Replies:
    5
    Views:
    427
    Abby Lee
    Aug 2, 2004
Loading...

Share This Page