Re: re or html parser module, for wildcard search within html document?

Discussion in 'Python' started by Bengt Richter, Aug 3, 2003.

  1. On 1 Aug 2003 19:06:53 -0700, (Douglas) wrote:

    >I want to search and replace some expressions within an html document.
    >Specifically, I want to replace any tag containing the word "font"
    >with a new tag. As I want to use some form of wild card for the
    >search, eg. <*font*>, should I use a regular expression module (re) or
    >one of the specific html parsers? If this should be done with an html
    >parser module then which one and where is some easy going introductory
    >documentation, please?
    >

    Do you want to change to another font? If you want to eliminate it altogether,
    you will have to eliminate the </font> end tag also.

    This seems unlikely to bomb with a regex, unless someone has deleted something to make them
    unmatched, and then commented the trash out. But then they deserve more trash ;-)

    Assuming you want just to change the opening font tag to another font tag, a regex like

    Read starting info (I saved python page to disk)

    >>> html = file('www_python_org.html').read()


    Make regex
    >>> import re
    >>> rxo = re.compile(r'<[Ff][Oo][Nn][Tt] [^>]*>')


    Check original
    >>> rxo.findall(html)

    ['<font color="#ffffff">', '<font color="#ffffff">', '<font color="#ffffff">', '<font color="#ff
    ffff">', '<font color="#ffffff">', '<font color="#ffffff">', '<font color="#ffffff">']

    Make an new by substitution
    >>> html2 = rxo.sub('<FONT color="#FF0000">', html)


    Write it out
    >>> file('www_python_red.html','w').write(html2)


    Check what we did to the data (look at the two with the browser and see effect to left)

    >>> rxo.findall(html2)

    ['<FONT color="#FF0000">', '<FONT color="#FF0000">', '<FONT color="#FF0000">', '<FONT color="#FF
    0000">', '<FONT color="#FF0000">', '<FONT color="#FF0000">', '<FONT color="#FF0000">']
    >>>


    HTH



    Regards,
    Bengt Richter
     
    Bengt Richter, Aug 3, 2003
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JemPower

    Wildcard search in dataview

    JemPower, Oct 24, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    3,605
    Jim Nugent
    Nov 9, 2003
  2. YoTuco
    Replies:
    0
    Views:
    323
    YoTuco
    Jul 18, 2003
  3. Replies:
    7
    Views:
    845
  4. Abby Lee
    Replies:
    5
    Views:
    437
    Abby Lee
    Aug 2, 2004
  5. JV

    document.formname wildcard?

    JV, Dec 8, 2006, in forum: Javascript
    Replies:
    4
    Views:
    88
    Evertjan.
    Dec 9, 2006
Loading...

Share This Page