Parsing HTML - modify URLs

Discussion in 'Python' started by Fuzzyman, Jul 7, 2004.

  1. Fuzzyman

    Fuzzyman Guest

    I am trying to parse an HTML page an only modify URLs within tags -
    e.g. inside IMG, A, SCRIPT, FRAME tags etc...

    I have built one that works fine using the HTMLParser.HTMLParser and
    it works fine.... on good HTML. Having done a google it looks like
    parsing dodgy HTML and having HTMLParser choke is a common theme.

    I would have difficulties using regular expressions as I want to
    modify local reference URLS as well as absolute ones.

    It would be nice to just override the error handling of HTMLParser -
    but short of digging in the source code it's not a documented
    technique :)

    Anyone got any suggestions - this is to go on a server as a CGI - and
    I don't have shell access or anything like that, so I'd like to avoid
    installing mxTidy. Anyone know an HTML parsing library that will allow
    me to rewrite out most of the page unmodified and just modify the
    contents of some of the tags.

    Regards,

    Fuzzy

    http://www.voidspace.org.uk/atlantibots/pythonutils.html
    Fuzzyman, Jul 7, 2004
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kaidi
    Replies:
    5
    Views:
    466
    Andrew Thompson
    Jan 4, 2004
  2. Robert Brewer

    RE: Parsing HTML - modify URLs

    Robert Brewer, Jul 7, 2004, in forum: Python
    Replies:
    5
    Views:
    605
    Fuzzyman
    Jul 8, 2004
  3. Nathan Sokalski

    Converting Relative URLs into Absolute URLs

    Nathan Sokalski, Aug 11, 2008, in forum: ASP .Net
    Replies:
    1
    Views:
    739
    Sriram Srivatsan
    Aug 12, 2008
  4. Adam Monsen

    JDBC URLs ...not really URLs?

    Adam Monsen, Feb 6, 2009, in forum: Java
    Replies:
    11
    Views:
    6,189
    Adam Monsen
    Feb 8, 2009
  5. Steve T.

    dynamic URLS convert to static URLS for search engines

    Steve T., Mar 1, 2004, in forum: ASP .Net Web Services
    Replies:
    7
    Views:
    283
    Steve T.
    Mar 4, 2004
Loading...

Share This Page