Question regarding HTMLParser module.

Discussion in 'Python' started by Adonis, Jul 28, 2003.

  1. Adonis

    Adonis Guest

    When parsing my html files, I use handle_pi to capture some embedded python
    code, but I have noticed that in the embedded python code if it contains
    html, HTMLParser will parse it as well, and thus causes an error when I exec
    the code, raises an EOL error. I have a work around for this as I use
    different set of characters rather that <tag> use something like (tag) then
    revert it back to <tag> via another function, I was wondering if there is a
    way to tell HTMLParser to ignore the embedded tags or another alternative?

    Any help would be greatly appreciated.
    And another note, I am well aware of Zope, Webware, CherryPy, etc... for
    py/html embedding options, but I want this to be a learning experience.

    HTML processing instruction:
    <?
    import time
    print time.strftime('%b-%d-%Y')
    print '<tt>testing!()</tt>')
    >


    error:
    Traceback (most recent call last):
    File "C:\home\Adonis\python\t.py", line 40, in -toplevel-
    x.feed(z)
    File "C:\Python23\lib\HTMLParser.py", line 108, in feed
    self.goahead(0)
    File "C:\Python23\lib\HTMLParser.py", line 154, in goahead
    k = self.parse_pi(i)
    File "C:\Python23\lib\HTMLParser.py", line 232, in parse_pi
    self.handle_pi(rawdata[i+2: j])
    File "C:\home\Adonis\python\t.py", line 33, in handle_pi
    exec(data)
    File "<string>", line 4
    print '<tt
    ^
    SyntaxError: EOL while scanning single-quoted string
     
    Adonis, Jul 28, 2003
    #1
    1. Advertising

  2. Adonis

    Carl Banks Guest

    Adonis wrote:
    > When parsing my html files, I use handle_pi to capture some embedded python
    > code, but I have noticed that in the embedded python code if it contains
    > html, HTMLParser will parse it as well, and thus causes an error when I exec
    > the code, raises an EOL error. I have a work around for this as I use
    > different set of characters rather that <tag> use something like (tag) then
    > revert it back to <tag> via another function, I was wondering if there is a
    > way to tell HTMLParser to ignore the embedded tags or another alternative?
    >
    > Any help would be greatly appreciated.
    > And another note, I am well aware of Zope, Webware, CherryPy, etc... for
    > py/html embedding options, but I want this to be a learning experience.



    Unfortunately, HTMLParser (and the similar sgmllib) miserably fail to
    process inline text. I know this very well; I have an HTML-generating
    package that uses a lot of scripting and verbatim text.

    What's happening in your case is that HTMLParser, when processing a <?
    tag, simply and naively inputs text up to the next ">". HTMLParser
    thinks the > in <tt> closes your <? tag. (It should at least have a
    flag indicating whether it should read up to "?>" or just ">".)

    A workaround is to do something like this:

    <? print '<tt\x29monospaced</tt\x29' >

    where obviously, \x29 is the hex code for >. That's not quite as bad
    as replacing characters, although it's still not perfect.

    Another possibility is to use sgmllib, but that's probably way more
    trouble than it's worth, and still far from perfect. Basically,
    sgmllib parsers have an method called verbatim, that turns of HTML tag
    processing, although entities and closing tags are still processed.
    (Entities and closing tags you can kind of reconstruct into the
    original text, although the whitespace is lost.) This is what I do in
    my own HTML-generating package.

    I'll probably contribute some badly-needed remedies to HTMLParser
    sometime, as the limitations of it and sgmllib are starting to get on
    my nerves.


    --
    CARL BANKS
     
    Carl Banks, Jul 28, 2003
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. mike
    Replies:
    0
    Views:
    898
  2. Stephen Briley

    question on HTMLParser and parser.feed()

    Stephen Briley, Dec 6, 2003, in forum: Python
    Replies:
    1
    Views:
    525
    Peter Otten
    Dec 6, 2003
  3. Rajarshi Guha

    HTMLParser question

    Rajarshi Guha, Aug 19, 2004, in forum: Python
    Replies:
    1
    Views:
    428
    Benjamin Niemann
    Aug 19, 2004
  4. Lad
    Replies:
    7
    Views:
    617
    Paul McGuire
    Mar 31, 2005
  5. Mike
    Replies:
    2
    Views:
    466
Loading...

Share This Page