parse_html_string reports error

Discussion in 'Perl Misc' started by AR John, Sep 22, 2004.

  1. AR John

    AR John Guest

    Hi,

    Can anybody help me to solve a problem about parsing using LibXML?

    In a machine parse_html_string() is used to parse a HTML file. It
    works fine. No error. When I deploy the program in another machine -
    there, it is showing errors like:

    Entity: line 111: error: htmlParseEntityRef: expecting ';'

    Can somebody tell me the fix?

    In advance, thanks for the solution provider.

    AR
     
    AR John, Sep 22, 2004
    #1
    1. Advertising

  2. AR John wrote:

    > Entity: line 111: error: htmlParseEntityRef: expecting ';'
    >
    > Can somebody tell me the fix?


    From reading the error message, it sounds to me like there's an error
    on line 111 of the HTML. Specifically, there's an entity that's missing
    the ';' - i.e. "&amp" instead of "&".

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Sep 22, 2004
    #2
    1. Advertising

  3. AR John

    AR John Guest

    Sherm Pendley <> wrote in message
    > on line 111 of the HTML. Specifically, there's an entity that's missing
    > the ';' - i.e. "&amp" instead of "&amp;".
    >
    > sherm--


    Sherm Pendley,

    You are right in a sense. I looked into the HTML and found there is an
    "&" NOT amp. like in, <a href=" ....?test=1&v=xyz"> ...</a>

    I am not sure if I am using an old version of LibXML.

    AR
     
    AR John, Sep 23, 2004
    #3
  4. AR John

    Ben Morrow Guest

    Quoth (AR John):
    > Sherm Pendley <> wrote in message
    > > on line 111 of the HTML. Specifically, there's an entity that's missing
    > > the ';' - i.e. "&amp" instead of "&amp;".
    > >
    > > sherm--

    >
    > Sherm Pendley,
    >
    > You are right in a sense. I looked into the HTML and found there is an
    > "&" NOT amp. like in, <a href=" ....?test=1&v=xyz"> ...</a>
    >
    > I am not sure if I am using an old version of LibXML.


    That's not the problem. The HTML file is broken: &s need encoding, even
    in double quotes.

    You may be able to use ; instead of & (if the cgi the url refers to is
    standards-compliant), or you can simply fix the HTML.

    BTW, are you aware that ordinary HTML (as opposed to strict XHTML) is
    *not* valid XML? You may be better off with an HTML parser.

    Ben

    --
    "The Earth is degenerating these days. Bribery and corruption abound.
    Children no longer mind their parents, every man wants to write a book,
    and it is evident that the end of the world is fast approaching."
    -Assyrian stone tablet, c.2800 BC
     
    Ben Morrow, Sep 23, 2004
    #4
  5. AR John wrote:

    > You are right in a sense. I looked into the HTML and found there is an
    > "&" NOT amp. like in, <a href=" ....?test=1&v=xyz"> ...</a>


    Ampersands need to be encoded - that's why newer guidelines prefer using
    semicolons to separate name/value pairs in URLs. You need either one of
    these:

    <a href="...?test=1&amp;v=xyz">

    or

    <a href="...?test=1;v=xyz">

    If you're using CGI.pm on the receiving end of this link, either one
    will work. (It *should* work with other languages & CGI libraries too,
    but I have no experience with them.)

    > I am not sure if I am using an old version of LibXML.


    Unencoded '&'s have *never* been well-formed XML. It's possible that an
    older version of LibXML didn't catch and report the XML error, but if so
    that behavior was a bug.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Sep 23, 2004
    #5
  6. AR John

    Eric Bohlman Guest

    Ben Morrow <> wrote in
    news::

    > BTW, are you aware that ordinary HTML (as opposed to strict XHTML) is
    > *not* valid XML? You may be better off with an HTML parser.


    libxml, upon which XML::LibXML is based, includes a dedicated HTML parser,
    and that's what the OP was using.
     
    Eric Bohlman, Sep 23, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brian Barnes
    Replies:
    1
    Views:
    584
    J Jones
    Feb 18, 2004
  2. Peri
    Replies:
    1
    Views:
    2,991
    =?Utf-8?B?Um90aGFyaWdlcg==?=
    Jul 21, 2005
  3. Replies:
    2
    Views:
    894
  4. Rod
    Replies:
    2
    Views:
    1,796
  5. Replies:
    0
    Views:
    497
Loading...

Share This Page