parse_html_string reports error

AR John · Sep 22, 2004

Hi,

Can anybody help me to solve a problem about parsing using LibXML?

In a machine parse_html_string() is used to parse a HTML file. It
works fine. No error. When I deploy the program in another machine -
there, it is showing errors like:

Entity: line 111: error: htmlParseEntityRef: expecting ';'

Can somebody tell me the fix?

In advance, thanks for the solution provider.

AR

Sherm Pendley · Sep 22, 2004

AR said:
Entity: line 111: error: htmlParseEntityRef: expecting ';'

Can somebody tell me the fix?

From reading the error message, it sounds to me like there's an error
on line 111 of the HTML. Specifically, there's an entity that's missing
the ';' - i.e. "&amp" instead of "&".

sherm--

AR John · Sep 23, 2004

Sherm Pendley said:
on line 111 of the HTML. Specifically, there's an entity that's missing
the ';' - i.e. "&amp" instead of "&".

sherm--

Sherm Pendley,

You are right in a sense. I looked into the HTML and found there is an
"&" NOT amp. like in, <a href=" ....?test=1&v=xyz"> ...</a>

I am not sure if I am using an old version of LibXML.

AR

Ben Morrow · Sep 23, 2004

Quoth (e-mail address removed) (AR John):

Sherm Pendley,

You are right in a sense. I looked into the HTML and found there is an
"&" NOT amp. like in, <a href=" ....?test=1&v=xyz"> ...</a>

I am not sure if I am using an old version of LibXML.

That's not the problem. The HTML file is broken: &s need encoding, even
in double quotes.

You may be able to use ; instead of & (if the cgi the url refers to is
standards-compliant), or you can simply fix the HTML.

BTW, are you aware that ordinary HTML (as opposed to strict XHTML) is
*not* valid XML? You may be better off with an HTML parser.

Ben

Sherm Pendley · Sep 23, 2004

AR said:
You are right in a sense. I looked into the HTML and found there is an
"&" NOT amp. like in, <a href=" ....?test=1&v=xyz"> ...</a>

Ampersands need to be encoded - that's why newer guidelines prefer using
semicolons to separate name/value pairs in URLs. You need either one of
these:

<a href="...?test=1&v=xyz">

or

<a href="...?test=1;v=xyz">

If you're using CGI.pm on the receiving end of this link, either one
will work. (It *should* work with other languages & CGI libraries too,
but I have no experience with them.)

I am not sure if I am using an old version of LibXML.

Unencoded '&'s have *never* been well-formed XML. It's possible that an
older version of LibXML didn't catch and report the XML error, but if so
that behavior was a bug.

sherm--

Eric Bohlman · Sep 23, 2004

BTW, are you aware that ordinary HTML (as opposed to strict XHTML) is
*not* valid XML? You may be better off with an HTML parser.

libxml, upon which XML::LibXML is based, includes a dedicated HTML parser,
and that's what the OP was using.

Argparse error using NodeJS	0	Oct 31, 2022
Image overlay and comparison code error.	2	Jul 1, 2021
Custom Minecraft launcher client error; I think regarding java	0	Sep 7, 2022
Crystal Reports .NET Loading Report Failed error	1	Apr 7, 2009
Problems with Crystal Reports	0	Aug 26, 2009
Getting the error java.lang.UnsatisfiedLinkError	0	Jan 5, 2017
ASP.NET Session Expires while viewing server reports	0	Sep 23, 2008
Mechanism to generate annotated "error codes"	87	Mar 1, 2012

parse_html_string reports error

AR John

Sherm Pendley

AR John

Ben Morrow

Sherm Pendley

Eric Bohlman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads