parse_html_string reports error

A

AR John

Hi,

Can anybody help me to solve a problem about parsing using LibXML?

In a machine parse_html_string() is used to parse a HTML file. It
works fine. No error. When I deploy the program in another machine -
there, it is showing errors like:

Entity: line 111: error: htmlParseEntityRef: expecting ';'

Can somebody tell me the fix?

In advance, thanks for the solution provider.

AR
 
S

Sherm Pendley

AR said:
Entity: line 111: error: htmlParseEntityRef: expecting ';'

Can somebody tell me the fix?

From reading the error message, it sounds to me like there's an error
on line 111 of the HTML. Specifically, there's an entity that's missing
the ';' - i.e. "&amp" instead of "&".

sherm--
 
A

AR John

Sherm Pendley said:
on line 111 of the HTML. Specifically, there's an entity that's missing
the ';' - i.e. "&amp" instead of "&".

sherm--

Sherm Pendley,

You are right in a sense. I looked into the HTML and found there is an
"&" NOT amp. like in, <a href=" ....?test=1&v=xyz"> ...</a>

I am not sure if I am using an old version of LibXML.

AR
 
B

Ben Morrow

Quoth (e-mail address removed) (AR John):
Sherm Pendley,

You are right in a sense. I looked into the HTML and found there is an
"&" NOT amp. like in, <a href=" ....?test=1&v=xyz"> ...</a>

I am not sure if I am using an old version of LibXML.

That's not the problem. The HTML file is broken: &s need encoding, even
in double quotes.

You may be able to use ; instead of & (if the cgi the url refers to is
standards-compliant), or you can simply fix the HTML.

BTW, are you aware that ordinary HTML (as opposed to strict XHTML) is
*not* valid XML? You may be better off with an HTML parser.

Ben
 
S

Sherm Pendley

AR said:
You are right in a sense. I looked into the HTML and found there is an
"&" NOT amp. like in, <a href=" ....?test=1&v=xyz"> ...</a>

Ampersands need to be encoded - that's why newer guidelines prefer using
semicolons to separate name/value pairs in URLs. You need either one of
these:

<a href="...?test=1&amp;v=xyz">

or

<a href="...?test=1;v=xyz">

If you're using CGI.pm on the receiving end of this link, either one
will work. (It *should* work with other languages & CGI libraries too,
but I have no experience with them.)
I am not sure if I am using an old version of LibXML.

Unencoded '&'s have *never* been well-formed XML. It's possible that an
older version of LibXML didn't catch and report the XML error, but if so
that behavior was a bug.

sherm--
 
E

Eric Bohlman

BTW, are you aware that ordinary HTML (as opposed to strict XHTML) is
*not* valid XML? You may be better off with an HTML parser.

libxml, upon which XML::LibXML is based, includes a dedicated HTML parser,
and that's what the OP was using.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top