Encoding detection in the html parser from libxml2

I

icoba

Hi,

I am parsing html documents using the html parser from libxml2, and if
the encoding is included in the document it works perfectly but if it
is not, I think it does not work well (probably because I am doing
something wrong).

As it is said in http://xmlsoft.org/encoding.html the parser should
detect the encoding. So I tested it putting an utf-8 word in a file and
it does not detect it (it generates a wrong string). Example:
reducción --> reducción.

I just use the parser as a SAX parser because I do not need a tree, so
to parse the file I use the htmlParseChunk() function and I create the
context with htmlCreatePushParser().

Is it posible that the encoding detection does not work with
htmlParseChunk? If it is so, what method should I use?

Thanks, Cesar
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,999
Latest member
MakersCBDGummiesReview

Latest Threads

Top