Encoding problems / Perl 5.8.0 / XML::LibXML / XML::LibXSLT

I

Iain

Folks,

I'm having a problem with charset encodings that I desparately need some
help with. I don't even pretend to know the basics about charsets, so
please forgive my ignorance.

I am transforming XML source into XHTML using an encoding of iso-8859-1
and when I browse (using Mozilla 1.x) I see strange, accented 'A'
characters preceeding some characters generated from an entity
reference. If I use utf-8, things get a lot worse: even my  
characters get prefixed with the accented junk.

My resultant XHTML source has the usual XML preamble at the top,
complete with encoding specification; however, it doesn't use <meta/> to
specify the charset -- could this be the cause of my problem?

Basically, because I don't understand this, and because I'd like to, can
someone recommend the practises I should be following when doing these
transforms, especially when using Perl and the XML::LibXML/XML::LibXSLT
to manage them.

Ideally, I'd like to use utf-8 (I'm guessing that's the best approach)
but it's been a bit of a non-started for me.

Hoping someone in c.t.xml or c.l.perl.misc can point me in the best
direction.

Many thanks,
Iain.
 
M

Martin Honnen

Iain said:
I'm having a problem with charset encodings that I desparately need some
help with. I don't even pretend to know the basics about charsets, so
please forgive my ignorance.

I am transforming XML source into XHTML using an encoding of iso-8859-1
and when I browse (using Mozilla 1.x) I see strange, accented 'A'
characters preceeding some characters generated from an entity
reference. If I use utf-8, things get a lot worse: even my &nbsp;
characters get prefixed with the accented junk.

My resultant XHTML source has the usual XML preamble at the top,
complete with encoding specification; however, it doesn't use <meta/> to
specify the charset -- could this be the cause of my problem?

What content-type do you send to the browser? If you have server side
scripting then you don't need a meta element but you should send a HTTP
header
Content-Type: text/html; charset=ISO-8859-1
to indidacte the encoding if you send text/html as the HTML parser of a
browser will hardly look at the XML declaration.
If you send the XHTML with an XML content type like
Content-Type: text/xml
then the browser will use the XML parser and that should indeed process any
<?xml version="1.0" encoding="ISO-8859-1"?>
 
I

Iain

Martin said:
-->8--

What content-type do you send to the browser? If you have server side
scripting then you don't need a meta element but you should send a HTTP
header
Content-Type: text/html; charset=ISO-8859-1
to indidacte the encoding if you send text/html as the HTML parser of a
browser will hardly look at the XML declaration.
If you send the XHTML with an XML content type like
Content-Type: text/xml
then the browser will use the XML parser and that should indeed process any
<?xml version="1.0" encoding="ISO-8859-1"?>

Thanks Martin. The HTTP header did the trick.

Iain.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,158
Latest member
Vinay_Kumar Nevatia
Top