Problem with acent-grave like characters.

  • Thread starter Tjerk Wolterink
  • Start date
T

Tjerk Wolterink

Helo i'v an xml file like this:

<bestand>
<id>8</id>
<beschrijving type="string"><![CDATA[film voorcafe]]></beschrijving>
<file type="file"
mime-type-image="/xcm/mime_types/movie.gif"><![CDATA[voorcafédrietevol.avi]]></file>
</bestand>


But the xml processor says:
"XML parser error 4: not well-formed (invalid token)"

That is because of the é in voorcafédrietevol.avi
How do i solve this problem,

The value voorcafédrietevol.avi is read from a database, i want
to support characters like this, but how?

Is this the solution: <?xml version="1" encoding="ISO-8859-1"?>

????

I think so, but why?
 
M

Marrow

Hi,

Sounds like your XML document is not properly encoded. When an XML parser
opens a document it uses a couple of things to determine the encoding of
that document, i.e. looking at the first few bytes (2 or 4) to see if they
imply the encoding; looking to see if the document starts with a BOM (byte
order marker); then looking at the encoding specified in the XML
declaration.
(see: http://www.w3.org/TR/2004/REC-xml-20040204/#sec-guessing)

Having determined what encoding the document is - the XML parser will expect
every character within the document to be correctly encoded according to
that encoding. If you haven't specified an encoding (in the XML
declaration) and the document has no BOM and the first two bytes of the
document are 0x3C 0x3F then it will likely assume that the document is UTF-8
encoded - so it will expect all characters to be correctly encoded in UTF-8.
etc.etc.
Is this the solution: <?xml version="1" encoding="ISO-8859-1"?>

????

I think so, but why?

No, the solution is to specify in the encoding of the XML declaration the
encoding the document is actually encoded in.

BTW, even characters within CDATA sections have to be correctly encoded.


Cheers
Marrow
http://www.marrowsoft.com - home of Xselerator (XSLT IDE and debugger)
http://www.topxml.com/Xselerator



Tjerk Wolterink said:
Helo i'v an xml file like this:

<bestand>
<id>8</id>
<beschrijving type="string"><![CDATA[film voorcafe]]></beschrijving>
<file type="file"
mime-type-image="/xcm/mime_types/movie.gif"><![CDATA[voorcafédrietevol.avi]]
</file>
</bestand>


But the xml processor says:
"XML parser error 4: not well-formed (invalid token)"

That is because of the é in voorcafédrietevol.avi
How do i solve this problem,

The value voorcafédrietevol.avi is read from a database, i want
to support characters like this, but how?

Is this the solution: <?xml version="1" encoding="ISO-8859-1"?>

????

I think so, but why?
 
T

Tjerk Wolterink

Marrow said:
Hi,

Sounds like your XML document is not properly encoded. When an XML parser
opens a document it uses a couple of things to determine the encoding of
that document, i.e. looking at the first few bytes (2 or 4) to see if they
imply the encoding; looking to see if the document starts with a BOM (byte
order marker); then looking at the encoding specified in the XML
declaration.
(see: http://www.w3.org/TR/2004/REC-xml-20040204/#sec-guessing)

Having determined what encoding the document is - the XML parser will expect
every character within the document to be correctly encoded according to
that encoding. If you haven't specified an encoding (in the XML
declaration) and the document has no BOM and the first two bytes of the
document are 0x3C 0x3F then it will likely assume that the document is UTF-8
encoded - so it will expect all characters to be correctly encoded in UTF-8.
etc.etc.

Is this the solution: <?xml version="1" encoding="ISO-8859-1"?>

????

I think so, but why?


No, the solution is to specify in the encoding of the XML declaration the
encoding the document is actually encoded in.

BTW, even characters within CDATA sections have to be correctly encoded.


Cheers
Marrow
http://www.marrowsoft.com - home of Xselerator (XSLT IDE and debugger)
http://www.topxml.com/Xselerator



Helo i'v an xml file like this:

<bestand>
<id>8</id>
<beschrijving type="string"><![CDATA[film voorcafe]]></beschrijving>
<file type="file"

mime-type-image="/xcm/mime_types/movie.gif"><![CDATA[voorcafédrietevol.avi]]

</file>
</bestand>


But the xml processor says:
"XML parser error 4: not well-formed (invalid token)"

That is because of the é in voorcafédrietevol.avi
How do i solve this problem,

The value voorcafédrietevol.avi is read from a database, i want
to support characters like this, but how?

Is this the solution: <?xml version="1" encoding="ISO-8859-1"?>

????

I think so, but why?


How do know in wich encoding a file is?

I've edited some files xml files, and i think i can set the encoding in
my editor,

but php also creates some xml files, with : echo "<xml> bla bla";
how do i know wich encoding php will echo to?
 
M

Marrow

Hi,
How do know in wich encoding a file is?

You don't normally need to know - the XML parser will detect it and verify
it. If you want to know what encoding a compliant XML parser will assume
then follow the same steps as the spec - but then, of course, that doesn't
tell you if the rest of the document is encoded correctly.
I've edited some files xml files, and i think i can set the encoding in
my editor,

An editor that is 'encoding aware' will probably give you choices, on
saving, as to which encoding.
but php also creates some xml files, with : echo "<xml> bla bla";
how do i know wich encoding php will echo to?

No idea, probably a question best aimed at a php newsgroup. I'd imagine
it's probably some localised 8-bit charset, ASCII or ANSII.

Cheers
Marrow
http://www.marrowsoft.com - home of Xselerator (XSLT IDE and debugger)
http://www.topxml.com/Xselerator
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,905
Latest member
Kristy_Poole

Latest Threads

Top