Saving XML as UTF-8?

P

Philipp Lenssen

How do I load and save a UTF-8 document in XML in ASP/VBS?


Well, the loading* is not the problem actually -- the file is in UTF-8,
and understood correctly -- but once saved, the UTF-8 is replaced by
what seems to be iso-8859-1 (which Flash doesn't understand, but that's
another problem). Any help greatly appreciated.


* Something like this...
set xDoc = server.createObject("Msxml2.DOMDocument")
xDoc.async = false
xDoc.load sPath
 
M

Martin Honnen

Philipp said:
How do I load and save a UTF-8 document in XML in ASP/VBS?

Well, the loading* is not the problem actually -- the file is in UTF-8,
and understood correctly -- but once saved, the UTF-8 is replaced by
what seems to be iso-8859-1
* Something like this...
set xDoc = server.createObject("Msxml2.DOMDocument")
xDoc.async = false
xDoc.load sPath

I am pretty sure if you then use
xDoc.save Server.MapPath(filename)
later then the encoding is preserved.
Are you by chance saving by writing xDoc.xml with the FileSystemObject?

The MSXML 4 docs say about the save method:

"Character encoding is based on the encoding attribute in the XML
declaration, such as <?xml version="1.0" encoding="windows-1252"?>. When
no encoding attribute is specified, the default setting is UTF-8."

which supports my view that the encoding the document has when being
loaded is preserved when saving.
 
P

Philipp Lenssen

I am pretty sure if you then use
xDoc.save Server.MapPath(filename)
later then the encoding is preserved.
Are you by chance saving by writing xDoc.xml with the
FileSystemObject?

Thanks so far Martin, this is my save method:

xDoc.save server.mapPath(sPath)

So no, I'm not using the FSO...
Any idea what's happening?
 
M

Martin Honnen

Philipp Lenssen wrote:

this is my save method:

xDoc.save server.mapPath(sPath)

You say the file is saved as iso-8859-1, does MSXML really save it with
that encoding and put a
<?xml version="1.0" encoding="iso-8859-1"?>
in there, or why do you think that MSXML saves as iso-8859-1?
 
P

Philipp Lenssen

Martin said:
Philipp Lenssen wrote:



You say the file is saved as iso-8859-1, does MSXML really save it
with that encoding and put a <?xml version="1.0"
encoding="iso-8859-1"?> in there, or why do you think that MSXML
saves as iso-8859-1?

Let me put it this way. I use my own Netpadd editor, which doesn't
support UTF-8. I know because whenever I open UTF-8, I see this "i>?"
as first character. So when I want to open UTF-8, I use Notepad.
The files however that *were* UTF-8 when I put them in this tool which
I'm programming (a simple text translation tool), they are coming out
"fine" for my non-UTF-8 Netpadd once they are saved. So they lost their
"UTF-8ness" without me saying so in ASP!

Thanks so far, and hope you have more hints!
 
M

Mark Schupp

UTF-8 does not by itself add special characters to the start of a file. If
the files are plain XML the first non-whitespace character should be "<".
Unicode files do have 2 special characters at the beginning.

What operating system are you running on when you open files in Notepad? The
version of notepad included with NT, Win2000, and WinXP Pro is capable of
saving files in ANSI, Unicode, or UTF-8

How are you opening the files from the ASP script? If possible show the
simplest *working* code (just read and then write the file) that duplicates
the problem along with a sample XML file.
 
M

Martin Honnen

Philipp said:
Martin Honnen wrote:
Let me put it this way. I use my own Netpadd editor, which doesn't
support UTF-8. I know because whenever I open UTF-8, I see this "i>?"
as first character. So when I want to open UTF-8, I use Notepad.
The files however that *were* UTF-8 when I put them in this tool which
I'm programming (a simple text translation tool), they are coming out
"fine" for my non-UTF-8 Netpadd once they are saved. So they lost their
"UTF-8ness" without me saying so in ASP!

Frankly to use a tool that doesn't understand UTF-8 to check whether a
file is UTF-8 encoded doesn't sound like a reliable way, it might simply
be a byte order mark at the beginning of the file and that mark is
optional in UTF-8.

I don't really how to help on that, I would use an XML parser to check
whether the file is properly encoded, simply loading the file in IE/Win
should do to check that.

If you have the application online then post a URL (or better two, one
to the original, one two the saved XML) then someone here could check
whether it is really UTF-8 or ISO-8859-1 what you get there.
 
P

Philipp Lenssen

Martin said:
Philipp Lenssen wrote:


If you have the application online then post a URL (or better two,
one to the original, one two the saved XML) then someone here could
check whether it is really UTF-8 or ISO-8859-1 what you get there.

It's already solved, IIRC I posted this here already.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top