Saving XML as UTF-8?

Discussion in 'ASP General' started by Philipp Lenssen, Jan 18, 2005.

  1. How do I load and save a UTF-8 document in XML in ASP/VBS?


    Well, the loading* is not the problem actually -- the file is in UTF-8,
    and understood correctly -- but once saved, the UTF-8 is replaced by
    what seems to be iso-8859-1 (which Flash doesn't understand, but that's
    another problem). Any help greatly appreciated.


    * Something like this...
    set xDoc = server.createObject("Msxml2.DOMDocument")
    xDoc.async = false
    xDoc.load sPath
     
    Philipp Lenssen, Jan 18, 2005
    #1
    1. Advertising

  2. Philipp Lenssen wrote:

    > How do I load and save a UTF-8 document in XML in ASP/VBS?
    >
    > Well, the loading* is not the problem actually -- the file is in UTF-8,
    > and understood correctly -- but once saved, the UTF-8 is replaced by
    > what seems to be iso-8859-1


    > * Something like this...
    > set xDoc = server.createObject("Msxml2.DOMDocument")
    > xDoc.async = false
    > xDoc.load sPath


    I am pretty sure if you then use
    xDoc.save Server.MapPath(filename)
    later then the encoding is preserved.
    Are you by chance saving by writing xDoc.xml with the FileSystemObject?

    The MSXML 4 docs say about the save method:

    "Character encoding is based on the encoding attribute in the XML
    declaration, such as <?xml version="1.0" encoding="windows-1252"?>. When
    no encoding attribute is specified, the default setting is UTF-8."

    which supports my view that the encoding the document has when being
    loaded is preserved when saving.




    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
     
    Martin Honnen, Jan 18, 2005
    #2
    1. Advertising

  3. Martin Honnen wrote:

    > Philipp Lenssen wrote:
    >
    > > How do I load and save a UTF-8 document in XML in ASP/VBS?
    > >


    >
    > I am pretty sure if you then use
    > xDoc.save Server.MapPath(filename)
    > later then the encoding is preserved.
    > Are you by chance saving by writing xDoc.xml with the
    > FileSystemObject?


    Thanks so far Martin, this is my save method:

    xDoc.save server.mapPath(sPath)

    So no, I'm not using the FSO...
    Any idea what's happening?

    --
    Google Blogoscoped
    http://blog.outer-court.com
     
    Philipp Lenssen, Jan 18, 2005
    #3
  4. Philipp Lenssen wrote:


    >>Philipp Lenssen wrote:
    >>
    >>
    >>>How do I load and save a UTF-8 document in XML in ASP/VBS?
    >>>


    > this is my save method:
    >
    > xDoc.save server.mapPath(sPath)
    >


    You say the file is saved as iso-8859-1, does MSXML really save it with
    that encoding and put a
    <?xml version="1.0" encoding="iso-8859-1"?>
    in there, or why do you think that MSXML saves as iso-8859-1?

    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
     
    Martin Honnen, Jan 18, 2005
    #4
  5. Martin Honnen wrote:

    > Philipp Lenssen wrote:
    >


    > > > Philipp Lenssen wrote:
    > > >
    > > >
    > > > > How do I load and save a UTF-8 document in XML in ASP/VBS?
    > > > >

    >
    > > this is my save method:
    > >
    > > xDoc.save server.mapPath(sPath)
    > >

    >
    > You say the file is saved as iso-8859-1, does MSXML really save it
    > with that encoding and put a <?xml version="1.0"
    > encoding="iso-8859-1"?> in there, or why do you think that MSXML
    > saves as iso-8859-1?


    Let me put it this way. I use my own Netpadd editor, which doesn't
    support UTF-8. I know because whenever I open UTF-8, I see this "i>?"
    as first character. So when I want to open UTF-8, I use Notepad.
    The files however that *were* UTF-8 when I put them in this tool which
    I'm programming (a simple text translation tool), they are coming out
    "fine" for my non-UTF-8 Netpadd once they are saved. So they lost their
    "UTF-8ness" without me saying so in ASP!

    Thanks so far, and hope you have more hints!
    --
    Google Blogoscoped
    http://blog.outer-court.com
     
    Philipp Lenssen, Jan 19, 2005
    #5
  6. Philipp Lenssen

    Mark Schupp Guest

    UTF-8 does not by itself add special characters to the start of a file. If
    the files are plain XML the first non-whitespace character should be "<".
    Unicode files do have 2 special characters at the beginning.

    What operating system are you running on when you open files in Notepad? The
    version of notepad included with NT, Win2000, and WinXP Pro is capable of
    saving files in ANSI, Unicode, or UTF-8

    How are you opening the files from the ASP script? If possible show the
    simplest *working* code (just read and then write the file) that duplicates
    the problem along with a sample XML file.
    --
    --Mark Schupp
    Head of Development
    Integrity eLearning
    www.ielearning.com

    "Philipp Lenssen" <> wrote in message
    news:...
    > Martin Honnen wrote:
    >
    >> Philipp Lenssen wrote:
    >>

    >
    >> > > Philipp Lenssen wrote:
    >> > >
    >> > >
    >> > > > How do I load and save a UTF-8 document in XML in ASP/VBS?
    >> > > >

    >>
    >> > this is my save method:
    >> >
    >> > xDoc.save server.mapPath(sPath)
    >> >

    >>
    >> You say the file is saved as iso-8859-1, does MSXML really save it
    >> with that encoding and put a <?xml version="1.0"
    >> encoding="iso-8859-1"?> in there, or why do you think that MSXML
    >> saves as iso-8859-1?

    >
    > Let me put it this way. I use my own Netpadd editor, which doesn't
    > support UTF-8. I know because whenever I open UTF-8, I see this "i>?"
    > as first character. So when I want to open UTF-8, I use Notepad.
    > The files however that *were* UTF-8 when I put them in this tool which
    > I'm programming (a simple text translation tool), they are coming out
    > "fine" for my non-UTF-8 Netpadd once they are saved. So they lost their
    > "UTF-8ness" without me saying so in ASP!
    >
    > Thanks so far, and hope you have more hints!
    > --
    > Google Blogoscoped
    > http://blog.outer-court.com
     
    Mark Schupp, Jan 19, 2005
    #6
  7. Philipp Lenssen wrote:

    > Martin Honnen wrote:


    >>You say the file is saved as iso-8859-1, does MSXML really save it
    >>with that encoding and put a <?xml version="1.0"
    >>encoding="iso-8859-1"?> in there, or why do you think that MSXML
    >>saves as iso-8859-1?


    >
    > Let me put it this way. I use my own Netpadd editor, which doesn't
    > support UTF-8. I know because whenever I open UTF-8, I see this "i>?"
    > as first character. So when I want to open UTF-8, I use Notepad.
    > The files however that *were* UTF-8 when I put them in this tool which
    > I'm programming (a simple text translation tool), they are coming out
    > "fine" for my non-UTF-8 Netpadd once they are saved. So they lost their
    > "UTF-8ness" without me saying so in ASP!


    Frankly to use a tool that doesn't understand UTF-8 to check whether a
    file is UTF-8 encoded doesn't sound like a reliable way, it might simply
    be a byte order mark at the beginning of the file and that mark is
    optional in UTF-8.

    I don't really how to help on that, I would use an XML parser to check
    whether the file is properly encoded, simply loading the file in IE/Win
    should do to check that.

    If you have the application online then post a URL (or better two, one
    to the original, one two the saved XML) then someone here could check
    whether it is really UTF-8 or ISO-8859-1 what you get there.

    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
     
    Martin Honnen, Jan 19, 2005
    #7
  8. Martin Honnen wrote:

    > Philipp Lenssen wrote:
    >



    > If you have the application online then post a URL (or better two,
    > one to the original, one two the saved XML) then someone here could
    > check whether it is really UTF-8 or ISO-8859-1 what you get there.


    It's already solved, IIRC I posted this here already.

    --
    Google Blogoscoped
    http://blog.outer-court.com
     
    Philipp Lenssen, Jan 21, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JJBW
    Replies:
    1
    Views:
    10,453
    Joerg Jooss
    Apr 24, 2004
  2. =?Utf-8?B?QXNoYQ==?=
    Replies:
    3
    Views:
    449
  3. Arifi Koseoglu
    Replies:
    2
    Views:
    1,019
    Arifi Koseoglu
    Apr 13, 2004
  4. Jimmy Shaw

    Converting from UTF-16 to UTF-32

    Jimmy Shaw, Jul 31, 2006, in forum: C++
    Replies:
    7
    Views:
    1,377
    P.J. Plauger
    Aug 1, 2006
  5. darrel
    Replies:
    5
    Views:
    482
    =?ISO-8859-1?Q?G=F6ran_Andersson?=
    Apr 14, 2007
Loading...

Share This Page