copy XML file -- three extra bytes???

M

martin

Hi,

I am copying an xml file like so.

Dim xmlDoc As New XmlDocument
xmlDoc.Load("C:\Program Files\Templates\message.msg")
Console.WriteLine("Tmaplate loaded")
xmlDoc.Save("C:\Program Files\Templates\copy.xml")
Console.WriteLine("message saved")

Now the xml file copies and is capable of being end with IE, however the xml
file that is prodced is not able to be copied using the method above.

The reason is the produced xml file has three additional bytes at the start
of it (ie before the "<xml" part)

my question is.

does anybody know why this is and how to get rid of the three additional
bytes at the start of the file.

many thanks in advance.

martin.
 
M

mikeb

martin said:
Hi,

I am copying an xml file like so.

Dim xmlDoc As New XmlDocument
xmlDoc.Load("C:\Program Files\Templates\message.msg")
Console.WriteLine("Tmaplate loaded")
xmlDoc.Save("C:\Program Files\Templates\copy.xml")
Console.WriteLine("message saved")

Now the xml file copies and is capable of being end with IE, however the xml
file that is prodced is not able to be copied using the method above.

The reason is the produced xml file has three additional bytes at the start
of it (ie before the "<xml" part)

my question is.

does anybody know why this is and how to get rid of the three additional
bytes at the start of the file.

The file is being save in a Unicode encoding. The 3 additional byes are
a Unicode BOM (Byte Order Mark).

you can probably solve the problem by either specifying that the file is
encoded with Unicode in the <?xml ...> declaration tag, or by saving the
file in ASCII:

dim stream as StreamWriter
try
stream = New StreamWriter( "C:\Program Files\Templates\copy.xml",
false, System.Text.Encoding.Default)

xmlDoc.Save( stream)
Console.WriteLine("message saved")
catch
Console.WriteLine( "Error saving file")
finally
if (Not stream Is nothing)
stream.Close()
end if
end try
 
M

mikeb

mikeb said:
The file is being save in a Unicode encoding. The 3 additional byes are
a Unicode BOM (Byte Order Mark).

you can probably solve the problem by either specifying that the file is
encoded with Unicode in the <?xml ...> declaration tag, or by saving the
file in ASCII:

dim stream as StreamWriter
try
stream = New StreamWriter( "C:\Program Files\Templates\copy.xml",
false, System.Text.Encoding.Default)

xmlDoc.Save( stream)
Console.WriteLine("message saved")
catch
Console.WriteLine( "Error saving file")
finally
if (Not stream Is nothing)
stream.Close()
end if
end try

Clarification: the Unicode encoding that you're seeing is probably UTF-8.

In any case, I played around a little bit more with your sample code,
and I had to manually change the encoding specified in the input file to
be incorrect to get xmlDoc.Load() to throw an exception. In other words,
xmlDoc.Load() does not seem to mind the BOM header, unless the encoding
attribute in the <?xml ...?> tag is lying.

Can you post a very, very small XML file that causes the problem you're
seeing?
 
M

martin

You are correct,
The problem now becomes now to create an xml file with the line

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

(with the UTF encoding set to 8)

using xmldocument.load

or do I just have to revert to your ascii method??

many thanks for the help, I have included samples below that demonstarte my
problem.
The xml file is generted in code rather than include files to this message.


================
Try

Dim doc As New XmlDocument

doc.LoadXml("<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>"
& _

"<Message version=""1.1"" id="""">" & _

"<Attributes>" & _

"<Priority></Priority>" & _

"<DeleteAttaches></DeleteAttaches>" & _

"</Attributes>" & _

"</Message>")



doc.Save("C:\Program Files\Templates\ThreeByteError.xml")

Console.WriteLine("Saved the dodgy xml file")

doc.LoadXml("<?xml version=""1.0"" standalone=""yes""?>" & _

"<Message version=""1.1"" id="""">" & _

"<Attributes>" & _

"<Priority></Priority>" & _

"<DeleteAttaches></DeleteAttaches>" & _

"</Attributes>" & _

"</Message>")

doc.Save("C:\Program Files\Templates\NoThreeByteError.xml")

Console.WriteLine("Saved the fine xml file")

Console.WriteLine("Press a key to close")

Console.ReadLine()

Catch ex As Exception

Console.WriteLine("***ERROR***")

Console.WriteLine(ex.Message)

End Try

Console.WriteLine("Press a key to close")

Console.ReadLine()

End Sub

================

Now run the follwoing at he command line to see the problem

type "C:\Program Files\Templates\ThreeByteError.xml"

type "C:\Program Files\Templates\NoThreeByteError.xml"

fc "C:\Program Files\Templates\NoThreeByteError.xml" "C:\Program
Files\Templates\ThreeByteError.xml"


cheers

martin.
 
M

mikeb

martin said:
You are correct,
The problem now becomes now to create an xml file with the line

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

(with the UTF encoding set to 8)

Well, the documentation for StreamWriter indicates that a BOM will be
written unless the encoding used is Encoding.Default.

However, at least for XmlDocument.Load(), the BOM poses no problem on my
machine - it loads just fine.

If there's some other software that you need to load the XML document
into that does not handle the BOM, I suppose you have a few options:

- write the file using Encoding.Default.
- post-process the output file to remove the BOM
- upgrade the software that doesn't like the BOM to handle it properly

I'm sure there are others, too.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top