How to grab data as UTF-8 rather than UTF-16 from DB?

D

darrel

I'm having a really odd problem that I can't find many other examples of.

I'm using XMLTextWriter and grabbing data from a database and then spitting
out an RSS-compliant XML file.

The problem I have is that, by default, IE pukes on it.

If I output the XMLTextWriter as:

XmlTextWriter(Response.OutputStream, Encoding.UTF8)

And let the page headers default, I end up with a UTF8 page encoding, but
the XMLtextWriter spits out the XML as such:

<?xml version="1.0" encoding="utf-16"?><rss version="2.0">...

IE doesn't like this as the XML encoding doesn't match the page encoding.

So, if I force the page to also be UTF-16, then both encodings match, but
apparently IE can't handle UTF-16, period. So I still get parsing errors
when loading the XML in the browser (there's a space between each and ever
character in the XML).

It appears that my solution is going to have to be to just make sure the
content from the DB is being retrieved as UTF-8 forcing everything to be
UTF-8. But I'm not sure how to do that. Can it be done?

-Darrel
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

darrel said:
I'm having a really odd problem that I can't find many other examples of.

I'm using XMLTextWriter and grabbing data from a database and then spitting
out an RSS-compliant XML file.

The problem I have is that, by default, IE pukes on it.

If I output the XMLTextWriter as:

XmlTextWriter(Response.OutputStream, Encoding.UTF8)

And let the page headers default, I end up with a UTF8 page encoding, but
the XMLtextWriter spits out the XML as such:

<?xml version="1.0" encoding="utf-16"?><rss version="2.0">...

IE doesn't like this as the XML encoding doesn't match the page encoding.

So, if I force the page to also be UTF-16, then both encodings match, but
apparently IE can't handle UTF-16, period. So I still get parsing errors
when loading the XML in the browser (there's a space between each and ever
character in the XML).

It appears that my solution is going to have to be to just make sure the
content from the DB is being retrieved as UTF-8 forcing everything to be
UTF-8. But I'm not sure how to do that. Can it be done?

Even if it was possible, that would not help you at all. Strings in .NET
are handled as 16 bit unicode, so however you recieve the data from the
database, it would be converted into 16 bit string data.

That the string data is 16 bits internally doesn't affect the encoding.
What ever encoding you use, the string data will be encoded before
output, the internal representation of strings are never output directly.

Set the Response.ContentEncoding property to UTF-8 also, and see if that
helps.

Another thing that might work is to use null for encoding when creating
the XmlTextWriter. This will output UTF-8 data, but omit the encoding
from the processing instruction in the XML.
 
D

darrel

Set the Response.ContentEncoding property to UTF-8 also, and see if that

It doesn't. That's the problem.

The issue is that even though I set the XMLtextWriter to output UTF-8, it
doesn't, and still appends a UTF-16 XML header.
Another thing that might work is to use null for encoding when creating
the XmlTextWriter. This will output UTF-8 data, but omit the encoding from
the processing instruction in the XML.

What is the syntax for that? This doesn't work:

Dim objX As New XmlTextWriter(Response.OutputStream, Encoding.null)

-Darrel
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

darrel said:
It doesn't. That's the problem.

The issue is that even though I set the XMLtextWriter to output UTF-8, it
doesn't, and still appends a UTF-16 XML header.


What is the syntax for that? This doesn't work:

Dim objX As New XmlTextWriter(Response.OutputStream, Encoding.null)

Dim objX As New XmlTextWriter(Response.OutputStream, null)
 
D

darrel

Dim objX As New XmlTextWriter(Response.OutputStream, null)

VS.net doesn't like that either. It's telling me 'Null is not delcared'

I guess a different question is why does it output utf-16 even when I tell
it to use encoding.utf8?

-Darrel
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

darrel said:
VS.net doesn't like that either. It's telling me 'Null is not delcared'

Sorry. In VB you use Nothing instead of null.
I guess a different question is why does it output utf-16 even when I tell
it to use encoding.utf8?

I think that the HttpResponseStream object might have an encoding
internally. The same problem occurs if you create an XmlTextWriter that
uses a StringWriter that uses a StringBuilder, as observed in this thread:

http://www.devnewsgroups.net/group/microsoft.public.dotnet.xml/topic60955.aspx

To prevent this, you can create an XmlTextWriter that uses a
MemoryStream, that way there is no encoding in the stream that will
affect the XmlTextWriter. Use Response.BinaryWrite to output the
contents of the MemoryStream.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top