XMLHTTP character issue - converting byte array to string

Y

yllar2005

Hi!

I'm using Msxml2.ServerXMLHTTP.3.0 to fetch a HTML page on a remote
server. The fetched page is then parsed and the information of interest
is extracted and send to the client browser.

However, the remote server does not specify any character coding in its
headers. If using ResponseText property in ServerXMLHTTP, some
international characters are not decoded correctly. This is due to
ResponseText assuming UTF-8 coding if no character set is specified.

My solution is to use the ResponseBody property which returns the web
page as an array of unsigned bytes. I then convert the data to a string
using the ADODB.Stream method as described here:
http://www.motobit.com/tips/detpg_binarytostring/

The string is then parsed and the required information is pulled out.

This solution works just fine but I wonder if there is some more
efficient (without the need for a byte to string converion) way to
solve the problem.

BR,
Yllar
 
M

Martin Honnen

This solution works just fine but I wonder if there is some more
efficient (without the need for a byte to string converion) way to
solve the problem.

Well if you use the third method described on that URL then the
ADODB.Stream object does all the work for you. The work itself (decoding
the bytes into a text string) can't be avoided.
 
A

Anthony Jones

Have I understood this correctly, the remote server is sending
international characters but not ones that are UTF-8 encoded and there is no
charset in the headers? Have I got that right?

Do you know the the implicit charset being set?

If so then simply

Set Response.CharSet = "whatever it is"
Response.BinaryWrite oXMLHTTP.ResponseBody


If not then you'll be asking the server to convert an unknown charset to a
known one which isn't possible.

Anthony.
 
Y

yllar2005

Yes, you got it right. If I request the server's content type by

Response.Write objxmlhttp.GetResponseHeader("Content-Type")

it will only return "text/html", hence no charset is sent.

The web page I try to download contains Swedish characters (åäö) and
I think that iso-8859-1 would work.

I don't think that the solution you suggest will work in my case since
I want to store the downloaded data in a variable and do some
manipulation prior to displaying it. If ResponseBody is used, the data
will be binary coded and text manipulation will not be possible.
 
A

Anthony Jones

I don't think that the solution you suggest will work in my case since
I want to store the downloaded data in a variable and do some
manipulation prior to displaying it. If ResponseBody is used, the data
will be binary coded and text manipulation will not be possible.

Ok. Since you know the charset to be ISO-8859-1 then use Martin's
suggestion.

Anthony.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top