UTF8>UNICODE

M

Meelis Lilbok

Hi

My ASP pages uses UTF-8 encoding.

How to convert UTF-8 text from Request.Form("text") to UNICODE for searching
frm MSSQL Database?



Best regards;
Meelis
 
A

Anthony Jones

Meelis Lilbok said:
Hi

My ASP pages uses UTF-8 encoding.

How to convert UTF-8 text from Request.Form("text") to UNICODE for searching
frm MSSQL Database?



Best regards;
Meelis

x = Request.Form("text").

x now contains a Unicode string

When passing to a ADO command object parameter make sure the parameter type
is adVarWChar.

Anthony.
 
M

Meelis Lilbok

x = Request.Form("text").

Nope, x is in UTF-8 format! Thats the problem

I use activex dll and API calls to convert UTF-8 to UNICODE, but where use
of activex is disabled this will not work

Meelis
 
A

Anthony Jones

Meelis Lilbok said:
Nope, x is in UTF-8 format! Thats the problem

I use activex dll and API calls to convert UTF-8 to UNICODE, but where use
of activex is disabled this will not work

Meelis

VBScript supports only one string format and that is Unicode.

I suspect that the form submission is using UTF-8 but the server side script
doesn't know that and is treating it as ISO-8859-1 or the like. Hence you
are getting a Unicode string that contains a series of UTF-8 encodings.

What is the character encoding of page that contains the text control?

Does the page actually inform the client of the character encoding used for
the page?

What method is used to submit the form GET or POST?

What is the Enctype of the form?

Is AcceptCharset specified for the Form?

What Browser are you using?

Anthony.
 
M

Meelis Lilbok

What is the character encoding of page that contains the text control?
UTF-8

Does the page actually inform the client of the character encoding used
for
the page? Yes


What method is used to submit the form GET or POST? POST

What is the Enctype of the form?

None, because page encoding is UTF-8
Is AcceptCharset specified for the Form? No

What Browser are you using?
IE6

Meelis
 
M

Meelis Lilbok

For example

If i enter into text box estonian word "väike"
and submit form to antoher pages search.asp
and read Request.Form("text")
i get väike (UTF-8)



Meelis
 
A

Anthony Jones

Meelis Lilbok said:
For example

If i enter into text box estonian word "väike"
and submit form to antoher pages search.asp
and read Request.Form("text")
i get väike (UTF-8)

Having looked into it a bit more it would seem that the forms approach just
isn't compatible with UTF-8 or unicode. There doesn't seem to be a way to
inform the server of the actual charset used to encode the form values.

I'm actually quite amazed at this.

What do you actually need to do?

Do you need to support input characters beyond ISO-8859-1? If not I would
suggest you ditch UTF-8 and use ISO-8859-1 everywhere instead.

Other wise it is possible to do the decoding in VBScript yourself but it's
really messy. A small VB6 component would make this a lot easier.

Ditching Forms may be another option and post XML instead. (This is what I
do, I don't use forms)

Anthony.
 
M

Meelis Lilbok

Hi

cant use ISO-8859-1, beacuse i need support cyrillic chars too.
its easier to use my activex dll with convert functions :))


Best Regadrs;
Meelis
 
E

Egbert Nierop \(MVP for IIS\)

Meelis Lilbok said:
Hi

My ASP pages uses UTF-8 encoding.

How to convert UTF-8 text from Request.Form("text") to UNICODE for
searching frm MSSQL Database?

use at the first line of your ASP page
<% codepage=65001%>
 
A

Anthony Jones

Egbert Nierop (MVP for IIS) said:
use at the first line of your ASP page
<% codepage=65001%>

did you mean:-

<%@ codepage=65001 %>

I don't think that helps. The value of session.codepage doesn't seem to
impact the assumptions made by server about the encoding of the request
data.


 
E

Egbert Nierop \(MVP for IIS\)

Anthony Jones said:
did you mean:-

<%@ codepage=65001 %>

I don't think that helps. The value of session.codepage doesn't seem to
impact the assumptions made by server about the encoding of the request
data.

however you are wrong :)

This really is saying that all input Request.* and output (response.write)
processes UTF-8 format.
 
M

Meelis Lilbok

Hi Egbert


Problem is not displayng UTF-8, all pages are using UTF-8
Problem is when i wanna make a query from MSSQL server, then i must convert
UTF-8 to UNICODE.

And <% codepage=65001%> does not work on IIS4 :)

And this is only possible when i use ActiveX DLL with MultiByteToWidechar
and WideCharToMultybite API's.

Meelis






Egbert Nierop (MVP for IIS) said:
Anthony Jones said:
did you mean:-

<%@ codepage=65001 %>

I don't think that helps. The value of session.codepage doesn't seem to
impact the assumptions made by server about the encoding of the request
data.

however you are wrong :)

This really is saying that all input Request.* and output (response.write)
processes UTF-8 format.
 
E

Egbert Nierop \(MVP for IIS\)

Meelis Lilbok said:
Hi Egbert


Problem is not displayng UTF-8, all pages are using UTF-8
Problem is when i wanna make a query from MSSQL server, then i must
convert UTF-8 to UNICODE.

And <% codepage=65001%> does not work on IIS4 :)

Why didn't you say so.
IIS4 indeed does not support that. Or better said, Oleautomation does not
support, so ADO and others do not support that either.
I'd really work on asking your boss upgrading! Because, if you need to
convert it manually, it will be a hard job, you'll end up converting all SQL
data / user-input data etc!
 
M

Meelis Lilbok

Yeah i know

Some our clients still!! use IIS4 and then i use again my ActiveX DLL to
convert all strings to UTF-8, works fine ;)


Meelis
 
A

Anthony Jones

Egbert Nierop (MVP for IIS) said:
however you are wrong :)

I am. Don't how I managed it in my first round of tests. Did them again
and it works as you say.

The receiving page needs to be using a codepage that matches the character
set that the client browser thinks the source page is using.

In IIS 5.1/IIS 6 setting Response.codepage has the same effect which is a
bit counter intuative.

This really is saying that all input Request.* and output (response.write)
processes UTF-8 format.
 
E

Egbert Nierop \(MVP for IIS\)

Anthony Jones said:
I am. Don't how I managed it in my first round of tests. Did them again
and it works as you say.

The receiving page needs to be using a codepage that matches the character
set that the client browser thinks the source page is using.

Right, and that is set by using

Response.CharSet = "utf-8"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top