UTF8>UNICODE

Discussion in 'ASP General' started by Meelis Lilbok, Apr 25, 2006.

  1. Hi

    My ASP pages uses UTF-8 encoding.

    How to convert UTF-8 text from Request.Form("text") to UNICODE for searching
    frm MSSQL Database?



    Best regards;
    Meelis
     
    Meelis Lilbok, Apr 25, 2006
    #1
    1. Advertising

  2. "Meelis Lilbok" <> wrote in message
    news:%...
    > Hi
    >
    > My ASP pages uses UTF-8 encoding.
    >
    > How to convert UTF-8 text from Request.Form("text") to UNICODE for

    searching
    > frm MSSQL Database?
    >
    >
    >
    > Best regards;
    > Meelis
    >
    >
    >


    x = Request.Form("text").

    x now contains a Unicode string

    When passing to a ADO command object parameter make sure the parameter type
    is adVarWChar.

    Anthony.
     
    Anthony Jones, Apr 25, 2006
    #2
    1. Advertising

  3. >
    > x = Request.Form("text").


    Nope, x is in UTF-8 format! Thats the problem

    I use activex dll and API calls to convert UTF-8 to UNICODE, but where use
    of activex is disabled this will not work

    Meelis
     
    Meelis Lilbok, Apr 25, 2006
    #3
  4. "Meelis Lilbok" <> wrote in message
    news:%23G%...
    > >
    > > x = Request.Form("text").

    >
    > Nope, x is in UTF-8 format! Thats the problem
    >
    > I use activex dll and API calls to convert UTF-8 to UNICODE, but where use
    > of activex is disabled this will not work
    >
    > Meelis
    >


    VBScript supports only one string format and that is Unicode.

    I suspect that the form submission is using UTF-8 but the server side script
    doesn't know that and is treating it as ISO-8859-1 or the like. Hence you
    are getting a Unicode string that contains a series of UTF-8 encodings.

    What is the character encoding of page that contains the text control?

    Does the page actually inform the client of the character encoding used for
    the page?

    What method is used to submit the form GET or POST?

    What is the Enctype of the form?

    Is AcceptCharset specified for the Form?

    What Browser are you using?

    Anthony.
     
    Anthony Jones, Apr 25, 2006
    #4
  5. > What is the character encoding of page that contains the text control?
    UTF-8


    >
    > Does the page actually inform the client of the character encoding used
    > for
    > the page?

    Yes


    > What method is used to submit the form GET or POST?

    POST

    > What is the Enctype of the form?


    None, because page encoding is UTF-8

    > Is AcceptCharset specified for the Form?

    No

    > What Browser are you using?

    IE6

    Meelis
     
    Meelis Lilbok, Apr 25, 2006
    #5
  6. For example

    If i enter into text box estonian word "väike"
    and submit form to antoher pages search.asp
    and read Request.Form("text")
    i get väike (UTF-8)



    Meelis








    "Anthony Jones" <> wrote in message
    news:...
    >
    > "Meelis Lilbok" <> wrote in message
    > news:%23G%...
    >> >
    >> > x = Request.Form("text").

    >>
    >> Nope, x is in UTF-8 format! Thats the problem
    >>
    >> I use activex dll and API calls to convert UTF-8 to UNICODE, but where
    >> use
    >> of activex is disabled this will not work
    >>
    >> Meelis
    >>

    >
    > VBScript supports only one string format and that is Unicode.
    >
    > I suspect that the form submission is using UTF-8 but the server side
    > script
    > doesn't know that and is treating it as ISO-8859-1 or the like. Hence you
    > are getting a Unicode string that contains a series of UTF-8 encodings.
    >
    > What is the character encoding of page that contains the text control?
    >
    > Does the page actually inform the client of the character encoding used
    > for
    > the page?
    >
    > What method is used to submit the form GET or POST?
    >
    > What is the Enctype of the form?
    >
    > Is AcceptCharset specified for the Form?
    >
    > What Browser are you using?
    >
    > Anthony.
    >
    >
     
    Meelis Lilbok, Apr 25, 2006
    #6
  7. "Meelis Lilbok" <> wrote in message
    news:...
    > For example
    >
    > If i enter into text box estonian word "väike"
    > and submit form to antoher pages search.asp
    > and read Request.Form("text")
    > i get väike (UTF-8)
    >
    >


    Having looked into it a bit more it would seem that the forms approach just
    isn't compatible with UTF-8 or unicode. There doesn't seem to be a way to
    inform the server of the actual charset used to encode the form values.

    I'm actually quite amazed at this.

    What do you actually need to do?

    Do you need to support input characters beyond ISO-8859-1? If not I would
    suggest you ditch UTF-8 and use ISO-8859-1 everywhere instead.

    Other wise it is possible to do the decoding in VBScript yourself but it's
    really messy. A small VB6 component would make this a lot easier.

    Ditching Forms may be another option and post XML instead. (This is what I
    do, I don't use forms)

    Anthony.
     
    Anthony Jones, Apr 25, 2006
    #7
  8. Hi

    cant use ISO-8859-1, beacuse i need support cyrillic chars too.
    its easier to use my activex dll with convert functions :))


    Best Regadrs;
    Meelis




    "Anthony Jones" <> wrote in message
    news:%...
    >
    > "Meelis Lilbok" <> wrote in message
    > news:...
    >> For example
    >>
    >> If i enter into text box estonian word "väike"
    >> and submit form to antoher pages search.asp
    >> and read Request.Form("text")
    >> i get väike (UTF-8)
    >>
    >>

    >
    > Having looked into it a bit more it would seem that the forms approach
    > just
    > isn't compatible with UTF-8 or unicode. There doesn't seem to be a way to
    > inform the server of the actual charset used to encode the form values.
    >
    > I'm actually quite amazed at this.
    >
    > What do you actually need to do?
    >
    > Do you need to support input characters beyond ISO-8859-1? If not I would
    > suggest you ditch UTF-8 and use ISO-8859-1 everywhere instead.
    >
    > Other wise it is possible to do the decoding in VBScript yourself but it's
    > really messy. A small VB6 component would make this a lot easier.
    >
    > Ditching Forms may be another option and post XML instead. (This is what
    > I
    > do, I don't use forms)
    >
    > Anthony.
    >
    >
     
    Meelis Lilbok, Apr 26, 2006
    #8
  9. "Meelis Lilbok" <> wrote in message
    news:%...
    > Hi
    >
    > My ASP pages uses UTF-8 encoding.
    >
    > How to convert UTF-8 text from Request.Form("text") to UNICODE for
    > searching frm MSSQL Database?


    use at the first line of your ASP page
    <% codepage=65001%>

    --
    compatible web farm Session replacement for Asp and Asp.Net
    http://www.nieropwebconsult.nl/asp_session_manager.htm
     
    Egbert Nierop \(MVP for IIS\), Apr 26, 2006
    #9
  10. "Egbert Nierop (MVP for IIS)" <> wrote in
    message news:...
    >
    > "Meelis Lilbok" <> wrote in message
    > news:%...
    > > Hi
    > >
    > > My ASP pages uses UTF-8 encoding.
    > >
    > > How to convert UTF-8 text from Request.Form("text") to UNICODE for
    > > searching frm MSSQL Database?

    >
    > use at the first line of your ASP page
    > <% codepage=65001%>
    >


    did you mean:-

    <%@ codepage=65001 %>

    I don't think that helps. The value of session.codepage doesn't seem to
    impact the assumptions made by server about the encoding of the request
    data.



    > --
    > compatible web farm Session replacement for Asp and Asp.Net
    > http://www.nieropwebconsult.nl/asp_session_manager.htm
    >
     
    Anthony Jones, Apr 26, 2006
    #10
  11. "Anthony Jones" <> wrote in message
    news:...
    >
    > "Egbert Nierop (MVP for IIS)" <> wrote in
    > message news:...
    >>
    >> "Meelis Lilbok" <> wrote in message
    >> news:%...
    >> > Hi
    >> >
    >> > My ASP pages uses UTF-8 encoding.
    >> >
    >> > How to convert UTF-8 text from Request.Form("text") to UNICODE for
    >> > searching frm MSSQL Database?

    >>
    >> use at the first line of your ASP page
    >> <% codepage=65001%>
    >>

    >
    > did you mean:-
    >
    > <%@ codepage=65001 %>
    >
    > I don't think that helps. The value of session.codepage doesn't seem to
    > impact the assumptions made by server about the encoding of the request
    > data.


    however you are wrong :)

    This really is saying that all input Request.* and output (response.write)
    processes UTF-8 format.

    >
    >
    >> --
    >> compatible web farm Session replacement for Asp and Asp.Net
    >> http://www.nieropwebconsult.nl/asp_session_manager.htm
    >>

    >
    >
     
    Egbert Nierop \(MVP for IIS\), Apr 26, 2006
    #11
  12. Hi Egbert


    Problem is not displayng UTF-8, all pages are using UTF-8
    Problem is when i wanna make a query from MSSQL server, then i must convert
    UTF-8 to UNICODE.

    And <% codepage=65001%> does not work on IIS4 :)

    And this is only possible when i use ActiveX DLL with MultiByteToWidechar
    and WideCharToMultybite API's.

    Meelis






    "Egbert Nierop (MVP for IIS)" <> wrote in
    message news:...
    >
    > "Anthony Jones" <> wrote in message
    > news:...
    >>
    >> "Egbert Nierop (MVP for IIS)" <> wrote in
    >> message news:...
    >>>
    >>> "Meelis Lilbok" <> wrote in message
    >>> news:%...
    >>> > Hi
    >>> >
    >>> > My ASP pages uses UTF-8 encoding.
    >>> >
    >>> > How to convert UTF-8 text from Request.Form("text") to UNICODE for
    >>> > searching frm MSSQL Database?
    >>>
    >>> use at the first line of your ASP page
    >>> <% codepage=65001%>
    >>>

    >>
    >> did you mean:-
    >>
    >> <%@ codepage=65001 %>
    >>
    >> I don't think that helps. The value of session.codepage doesn't seem to
    >> impact the assumptions made by server about the encoding of the request
    >> data.

    >
    > however you are wrong :)
    >
    > This really is saying that all input Request.* and output (response.write)
    > processes UTF-8 format.
    >
    >>
    >>
    >>> --
    >>> compatible web farm Session replacement for Asp and Asp.Net
    >>> http://www.nieropwebconsult.nl/asp_session_manager.htm
    >>>

    >>
    >>

    >
     
    Meelis Lilbok, Apr 27, 2006
    #12
  13. "Meelis Lilbok" <> wrote in message
    news:...
    > Hi Egbert
    >
    >
    > Problem is not displayng UTF-8, all pages are using UTF-8
    > Problem is when i wanna make a query from MSSQL server, then i must
    > convert UTF-8 to UNICODE.
    >
    > And <% codepage=65001%> does not work on IIS4 :)


    Why didn't you say so.
    IIS4 indeed does not support that. Or better said, Oleautomation does not
    support, so ADO and others do not support that either.
    I'd really work on asking your boss upgrading! Because, if you need to
    convert it manually, it will be a hard job, you'll end up converting all SQL
    data / user-input data etc!


    >And this is only possible when i use ActiveX DLL with MultiByteToWidechar
    >and WideCharToMultybite API's.


    > Meelis
    >
    >
    >
    >
    >
    >
    > "Egbert Nierop (MVP for IIS)" <> wrote in
    > message news:...
    >>
    >> "Anthony Jones" <> wrote in message
    >> news:...
    >>>
    >>> "Egbert Nierop (MVP for IIS)" <> wrote in
    >>> message news:...
    >>>>
    >>>> "Meelis Lilbok" <> wrote in message
    >>>> news:%...
    >>>> > Hi
    >>>> >
    >>>> > My ASP pages uses UTF-8 encoding.
    >>>> >
    >>>> > How to convert UTF-8 text from Request.Form("text") to UNICODE for
    >>>> > searching frm MSSQL Database?
    >>>>
    >>>> use at the first line of your ASP page
    >>>> <% codepage=65001%>
    >>>>
    >>>
    >>> did you mean:-
    >>>
    >>> <%@ codepage=65001 %>
    >>>
    >>> I don't think that helps. The value of session.codepage doesn't seem to
    >>> impact the assumptions made by server about the encoding of the request
    >>> data.

    >>
    >> however you are wrong :)
    >>
    >> This really is saying that all input Request.* and output
    >> (response.write) processes UTF-8 format.
    >>
    >>>
    >>>
    >>>> --
    >>>> compatible web farm Session replacement for Asp and Asp.Net
    >>>> http://www.nieropwebconsult.nl/asp_session_manager.htm
    >>>>
    >>>
    >>>

    >>

    >
    >
     
    Egbert Nierop \(MVP for IIS\), Apr 27, 2006
    #13
  14. Yeah i know

    Some our clients still!! use IIS4 and then i use again my ActiveX DLL to
    convert all strings to UTF-8, works fine ;)


    Meelis



    "Egbert Nierop (MVP for IIS)" <> wrote in
    message news:Oq$...
    >
    > "Meelis Lilbok" <> wrote in message
    > news:...
    >> Hi Egbert
    >>
    >>
    >> Problem is not displayng UTF-8, all pages are using UTF-8
    >> Problem is when i wanna make a query from MSSQL server, then i must
    >> convert UTF-8 to UNICODE.
    >>
    >> And <% codepage=65001%> does not work on IIS4 :)

    >
    > Why didn't you say so.
    > IIS4 indeed does not support that. Or better said, Oleautomation does not
    > support, so ADO and others do not support that either.
    > I'd really work on asking your boss upgrading! Because, if you need to
    > convert it manually, it will be a hard job, you'll end up converting all
    > SQL data / user-input data etc!
    >
    >
    >>And this is only possible when i use ActiveX DLL with MultiByteToWidechar
    >>and WideCharToMultybite API's.

    >
    >> Meelis
    >>
    >>
    >>
    >>
    >>
    >>
    >> "Egbert Nierop (MVP for IIS)" <> wrote in
    >> message news:...
    >>>
    >>> "Anthony Jones" <> wrote in message
    >>> news:...
    >>>>
    >>>> "Egbert Nierop (MVP for IIS)" <> wrote in
    >>>> message news:...
    >>>>>
    >>>>> "Meelis Lilbok" <> wrote in message
    >>>>> news:%...
    >>>>> > Hi
    >>>>> >
    >>>>> > My ASP pages uses UTF-8 encoding.
    >>>>> >
    >>>>> > How to convert UTF-8 text from Request.Form("text") to UNICODE for
    >>>>> > searching frm MSSQL Database?
    >>>>>
    >>>>> use at the first line of your ASP page
    >>>>> <% codepage=65001%>
    >>>>>
    >>>>
    >>>> did you mean:-
    >>>>
    >>>> <%@ codepage=65001 %>
    >>>>
    >>>> I don't think that helps. The value of session.codepage doesn't seem
    >>>> to
    >>>> impact the assumptions made by server about the encoding of the request
    >>>> data.
    >>>
    >>> however you are wrong :)
    >>>
    >>> This really is saying that all input Request.* and output
    >>> (response.write) processes UTF-8 format.
    >>>
    >>>>
    >>>>
    >>>>> --
    >>>>> compatible web farm Session replacement for Asp and Asp.Net
    >>>>> http://www.nieropwebconsult.nl/asp_session_manager.htm
    >>>>>
    >>>>
    >>>>
    >>>

    >>
    >>

    >
     
    Meelis Lilbok, Apr 27, 2006
    #14
  15. "Egbert Nierop (MVP for IIS)" <> wrote in
    message news:...
    >
    > "Anthony Jones" <> wrote in message
    > news:...
    > >
    > > "Egbert Nierop (MVP for IIS)" <> wrote in
    > > message news:...
    > >>
    > >> "Meelis Lilbok" <> wrote in message
    > >> news:%...
    > >> > Hi
    > >> >
    > >> > My ASP pages uses UTF-8 encoding.
    > >> >
    > >> > How to convert UTF-8 text from Request.Form("text") to UNICODE for
    > >> > searching frm MSSQL Database?
    > >>
    > >> use at the first line of your ASP page
    > >> <% codepage=65001%>
    > >>

    > >
    > > did you mean:-
    > >
    > > <%@ codepage=65001 %>
    > >
    > > I don't think that helps. The value of session.codepage doesn't seem to
    > > impact the assumptions made by server about the encoding of the request
    > > data.

    >
    > however you are wrong :)
    >


    I am. Don't how I managed it in my first round of tests. Did them again
    and it works as you say.

    The receiving page needs to be using a codepage that matches the character
    set that the client browser thinks the source page is using.

    In IIS 5.1/IIS 6 setting Response.codepage has the same effect which is a
    bit counter intuative.


    > This really is saying that all input Request.* and output (response.write)
    > processes UTF-8 format.
    >
    > >
    > >
    > >> --
    > >> compatible web farm Session replacement for Asp and Asp.Net
    > >> http://www.nieropwebconsult.nl/asp_session_manager.htm
    > >>

    > >
    > >

    >
     
    Anthony Jones, Apr 27, 2006
    #15
  16. "Anthony Jones" <> wrote in message
    news:...
    >
    > "Egbert Nierop (MVP for IIS)" <> wrote in
    > message news:...
    >>
    >> "Anthony Jones" <> wrote in message
    >> news:...
    >> >
    >> > "Egbert Nierop (MVP for IIS)" <> wrote in
    >> > message news:...
    >> >>
    >> >> "Meelis Lilbok" <> wrote in message
    >> >> news:%...
    >> >> > Hi
    >> >> >
    >> >> > My ASP pages uses UTF-8 encoding.
    >> >> >
    >> >> > How to convert UTF-8 text from Request.Form("text") to UNICODE for
    >> >> > searching frm MSSQL Database?
    >> >>
    >> >> use at the first line of your ASP page
    >> >> <% codepage=65001%>
    >> >>
    >> >
    >> > did you mean:-
    >> >
    >> > <%@ codepage=65001 %>
    >> >
    >> > I don't think that helps. The value of session.codepage doesn't seem
    >> > to
    >> > impact the assumptions made by server about the encoding of the request
    >> > data.

    >>
    >> however you are wrong :)
    >>

    >
    > I am. Don't how I managed it in my first round of tests. Did them again
    > and it works as you say.
    >
    > The receiving page needs to be using a codepage that matches the character
    > set that the client browser thinks the source page is using.


    Right, and that is set by using

    Response.CharSet = "utf-8"

    > In IIS 5.1/IIS 6 setting Response.codepage has the same effect which is a
    > bit counter intuative.
    >
    >
    >> This really is saying that all input Request.* and output
    >> (response.write)
    >> processes UTF-8 format.
     
    Egbert Nierop \(MVP for IIS\), Apr 28, 2006
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Spamtrap

    UTF8 to Unicode conversion

    Spamtrap, Jul 30, 2004, in forum: Perl
    Replies:
    6
    Views:
    9,939
    Joe Smith
    Jul 31, 2004
  2. Abe Simpson
    Replies:
    1
    Views:
    2,477
    Joerg Jooss
    Dec 15, 2005
  3. Jeff Higgins

    convert Java unicode escape to utf8

    Jeff Higgins, Jul 6, 2007, in forum: Java
    Replies:
    12
    Views:
    11,931
    Jeff Higgins
    Jul 12, 2007
  4. Yves Dorfsman

    Unicode (UTF8) in dbhas on 2.5

    Yves Dorfsman, Oct 20, 2008, in forum: Python
    Replies:
    9
    Views:
    369
    Paul Boddie
    Oct 22, 2008
  5. gry
    Replies:
    2
    Views:
    780
    Alf P. Steinbach
    Mar 13, 2012
Loading...

Share This Page