Posting Unicode Form Values

A

Arnold Shore

A really weird thing (to me, anyway) I've encountered is in a UTF-8 test
script. Here, the input - a single two-byte Cyrillic character (as reported
by Javascript in the originating form) is posted to the receiving script,
where IIS or IE has expanded that to a 4-byte field. -- while the display of
that character is correct.

Can someone pls explain that? What encoding is the latter?

AS
 
P

Paul Gorodyansky

Arnold Shore said:
A really weird thing (to me, anyway) I've encountered is in a UTF-8 test
script. Here, the input - a single two-byte Cyrillic character (as reported
by Javascript in the originating form) is posted to the receiving script,
where IIS or IE has expanded that to a 4-byte field. -- while the display of
that character is correct.

Can someone pls explain that? What encoding is the latter?

Are you sure it's 4-byte? It's usually 6 bytes. For example,
Russian small 'd' in UTF-8 is 2-byte thing 0xD0B4.
What browser sends from a form is URL-encoding
( http://www.blooberry.com/indexdot/html/topics/urlencoding.htm )
of the above:
%D0%B4 - each byte as 3 ASCII symbols, 6 alltogether.
You can see it your self on my test pages:
a) URL-encoded data on single-byte Cyrillic windows-1251 page:
http://ourworld.compuserve.com/homepages/PaulGor/inp1251.htm
b) URL-encoded data (same Russian letters for example) on UTF-8 page:
http://ourworld.compuserve.com/homepages/PaulGor/utf8euro.htm


--
Regards,
Paul Gorodyansky
"Cyrillic (Russian): instructions for Windows and Internet":
http://ourworld.compuserve.com/homepages/PaulGor/
 
A

Arnold Shore

Paul, thanks heaps; That's a big help. To answer, LenB(string) reports the
value as length 4. BUT:

I've looked at both the Cyrillic small YU and the small YA, and within the 4
bytes that I see the hex values are identical - while they should differ by
1. So I expect you're right re the length.
1. So how do I get at the length? (in ASP/VBScript)
2. And how do I decode the bytes into a value I can use further?
3. I don't see a "Russian small 'd' ", and 0xB4 is 180 - which appears
where?

I've been to your pages, but I 404 when I submit a Russian character. Some
temporary problem, I hope?

Thanks again, Paul. It's really appreciated.

AS
 
A

Arnold Shore

Pls disregard my prior posting; it's plain wrong. I'll get a night's sleep
and post something that's coherent. Sorry, all.

AS
 
P

Paul Gorodyansky

Hello!

Arnold Shore said:
Paul, thanks heaps; That's a big help. To answer, LenB(string) reports the
value as length 4.

Strange... May be then ASP does automatic URL-DEcoding. But then
why it's 4?
You probably should look at the famous "ASP Internationalization"
article by M.Kaplan that describes how to use non-Western encoding
data there:
http://msdn.microsoft.com/msdnmag/issues/0700/localize/

BUT:

I've looked at both the Cyrillic small YU and the small YA, and within the 4
bytes that I see the hex values are identical - while they should differ by
1. So I expect you're right re the length.

No, now I don't think I am right - you don't have 6.
Also, if you see the same hex values then it may be a corruption -
each Russian letters got replaces by some symbol when something
went wrong. I should've given you the link to M.Kaplan's article
1st time...
1. So how do I get at the length? (in ASP/VBScript)

I don't know - I never worked with ASP/DBCScript. But I do know
how browser performs Form submission and it's what I wrote
1st time.
2. And how do I decode the bytes into a value I can use further?
3. I don't see a "Russian small 'd' ", and 0xB4 is 180 - which appears
where?

I've been to your pages, but I 404 when I submit a Russian character. Some
temporary problem, I hope?

No, it's 'by design' - I cannot have any server-side code with my
ISP, so 0 as I wrote there - I just let a data submitted from a form
be visible in _address line_ - URL-encoded strings such as
%D0%B4 on UTF-8 page for small Russian 'd' - don't pay attention to
404 (which just means that there is no Receiving software on my
server - which is true!), just look at Address Bar - the results are
there instead of being sent to server-side code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top