UNICODE input for CGI using C

P

puneet.p.shah

Dear All,
I'm trying to accept a multi-lingual string (UNICODE) in a
form and am trying to parse it. What i am getting is %XX (which is a
single byte, not 2 bytes). So, is the data getting lost? What format
is it, if it is not getting lost.

Thanx in advance,
Punit.
 
R

Richard Tobin

I'm trying to accept a multi-lingual string (UNICODE) in a
form and am trying to parse it. What i am getting is %XX (which is a
single byte, not 2 bytes). So, is the data getting lost? What format
is it, if it is not getting lost.

You should be getting 2 or more successive %XXs. HTML form data send
using GET is part of the URL Non-ASCII characters are represented in
UTF-8, then each byte of the UTF-8 sequence is encoded in hex as %XX.

See

http://www.ietf.org/rfc/rfc3986.txt
http://www.ietf.org/rfc/rfc2279.txt

For POST data, I can't find up-to-date documentation. The very old
http://www.w3.org/TR/html4/interact/forms.html describes the
application/x-www-form-urlencoded mime type, but it does not mention
non-ASCII characters. I think you'll find that it uses the same
method as GET, but it's possible that it might use the encoding
specified by the HTTP charset declaration rather than UTF-8. You'll
need to ask about that somewhere other than comp.lang.c.

-- Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top