escaped high ascii chars in URI's and getParameter

  • Thread starter Java script Dude
  • Start date
J

Java script Dude

Hello,

When URI parameter to be posted to a servlet/jsp contains high
(128-255) ASCII characters such as european accent chars, they should
be escaped using encodeURIParameter in JavaScript prior to submitting
to server. These chars are then replaced by default with a %xx%xx...
escaped sequences representing utf-8 value.

When my servlet parses the parameter, it is not processing the %xx%xx
utf-8 char as a single char but is processing each %xx as an individual
char.

Eg.
URI parameter value = "_201_É_.test"
JS 1.5 method encodeURIParameter gives = "_201_%C3%89_.test"
HTTPRequest.getParameter gives = "_201_É_.test"

[question]
Assuming that the encodeURIParameter method is doing the correct job
encodeing, is there any way to tell servlet engine to interpret the URI
as a utf-8 encoded sequence? How can I decode this parameter properly.

Notes:
. Deprecated JS method escape() encodes in hex representation of
extended ASCII value which servlet engine properly parses but since
this method is deprecated I am trying to avoid using it. Since my
application is going global, I need to look beyond ASCII.
. I am trying to avoid using HTTP Post because of a bug in IE 5.5 sp1
which neglegent sys admins in my company are still deploying with
re-images :[

Thanks,

JsD
 
J

Java script Dude

Solved...

Since I am locked down to Java 1.3.1 by n-tier app vendor, I cannot use
URLDecoder.

But String constructor has a nice way of converting a byte[].

String s= s.getParameter("s")
if(s!=null){
s=new String(s.getBytes(), "UTF-8");
}

Thats it :]

Now only if I could get a similar method for TCL 8.0 .....?
 
R

Roedy Green

Since I am locked down to Java 1.3.1 by n-tier app vendor, I cannot use
URLDecoder.

Look at the code for URLDecoder. It is pretty simple. You could write
your own class.

Another possibility is to use Base64 or even better, Base64u. See
http://mindprod.com/jgloss/armouring.html

Base64u does not balloon up the way URLDecoder does when you have a
lot of unusual characters. However, when the chars are nearly all
normal then it still expands by about a third.

Source code for both available.

Here is what the key method in URLDecoder looks like:

public static String decode(String s, String enc)
throws UnsupportedEncodingException{

boolean needToChange = false;
int numChars = s.length();
StringBuffer sb = new StringBuffer(numChars > 500 ? numChars /
2 : numChars);
int i = 0;

if (enc.length() == 0) {
throw new UnsupportedEncodingException ("URLDecoder: empty
string enc parameter");
}

char c;
byte[] bytes = null;
while (i < numChars) {
c = s.charAt(i);
switch (c) {
case '+':
sb.append(' ');
i++;
needToChange = true;
break;
case '%':
/*
* Starting with this instance of %, process all
* consecutive substrings of the form %xy. Each
* substring %xy will yield a byte. Convert all
* consecutive bytes obtained this way to whatever
* character(s) they represent in the provided
* encoding.
*/

try {

// (numChars-i)/3 is an upper bound for the number
// of remaining bytes
if (bytes == null)
bytes = new byte[(numChars-i)/3];
int pos = 0;

while ( ((i+2) < numChars) &&
(c=='%')) {
bytes[pos++] =
(byte)Integer.parseInt(s.substring(i+1,i+3),16);
i+= 3;
if (i < numChars)
c = s.charAt(i);
}

// A trailing, incomplete byte encoding such as
// "%x" will cause an exception to be thrown

if ((i < numChars) && (c=='%'))
throw new IllegalArgumentException(
"URLDecoder: Incomplete trailing escape (%)
pattern");

sb.append(new String(bytes, 0, pos, enc));
} catch (NumberFormatException e) {
throw new IllegalArgumentException(
"URLDecoder: Illegal hex characters in escape (%)
pattern - "
+ e.getMessage());
}
needToChange = true;
break;
default:
sb.append(c);
i++;
break;
}
}

return (needToChange? sb.toString() : s);
}

and here is the corresponding encode method, that makes use of a
statically constructed bit map to determine which characters are
"awkward" needing encoding.

public static String encode(String s, String enc)
throws UnsupportedEncodingException {

boolean needToChange = false;
boolean wroteUnencodedChar = false;
int maxBytesPerChar = 10; // rather arbitrary limit, but safe
for now
StringBuffer out = new StringBuffer(s.length());
ByteArrayOutputStream buf = new
ByteArrayOutputStream(maxBytesPerChar);

OutputStreamWriter writer = new OutputStreamWriter(buf, enc);

for (int i = 0; i < s.length(); i++) {
int c = (int) s.charAt(i);
//System.out.println("Examining character: " + c);
if (dontNeedEncoding.get(c)) {
if (c == ' ') {
c = '+';
needToChange = true;
}
//System.out.println("Storing: " + c);
out.append((char)c);
wroteUnencodedChar = true;
} else {
// convert to external encoding before hex conversion
try {
if (wroteUnencodedChar) { // Fix for 4407610
writer = new OutputStreamWriter(buf, enc);
wroteUnencodedChar = false;
}
writer.write(c);
/*
* If this character represents the start of a
Unicode
* surrogate pair, then pass in two characters.
It's not
* clear what should be done if a bytes reserved
in the
* surrogate pairs range occurs outside of a legal
* surrogate pair. For now, just treat it as if it
were
* any other character.
*/
if (c >= 0xD800 && c <= 0xDBFF) {
/*
System.out.println(Integer.toHexString(c)
+ " is high surrogate");
*/
if ( (i+1) < s.length()) {
int d = (int) s.charAt(i+1);
/*
System.out.println("\tExamining "
+ Integer.toHexString(d));
*/
if (d >= 0xDC00 && d <= 0xDFFF) {
/*
System.out.println("\t"
+ Integer.toHexString(d)
+ " is low surrogate");
*/
writer.write(d);
i++;
}
}
}
writer.flush();
} catch(IOException e) {
buf.reset();
continue;
}
byte[] ba = buf.toByteArray();
for (int j = 0; j < ba.length; j++) {
out.append('%');
char ch = Character.forDigit((ba[j] >> 4) & 0xF,
16);
// converting to use uppercase letter as part of
// the hex value if ch is a letter.
if (Character.isLetter(ch)) {
ch -= caseDiff;
}
out.append(ch);
ch = Character.forDigit(ba[j] & 0xF, 16);
if (Character.isLetter(ch)) {
ch -= caseDiff;
}
out.append(ch);
}
buf.reset();
needToChange = true;
}
}

return (needToChange? out.toString() : s);
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top