I just had a great revelation as I was putting together my SSCCE
for the question I was going to ask. So it has changed my
question. How do I do the conversion of unicode escape sequences
to a String that are done by string literals?
String s = "\u0066\u0065\u0064";
becomes "fed" but if you create a String with \u0066\u0065\u0064 in
it without using the literal it stays \u0066\u0065\u0064. Is there
a built in mechanism in Java for doing that translation to a
String?
I don't think there is anything built in.
But it is trivial to code.
This was posted just a few months back:
import java.util.regex.Matcher; import java.util.regex.Pattern;
public class Unescape { private static final Pattern p =
Pattern.compile("\\\\u([0-9A-F]{4})"); public static String
U2U(String s) { //String res = s; //Matcher m = p.matcher(res);
//while (m.find()) { // res = res.replaceAll("\\" + m.group(0),
Character.toString((char) Integer.parseInt(m.group(1), 16))); //}
//return res; Matcher m = p.matcher(s); StringBuffer res = new
StringBuffer(); while (m.find()) { m.appendReplacement(res,
Character.toString((char) Integer.parseInt(m.group(1), 16))); }
m.appendTail(res); return res.toString(); } public static void
main(String[] args) {
System.out.println(U2U("\\u0041\\u0042\\u0043\\u000A\\u0031\\u0032\\u0033"));
} }
Arne
Well, brilliant minds think alike. Where were you when I asked the
first time

. I don't remember a thread on this going by but that's
getting harder to do all the time. I originally had String.valueOf()
instead of Character.toString(). I think the latter is better but not
sure if it makes any difference. Could be a non-trivial Unicode gotcha
eh Daniel?
Thanks everybody.
import java.util.regex.*;
public class test6 {
public static void main(String[] args) {
String clear = "byte me!";
System.out.println(clear);
String escpd = unicodeEscape(clear);
System.out.println(escpd);
Pattern p = Pattern.compile("\\\\u([0-9a-fA-F]{4})");
Matcher m = p.matcher(escpd);
StringBuffer buf = new StringBuffer();
while (m.find()) {
String repl =
Character.toString((char)Integer.parseInt(m.group(1),16));
m.appendReplacement(buf,repl);
}
m.appendTail(buf);
System.out.println(buf);
}
public static String unicodeEscape(char c) {
return String.format("\\u%04x",(int)c);
}
public static String unicodeEscape(Character c) {
if (c == null)
return null;
return unicodeEscape(c.charValue());
}
public static String unicodeEscape(String str) {
StringBuilder buf = new StringBuilder();
for (int i=0; i<str.length(); i++)
buf.append(unicodeEscape(str.charAt(i)));
return buf.toString();
}
}
C:\Documents and Settings\Knute Johnson>java test6
byte me!
\u0062\u0079\u0074\u0065\u0020\u006d\u0065\u0021
byte me!