Convert a given string into how Java would interpret it if used in code?

S

steve.chambers

Hi,

Is there any way to do this in Java? I'll try to explain what I mean a
bit better.

Given the following string in a text file:

This is a return:\r\nThis is a tab:\tand this is a backslash:\\

I want Java to interpret this as it would if this string were in code
by changing the escape characters into their single character
representations. I know I could do this by using a number of
String.replaceAll()'s but was wondering if there's a command to parse
the string & produce the result in a nicer way???

Thanks for any help with this...

Cheers,
Steve
 
S

steve.chambers

In fact I've just realised it's a bit more complicated than I thought
if using replaceAll(). If there was a double backslash followed by an
escape character e.g. \\t, which ever way round I do the replacing it's
going to run into problems. If I replace the \\ with \ first then \t
will still be replaced with the literal tab character afterwards. And
replacing the \t first wouldn't work either - what we want to end up
with is a literal backslash followed by a "t". I'm stuck! But clinging
onto the hope that there might be an API call somewhere that will take
the work away...
 
R

Robert Klemme

Please don't top post.

In fact I've just realised it's a bit more complicated than I thought
if using replaceAll(). If there was a double backslash followed by an
escape character e.g. \\t, which ever way round I do the replacing it's
going to run into problems. If I replace the \\ with \ first then \t
will still be replaced with the literal tab character afterwards. And
replacing the \t first wouldn't work either - what we want to end up
with is a literal backslash followed by a "t". I'm stuck! But clinging
onto the hope that there might be an API call somewhere that will take
the work away...

I don't see the point. If you have the sequence "\\t" in your string
then what you want afterwards is a single literal backslash and a t.
Otherwise you would have to have either one or three backslashes.

The way I typically code this is a single loop that replaces the
sequence "backslash anything" with the appropriate value.

HTH

Regards

robert
 
S

Simon

Robert said:
Please don't top post.



I don't see the point. If you have the sequence "\\t" in your string
then what you want afterwards is a single literal backslash and a t.
Otherwise you would have to have either one or three backslashes.

Take the string "\\t" and consider Steve's algorithm using replaceAll(). It can
be implemented in to ways:

Option 1:
1) First replace all "\\" by "\"
Result: "\t"
2) Then replace all "\t" by "[TAB]"
Result: "[TAB]"

Option 2:
1) First replace all "\t" by "[TAB]"
Result: "\[TAB]"
2) Then replace all "\t" by "[TAB]"
Result: "\[TAB]"

Both is not what you want ("\t"). It fails because a replacement character may
be replaced again. Iterating once over all characters of the original string and
appending the characters or their replacements to a new string should do the
job. This implementation should also be quick and easy to check for correctness.

Still, the question remains open whether this is implemented somewhere already.
Actually I think I did that a dozen times already...

Cheers,
Simon
 
S

steve.chambers

Take the string "\\t" and consider Steve's algorithm using replaceAll(). It can
be implemented in to ways:

Option 1:
1) First replace all "\\" by "\"
Result: "\t"
2) Then replace all "\t" by "[TAB]"
Result: "[TAB]"

Option 2:
1) First replace all "\t" by "[TAB]"
Result: "\[TAB]"
2) Then replace all "\t" by "[TAB]"
Result: "\[TAB]"

Both is not what you want ("\t"). It fails because a replacement character may
be replaced again. Iterating once over all characters of the original string and
appending the characters or their replacements to a new string should do the
job. This implementation should also be quick and easy to check for correctness.

Still, the question remains open whether this is implemented somewhere already.
Actually I think I did that a dozen times already...

Cheers,
Simon

Well said simon, that was what I meant. What I could do to botch this
is a replaceAll on the double backslashes first but replace them with a
string that would never be expected and which doesn't include a
backslash character (e.g. "`¬backslash~@") and then replace all these
back to backslashes at the end, having replaced the other escape
characters.

However as this is a bit of a botch I think I'll go with the suggested
looping method instead. Here's my first attempt, which converts the
particular escape characters that I need to be able to use in my text
file but doesn't bother with backspaces, unicode literals etc:


/**
* Replaces the following escape characters in a string with their
literal
* equivalents:
* \\f -> \f (form feed)
* \\n -> \n (new line)
* \\r -> \r (carriage return)
* \\t -> \t (tab)
* \\' -> \' (single quote)
* \\" -> \" (double quote)
* \\\\ -> \\ (backslash)
*
* @param inputString The string in which the replacements will be
made
* @return The string with all escape characters replaced by their
* equivalent literals
*/
public static String replaceEscapesWithLiterals(String inputString)
{
String returnString = "";
int inputStringLength = inputString.length();

int charNum = 0;
while (charNum < inputStringLength) {
char currentChar = inputString.charAt(charNum);
char literal = '\0';
if ((currentChar == '\\') && (charNum + 1 <
inputStringLength)) {
char nextChar = inputString.charAt(charNum + 1);
switch (nextChar) {
case 'f':
literal = '\f';
break;
case 'n':
literal = '\n';
break;
case 'r':
literal = '\r';
break;
case 't':
literal = '\t';
break;
case '\'':
literal = '\'';
break;
case '\"':
literal = '\"';
break;
case '\\':
literal = '\\';
break;
}
}
if (literal == '\0') {
returnString += currentChar;
charNum++;
} else {
returnString += literal;
charNum += 2;
}
}

return returnString;
}
 
R

Roland de Ruiter

Take the string "\\t" and consider Steve's algorithm using replaceAll(). It can
be implemented in to ways:

Option 1:
1) First replace all "\\" by "\"
Result: "\t"
2) Then replace all "\t" by "[TAB]"
Result: "[TAB]"

Option 2:
1) First replace all "\t" by "[TAB]"
Result: "\[TAB]"
2) Then replace all "\t" by "[TAB]"
Result: "\[TAB]"

Both is not what you want ("\t"). It fails because a replacement character may
be replaced again. Iterating once over all characters of the original string and
appending the characters or their replacements to a new string should do the
job. This implementation should also be quick and easy to check for correctness.

Still, the question remains open whether this is implemented somewhere already.
Actually I think I did that a dozen times already...

Cheers,
Simon

Well said simon, that was what I meant. What I could do to botch this
is a replaceAll on the double backslashes first but replace them with a
string that would never be expected and which doesn't include a
backslash character (e.g. "`¬backslash~@") and then replace all these
back to backslashes at the end, having replaced the other escape
characters.

However as this is a bit of a botch I think I'll go with the suggested
looping method instead. Here's my first attempt, which converts the
particular escape characters that I need to be able to use in my text
file but doesn't bother with backspaces, unicode literals etc:

[...]

Here's one I did earlier ;-)

// begin of ConvertEscapedChars.java
public class ConvertEscapedChars {
public static void main(String[] args) {
final String orig = "This is a return:\\r\\nThis is "
+ "a tab:\\tand this is a backslash:\\\\.\r\n"
+ "Others are the backspace \\b\r\nthe formfeed \\f\r\n"
+ "single quote \\'\r\ndouble quote \\\"\r\n"
+ "one character octal \\7\r\ntwo character octal \\76\r\n"
+ "three character octal \\176\r\n"
+ "two character octal escape followed by octal digit \\767\r\n"
+ "\t(note that this is not a 3-char octal escape because "
+ "first digit is bigger than 3)\r\n"
+ "two character octal escape followed by non-octal digit "
+ "\\768\r\n"
+ "one character octal escape followed by non-octal digit "
+ "\\78\r\n"
+ "an invalid escape sequence \\w\r\n"
+ "an unterminated escape sequence \\";

System.out.println("------- Original -------");
System.out.println(orig);
System.out.println("------------------------");
System.out.println();

final String conv = convertEscapedChars(orig);
System.out.println("------- Converted ------");
System.out.println(conv);
System.out.println("------------------------");
}

/**
* Replaces escape sequences in the given String <code>s</code> by
* the actual characters they represent. The escape sequences that
* are recognized are those that are defined in section 3.10.6 of
* the JLS, 3rd ed. Invalid escape sequences are left untouched.
*
* @param s
* the String to convert
* @return a String where escape sequences have been replaced by
* their actual character
* @see
http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#101089
*/
public static String convertEscapedChars(String s) {
int n = s == null ? 0 : s.length();
if (n == 0) {
return s; // null or empty string
}
StringBuffer result = new StringBuffer(n);
for (int i = 0; i < n; i++) {
char c = s.charAt(i);
if (c != '\\') {
result.append(c);
} else {
if (++i < n) {
c = s.charAt(i);
switch (c) {
case 'b':
result.append('\b');
break;
case 't':
result.append('\t');
break;
case 'n':
result.append('\n');
break;
case 'f':
result.append('\f');
break;
case 'r':
result.append('\r');
break;
case '\"':
result.append('\"');
break;
case '\'':
result.append('\'');
break;
case '\\':
result.append('\\');
break;
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
// OctalEscape
StringBuffer octal =
new StringBuffer(3).append(c);
if (i + 1 < n && (c = s.charAt(i + 1)) >= '0'
&& c <= '7') {
i++;
octal.append(c);
if (i + 1 < n && (c = s.charAt(i + 1))>= '0'
&& c <= '7') {
i++;
octal.append(c);
}
}
if (octal.length()==3 && octal.charAt(0)>'3') {
i--;
octal.setLength(2);
}
result.append(
(char) Integer.parseInt(octal.toString(),
8));
break;
default:
System.err.println(
"Invalid escape sequence: \\" + c);
result.append('\\').append(c);
break;
}
} else {
System.err.println("Unterminated escape sequence");
result.append('\\');
}
}
}
return result.toString();
}
}
// end of ConvertEscapedChars.java
 
S

steve.chambers

Here's one I did earlier ;-)
// begin of ConvertEscapedChars.java
public class ConvertEscapedChars {
public static void main(String[] args) {
final String orig = "This is a return:\\r\\nThis is "
+ "a tab:\\tand this is a backslash:\\\\.\r\n"
+ "Others are the backspace \\b\r\nthe formfeed \\f\r\n"
+ "single quote \\'\r\ndouble quote \\\"\r\n"
+ "one character octal \\7\r\ntwo character octal \\76\r\n"
+ "three character octal \\176\r\n"
+ "two character octal escape followed by octal digit \\767\r\n"
+ "\t(note that this is not a 3-char octal escape because "
+ "first digit is bigger than 3)\r\n"
+ "two character octal escape followed by non-octal digit "
+ "\\768\r\n"
+ "one character octal escape followed by non-octal digit "
+ "\\78\r\n"
+ "an invalid escape sequence \\w\r\n"
+ "an unterminated escape sequence \\";

System.out.println("------- Original -------");
System.out.println(orig);
System.out.println("------------------------");
System.out.println();

final String conv = convertEscapedChars(orig);
System.out.println("------- Converted ------");
System.out.println(conv);
System.out.println("------------------------");
}

/**
* Replaces escape sequences in the given String <code>s</code> by
* the actual characters they represent. The escape sequences that
* are recognized are those that are defined in section 3.10.6 of
* the JLS, 3rd ed. Invalid escape sequences are left untouched.
*
* @param s
* the String to convert
* @return a String where escape sequences have been replaced by
* their actual character
* @see
http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#101089
*/
public static String convertEscapedChars(String s) {
int n = s == null ? 0 : s.length();
if (n == 0) {
return s; // null or empty string
}
StringBuffer result = new StringBuffer(n);
for (int i = 0; i < n; i++) {
char c = s.charAt(i);
if (c != '\\') {
result.append(c);
} else {
if (++i < n) {
c = s.charAt(i);
switch (c) {
case 'b':
result.append('\b');
break;
case 't':
result.append('\t');
break;
case 'n':
result.append('\n');
break;
case 'f':
result.append('\f');
break;
case 'r':
result.append('\r');
break;
case '\"':
result.append('\"');
break;
case '\'':
result.append('\'');
break;
case '\\':
result.append('\\');
break;
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
// OctalEscape
StringBuffer octal =
new StringBuffer(3).append(c);
if (i + 1 < n && (c = s.charAt(i + 1)) >= '0'
&& c <= '7') {
i++;
octal.append(c);
if (i + 1 < n && (c = s.charAt(i + 1))>= '0'
&& c <= '7') {
i++;
octal.append(c);
}
}
if (octal.length()==3 && octal.charAt(0)>'3') {
i--;
octal.setLength(2);
}
result.append(
(char) Integer.parseInt(octal.toString(),
8));
break;
default:
System.err.println(
"Invalid escape sequence: \\" + c);
result.append('\\').append(c);
break;
}
} else {
System.err.println("Unterminated escape sequence");
result.append('\\');
}
}
}
return result.toString();
}
}
// end of ConvertEscapedChars.java

Thanks Roland, looks like I've reinvented the wheel with a slightly
inferior version!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top