replaceAll with new lines

  • Thread starter Benjamin H. Brinckerhoff
  • Start date
B

Benjamin H. Brinckerhoff

All,

Maybe someone can clear some confusion I am having concerning
replaceAll. Here's my code:

String str = "a\nb";
String str2 = str.replaceAll("\n","\\\\n");
String str3 = str.replaceAll("\n","\\n");
System.out.println(str);
System.out.println(str2);
System.out.println(str3);

This prints out the following when run

a
b
a\nb
anb

The first two printlns print out exactly what I would expect. But the
third one, using str3, confuses me. Here's what I thought was going
on:

str, seen as an array of chars looks like (I'm using [] to show
different elements in array)

[a][\n]

replaceAll should find all characters that are \n , and replace them
with \\n, i.e. the "\" character and the "n" character, so I would
think str3 would be

[a][\][n]

Why does this end up printing out as "anb"? I mean, "\" isn't even an
ASCII character. Yeah, I'm confusing myself, if anyone out there has a
clear explanation, I'd love to hear it. Thanks.

Ben
 
V

VisionSet

Benjamin H. Brinckerhoff said:
All,

Maybe someone can clear some confusion I am having concerning
replaceAll. Here's my code:

String str = "a\nb";
String str2 = str.replaceAll("\n","\\\\n");
String str3 = str.replaceAll("\n","\\n");
System.out.println(str);
System.out.println(str2);
System.out.println(str3);

This prints out the following when run

a
b
a\nb
anb

I think \\n is being read as:

\n is a newline character then the leading \ escapes that to n.

Exactly the same is happening in str2 but the 1st \\ gives a \

Of course I maybe completely wrong.
 
J

John C. Bollinger

Benjamin said:
Maybe someone can clear some confusion I am having concerning
replaceAll. Here's my code:

String str = "a\nb";
String str2 = str.replaceAll("\n","\\\\n");
String str3 = str.replaceAll("\n","\\n");
System.out.println(str);
System.out.println(str2);
System.out.println(str3);

This prints out the following when run

a
b
a\nb
anb

The first two printlns print out exactly what I would expect. But the
third one, using str3, confuses me.

Well, it looks just right to me. I'll see whether I can explain.
Here's what I thought was going
on:

str, seen as an array of chars looks like (I'm using [] to show
different elements in array)

[a][\n]
Okay.

replaceAll should find all characters that are \n , and replace them
with \\n, i.e. the "\" character and the "n" character, so I would
think str3 would be

[a][\][n]

Why does this end up printing out as "anb"?


From the API docs for String.replaceAll(String, String):
----
An invocation of this method of the form str.replaceAll(regex, repl)
yields exactly the same result as the expression

Pattern.compile(regex).matcher(str).replaceAll(repl)
----

From the API docs for Matcher.replaceAll(String):
----
Note that backslashes (\) and dollar signs ($) in the replacement string
may cause the results to be different than if it were being treated as a
literal replacement string. Dollar signs may be treated as references to
captured subsequences as described above, and backslashes are used to
escape literal characters in the replacement string.
----

What is happening is that the replacement string is going through _two_
levels of escape processing. The first level, performed by the Java
compiler, processes Java character escapes, such as "\n" to newline and
"\\" to a single backslash. The second level, performed by a Matcher,
is a much simpler escape processor that just interprets a backslash to
indicate that the following character is literal rather than a
metacharacter (such as the start of a reference to a captured group).
To get a literal '\' character into the resulting string it has to be
doubled in the input to the second level of escape processing, just as
you accomplish with str2 in your example.
I mean, "\" isn't even an
ASCII character.

ASCII has absolutely nothing to do with it. Java source is interpreted
by the compiler as a stream of Unicode characters, and string constants
(including class names, field names, literal constants, etc.) are stored
in class files as UTF-8 encoded Unicode strings. Java chars contain
Unicode characters. Java Strings are sequences of Unicode characters.

In any event, '\' IS an ASCII character; it's character number 76
(decimal). And even though Java source is Unicode, the only places that
characters not representable by ASCII are allowed are identifiers,
character and string literals, and comments (and they always represent
themselves). You can write completely general Java source (including
representing any Unicode character sequence desired) with only ASCII
characters (including '\').
Yeah, I'm confusing myself, if anyone out there has a
clear explanation, I'd love to hear it. Thanks.

I hope that was clear.


John Bollinger
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top