Regex to find a certain character

W

Wendy S

I'm trying to split a large text file whenever there is a certain character
(ASCII 12)

The suggested code is not compiling:
[javac]
G:\irm\sharedsource\ColleagueWeb\coldev\src\java\edu\asu\vpia\webapp
\PDFServlet.java:190: illegal escape character
[javac] String[] reportPage = report.split( "\x0c" );
[javac] 1 error

The javadoc for String.split() sends you to java.util.regex.Pattern which
says:
\xhh The character with hexadecimal value 0xhh

So it seems like /x0c should work, but obviously not. I changed it to:
String[] reportPage = report.split( "\u000c" );
which compiles, at least.

Can someone comment on what's wrong with the first syntax, and whether my
replacement is, in fact, equivalent?
 
S

Sudsy

Wendy said:
I'm trying to split a large text file whenever there is a certain character
(ASCII 12)

I always use octal in these situations. Since ASCII 12(10) is 14(8)
then try split( "\014" )

ps. Tested and proven using newline character
 
N

Neil Campbell

Wendy said:
I'm trying to split a large text file whenever there is a certain
character (ASCII 12)

The suggested code is not compiling:
[javac]
G:\irm\sharedsource\ColleagueWeb\coldev\src\java\edu\asu\vpia\webapp
\PDFServlet.java:190: illegal escape character
[javac] String[] reportPage = report.split( "\x0c" );
[javac] 1 error

The javadoc for String.split() sends you to java.util.regex.Pattern which
says:
\xhh The character with hexadecimal value 0xhh

So it seems like /x0c should work, but obviously not. I changed it to:
String[] reportPage = report.split( "\u000c" );
which compiles, at least.

I'm not sure what the problem is, but I noticed that the docs also say:

\f The form-feed character ('\u000C')

So can't you just use \f instead?

In fact, do you really need a regexp if you're just looking for a single
character? Would it be possible to simply search char by char comparing
with 12?
 
D

Dave Glasser

I'm trying to split a large text file whenever there is a certain character
(ASCII 12)

The suggested code is not compiling:
[javac]
G:\irm\sharedsource\ColleagueWeb\coldev\src\java\edu\asu\vpia\webapp
\PDFServlet.java:190: illegal escape character
[javac] String[] reportPage = report.split( "\x0c" );
[javac] 1 error

The javadoc for String.split() sends you to java.util.regex.Pattern which
says:
\xhh The character with hexadecimal value 0xhh

So it seems like /x0c should work, but obviously not. I changed it to:
String[] reportPage = report.split( "\u000c" );
which compiles, at least.

Can someone comment on what's wrong with the first syntax, and whether my
replacement is, in fact, equivalent?


You have to escape the backslash character itself:

String[] reportPage = report.split( "\\x0c" );

Keep in mind whenever you're using regular expressions in Java, that
when a backslash appears inside a string literal that is to be used as
a regular expression, you almost always have to escape it. You're
lucky in this case that the compiler caught your mistake. ("\x0c"
would cause a compiler error regardless of where you used it. It has
nothing to do with regular expressions per se.) A worse case would be
when the code compiles properly but functions improperly.
 
H

hiwa

Wendy S said:
The javadoc for String.split() sends you to java.util.regex.Pattern which
says:
\xhh The character with hexadecimal value 0xhh

Single backslash is consumed by JVM as an escape indicator,
so it isn't conveyed to the regex engine. You have to use:

String[] reportPage = report.split( "\\x0c" );

Then the regex \x0c would be conveyed to the engine.

Your "\u000c" acutally specifies a literal single character
to the Java regex engine. It doesn't much differ from simple
"a", "b", "c" etc.
 
W

Wendy S

Neil Campbell said:
I'm not sure what the problem is, but I noticed that the docs also say:
\f The form-feed character ('\u000C')
So can't you just use \f instead?

Most probably, thanks!
In fact, do you really need a regexp if you're just looking for a single
character? Would it be possible to simply search char by char comparing
with 12?

I suppose, but what would I gain when I can do it in one line with split()?
 
N

Neil Campbell

Wendy said:
I suppose, but what would I gain when I can do it in one line with
split()?

Not a lot, admittedly. I just suggested it as an alternative if you
couldn't get the other approaches working.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top