Matching parentheses with Regular Expressions

J

James

I`m trying to use regex to match/replace a word in parentheses.
The regular expression

private static final Pattern java_proc =
Pattern.compile("(java)");

does not work, because parentheses are treated as groupings.

Using "\" to designate the parentheses as literal characters does
not work --- not sure why:

private static final Pattern java_proc = Pattern.compile("\(java
\)");

I searched for and read a related post here, but it did not
help. I seem to be having a different problem than they. Or I just
don`t understand the post.

What am I doing wrong? Thanks, Alan
 
J

James

OK, I finally found the words about using double slashes in front of
parentheses. So, now, why won`t the following regular expression
pattern compile?

private static final Pattern java_proc = Pattern.compile("\\\\.+\
\Process\\(java\\)\\");

The error says:

java.lang.ExceptionInInitializerError
Caused by: java.util.regex.PatternSyntaxException: Unknown character
property name {r} near index 6
\\.+\Process\(java\)\
^

This does not make sense to me.

I`m trying to match text of the form (example):

\\GOLLY\Process(java)\% Processor Time

Thanks, Alan
 
S

Stefan Ram

James said:
private static final Pattern java_proc = Pattern.compile("\(java\)");

private static final Pattern java_proc = Pattern.compile("\\(java\\)");
 
J

Joshua Cranmer

James said:
OK, I finally found the words about using double slashes in front of
parentheses. So, now, why won`t the following regular expression
pattern compile?

private static final Pattern java_proc = Pattern.compile("\\\\.+\
\Process\\(java\\)\\");

The error says:

java.lang.ExceptionInInitializerError
Caused by: java.util.regex.PatternSyntaxException: Unknown character
property name {r} near index 6
\\.+\Process\(java\)\
^

This is what the regex is seeing. Don't forget that `\' is also a
metacharacter in regexes. So to match a '\' in regex requires you to use
'\\\\', which causes the regex to see '\\', which is what it uses to
match as a '\'. So the regex you're probably trying to compile:
"\\\\{2}.+\\\\Process\\(java\\)\\\\" (The {2} is so that you don't have
to type in 8 slashes)
 
J

James

Thank you.

I have one last remaining problem. The full data I`m working with,
in CSV format, looks like this:

"(PDH-CSV 4.0) (Eastern Daylight Time)(240)","\\GOLLY\Memory\%
Committed Bytes In Use","\\GOLLY\Process(java)\% Processor Time"

I want to match on

\\GOLLY\Process(java)\

so I can replace it.

The regular expression

\\\\{2}.+\\\\Process\\(java\\).

matches, but it matches too much of it:


\\GOLLY\Memory\% Committed Bytes In Use","\\GOLLY\Process(java)\

How can I get it to only match the part I want?

Thanks again, Alan
 
J

Joshua Cranmer

James said:
The regular expression

\\\\{2}.+\\\\Process\\(java\\).
>
> matches, but it matches too much of it:

In that case, you probably want this regex:
\\\\{2}[^\\\\]+\\\\Process\\(java\\)
 
A

Arved Sandstrom

James said:
I`m trying to use regex to match/replace a word in parentheses.
The regular expression

private static final Pattern java_proc =
Pattern.compile("(java)");

does not work, because parentheses are treated as groupings.

Using "\" to designate the parentheses as literal characters does
not work --- not sure why:

private static final Pattern java_proc = Pattern.compile("\(java
\)");

I searched for and read a related post here, but it did not
help. I seem to be having a different problem than they. Or I just
don`t understand the post.

What am I doing wrong? Thanks, Alan

Double backslash your pattern: \\(java)\\

AHS
 
S

shakah

James said:
The regular expression


matches, but it matches too much of it:

In that case, you probably want this regex:
\\\\{2}[^\\\\]+\\\\Process\\(java\\)

FWIW, you could avoid a little of the backslash escape mess
by using single-char character classes, e.g.:
Pattern.compile("[\\]{2}[^\\]+[\\]Process[(]java[)]") ;
// ...outside of a Java string that'd be [\]{2}[^\]+
[\]Process[(]java[)]
 
M

Mark Space

shakah said:
James said:
The regular expression
\\\\{2}.+\\\\Process\\(java\\).

matches, but it matches too much of it:

In that case, you probably want this regex:
\\\\{2}[^\\\\]+\\\\Process\\(java\\)

FWIW, you could avoid a little of the backslash escape mess
by using single-char character classes, e.g.:
Pattern.compile("[\\]{2}[^\\]+[\\]Process[(]java[)]") ;
// ...outside of a Java string that'd be [\]{2}[^\]+
[\]Process[(]java[)]

You also might get rid of some of those backslashes by substituting
another character, then using replace() on the string before compiling it.

final static String PATTERN = "``{2}.+``Process`(java`)";

String myRegex = PATTERN.replace("`", "\\" );
System.out.println( myRegex );

Result:

\\{2}.+\\Process\(java\)


It just makes things more readable. Using `, or %, or # in a string,
then replace that character with \'s before compiling it as a regex can
save your eyes.

Incidentally, I wonder if Sun could be convinced to add this themselves.
Maybe add a new operator/keyword altogether. Like # introduces new
keywords or operators. It's followed by the keyword or operator. This
just allows Sun to make new keywords or operators, with out breaking any
existing code. So #s might give us new string constatns. Let's say '
then means like a Unix shell string, where escaping is ignored.

String regex = #s'\\{2}.+\\Process\(java\)';

Would give that literal string, without the need to escape the
backslashes. Easier for regex at least. Other types of flags besides '
could be introduced too. `,$,@,%,= might do the same thing, just use a
different character as a string terminator, in case you want a ' to be
part of the string. """ might introduce a "here-is" operator. Etc.

Just thinking out loud....
 
R

Roedy Green

You also might get rid of some of those backslashes by substituting
another character, then using replace() on the string before compiling it.

Other ideas:

1. Use Quoter to insert \ quoting, both for regex and Java strings.
see http://mindprod.com/applet/quoter.html

2. implement one or more of my regex student projects
http://mindprod.com/project/regexutility.html
http://mindprod.com/project/regexcomposer.html
http://mindprod.com/project/regexdebugger.html
http://mindprod.com/project/regexproofreader.html

3. use \Q ... \E
 
J

James

shakah,

The statement

Pattern JAVA_PROC = Pattern.compile("[\\]{2}[^\\]+[\
\]Process[(]java[)]");

compiles but raises an exception there:

run:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Unclosed character class near index 30
[\]{2}[^\]+[\]Process[(]java[)]
^

All: Thank you for your suggestions.
 
J

Joshua Cranmer

James said:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Unclosed character class near index 30
[\]{2}[^\]+[\]Process[(]java[)]

You have to quote the slashes here still since the slashes are currently
quoting the close of the character class expression.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top