Matching parentheses with Regular Expressions

Discussion in 'Java' started by James, Jul 4, 2008.

  1. James

    James Guest

    I`m trying to use regex to match/replace a word in parentheses.
    The regular expression

    private static final Pattern java_proc =
    Pattern.compile("(java)");

    does not work, because parentheses are treated as groupings.

    Using "\" to designate the parentheses as literal characters does
    not work --- not sure why:

    private static final Pattern java_proc = Pattern.compile("\(java
    \)");

    I searched for and read a related post here, but it did not
    help. I seem to be having a different problem than they. Or I just
    don`t understand the post.

    What am I doing wrong? Thanks, Alan
    James, Jul 4, 2008
    #1
    1. Advertising

  2. James

    James Guest

    OK, I finally found the words about using double slashes in front of
    parentheses. So, now, why won`t the following regular expression
    pattern compile?

    private static final Pattern java_proc = Pattern.compile("\\\\.+\
    \Process\\(java\\)\\");

    The error says:

    java.lang.ExceptionInInitializerError
    Caused by: java.util.regex.PatternSyntaxException: Unknown character
    property name {r} near index 6
    \\.+\Process\(java\)\
    ^

    This does not make sense to me.

    I`m trying to match text of the form (example):

    \\GOLLY\Process(java)\% Processor Time

    Thanks, Alan
    James, Jul 4, 2008
    #2
    1. Advertising

  3. James

    Stefan Ram Guest

    James <> writes:
    >private static final Pattern java_proc = Pattern.compile("\(java\)");


    private static final Pattern java_proc = Pattern.compile("\\(java\\)");
    Stefan Ram, Jul 4, 2008
    #3
  4. James wrote:
    > OK, I finally found the words about using double slashes in front of
    > parentheses. So, now, why won`t the following regular expression
    > pattern compile?
    >
    > private static final Pattern java_proc = Pattern.compile("\\\\.+\
    > \Process\\(java\\)\\");
    >
    > The error says:
    >
    > java.lang.ExceptionInInitializerError
    > Caused by: java.util.regex.PatternSyntaxException: Unknown character
    > property name {r} near index 6
    > \\.+\Process\(java\)\
    > ^


    This is what the regex is seeing. Don't forget that `\' is also a
    metacharacter in regexes. So to match a '\' in regex requires you to use
    '\\\\', which causes the regex to see '\\', which is what it uses to
    match as a '\'. So the regex you're probably trying to compile:
    "\\\\{2}.+\\\\Process\\(java\\)\\\\" (The {2} is so that you don't have
    to type in 8 slashes)


    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
    Joshua Cranmer, Jul 4, 2008
    #4
  5. James

    James Guest

    Thank you.

    I have one last remaining problem. The full data I`m working with,
    in CSV format, looks like this:

    "(PDH-CSV 4.0) (Eastern Daylight Time)(240)","\\GOLLY\Memory\%
    Committed Bytes In Use","\\GOLLY\Process(java)\% Processor Time"

    I want to match on

    \\GOLLY\Process(java)\

    so I can replace it.

    The regular expression

    \\\\{2}.+\\\\Process\\(java\\).

    matches, but it matches too much of it:


    \\GOLLY\Memory\% Committed Bytes In Use","\\GOLLY\Process(java)\

    How can I get it to only match the part I want?

    Thanks again, Alan
    James, Jul 4, 2008
    #5
  6. James wrote:
    > The regular expression
    >
    > \\\\{2}.+\\\\Process\\(java\\).
    >
    > matches, but it matches too much of it:


    In that case, you probably want this regex:
    \\\\{2}[^\\\\]+\\\\Process\\(java\\)
    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
    Joshua Cranmer, Jul 4, 2008
    #6
  7. "James" <> wrote in message
    news:...
    > I`m trying to use regex to match/replace a word in parentheses.
    > The regular expression
    >
    > private static final Pattern java_proc =
    > Pattern.compile("(java)");
    >
    > does not work, because parentheses are treated as groupings.
    >
    > Using "\" to designate the parentheses as literal characters does
    > not work --- not sure why:
    >
    > private static final Pattern java_proc = Pattern.compile("\(java
    > \)");
    >
    > I searched for and read a related post here, but it did not
    > help. I seem to be having a different problem than they. Or I just
    > don`t understand the post.
    >
    > What am I doing wrong? Thanks, Alan


    Double backslash your pattern: \\(java)\\

    AHS
    Arved Sandstrom, Jul 4, 2008
    #7
  8. James

    Roedy Green Guest

    On Thu, 3 Jul 2008 18:12:55 -0700 (PDT), James
    <> wrote, quoted or indirectly quoted someone
    who said :

    > private static final Pattern java_proc = Pattern.compile("\(java
    >\)");


    It gets complicated because you have both Java and regex escape
    quoting.

    See http://mindprod.com/jgloss/regex.html#QUOTING

    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
    Roedy Green, Jul 4, 2008
    #8
  9. James

    shakah Guest

    On Jul 3, 9:52 pm, Joshua Cranmer <> wrote:
    > James wrote:
    > > The regular expression

    >
    > > \\\\{2}.+\\\\Process\\(java\\).

    >
    > >
    > > matches, but it matches too much of it:

    >
    > In that case, you probably want this regex:
    > \\\\{2}[^\\\\]+\\\\Process\\(java\\)
    > --
    > Beware of bugs in the above code; I have only proved it correct, not
    > tried it. -- Donald E. Knuth


    FWIW, you could avoid a little of the backslash escape mess
    by using single-char character classes, e.g.:
    Pattern.compile("[\\]{2}[^\\]+[\\]Process[(]java[)]") ;
    // ...outside of a Java string that'd be [\]{2}[^\]+
    [\]Process[(]java[)]
    shakah, Jul 4, 2008
    #9
  10. James

    Mark Space Guest

    shakah wrote:
    > On Jul 3, 9:52 pm, Joshua Cranmer <> wrote:
    >> James wrote:
    >>> The regular expression
    >>> \\\\{2}.+\\\\Process\\(java\\).
    >> >
    >> > matches, but it matches too much of it:

    >>
    >> In that case, you probably want this regex:
    >> \\\\{2}[^\\\\]+\\\\Process\\(java\\)
    >> --
    >> Beware of bugs in the above code; I have only proved it correct, not
    >> tried it. -- Donald E. Knuth

    >
    > FWIW, you could avoid a little of the backslash escape mess
    > by using single-char character classes, e.g.:
    > Pattern.compile("[\\]{2}[^\\]+[\\]Process[(]java[)]") ;
    > // ...outside of a Java string that'd be [\]{2}[^\]+
    > [\]Process[(]java[)]


    You also might get rid of some of those backslashes by substituting
    another character, then using replace() on the string before compiling it.

    final static String PATTERN = "``{2}.+``Process`(java`)";

    String myRegex = PATTERN.replace("`", "\\" );
    System.out.println( myRegex );

    Result:

    \\{2}.+\\Process\(java\)


    It just makes things more readable. Using `, or %, or # in a string,
    then replace that character with \'s before compiling it as a regex can
    save your eyes.

    Incidentally, I wonder if Sun could be convinced to add this themselves.
    Maybe add a new operator/keyword altogether. Like # introduces new
    keywords or operators. It's followed by the keyword or operator. This
    just allows Sun to make new keywords or operators, with out breaking any
    existing code. So #s might give us new string constatns. Let's say '
    then means like a Unix shell string, where escaping is ignored.

    String regex = #s'\\{2}.+\\Process\(java\)';

    Would give that literal string, without the need to escape the
    backslashes. Easier for regex at least. Other types of flags besides '
    could be introduced too. `,$,@,%,= might do the same thing, just use a
    different character as a string terminator, in case you want a ' to be
    part of the string. """ might introduce a "here-is" operator. Etc.

    Just thinking out loud....
    Mark Space, Jul 4, 2008
    #10
  11. James

    Roedy Green Guest

    On Fri, 04 Jul 2008 11:36:12 -0700, Mark Space
    <> wrote, quoted or indirectly quoted someone
    who said :

    >You also might get rid of some of those backslashes by substituting
    >another character, then using replace() on the string before compiling it.


    Other ideas:

    1. Use Quoter to insert \ quoting, both for regex and Java strings.
    see http://mindprod.com/applet/quoter.html

    2. implement one or more of my regex student projects
    http://mindprod.com/project/regexutility.html
    http://mindprod.com/project/regexcomposer.html
    http://mindprod.com/project/regexdebugger.html
    http://mindprod.com/project/regexproofreader.html

    3. use \Q ... \E
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
    Roedy Green, Jul 4, 2008
    #11
  12. James

    Mark Space Guest

    Roedy Green wrote:

    > 3. use \Q ... \E


    OK, that's cool. It only works with regex, but it's darn handy for
    them. Thanks!
    Mark Space, Jul 4, 2008
    #12
  13. James

    James Guest

    shakah,

    The statement

    Pattern JAVA_PROC = Pattern.compile("[\\]{2}[^\\]+[\
    \]Process[(]java[)]");

    compiles but raises an exception there:

    run:
    Exception in thread "main" java.util.regex.PatternSyntaxException:
    Unclosed character class near index 30
    [\]{2}[^\]+[\]Process[(]java[)]
    ^

    All: Thank you for your suggestions.
    James, Jul 5, 2008
    #13
  14. James

    Roedy Green Guest

    On Sat, 5 Jul 2008 12:48:44 -0700 (PDT), James
    <> wrote, quoted or indirectly quoted someone
    who said :

    >[\]{2}[^\]+[\]Process[(]java[)]
    > ^


    () both need escapes. If that is a Java literal, you also need to
    escape \ both for Java and for regex.

    see http://mindprod.com/jgloss/regex.html#QUOTING
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
    Roedy Green, Jul 5, 2008
    #14
  15. James wrote:
    > Exception in thread "main" java.util.regex.PatternSyntaxException:
    > Unclosed character class near index 30
    > [\]{2}[^\]+[\]Process[(]java[)]


    You have to quote the slashes here still since the slashes are currently
    quoting the close of the character class expression.


    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
    Joshua Cranmer, Jul 5, 2008
    #15
  16. James

    Roedy Green Guest

    On Thu, 3 Jul 2008 18:12:55 -0700 (PDT), James
    <> wrote, quoted or indirectly quoted someone
    who said :

    > I`m trying to use regex to match/replace a word in parentheses.
    >The regular expression


    An aside, you can't use a regex to tell if ( ) are nested and
    balanced correctly to arbitrary depth.

    For that you need a parser.

    See http://mindprod.com/jgloss/parser.html
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
    Roedy Green, Jul 6, 2008
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dustin D.
    Replies:
    0
    Views:
    2,303
    Dustin D.
    Aug 27, 2003
  2. Jay Douglas
    Replies:
    0
    Views:
    600
    Jay Douglas
    Aug 15, 2003
  3. Replies:
    2
    Views:
    451
    Roedy Green
    Apr 4, 2006
  4. Emil Kampp
    Replies:
    11
    Views:
    163
    Robert Klemme
    Mar 26, 2011
  5. Noman Shapiro
    Replies:
    0
    Views:
    232
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page