Help with regular expression?

Discussion in 'Java' started by Linda, Dec 21, 2004.

  1. Linda

    Linda Guest


    I'm trying to match all strings in my code that aren't in println()

    What I have tried most recently:


    Linda, Dec 21, 2004
    1. Advertisements

  2. Tilman Bohn

    Tilman Bohn Guest

    In message <Xns95C5A3731FA0Aa@>,
    Linda wrote on Tue, 21 Dec 2004 00:03:22 GMT:

    > Hi,
    > I'm trying to match all strings in my code that aren't in println()
    > statements.

    I take it you mean all String literals, not all strings.

    > What I have tried most recently:
    > [^\Qsystem.out.println\E]\"([^\"])+?\"

    The first bracketed expression doesn't do what you think it does.
    Character classes don't work for character sequences like that, and the
    \Q...\E escaping doesn't change that. (Your bracket really means `any
    character but s, y, t, e, m, o, u, p, r, i, n, l, or a dot'.) Also, if it
    did, the match would include not only your literal, but anything leading
    up to it. Plus, the class System is spelled with a capital s. You really
    want to be using a negative look-behind assertion, which in your case
    would look as follows:


    (A positive look-behind assertion would be (?<=foo), just for
    comparison's sake.)

    Note that this will break if someone has aliased System.out and then
    calls println() on the alias. Also be careful to allow arbitrary amounts
    of white-space. This will get pretty involved once you want to correctly
    exclude println() calls spanning several lines, for which you probably
    have only two alternatives: actually parse a good deal of Java syntax, or
    read a whole file at a time and match it as one multi-line expression.

    Both of these will get fairly complicated, but if you can live with
    the occasional false positive, the simple look-behind and some additional
    white-space should get you a good deal closer to the solution.

    The second part of your proposed solution won't catch double quotes
    within the literal. But don't just exclude matches to \", because the
    sequence \\" could again terminate the literal.

    This problem is very similar to the `Regexp and Pattern.class' thread
    currently going on here. Deciding how to correctly match variously escaped
    characters, but not escaped escape sequences... ;-) Of course this type of
    problem is one of the original reasons for regular expressions, because
    such classes of sequences are `typical' languages produced by regular
    (type 3) grammars. (And of course look-behind assertions can technically
    never be a part of _regular_ expressions, but that's another story...)

    Cheers, Tilman

    `Boy, life takes a long time to live...' -- Steven Wright
    Tilman Bohn, Dec 21, 2004
    1. Advertisements

  3. Linda

    Linda Guest

    Thanks for the info, Tilman.
    I'll watch the Regexp thread for more.

    Linda, Dec 21, 2004
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Anand

    Regular Expression help...

    Anand, Jul 9, 2003, in forum: Perl
    Eric J. Roode
    Jul 9, 2003
  2. Eric B.
    Jim Gibson
    Dec 17, 2004
  3. VSK
  4. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Alan Moore
    Dec 2, 2005
  5. GIMME
    Dec 29, 2008

Share This Page