Ignoring quoted strings in regular expressions

Discussion in 'Perl Misc' started by nick, Dec 6, 2003.

  1. nick

    nick Guest

    I couldn't find a usenet group on regular expressions, so I'm posing
    here. Sorry if this is the wrong place to post.

    Anyway, I'm wondering if there's some way to specify in a regular
    expression a way to ignore quoted strings. For example:

    1. blah whatever blah
    2. blah "whatever" blah
    3. blah " whatever " blah

    I want to match 1, but not 2 or 3. Essentially, I want to match if
    whatever is anywhere in the string and unquoted.

    At http://www.cs.sfu.ca/~cameron/REX.html I found the following:


    "This expression scans through arbitrary content searching for the
    closing ">" delimiter. Quoted strings are skipped and the scan may
    terminate with failure if an erroneous "]" or "<" delimiter is

    I don't see where in the above RE the actual quoted string skipping is
    being done.

    Any help would be appreciated.


    nick, Dec 6, 2003
    1. Advertisements

  2. nick

    Ben Morrow Guest

    Which language/regex package are you using? You would be best off in a
    group specific to that. Unless, of course, this is a homework problem.
    Bleech. Talk about line-noise :).

    Let's break it up. Stuff from # to end-of-line is my commentary.

    ( # start group
    [^]"'><]+ # match as much stuff as possible that doesn't include
    # ] " ' > or <.
    | # or
    " [^"]* " # a double-quoted string (" followed by as much not-" as
    # you can followed by another ")
    | # or
    ' [^']* ' # a single-quoted string (as above)
    )* # keep doing all of the above as much as you can
    Ben Morrow, Dec 6, 2003
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.