Ignoring quoted strings in regular expressions

    I couldn't find a usenet group on regular expressions, so I'm posing
    here. Sorry if this is the wrong place to post.

    Anyway, I'm wondering if there's some way to specify in a regular
    expression a way to ignore quoted strings. For example:

    1. blah whatever blah
    2. blah "whatever" blah
    3. blah " whatever " blah

    I want to match 1, but not 2 or 3. Essentially, I want to match if
    whatever is anywhere in the string and unquoted.

    At http://www.cs.sfu.ca/~cameron/REX.html I found the following:


    "This expression scans through arbitrary content searching for the
    closing ">" delimiter. Quoted strings are skipped and the scan may
    terminate with failure if an erroneous "]" or "<" delimiter is

    I don't see where in the above RE the actual quoted string skipping is
    being done.

    Any help would be appreciated.


    nick, Dec 6, 2003
    Which language/regex package are you using? You would be best off in a
    group specific to that. Unless, of course, this is a homework problem.
    Bleech. Talk about line-noise :).

    Let's break it up. Stuff from # to end-of-line is my commentary.

    ( # start group
    [^]"'><]+ # match as much stuff as possible that doesn't include
    # ] " ' > or <.
    | # or
    " [^"]* " # a double-quoted string (" followed by as much not-" as
    # you can followed by another ")
    | # or
    ' [^']* ' # a single-quoted string (as above)
    )* # keep doing all of the above as much as you can
    Ben Morrow, Dec 6, 2003
