Ignoring quoted strings in regular expressions

Discussion in 'Perl Misc' started by nick, Dec 6, 2003.

  1. nick

    nick Guest

    I couldn't find a usenet group on regular expressions, so I'm posing
    here. Sorry if this is the wrong place to post.

    Anyway, I'm wondering if there's some way to specify in a regular
    expression a way to ignore quoted strings. For example:

    1. blah whatever blah
    2. blah "whatever" blah
    3. blah " whatever " blah

    I want to match 1, but not 2 or 3. Essentially, I want to match if
    whatever is anywhere in the string and unquoted.

    At http://www.cs.sfu.ca/~cameron/REX.html I found the following:

    ([^]"'><]+|"[^"]*"|'[^']*')*>

    "This expression scans through arbitrary content searching for the
    closing ">" delimiter. Quoted strings are skipped and the scan may
    terminate with failure if an erroneous "]" or "<" delimiter is
    encountered."

    I don't see where in the above RE the actual quoted string skipping is
    being done.

    Any help would be appreciated.

    Thanks,

    Nick
    nick, Dec 6, 2003
    #1
    1. Advertising

  2. nick

    Ben Morrow Guest

    (nick) wrote:
    > I couldn't find a usenet group on regular expressions, so I'm posing
    > here. Sorry if this is the wrong place to post.


    Which language/regex package are you using? You would be best off in a
    group specific to that. Unless, of course, this is a homework problem.

    > Anyway, I'm wondering if there's some way to specify in a regular
    > expression a way to ignore quoted strings. For example:
    >

    <snip>
    > At http://www.cs.sfu.ca/~cameron/REX.html I found the following:
    >
    > ([^]"'><]+|"[^"]*"|'[^']*')*>


    Bleech. Talk about line-noise :).

    Let's break it up. Stuff from # to end-of-line is my commentary.

    ( # start group
    [^]"'><]+ # match as much stuff as possible that doesn't include
    # ] " ' > or <.
    | # or
    " [^"]* " # a double-quoted string (" followed by as much not-" as
    # you can followed by another ")
    | # or
    ' [^']* ' # a single-quoted string (as above)
    )* # keep doing all of the above as much as you can
    > # end by matching a >.


    Ben

    --
    The cosmos, at best, is like a rubbish heap scattered at random.
    - Heraclitus
    Ben Morrow, Dec 6, 2003
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jay Douglas
    Replies:
    0
    Views:
    593
    Jay Douglas
    Aug 15, 2003
  2. Diez B. Roggisch
    Replies:
    1
    Views:
    295
    Satya Arjunan
    Dec 11, 2003
  3. Heike C. Zimmerer
    Replies:
    1
    Views:
    395
    Satya Arjunan
    Dec 12, 2003
  4. Replies:
    5
    Views:
    122
    Xicheng Jia
    Jun 1, 2007
  5. Noman Shapiro
    Replies:
    0
    Views:
    222
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page