Help simplify complex regexp needing positive lookahead and reluctant quantifers

Discussion in 'Perl Misc' started by david.karr@wamu.net, Mar 20, 2005.

  1. Guest

    My code is in Java, but my problem is a complicated regexp.
    Ironically, I think I'm more likely to get a better response in here
    than elsewhere. It's too bad there's no "regular expressions"
    newsgroup (that I can find).

    My sample data is the following (abstracted from real data):
    --------------
    *XXXlkjsflkw34lkjsfd
    2XXXlkjsdfojsfjoimf344
    3XXXabcdef9999999
    4XXX9f9f9f9f9f9f9f9f
    5XXXg8g8g8g8g8g8g8g
    6XXXe6e6e6e6e6e6e6e6e
    YYY=D/23333333
    -xxxxxxxxxxxx
    -yyyyyyyyyyyy
    ZZZ=gggggggggggg
    AAA=hhhhhhhhhh
    -jjjjjjjjjjj
    -kkkkkkkkkkk
    /XXX 2
    --------------

    The important elements are "XXX", "YYY", "ZZZ", and "AAA". Each of
    "YYY", "ZZZ", and "AAA" could be in any order, and some could be
    missing, or others like it could be added. What I'd like to build is a
    regexp that can group each of "YYY", "ZZZ", and "AAA" along with their
    "associated data", up to either the next "[A-Z]{3}=", or the ending
    "/XXX". If I can get the "associated data" into group values, I can
    use other regexps for the detail in those group values.

    The regexp that I've built so far comes close to solving this, but not
    quite. This is what I have so far (translated from Java string syntax
    to Perl):

    --------------
    "(?sm)\\*.{3}.*\n" .
    "2.{3}.*\n" .
    "3.{3}.*\n" .
    "4.{3}.*\n" .
    "5.{3}.*\n" .
    "6.{3}.*\n" .
    " ([A-Z]{3}=)(.*?)(?= [A-Z]{3}=|/[A-Z]{3})" .
    " ([A-Z]{3}=)(.*?)(?= [A-Z]{3}=|/[A-Z]{3})" .
    " ([A-Z]{3}=)(.*?)(?= [A-Z]{3}=|/[A-Z]{3})" .
    "/[A-Z]{3}.*"
    --------------

    You can ignore for now the fact that I'm not verifying that all the
    places that require "XXX" are all "XXX". The problem area is the
    "[A-Z]{3}=" groups. This regexp works for my sample data, but I wasn't
    able to simplify those three repeated lines into a single expression,
    which would handle any number of those. I tried the following, to
    replace those three lines:

    "( ([A-Z]{3}=)(.*?)(?= [A-Z]{3}=|/[A-Z]{3}))*"

    but that didn't seem to work, and I'm not sure why.

    The following is the output from my Java program, using the working
    regexp, where it iterated through the found groups. I provide this
    just as another view of what I'm trying to capture:

    --------------
    group[YYY=]
    group[D/23333333
    -xxxxxxxxxxxx
    -yyyyyyyyyyyy
    ]
    group[ZZZ=]
    group[gggggggggggg
    ]
    group[AAA=]
    group[hhhhhhhhhh
    -jjjjjjjjjjj
    -kkkkkkkkkkk
    ]
    --------------
     
    , Mar 20, 2005
    #1
    1. Advertising

  2. wrote:

    > My code is in Java, but my problem is a complicated regexp.
    > Ironically, I think I'm more likely to get a better response in here
    > than elsewhere. It's too bad there's no "regular expressions"
    > newsgroup (that I can find).


    No, but there is definitely a Java group.

    I'm not just being snide - implementations of regular expressions vary. An
    answer you get here may not apply to Java, and answers you get here or in a
    Java group may not apply to sed, and so forth. You'd be far better off
    asking your question in a group that's focused on the particular
    implementation that you're using.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Mar 20, 2005
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    7
    Views:
    541
  2. tobiah

    Positive lookahead assertion

    tobiah, Sep 7, 2006, in forum: Python
    Replies:
    8
    Views:
    622
    Steve Holden
    Sep 8, 2006
  3. Hicham Mouline
    Replies:
    2
    Views:
    829
    Keith Thompson
    Apr 23, 2010
  4. Tom Aadland

    Treetop positive lookahead problem

    Tom Aadland, Jul 11, 2008, in forum: Ruby
    Replies:
    4
    Views:
    162
    Tom Aadland
    Jul 14, 2008
  5. vbgunz
    Replies:
    6
    Views:
    165
    vbgunz
    Nov 28, 2007
Loading...

Share This Page