Unexpected RegEx results

Discussion in 'Perl Misc' started by QoS, Feb 26, 2007.

  1. QoS

    QoS Guest

    Hello, having some trouble solving this regular expression puzzle.
    It is possible to solve the issue using some if statements, but im
    curious why this is occurring.

    The data involved looks similar to the following:

    ALWAYSPRESENT:0008:0:OPTIONAL
    OPTIONAL
    OPTIONAL

    Where the data will always start with a name.
    This is followed by a colon some numbers a colon some numbers and a colon,
    which will all be discarded.
    Then there may or may not be some additional data after that.

    Next there might be a newline followed by some optional data.
    Finally there might be a newline followed by some optional data.

    Ok here is my issue, the RegEx im using to do this will place data found
    in the 3rd memory variable in the variable $4 when there is no match
    to fill $4. So $4 will contain data but $3 will not, when i expected rather
    that $3 would contain data and $4 would not.

    Example troublesome data:

    ALWAYSPRESENT:0008:0:pRESENT
    PRESENT
    NOTPRESENT

    This is the offending RegEx.

    $msg =~ /(.*?):.*:(.*)\n*(^.*)\n*(^.*)/m;

    Thanks for any assistance.
     
    QoS, Feb 26, 2007
    #1
    1. Advertisements

  2. QoS

    QoS Guest

     
    QoS, Feb 26, 2007
    #2
    1. Advertisements

  3. $ perl -le'
    my @x = ( <<ONE, <<TWO );
    ALWAYSPRESENT:0008:0:OPTIONAL
    OPTIONAL
    OPTIONAL
    ONE
    ALWAYSPRESENT:0008:0:pRESENT
    PRESENT
    TWO

    for ( @x ) {
    print "1=$1 2=$2 3=$3 4=$4" if /(.*?):.*:(.*)\n*(^.*)\n*(^.*)/m;
    }
    '
    1=ALWAYSPRESENT 2=OPTIONAL 3=OPTIONAL 4=OPTIONAL
    1=ALWAYSPRESENT 2=PRESENT 3= 4=PRESENT



    You are using the /m option and the ^ anchor which tells perl that there
    *must* be at least three lines even if there are only two lines.

    $ perl -le'
    my @x = ( <<ONE, <<TWO );
    ALWAYSPRESENT:0008:0:OPTIONAL
    OPTIONAL
    OPTIONAL
    ONE
    ALWAYSPRESENT:0008:0:pRESENT
    PRESENT
    TWO

    for ( @x ) {
    print "1=$1 2=$2 3=$3 4=$4" if /(.*?):.*:(.*)\n*(.*)\n*(.*)/;
    }
    '
    1=ALWAYSPRESENT 2=OPTIONAL 3=OPTIONAL 4=OPTIONAL
    1=ALWAYSPRESENT 2=PRESENT 3=PRESENT 4=




    John
     
    John W. Krahn, Feb 26, 2007
    #3
  4. QoS

    Mirco Wahab Guest

    Thats better. Real data ;-)

    My first shot:


    ....
    my $data='
    Some Text:0000:0:More Text
    Text text
    Text text text
    ';

    my $rg = qr/
    ^([^:]+) : \d+ : \d+ : ([^\n]+)?\n
    (?: ^([^:\n]+?) \n)?
    (?: ^([^:\n]+?) (?:\n|$) )?/mx;

    if( $data =~ /$rg/ ) {
    print join "\n", map defined $_?$_:'undef', ($1, $2, $3, $4);
    }


    Regards

    M.
     
    Mirco Wahab, Feb 26, 2007
    #4
  5. QoS

    QoS Guest

    [Snip]

    Thank you everybody for helping solve this little mystery.

    Your solutions and workarounds are quite clever!
    I was unaware of that 'm' option side-effect.
     
    QoS, Feb 26, 2007
    #5
  6. QoS

    Mirco Wahab Guest

    I was under the impression your data
    would not only consist of /one/ record
    but rather a good sequence of them, so
    the regex would need to climb down
    (find) the records and spit out the
    correct matches,

    # Example: four record thing with "offending" structure ==>

    my $morestuff='
    ALWAYSPRESENT:0008:0:pRESENT
    PRESENT
    MAYBEPRESENT
    Some Text 1:0000:0:More Text 1
    Some Text 2:0000:0:More Text 2
    Text22 text22 text22
    Some Text 3:0000:0:More Text 3
    Text3 text3
    Text33 text33 text33
    ';
    # and so on ...

    # Now, the regex should identify them
    # and step along ==>

    my $rg = qr/ \s*
    ^([^:]+) : \d+ : \d+ : ([^\n]+)?\n
    (?: ^([^:\n]+?) \n)?
    (?: ^([^:\n]+?) (?:\n|$) )?/mx;

    # This was the "shortest" thing I could find so
    # far (within your constraints), the record-
    # walking would be within a while ==>

    while( $morestuff =~ /$rg/g ) {

    printf "%s %s\n\t%s\n\t%s\n",
    $1||'undef', $2||'undef',
    $3||'undef', $4||'undef';

    }

    # ... which would give the correct matches.


    Maybe I misunderstood your problem somehow,
    but I found the task quite nice and interesting
    (maybe somebody would write down a really simple
    regular expression for that - (not me, sleeping
    time now in this country ;-).

    Regards

    Mirco
     
    Mirco Wahab, Feb 26, 2007
    #6
  7. QoS

    Broke Guest

    Very good job Mr. Wahab !
    I didn't know yet the
    secret of the qr in your
    code and just learned it.
    It's extremely useful.

    Many thanks !
     
    Broke, Mar 7, 2007
    #7
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.