Unexpected RegEx results

Discussion in 'Perl Misc' started by QoS@domain.invalid.com, Feb 26, 2007.

  1. Guest

    Hello, having some trouble solving this regular expression puzzle.
    It is possible to solve the issue using some if statements, but im
    curious why this is occurring.

    The data involved looks similar to the following:

    ALWAYSPRESENT:0008:0:OPTIONAL
    OPTIONAL
    OPTIONAL

    Where the data will always start with a name.
    This is followed by a colon some numbers a colon some numbers and a colon,
    which will all be discarded.
    Then there may or may not be some additional data after that.

    Next there might be a newline followed by some optional data.
    Finally there might be a newline followed by some optional data.

    Ok here is my issue, the RegEx im using to do this will place data found
    in the 3rd memory variable in the variable $4 when there is no match
    to fill $4. So $4 will contain data but $3 will not, when i expected rather
    that $3 would contain data and $4 would not.

    Example troublesome data:

    ALWAYSPRESENT:0008:0:pRESENT
    PRESENT
    NOTPRESENT

    This is the offending RegEx.

    $msg =~ /(.*?):.*:(.*)\n*(^.*)\n*(^.*)/m;

    Thanks for any assistance.
     
    , Feb 26, 2007
    #1
    1. Advertising

  2. Guest

    Jim Gibson <> wrote in message-id:
    <260220071249048703%>
    >
    >In article <wyGEh.14260$sv6.3728@trndny08>, <>
    >wrote:
    >
    >> Hello, having some trouble solving this regular expression puzzle.
    >> It is possible to solve the issue using some if statements, but im
    >> curious why this is occurring.
    >>
    >> The data involved looks similar to the following:
    >>
    >> ALWAYSPRESENT:0008:0:OPTIONAL
    >> OPTIONAL
    >> OPTIONAL
    >>
    >> Where the data will always start with a name.
    >> This is followed by a colon some numbers a colon some numbers and a colon,
    >> which will all be discarded.
    >> Then there may or may not be some additional data after that.
    >>
    >> Next there might be a newline followed by some optional data.
    >> Finally there might be a newline followed by some optional data.
    >>
    >> Ok here is my issue, the RegEx im using to do this will place data found
    >> in the 3rd memory variable in the variable $4 when there is no match
    >> to fill $4. So $4 will contain data but $3 will not, when i expected rather
    >> that $3 would contain data and $4 would not.
    >>
    >> Example troublesome data:
    >>
    >> ALWAYSPRESENT:0008:0:pRESENT
    >> PRESENT
    >> NOTPRESENT
    >>
    >> This is the offending RegEx.
    >>
    >> $msg =~ /(.*?):.*:(.*)\n*(^.*)\n*(^.*)/m;

    >
    >I can't follow your logic entirely, but I suspect that you simply have
    >too many unqualified '*' characters in your regex (I count 6) and it is
    >causing confusion. For example, '\n*' need not match any characters at
    >all. Perhaps you want '\n?' or '\n+' there instead.
    >
    >In any case, please post a complete, runnable program and somebody,
    >perhaps even me, will be able to help you.
    >
    >--
    >Jim Gibson
    >
    > Posted Via Usenet.com Premium Usenet Newsgroup Services
    >----------------------------------------------------------
    > ** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
    >----------------------------------------------------------
    > http://www.usenet.com


    Ok, here is an example that demonstrates the quirks.
    Notice in the second printout that what was in $3 in the first
    printout is now in $4 and $3 contains ''.

    And thanks very much for giving this a go!

    #!usr/bin/Perl
    use strict;
    use warnings;

    my $data;
    $data = 'Some Text:0000:0:More Text'."\n".
    'Text text'."\n".
    'Text text text.'."\n";
    &reformat($data);
    $data = 'Some Text:0000:0:More Text'."\n".
    'Text text'."\n";
    &reformat($data);

    exit;

    sub reformat
    {
    my $msg = $_[0] || die "Invalid option in reformat\n";
    my $out;
    $msg =~ /(.*?):.*:(.*)\n*(^.*)\n*(^.*)/m;
    $out = "$1,".
    "1,".
    "00000000000,".
    "0000000,".
    "000,".
    "$2,".
    "$3,".
    "$4\n";
    print $out;
    print '=======================================================',"\n";
    return(1);
    }
     
    , Feb 26, 2007
    #2
    1. Advertising

  3. wrote:
    > Hello, having some trouble solving this regular expression puzzle.
    > It is possible to solve the issue using some if statements, but im
    > curious why this is occurring.
    >
    > The data involved looks similar to the following:
    >
    > ALWAYSPRESENT:0008:0:OPTIONAL
    > OPTIONAL
    > OPTIONAL
    >
    > Where the data will always start with a name.
    > This is followed by a colon some numbers a colon some numbers and a colon,
    > which will all be discarded.
    > Then there may or may not be some additional data after that.
    >
    > Next there might be a newline followed by some optional data.
    > Finally there might be a newline followed by some optional data.
    >
    > Ok here is my issue, the RegEx im using to do this will place data found
    > in the 3rd memory variable in the variable $4 when there is no match
    > to fill $4. So $4 will contain data but $3 will not, when i expected rather
    > that $3 would contain data and $4 would not.
    >
    > Example troublesome data:
    >
    > ALWAYSPRESENT:0008:0:pRESENT
    > PRESENT
    > NOTPRESENT
    >
    > This is the offending RegEx.
    >
    > $msg =~ /(.*?):.*:(.*)\n*(^.*)\n*(^.*)/m;


    $ perl -le'
    my @x = ( <<ONE, <<TWO );
    ALWAYSPRESENT:0008:0:OPTIONAL
    OPTIONAL
    OPTIONAL
    ONE
    ALWAYSPRESENT:0008:0:pRESENT
    PRESENT
    TWO

    for ( @x ) {
    print "1=$1 2=$2 3=$3 4=$4" if /(.*?):.*:(.*)\n*(^.*)\n*(^.*)/m;
    }
    '
    1=ALWAYSPRESENT 2=OPTIONAL 3=OPTIONAL 4=OPTIONAL
    1=ALWAYSPRESENT 2=PRESENT 3= 4=PRESENT



    You are using the /m option and the ^ anchor which tells perl that there
    *must* be at least three lines even if there are only two lines.

    $ perl -le'
    my @x = ( <<ONE, <<TWO );
    ALWAYSPRESENT:0008:0:OPTIONAL
    OPTIONAL
    OPTIONAL
    ONE
    ALWAYSPRESENT:0008:0:pRESENT
    PRESENT
    TWO

    for ( @x ) {
    print "1=$1 2=$2 3=$3 4=$4" if /(.*?):.*:(.*)\n*(.*)\n*(.*)/;
    }
    '
    1=ALWAYSPRESENT 2=OPTIONAL 3=OPTIONAL 4=OPTIONAL
    1=ALWAYSPRESENT 2=PRESENT 3=PRESENT 4=




    John
    --
    Perl isn't a toolbox, but a small machine shop where you can special-order
    certain sorts of tools at low cost and in short order. -- Larry Wall
     
    John W. Krahn, Feb 26, 2007
    #3
  4. Mirco Wahab Guest

    wrote:
    > my $data;
    > $data = 'Some Text:0000:0:More Text'."\n".
    > 'Text text'."\n".
    > 'Text text text.'."\n";


    Thats better. Real data ;-)

    My first shot:


    ....
    my $data='
    Some Text:0000:0:More Text
    Text text
    Text text text
    ';

    my $rg = qr/
    ^([^:]+) : \d+ : \d+ : ([^\n]+)?\n
    (?: ^([^:\n]+?) \n)?
    (?: ^([^:\n]+?) (?:\n|$) )?/mx;

    if( $data =~ /$rg/ ) {
    print join "\n", map defined $_?$_:'undef', ($1, $2, $3, $4);
    }


    Regards

    M.
     
    Mirco Wahab, Feb 26, 2007
    #4
  5. Guest

    wrote in message-id:
    <wyGEh.14260$sv6.3728@trndny08>
    >

    [Snip]

    Thank you everybody for helping solve this little mystery.

    Your solutions and workarounds are quite clever!
    I was unaware of that 'm' option side-effect.
     
    , Feb 26, 2007
    #5
  6. Mirco Wahab Guest

    wrote:
    > wrote in message-id:
    > <wyGEh.14260$sv6.3728@trndny08>
    > [Snip]
    >
    > Thank you everybody for helping solve this little mystery.
    >
    > Your solutions and workarounds are quite clever!
    > I was unaware of that 'm' option side-effect.


    I was under the impression your data
    would not only consist of /one/ record
    but rather a good sequence of them, so
    the regex would need to climb down
    (find) the records and spit out the
    correct matches,

    # Example: four record thing with "offending" structure ==>

    my $morestuff='
    ALWAYSPRESENT:0008:0:pRESENT
    PRESENT
    MAYBEPRESENT
    Some Text 1:0000:0:More Text 1
    Some Text 2:0000:0:More Text 2
    Text22 text22 text22
    Some Text 3:0000:0:More Text 3
    Text3 text3
    Text33 text33 text33
    ';
    # and so on ...

    # Now, the regex should identify them
    # and step along ==>

    my $rg = qr/ \s*
    ^([^:]+) : \d+ : \d+ : ([^\n]+)?\n
    (?: ^([^:\n]+?) \n)?
    (?: ^([^:\n]+?) (?:\n|$) )?/mx;

    # This was the "shortest" thing I could find so
    # far (within your constraints), the record-
    # walking would be within a while ==>

    while( $morestuff =~ /$rg/g ) {

    printf "%s %s\n\t%s\n\t%s\n",
    $1||'undef', $2||'undef',
    $3||'undef', $4||'undef';

    }

    # ... which would give the correct matches.


    Maybe I misunderstood your problem somehow,
    but I found the task quite nice and interesting
    (maybe somebody would write down a really simple
    regular expression for that - (not me, sleeping
    time now in this country ;-).

    Regards

    Mirco
     
    Mirco Wahab, Feb 26, 2007
    #6
  7. Broke Guest

    Mirco Wahab <> wrote:

    Very good job Mr. Wahab !
    I didn't know yet the
    secret of the qr in your
    code and just learned it.
    It's extremely useful.

    Many thanks !
    --
    B.

    > My first shot:
    >
    >
    > ...
    > my $data='
    > Some Text:0000:0:More Text
    > Text text
    > Text text text
    > ';
    >
    > my $rg = qr/
    > ^([^:]+) : \d+ : \d+ : ([^\n]+)?\n
    > (?: ^([^:\n]+?) \n)?
    > (?: ^([^:\n]+?) (?:\n|$) )?/mx;
    >
    > if( $data =~ /$rg/ ) {
    > print join "\n", map defined $_?$_:'undef', ($1, $2, $3, $4);
    > }
    >
    >
    > Regards
    >
    > M.
     
    Broke, Mar 7, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Scott Lander

    Re: unexpected results

    Scott Lander, Jul 7, 2003, in forum: Perl
    Replies:
    0
    Views:
    1,736
    Scott Lander
    Jul 7, 2003
  2. Dave
    Replies:
    1
    Views:
    335
    Leor Zolman
    Apr 8, 2004
  3. =?Utf-8?B?QXJ0?=

    Page inherting from .master - unexpected results

    =?Utf-8?B?QXJ0?=, May 26, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    358
    =?Utf-8?B?QXJ0?=
    May 26, 2006
  4. Dave
    Replies:
    5
    Views:
    355
    Pete Becker
    Feb 8, 2006
  5. Replies:
    3
    Views:
    769
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page