regexp that seems not to work since 5.10

Discussion in 'Perl Misc' started by Sébastien Cottalorda, Nov 18, 2009.

  1. Hi all,
    I use a regexp to split a network frame protocol like this.

    #-------------------------------------------------------------------
    #!/usr/bin/perl -w
    use strict;
    use constant ETX => chr( hex('03'));
    use constant ACK => chr( hex('06'));
    use constant NACK => chr( hex('15'));
    my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}',
    my $line = 'hello World'.ETX.'XHow are you today ?'.ETX.'XWell, not so
    bad.'.ETX.'X';
    while ($line =~ s/([^$endcar]*$endcar)//){
    my $buf = $1;
    print $buf."\n";
    }
    print "$line\n";
    exit;
    #--------------------------------------------------------------------

    With 5.8.X version, I use to have:
    hello World
    How are you today ?
    Well, not so bad.

    Now I have :
    X
    X
    X
    hello WorldHow are you today ?Well, not so bad.

    Could someone help me to solve that problem.

    Thanks in advance for any help.
    Cheers.
    Sebastien
    Sébastien Cottalorda, Nov 18, 2009
    #1
    1. Advertising

  2. Sorry,

    With 5.8.X version, I use to have:
    hello World{ETX}X
    How are you today ?{ETX}X
    Well, not so bad.{ETX}X
    Sébastien Cottalorda, Nov 18, 2009
    #2
    1. Advertising

  3. Sébastien Cottalorda

    C.DeRykus Guest

    On Nov 18, 3:05 am, Sébastien Cottalorda <>
    wrote:
    > Hi all,
    > I use a regexp to split a network frame protocol like this.
    >
    > #-------------------------------------------------------------------
    > #!/usr/bin/perl -w
    > use strict;
    > use constant ETX  => chr( hex('03'));
    > use constant ACK  => chr( hex('06'));
    > use constant NACK => chr( hex('15'));
    > my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}',

    ^
    ^
    typo - trailing , instead of ;

    > my $line = 'hello World'.ETX.'XHow are you today ?'.ETX.'XWell, not so
    > bad.'.ETX.'X';
    > while ($line =~ s/([^$endcar]*$endcar)//){

    ^
    ^
    Did you know that alternation and quantifiers
    aren't special in a character class..? The
    | and {1} in $endcar aren't doing what you
    might think at first glance. See perlrequick
    or perlretut.

    >         my $buf = $1;
    >         print $buf."\n";}
    >
    > print "$line\n";
    > exit;
    > ...


    --
    Charles DeRykus
    C.DeRykus, Nov 18, 2009
    #3
  4. On 18 nov, 15:41, "C.DeRykus" <> wrote:
    > On Nov 18, 3:05 am, Sébastien Cottalorda <>
    > wrote:> Hi all,
    > > I use a regexp to split a network frame protocol like this.

    >
    > > #-------------------------------------------------------------------
    > > #!/usr/bin/perl -w
    > > use strict;
    > > use constant ETX  => chr( hex('03'));
    > > use constant ACK  => chr( hex('06'));
    > > use constant NACK => chr( hex('15'));
    > > my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}';
    > > my $line = 'hello World'.ETX.'XHow are you today ?'.ETX.'XWell, not so
    > > bad.'.ETX.'X';
    > > while ($line =~ s/([^$endcar]*$endcar)//){

    >
    >                           ^
    >                           ^
    >               Did you know that alternation and quantifiers
    >               aren't special in a character class..? The
    >               | and {1} in $endcar aren't doing what you
    >               might think at first glance. See perlrequick
    >               or perlretut.
    >
    > >         my $buf = $1;
    > >         print $buf."\n";}

    >
    > > print "$line\n";
    > > exit;
    > > ...

    >
    > --
    > Charles DeRykus


    I've tried those modifications :
    with
    my $endcar = ACK.'|'.NACK.'|'.ETX;
    my $line = 'hello World'.ETX.'How are you today ?'.ETX.'Well, not so
    bad.'.ETX;
    while ($line =~ s/([^($endcar)]*($endcar))//){
    it works pretty good but I cannot manage to make it works with ACK,
    NACK and ETX.'.'


    I even tried this:
    my $endcar = ACK.'|'.NACK.'|'.ETX.'.';
    my $line = 'hello World'.ETX.'XHow are you today ?'.ETX.'XWell, not so
    bad.'.ETX.'X';
    while ($line =~ s/([[:^cntrl:]]*($endcar))//){
    and it works perfectly but it's a particular case : I suppose that
    split caracters are controls.

    but this regexp didn't work with :
    my $endcar = ACK.'|'.NACK.'|'.ETX.'.';
    my $line = STX.'hello World'.ETX.'X'.ACK.NACK.STX.'How are you
    today ?'.ETX.'X'.ACK.STX.'Well, not so bad.'.ETX.NACK;
    Unfortunately I need to make that last sample to work.

    If someone as a clue ?
    Thanks in advance.
    Sebastien
    Sébastien Cottalorda, Nov 18, 2009
    #4
  5. Sébastien Cottalorda

    Dr.Ruud Guest

    Sébastien Cottalorda wrote:

    > I use a regexp to split a network frame protocol like this.
    >
    > #-------------------------------------------------------------------
    > #!/usr/bin/perl -w
    > use strict;
    > use constant ETX => chr( hex('03'));


    Alternative:

    use constant ETX => "\x{03}";


    > use constant ACK => chr( hex('06'));
    > use constant NACK => chr( hex('15'));
    > my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}',


    Why does that line end in a comma?

    my ($ETX, $ACK, $NACK) = ("\x{03}", "\x{06}", "\x{15}");

    my $endcar = "(?:$ETX|$ACK|$NACK)"; # alternation


    Alternative:

    my $endcar= "[$ETX$ACK$NACK]"; # charset


    > while ($line =~ s/([^$endcar]*$endcar)//){


    You are messing up character class and alternation there.


    With your $endcar, this would work:

    while ($line =~ s/(.*?(?:$endcar))//s){



    --
    Ruud
    Dr.Ruud, Nov 18, 2009
    #5
  6. Sébastien Cottalorda

    Dr.Ruud Guest

    Ben Morrow wrote:

    > I suspect what the OP wants here is
    >
    > my $endcar = "\x3\x6\x15";
    >
    > while ($line =~ s/([^$endcar]*[$endcar].//) {


    That is more or less (count the half captures :) what I assumed,
    and I also assumed that he would find out the rest himself.

    --
    Ruud
    Dr.Ruud, Nov 18, 2009
    #6
  7. Sébastien Cottalorda

    Guest

    On Wed, 18 Nov 2009 18:30:18 +0000, Ben Morrow <> wrote:

    >
    >Quoth "Dr.Ruud" <>:
    >> Sébastien Cottalorda wrote:
    >>
    >> > I use a regexp to split a network frame protocol like this.
    >> >
    >> > #-------------------------------------------------------------------
    >> > #!/usr/bin/perl -w
    >> > use strict;
    >> > use constant ETX => chr( hex('03'));

    >>
    >> Alternative:
    >>
    >> use constant ETX => "\x{03}";
    >>
    >>
    >> > use constant ACK => chr( hex('06'));
    >> > use constant NACK => chr( hex('15'));
    >> > my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}',

    >>
    >> Why does that line end in a comma?
    >>
    >> my ($ETX, $ACK, $NACK) = ("\x{03}", "\x{06}", "\x{15}");
    >>
    >> my $endcar = "(?:$ETX|$ACK|$NACK)"; # alternation

    >
    >You've omitted the trailing '.{1}' (which is equivalent to just '.').
    >
    > my $endcar = "(?:$ETX|$ACK|$NACK).";

    ^
    Seems reasonable the op meant a single char in the alternation
    given his: my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}',
    Otherwise if a group its catenated like:
    ACK|NACK|ETX.{1} or
    (?:$ACK|$NACK|$ETX.)
    where an alternation is ETX plus any character,
    which is probably a mistake.

    >
    >> Alternative:
    >>
    >> my $endcar= "[$ETX$ACK$NACK]"; # charset

    >
    >As above.
    >
    >> > while ($line =~ s/([^$endcar]*$endcar)//){

    >>
    >> You are messing up character class and alternation there.
    >>
    >>
    >> With your $endcar, this would work:
    >>
    >> while ($line =~ s/(.*?(?:$endcar))//s){

    >
    >That depends. /.*?/ is not always equivalent to a negated end condition,
    >for instance /.*?>x/ will match all of ">>x" whereas /[^>]*>x/ will only
    >match the last two characters. I suspect what the OP wants here is


    But in this case it makes no sence to add characters after the endchar
    since you want all from beginning, up to that character, not starting the
    match in the middle of the string. Its a total sub-expression '.*?>', part
    of an alternation.

    In that case given ">>x":
    /^.*?>x//
    works, whereas
    /^[^>]*>x/
    doesen't.

    >
    > my $endcar = "\x3\x6\x15";
    >
    > while ($line =~ s/([^$endcar]*[$endcar].//) {

    while ($line =~ s/([^$endcar]*[$endcar].)//) {

    >
    >possibly with a /s modifier, since this is a binary protocol so random
    >newlines seem likely.


    Not if you take out the '.'

    -sln
    , Nov 18, 2009
    #7
  8. Sébastien Cottalorda

    C.DeRykus Guest

    On Nov 18, 8:37 am, Sébastien Cottalorda <>
    wrote:
    > On 18 nov, 15:41, "C.DeRykus" <> wrote:
    > ...
    >
    > I even tried this:
    > my $endcar = ACK.'|'.NACK.'|'.ETX.'.';
    > my $line = 'hello World'.ETX.'XHow are you today ?'.ETX.'XWell, not so
    > bad.'.ETX.'X';
    > while ($line =~ s/([[:^cntrl:]]*($endcar))//){
    > and it works perfectly but it's a particular case : I suppose that
    > split caracters are controls.
    >
    > but this regexp didn't work with :
    > my $endcar = ACK.'|'.NACK.'|'.ETX.'.';
    > my $line = STX.'hello World'.ETX.'X'.ACK.NACK.STX.'How are you
    > today ?'.ETX.'X'.ACK.STX.'Well, not so bad.'.ETX.NACK;
    > Unfortunately I need to make that last sample to work.


    Here's a closer cut I think since you were negating
    the character class:

    my $endcar = STX . '|' . ACK . '|' . NACK . '|' . ETX ;
    while ($line =~ s/([[:cntrl:]]*($endcar))//){
    ...
    }
    print $line;


    Case 1:
    my $line = 'hello World'.ETX.'XHow are you today
    ?'.ETX.'XWell, not so
    output: hello WorldXHow are you today ?XWell, not so
    bad.X

    Case 2:
    my $line = STX.'hello World'.ETX.'X'.ACK.NACK.STX.'How are you
    today ?'.ETX.'X'.ACK.STX.'Well, not so bad.'.ETX.NACK;

    output: hello WorldXHow are you
    today ?XWell, not so bad.

    --
    Charles DeRykus
    C.DeRykus, Nov 19, 2009
    #8
  9. Found a solution with the help of Olivier Makinen.

    use constant STX => chr( hex('02'));
    use constant ETX => chr( hex('03'));
    use constant ACK => chr( hex('06'));
    use constant NACK => chr( hex('15'));
    my $line = STX.'hello World'.ETX.'X'.ACK.NACK.STX.'How are you
    today ?'.ETX.'X'.ACK.STX.'Well, not so bad.'.ETX.NACK;

    my $noendcar = '[^' . ACK . ETX . NACK . ']';
    my $endstring = '(' . ACK . '|' . ETX . '.|' . NACK . ')';
    while ($line =~ s/$noendcar*$endstring//) {
    print "buf=$&\n";
    }
    print "lastbuffer = $line\n";

    I obtains:
    buf={STX}hello World{ETX}X
    buf={ACK}
    buf={NACK}
    buf={STX}How are you today ?{ETX}X
    buf={ACK}
    buf={STX}Well, not so bad.{ETX}X
    buf={NACK}
    lastbuffer = .... (empty)

    It works perfectly.
    Thanks all for your help.
    Sebastien
    Sébastien Cottalorda, Nov 19, 2009
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Frederic Gignac

    Expiration date seems not to work

    Frederic Gignac, Jul 4, 2003, in forum: ASP .Net
    Replies:
    2
    Views:
    416
    Frederic Gignac
    Jul 8, 2003
  2. Wootaek Choi
    Replies:
    1
    Views:
    295
    Marshal Antony
    Feb 10, 2004
  3. lonelyplanet999

    synchronize seems not work

    lonelyplanet999, Nov 13, 2003, in forum: Java
    Replies:
    6
    Views:
    453
    lonelyplanet999
    Nov 16, 2003
  4. Casey Hawthorne
    Replies:
    4
    Views:
    355
    Casey Hawthorne
    Oct 20, 2005
  5. Joao Silva
    Replies:
    16
    Views:
    355
    7stud --
    Aug 21, 2009
Loading...

Share This Page