Checking how many items have been captured in a pattern match

Discussion in 'Perl Misc' started by niall.macpherson@ntlworld.com, Feb 8, 2006.

  1. Guest

    I know this should be a fairly simple question but I have been
    searching for a while and can't find an obvious answer.

    When testing a pattern , I tended to use the method shown in METHOD 1
    in the code below., i.e use the temporary variables $1, $2, $3 etc.

    METHOD 2 seems to be better from my point of view as it avoids the
    code being littered with lots of $1, $2, $3 ... variables . However
    the multiple defined() calls make it look a bit unwieldy.

    Am I missing an obviously more elegant way of checking that all three
    values have been captured ? I do not want to put the results in an
    array as I need the variable names to be meaningful to other people
    (although obviusly a hash may be possible).

    I am only interested in a full match , i.e all 3 (or however many)
    values captured .

    use strict;
    use warnings;

    my $teststr = '123 456 abcd';
    my ($var1, $var2, $var3);

    ### METHOD 1
    if($teststr =~ m/(\d+)\s+(\d+)\s+(\w*)/)
    {
    ($var1, $var2, $var3) = ($1, $2, $3);
    print STDERR "\nMETHOD 1" , ' var1 = ' , $var1 , ' var2 = ' , $var2 ,
    ' var3 = ' , $var3, "\n";
    }
    else
    {
    print STDERR "\nMETHOD 1 No Match\n"
    }

    ### METHOD 2
    ($var1, $var2, $var3) = $teststr =~ m/(\d+)\s+(\d+)\s+(\w*)/;
    if(defined($var1) && defined($var2) && defined($var3))
    {
    print STDERR "\nMETHOD 2" , ' var1 = ' , $var1 , ' var2 = ' , $var2 ,
    ' var3 = ' , $var3, "\n";
    }
    else
    {
    print STDERR "\nMETHOD 2 No Match\n"
    }
     
    , Feb 8, 2006
    #1
    1. Advertising

  2. wrote:
    > I know this should be a fairly simple question but I have been
    > searching for a while and can't find an obvious answer.
    >
    > When testing a pattern , I tended to use the method shown in METHOD 1
    > in the code below., i.e use the temporary variables $1, $2, $3 etc.
    >
    > METHOD 2 seems to be better from my point of view as it avoids the
    > code being littered with lots of $1, $2, $3 ... variables . However
    > the multiple defined() calls make it look a bit unwieldy.
    >
    > Am I missing an obviously more elegant way of checking that all three
    > values have been captured ? I do not want to put the results in an
    > array as I need the variable names to be meaningful to other people
    > (although obviusly a hash may be possible).
    >
    > I am only interested in a full match , i.e all 3 (or however many)
    > values captured .
    >
    > use strict;
    > use warnings;
    >
    > my $teststr = '123 456 abcd';
    > my ($var1, $var2, $var3);
    >
    > ### METHOD 1
    > if($teststr =~ m/(\d+)\s+(\d+)\s+(\w*)/)
    > {
    > ($var1, $var2, $var3) = ($1, $2, $3);
    > print STDERR "\nMETHOD 1" , ' var1 = ' , $var1 , ' var2 = ' , $var2 ,
    > ' var3 = ' , $var3, "\n";
    > }
    > else
    > {
    > print STDERR "\nMETHOD 1 No Match\n"
    > }
    >
    > ### METHOD 2
    > ($var1, $var2, $var3) = $teststr =~ m/(\d+)\s+(\d+)\s+(\w*)/;
    > if(defined($var1) && defined($var2) && defined($var3))
    > {
    > print STDERR "\nMETHOD 2" , ' var1 = ' , $var1 , ' var2 = ' , $var2 ,
    > ' var3 = ' , $var3, "\n";
    > }
    > else
    > {
    > print STDERR "\nMETHOD 2 No Match\n"
    > }


    You could do it like this:

    if ( 3 == ( ( $var1, $var2, $var3 ) = $teststr =~ /(\d+)\s+(\d+)\s+(\w*)/ ) )
    {
    print STDERR "\nMETHOD 3 var1 = $var1 var2 = $var2 var3 = $var3\n";
    }



    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Feb 8, 2006
    #2
    1. Advertising

  3. Anno Siegel Guest

    <> wrote in comp.lang.perl.misc:
    > I know this should be a fairly simple question but I have been
    > searching for a while and can't find an obvious answer.
    >
    > When testing a pattern , I tended to use the method shown in METHOD 1
    > in the code below., i.e use the temporary variables $1, $2, $3 etc.
    >
    > METHOD 2 seems to be better from my point of view as it avoids the
    > code being littered with lots of $1, $2, $3 ... variables .


    Right. Unless you need the behavior of matches in scalar context
    (with /g, for instance), catching captures in list context is much
    preferable.

    > However
    > the multiple defined() calls make it look a bit unwieldy.
    >
    > Am I missing an obviously more elegant way of checking that all three
    > values have been captured ? I do not want to put the results in an
    > array as I need the variable names to be meaningful to other people
    > (although obviusly a hash may be possible).
    >
    > I am only interested in a full match , i.e all 3 (or however many)
    > values captured .
    >
    > use strict;
    > use warnings;
    >
    > my $teststr = '123 456 abcd';
    > my ($var1, $var2, $var3);
    >
    > ### METHOD 1
    > if($teststr =~ m/(\d+)\s+(\d+)\s+(\w*)/)
    > {
    > ($var1, $var2, $var3) = ($1, $2, $3);
    > print STDERR "\nMETHOD 1" , ' var1 = ' , $var1 , ' var2 = ' , $var2 ,
    > ' var3 = ' , $var3, "\n";
    > }
    > else
    > {
    > print STDERR "\nMETHOD 1 No Match\n"
    > }
    >
    > ### METHOD 2
    > ($var1, $var2, $var3) = $teststr =~ m/(\d+)\s+(\d+)\s+(\w*)/;
    > if(defined($var1) && defined($var2) && defined($var3))
    > {
    > print STDERR "\nMETHOD 2" , ' var1 = ' , $var1 , ' var2 = ' , $var2 ,
    > ' var3 = ' , $var3, "\n";
    > }
    > else
    > {
    > print STDERR "\nMETHOD 2 No Match\n"
    > }


    You have the same problem with both methods. If the match in list context
    returns an undefined value, the corresponding $n variable would also
    be undefined. If an undefined match is possible, you'll have to test
    either way. A little more compact (untested):

    unless ( grep !defined, $var1, $var2, $var3 ) {
    # they're all defined

    On the other hand, most captures can't be undefined after a successful
    match. The only case that comes to mind is when a pair of parens is part
    of an alternative that wasn't used in the match. There may be others.

    In your case, when the regex matches, all captures will be defined, though
    some (well, one, $var3) may be empty.

    So, as a rule, look at your regex and identify the captures that
    can possibly come back undefined. Then test only those. In the
    concrete case, you have nothing to test.

    Anno
    --
    $_='Just another Perl hacker'; print +( join( '', map { eval $_; $@ }
    'use warnings FATAL => "all"; printf "%-1s", "\n"', 'use strict; a',
    'use warnings FATAL => "all"; "@x"', '1->m') =~
    m|${ s/(.)/($1).*/g; \ $_ }|is),',';
     
    Anno Siegel, Feb 8, 2006
    #3
  4. Anno Siegel Guest

    John W. Krahn <> wrote in comp.lang.perl.misc:
    > wrote:
    > > I know this should be a fairly simple question but I have been
    > > searching for a while and can't find an obvious answer.
    > >
    > > When testing a pattern , I tended to use the method shown in METHOD 1
    > > in the code below., i.e use the temporary variables $1, $2, $3 etc.
    > >
    > > METHOD 2 seems to be better from my point of view as it avoids the
    > > code being littered with lots of $1, $2, $3 ... variables . However
    > > the multiple defined() calls make it look a bit unwieldy.
    > >
    > > Am I missing an obviously more elegant way of checking that all three
    > > values have been captured ? I do not want to put the results in an
    > > array as I need the variable names to be meaningful to other people
    > > (although obviusly a hash may be possible).
    > >
    > > I am only interested in a full match , i.e all 3 (or however many)
    > > values captured .
    > >
    > > use strict;
    > > use warnings;
    > >
    > > my $teststr = '123 456 abcd';
    > > my ($var1, $var2, $var3);


    [...]

    > > ### METHOD 2
    > > ($var1, $var2, $var3) = $teststr =~ m/(\d+)\s+(\d+)\s+(\w*)/;
    > > if(defined($var1) && defined($var2) && defined($var3))
    > > {
    > > print STDERR "\nMETHOD 2" , ' var1 = ' , $var1 , ' var2 = ' , $var2 ,
    > > ' var3 = ' , $var3, "\n";
    > > }
    > > else
    > > {
    > > print STDERR "\nMETHOD 2 No Match\n"
    > > }

    >
    > You could do it like this:
    >
    > if ( 3 == ( ( $var1, $var2, $var3 ) = $teststr =~ /(\d+)\s+(\d+)\s+(\w*)/ ) )
    > {
    > print STDERR "\nMETHOD 3 var1 = $var1 var2 = $var2 var3 = $var3\n";
    > }


    No, John, that's wrong. The match with three captures will always return
    three values, defined or undefined. The test will not indicate any
    undefined values. Try "123 456" against /(\d+)\s+(\d+)(?:\s+(\w*))?/.

    unless ( grep !defined, ( $var1, ...

    should work.

    Anno
    --
    $_='Just another Perl hacker'; print +( join( '', map { eval $_; $@ }
    'use warnings FATAL => "all"; printf "%-1s", "\n"', 'use strict; a',
    'use warnings FATAL => "all"; "@x"', '1->m') =~
    m|${ s/(.)/($1).*/g; \ $_ }|is),',';
     
    Anno Siegel, Feb 8, 2006
    #4
  5. -berlin.de (Anno Siegel) wrote in
    news:dsckd4$lsu$-Berlin.DE:

    > John W. Krahn <> wrote in comp.lang.perl.misc:
    >> wrote:
    >> > I know this should be a fairly simple question but I have been
    >> > searching for a while and can't find an obvious answer.
    >> >
    >> > When testing a pattern , I tended to use the method shown in METHOD
    >> > 1 in the code below., i.e use the temporary variables $1, $2, $3
    >> > etc.
    >> >
    >> > METHOD 2 seems to be better from my point of view as it avoids the
    >> > code being littered with lots of $1, $2, $3 ... variables .
    >> > However the multiple defined() calls make it look a bit unwieldy.
    >> >
    >> > Am I missing an obviously more elegant way of checking that all
    >> > three values have been captured ? I do not want to put the results
    >> > in an array as I need the variable names to be meaningful to other
    >> > people (although obviusly a hash may be possible).
    >> >
    >> > I am only interested in a full match , i.e all 3 (or however many)
    >> > values captured .
    >> >
    >> > use strict;
    >> > use warnings;
    >> >
    >> > my $teststr = '123 456 abcd';
    >> > my ($var1, $var2, $var3);

    >
    > [...]
    >
    >> > ### METHOD 2
    >> > ($var1, $var2, $var3) = $teststr =~ m/(\d+)\s+(\d+)\s+(\w*)/;
    >> > if(defined($var1) && defined($var2) && defined($var3))
    >> > {
    >> > print STDERR "\nMETHOD 2" , ' var1 = ' , $var1 , ' var2 = ' ,
    >> > $var2 ,
    >> > ' var3 = ' , $var3, "\n";
    >> > }
    >> > else
    >> > {
    >> > print STDERR "\nMETHOD 2 No Match\n"
    >> > }

    >>
    >> You could do it like this:
    >>
    >> if ( 3 == ( ( $var1, $var2, $var3 ) = $teststr =~
    >> /(\d+)\s+(\d+)\s+(\w*)/ ) ) {
    >> print STDERR "\nMETHOD 3 var1 = $var1 var2 = $var2 var3 =
    >> $var3\n";
    >> }

    >
    > No, John, that's wrong. The match with three captures will always
    > return three values, defined or undefined. The test will not indicate
    > any undefined values. Try "123 456" against
    > /(\d+)\s+(\d+)(?:\s+(\w*))?/.
    >
    > unless ( grep !defined, ( $var1, ...
    >
    > should work.


    Mere mortals such as myself might be more comfortable without the
    double-negation:

    if ( 3 == grep { defined } ($var1, ...

    ;-) (desperately looking for something useful to say).


    Also, won't (\w*) set $3 to the empty string rather than an undefined
    value? So, may be something like this:

    #!/usr/bin/perl

    use strict;
    use warnings;

    while (my $s = <DATA>) {
    my @matched = ( $s =~ m{ \A (\d{3}) \s+ (\d{3}) \s* (\w*) }x );
    if (3 == grep { defined and $_ ne q{} } @matched) {
    my ($var1, $var2, $var3) = @matched;
    ...
    }
    }
    __END__
    111 222
    333 444 this
    111 222

    But, then again, if the match succeeded, only $var3 may be empty. So,
    why test $var1 and $var2 (in the original code)?

    #!/usr/bin/perl

    use strict;
    use warnings;

    while (my $s = <DATA>) {
    if ( $s =~ m{ \A (\d{3}) \s+ (\d{3}) \s* (\w*) }x
    and $3 ne q{} ) {
    my ($var1, $var2, $var3) = ($1, $2, $3);
    # do something
    }
    }

    __END__
    111 222
    333 444 this
    111 222





    Sinan

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Feb 8, 2006
    #5
  6. Anno Siegel Guest

    A. Sinan Unur <> wrote in comp.lang.perl.misc:
    > -berlin.de (Anno Siegel) wrote in
    > news:dsckd4$lsu$-Berlin.DE:
    >
    > > John W. Krahn <> wrote in comp.lang.perl.misc:
    > >> wrote:


    [snip]

    > >> You could do it like this:
    > >>
    > >> if ( 3 == ( ( $var1, $var2, $var3 ) = $teststr =~
    > >> /(\d+)\s+(\d+)\s+(\w*)/ ) ) {
    > >> print STDERR "\nMETHOD 3 var1 = $var1 var2 = $var2 var3 =
    > >> $var3\n";
    > >> }

    > >
    > > No, John, that's wrong. The match with three captures will always
    > > return three values, defined or undefined. The test will not indicate
    > > any undefined values. Try "123 456" against
    > > /(\d+)\s+(\d+)(?:\s+(\w*))?/.
    > >
    > > unless ( grep !defined, ( $var1, ...
    > >
    > > should work.

    >
    > Mere mortals such as myself might be more comfortable without the
    > double-negation:
    >
    > if ( 3 == grep { defined } ($var1, ...
    >
    > ;-) (desperately looking for something useful to say).
    >
    >
    > Also, won't (\w*) set $3 to the empty string rather than an undefined
    > value? So, may be something like this:


    Yes, the original example (/(\d+)\s+(\d+)\s+(\w*)/) would never
    return an undefined capture if it matched at all. Captures that
    can be undefined are not all that common, they happen when a pair
    of parentheses is in an optional part of the regex.

    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > while (my $s = <DATA>) {
    > my @matched = ( $s =~ m{ \A (\d{3}) \s+ (\d{3}) \s* (\w*) }x );
    > if (3 == grep { defined and $_ ne q{} } @matched) {
    > my ($var1, $var2, $var3) = @matched;
    > ...
    > }
    > }
    > __END__
    > 111 222
    > 333 444 this
    > 111 222


    The OPs question was about defined-ness, not about empty matches,
    and I think that was deliberate. Empty matches are much more
    common.

    > But, then again, if the match succeeded, only $var3 may be empty. So,
    > why test $var1 and $var2 (in the original code)?
    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > while (my $s = <DATA>) {
    > if ( $s =~ m{ \A (\d{3}) \s+ (\d{3}) \s* (\w*) }x
    > and $3 ne q{} ) {
    > my ($var1, $var2, $var3) = ($1, $2, $3);
    > # do something
    > }
    > }
    >
    > __END__
    > 111 222
    > 333 444 this
    > 111 222


    Right. It usually pays to look at the individual captures and only
    test those that *can* return the Wrong Thing, whatever that is in the
    particular case.

    Anno
    --
    $_='Just another Perl hacker'; print +( join( '', map { eval $_; $@ }
    'use warnings FATAL => "all"; printf "%-1s", "\n"', 'use strict; a',
    'use warnings FATAL => "all"; "@x"', '1->m') =~
    m|${ s/(.)/($1).*/g; \ $_ }|is),',';
     
    Anno Siegel, Feb 8, 2006
    #6
  7. -berlin.de (Anno Siegel) wrote in news:dsco0f$nra
    $-Berlin.DE:

    > A. Sinan Unur <> wrote in comp.lang.perl.misc:
    >> -berlin.de (Anno Siegel) wrote in
    >> news:dsckd4$lsu$-Berlin.DE:
    >>
    >> > John W. Krahn <> wrote in comp.lang.perl.misc:
    >> >> wrote:

    >
    > [snip]
    >
    >> >> You could do it like this:
    >> >>
    >> >> if ( 3 == ( ( $var1, $var2, $var3 ) = $teststr =~
    >> >> /(\d+)\s+(\d+)\s+(\w*)/ ) ) {
    >> >> print STDERR "\nMETHOD 3 var1 = $var1 var2 = $var2 var3 =
    >> >> $var3\n";
    >> >> }
    >> >
    >> > No, John, that's wrong. The match with three captures will always
    >> > return three values, defined or undefined. The test will not

    indicate
    >> > any undefined values. Try "123 456" against
    >> > /(\d+)\s+(\d+)(?:\s+(\w*))?/.
    >> >
    >> > unless ( grep !defined, ( $var1, ...
    >> >
    >> > should work.

    >>
    >> Mere mortals such as myself might be more comfortable without the
    >> double-negation:
    >>
    >> if ( 3 == grep { defined } ($var1, ...
    >>
    >> ;-) (desperately looking for something useful to say).
    >>
    >>
    >> Also, won't (\w*) set $3 to the empty string rather than an undefined
    >> value? So, may be something like this:

    >
    > Yes, the original example (/(\d+)\s+(\d+)\s+(\w*)/) would never
    > return an undefined capture if it matched at all. Captures that
    > can be undefined are not all that common, they happen when a pair
    > of parentheses is in an optional part of the regex.


    Of course ... It is just too early here I guess.

    Thanks.

    Sinan
    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Feb 8, 2006
    #7
  8. Guest

    Anno Siegel wrote:

    >
    > In your case, when the regex matches, all captures will be defined, though
    > some (well, one, $var3) may be empty.
    >
    > So, as a rule, look at your regex and identify the captures that
    > can possibly come back undefined. Then test only those. In the
    > concrete case, you have nothing to test.
    >
    > Anno


    I think what I was missing was the fundamental point that if the line
    I am processing matches the regex then all the captures will be
    defined. Obviously the example I gave was very simple.If one of the
    captured values happens to be an empty string then that is something I
    just have to check. Therefore METHOD 2 seems fine to me since the
    undefs are not neccessary as you pointed out

    In the real world I am parsing a log file which contains a lot of
    varied SQL statements. I want to identify those which are inserting /
    updating into a particular table and capture and analyse the values.

    So , a more realistic example of what I am trying to do is as follows ,
    which now works using method2

    -----------------------------------------------------------------------------------------------------------------------
    use strict;
    use warnings;

    while(<DATA>)
    {
    my $str = $_; ## Since in real life value will be in variable not $_

    ## METHOD 2
    print STDERR "\n", 'METHOD 2 ', $str;
    my ($fids2, $vals2) =
    $str =~ /INSERT\s*INTO\s*bond\s*
    \((.*?)\) # Non greedy match for first set parens
    .*? # Any other stuff up to the next open paren non greedy
    \((.*)\)/x # Greedy match for second set parens
    ;
    if(defined($fids2) && defined($vals2))
    {
    ## We got a match
    print STDERR "\nValues are\n", $fids2, "\n", $vals2;
    }
    else
    {
    print STDERR "\n", 'Values are not defined';
    }
    }
    __END__
    INSERT INTO bond (a,b,c,d,e,f,g) VALUES (1,2,3,4,5,6,7);
    INSERT INTO bond (a,b,c,d,e,f,g) VALUES rubbish;
    UPDATE issue SET (a,b,c) = (3,4,5);
    --------------------------------------------------------------------------------------------------------------------------

    One final question here , if my regexp had a large number of captures
    would there be any overhead using this method if the match failed late
    on ? Since less than 1% of the lines I am searching for actually match
    the pattern I would like to keep overhead to a minimum.

    I can see from the above example the second test

    INSERT INTO bond (a,b,c,d,e,f,g) VALUES rubbish;

    causes both $fids2 and $vals2 to be undefined so I assume there is
    minimal overhead.
     
    , Feb 8, 2006
    #8
  9. DJ Stunks Guest

    wrote:
    > <snip>
    > Am I missing an obviously more elegant way of checking that all three
    > values have been captured ? I do not want to put the results in an
    > array as I need the variable names to be meaningful to other people
    > (although obviusly a hash may be possible).
    >
    > I am only interested in a full match , i.e all 3 (or however many)
    > values captured .
    >
    > <methods snipped>


    um, none of your capturing parenthesis had ?'s, so am I missing
    something? or:

    C:\tmp>cat tmp.pl
    #!/usr/bin/perl

    use strict;
    use warnings;

    while (<DATA>) {
    print "Line $.";
    if ( my ($var1,$var2) = m{ (\w+) : (\w+) }x ) {
    print " matched.\n";
    } else {
    print " didn't match.\n";
    }
    }

    __END__
    this:match
    this nomatch

    C:\tmp>tmp.pl
    Line 1 matched.
    Line 2 didn't match.

    ?

    -jp
     
    DJ Stunks, Feb 8, 2006
    #9
  10. Xicheng Guest

    wrote:
    > Anno Siegel wrote:
    >
    > >
    > > In your case, when the regex matches, all captures will be defined, though
    > > some (well, one, $var3) may be empty.
    > >
    > > So, as a rule, look at your regex and identify the captures that
    > > can possibly come back undefined. Then test only those. In the
    > > concrete case, you have nothing to test.
    > >
    > > Anno

    >
    > I think what I was missing was the fundamental point that if the line
    > I am processing matches the regex then all the captures will be
    > defined. Obviously the example I gave was very simple.If one of the
    > captured values happens to be an empty string then that is something I
    > just have to check. Therefore METHOD 2 seems fine to me since the
    > undefs are not neccessary as you pointed out
    >
    > In the real world I am parsing a log file which contains a lot of
    > varied SQL statements. I want to identify those which are inserting /
    > updating into a particular table and capture and analyse the values.
    >
    > So , a more realistic example of what I am trying to do is as follows ,
    > which now works using method2
    >
    > -----------------------------------------------------------------------------------------------------------------------
    > use strict;
    > use warnings;
    >
    > while(<DATA>)
    > {
    > my $str = $_; ## Since in real life value will be in variable not $_
    >
    > ## METHOD 2
    > print STDERR "\n", 'METHOD 2 ', $str;
    > my ($fids2, $vals2) =
    > $str =~ /INSERT\s*INTO\s*bond\s*
    > \((.*?)\) # Non greedy match for first set parens
    > .*? # Any other stuff up to the next open paren non greedy
    > \((.*)\)/x # Greedy match for second set parens
    > ;
    > if(defined($fids2) && defined($vals2))
    > {
    > ## We got a match
    > print STDERR "\nValues are\n", $fids2, "\n", $vals2;
    > }
    > else
    > {
    > print STDERR "\n", 'Values are not defined';
    > }
    > }
    > __END__
    > INSERT INTO bond (a,b,c,d,e,f,g) VALUES (1,2,3,4,5,6,7);
    > INSERT INTO bond (a,b,c,d,e,f,g) VALUES rubbish;
    > UPDATE issue SET (a,b,c) = (3,4,5);
    > --------------------------------------------------------------------------------------------------------------------------
    >
    > One final question here , if my regexp had a large number of captures
    > would there be any overhead using this method if the match failed late
    > on ? Since less than 1% of the lines I am searching for actually match
    > the pattern I would like to keep overhead to a minimum.


    I would use a temporary array like:
    ---------------------
    while(<DATA>)
    {
    my @tmp = /INSERT\s*INTO\s*bond\s*
    \((.*?)\) # Non greedy match for first set parens
    .*? # Any other stuff up to the next open
    paren non greedy
    \((.*)\)/x # Greedy match for second set parens
    ;
    if (@tmp == 2) {
    my ($fids2, $vals2) = @tmp;
    #do something on $fids2 and $vals2;
    print STDERR "\nValues are\n", $fids2, "\n", $vals2;
    } else {
    print STDERR "\n",'Values are not defined';
    }
    }
    __END__
    INSERT INTO bond (a,b,c,d,e,f,g) VALUES (1,2,3,4,5,6,7);
    INSERT INTO bond (a,b,c,d,e,f,g) VALUES rubbish;
    UPDATE issue SET (a,b,c) = (3,4,5);

    Best,
    Xicheng

    > I can see from the above example the second test
    >
    > INSERT INTO bond (a,b,c,d,e,f,g) VALUES rubbish;
    >
    > causes both $fids2 and $vals2 to be undefined so I assume there is
    > minimal overhead.
     
    Xicheng, Feb 8, 2006
    #10
  11. Anno Siegel Guest

    <> wrote in comp.lang.perl.misc:
    > Anno Siegel wrote:
    >
    > >
    > > In your case, when the regex matches, all captures will be defined, though
    > > some (well, one, $var3) may be empty.
    > >
    > > So, as a rule, look at your regex and identify the captures that
    > > can possibly come back undefined. Then test only those. In the
    > > concrete case, you have nothing to test.
    > >
    > > Anno

    >
    > I think what I was missing was the fundamental point that if the line
    > I am processing matches the regex then all the captures will be
    > defined. Obviously the example I gave was very simple.If one of the
    > captured values happens to be an empty string then that is something I
    > just have to check. Therefore METHOD 2 seems fine to me since the
    > undefs are not neccessary as you pointed out
    >
    > In the real world I am parsing a log file which contains a lot of
    > varied SQL statements. I want to identify those which are inserting /
    > updating into a particular table and capture and analyse the values.
    >
    > So , a more realistic example of what I am trying to do is as follows ,
    > which now works using method2
    >
    > -----------------------------------------------------------------------------------------------------------------------
    > use strict;
    > use warnings;
    >
    > while(<DATA>)
    > {
    > my $str = $_; ## Since in real life value will be in variable not $_
    >
    > ## METHOD 2
    > print STDERR "\n", 'METHOD 2 ', $str;
    > my ($fids2, $vals2) =
    > $str =~ /INSERT\s*INTO\s*bond\s*
    > \((.*?)\) # Non greedy match for first set parens
    > .*? # Any other stuff up to the next open paren non greedy
    > \((.*)\)/x # Greedy match for second set parens
    > ;
    > if(defined($fids2) && defined($vals2))
    > {
    > ## We got a match
    > print STDERR "\nValues are\n", $fids2, "\n", $vals2;
    > }
    > else
    > {
    > print STDERR "\n", 'Values are not defined';
    > }
    > }
    > __END__
    > INSERT INTO bond (a,b,c,d,e,f,g) VALUES (1,2,3,4,5,6,7);
    > INSERT INTO bond (a,b,c,d,e,f,g) VALUES rubbish;
    > UPDATE issue SET (a,b,c) = (3,4,5);


    Like your former example, the regex doesn't contain any optional
    parts, so if it matches at all both captures will be defined.
    You could write

    if ( my ($fids2, $vals2) = $str =~ /.../ ) {
    # can assume that $fids2 and $vals2 are defined here
    }

    You can, of course, test definedness to see *if* the regex has
    matched (testing any one of $fids2 or $vals2 would do), but that's
    a little roundabout.

    > --------------------------------------------------------------------------------------------------------------------------
    >
    > One final question here , if my regexp had a large number of captures
    > would there be any overhead using this method if the match failed late
    > on ? Since less than 1% of the lines I am searching for actually match
    > the pattern I would like to keep overhead to a minimum.


    What is "this method"? Extracting captures in list context? I
    don't think it has any impact in case of a failed match, there
    are no captures to extract.

    > I can see from the above example the second test
    >
    > INSERT INTO bond (a,b,c,d,e,f,g) VALUES rubbish;
    >
    > causes both $fids2 and $vals2 to be undefined so I assume there is
    > minimal overhead.


    It causes the regex to not match. $fids2 and $vals2 are never assigned
    to in this case. They are *not* assigned undefined values extracted
    from the match operation. That can only happen with a regex that has
    optional parts. [1]

    Anno

    [1] Well, you could also have a capture inside a negative lookahead or
    lookbehind. That will never return anything *but* undef, so it's
    unlikely to happen in real code.
    --
    $_='Just another Perl hacker'; print +( join( '', map { eval $_; $@ }
    'use warnings FATAL => "all"; printf "%-1s", "\n"', 'use strict; a',
    'use warnings FATAL => "all"; "@x"', '1->m') =~
    m|${ s/(.)/($1).*/g; \ $_ }|is),',';
     
    Anno Siegel, Feb 9, 2006
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. okaminer
    Replies:
    3
    Views:
    310
    okaminer
    Aug 1, 2005
  2. coosa
    Replies:
    7
    Views:
    2,764
    Jeff Dillon
    May 2, 2006
  3. Jia Lu

    Many-to-many pattern possiable?

    Jia Lu, May 19, 2007, in forum: Python
    Replies:
    4
    Views:
    293
    Bruno Desthuilliers
    May 20, 2007
  4. damezumari
    Replies:
    18
    Views:
    458
    Richard Cornford
    Dec 20, 2006
  5. Chris Forone
    Replies:
    2
    Views:
    365
    Chris Forone
    Nov 12, 2012
Loading...

Share This Page