Confusion about the smart matching operator

J

jl_post

Hi,

I've been reading up on the smart matching operator ('~~') in
"perldoc perlsyn", and I have to say I'm a little confused on a
certain aspect of it.

Say I have an array like:

my @a = ('cat', 'dog', 55);

If I want to discover if any of its elements contains only digits,
I can use the '~~' operator against a regular expression, like this:

if ( @a ~~ m/^\d+$/ ) # prints "true"
{
print "true";
}
else
{
print "false"
}

In this respect, the smart matching operator behaves like an "any()"
function, returning a true value if any element matches.

But if I wanted to discover if any of its elements looks like a
number with the Scalar::Util::looks_like_number() function in a code
reference, like this:

use Scalar::Util qw(looks_like_number);
if ( @a ~~ sub { looks_like_number($_[0]) } ) # prints "false"
{
print "true";
}
else
{
print "false"
}

the smart matching operator will return a false value. In this case,
the smart matching operator behaves like an "all()" function,
returning a true value only if every element returns true (or there
are no elements that return false).

So "Array ~~ Regex" returns true in any elements evaluate to true,
but "Array ~~ CodeRef" returns true only if all elements evaluate to
true.

I find this a bit counter-intuitive. Is there any logic behind
this behavior? (I'm asking because I find it difficult to remember
which one behaves like any() and which one behaves like all(), since
the distinction seems arbitrary to me.)

I've discovered that if I want "Array ~~ Regex" to behave like an
all() function, I can basically rewrite it by wrapping it in a
CodeRef, like this:

if ( @a ~~ sub { $_[0] =~ m/^\d+$/ } ) # evaluates to false

However, if I want "Array ~~ CodeRef" to behave like an any()
function, I have to resort to using De Morgan's laws and write it like
this:

if (not( @a ~~ sub { not looks_like_number($_[0]) } )) # evaluates
to true

This works, but I find that this makes the code confusing to read (and
to maintain).

Is there a better way to get "Array ~~ CodeRef" to behave like an
any() function? (Without abandoning the '~~' operator, that is.) I'm
aware that I can do this:

if ( grep { looks_like_number($_) } @a ) # evaluates to true

but I know from experience that many seasoned Perl coders are against
this approach. I also know of this approach documented in "perldoc
List::Util :

sub any { looks_like_number($_) && return 1 for @_; 0 }
if ( any(@a) ) # evaluates to true

but I'd rather not write a new subroutine every time I need the any()
behavior.


So basically I'm asking two things:

1.) Why does "Array ~~ Regex" act like any() whereas "Array ~~
CodeRef" act like all()? (In other words, why the discrepancy?)

and:

2.) Is there a simple/elegant way to get "Array ~~ CodeRef" to behave
like an any() function (like "Array ~~ Regex" does)?


Thanks in advance for any advice.

-- Jean-Luc
 
C

C.DeRykus

Hi,

   I've been reading up on the smart matching operator ('~~') in
"perldoc perlsyn", and I have to say I'm a little confused on a
certain aspect of it.

   Say I have an array like:

   my @a = ('cat', 'dog', 55);

   If I want to discover if any of its elements contains only digits,
I can use the '~~' operator against a regular expression, like this:

   if ( @a ~~ m/^\d+$/ )  # prints "true"
   {
      print "true";
   }
   else
   {
      print "false"
   }

In this respect, the smart matching operator behaves like an "any()"
function, returning a true value if any element matches.

   But if I wanted to discover if any of its elements looks like a
number with the Scalar::Util::looks_like_number() function in a code
reference, like this:

   use Scalar::Util qw(looks_like_number);
   if ( @a ~~ sub { looks_like_number($_[0]) } )  # prints "false"
   {
      print "true";
   }
   else
   {
      print "false"
   }

the smart matching operator will return a false value.  In this case,
the smart matching operator behaves like an "all()" function,
returning a true value only if every element returns true (or there
are no elements that return false).

   So "Array ~~ Regex" returns true in any elements evaluate to true,
but "Array ~~ CodeRef" returns true only if all elements evaluate to
true.

   I find this a bit counter-intuitive.  Is there any logic behind
this behavior?  (I'm asking because I find it difficult to remember
which one behaves like any() and which one behaves like all(), since
the distinction seems arbitrary to me.)

   I've discovered that if I want "Array ~~ Regex" to behave like an
all() function, I can basically rewrite it by wrapping it in a
CodeRef, like this:

   if ( @a ~~ sub { $_[0] =~ m/^\d+$/ } )  # evaluates to false

   However, if I want "Array ~~ CodeRef" to behave like an any()
function, I have to resort to using De Morgan's laws and write it like
this:

   if (not( @a ~~ sub { not looks_like_number($_[0]) } ))  # evaluates
to true

This works, but I find that this makes the code confusing to read (and
to maintain).

   Is there a better way to get "Array ~~ CodeRef" to behave like an
any() function?  (Without abandoning the '~~' operator, that is.)  I'm
aware that I can do this:

   if ( grep { looks_like_number($_) } @a )  # evaluates to true

but I know from experience that many seasoned Perl coders are against
this approach.  I also know of this approach documented in "perldoc
List::Util :

   sub any { looks_like_number($_) && return 1 for @_; 0 }
   if ( any(@a) )  # evaluates to true

but I'd rather not write a new subroutine every time I need the any()
behavior.

Wouldn't the short-circuiting List::Util::first or
List::MoreUtils::any fit the bill then...?


if ( List::Util::first { looks_like_number($_) } @a ) {...}

if ( List::MoreUtils::any { looks_like_number($_) } @a {...}

   So basically I'm asking two things:

1.)  Why does "Array ~~ Regex" act like any() whereas "Array ~~
CodeRef" act like all()?  (In other words, why the discrepancy?)

   and:

2.)  Is there a simple/elegant way to get "Array ~~ CodeRef" to behave
like an any() function (like "Array ~~ Regex" does)?

Maybe something such as this:

use feature 'state';

if ( @a ~~ sub{ state $r; $r //=
L::U::first {looks_like_number($_)} @a
} )
....
 
J

jl_post

Wouldn't the short-circuiting List::Util::first or
List::MoreUtils::any fit the bill then...?

if ( List::Util::first { looks_like_number($_) } @a ) {...}

if ( List::MoreUtils::any { looks_like_number($_) } @a {...}


Ah, List::MoreUtils is exactly what I'm looking for. Thanks.
Unfortunately, List::MoreUtils doesn't seem to be a standard module
(at least according to http://perldoc.perl.org/index-modules-L.html).

So on platforms where I can only depend on standard modules I
suppose I can use List::Util::first() instead, as long as the set of
values I'm looking for can never be undefined. (Otherwise, I wouldn't
know if I got a proper match, or if no match at all was found.)

(So now I know a nice work-around to getting "Array ~~ CodeRef"
any() behavior, but I still don't know the reason behind the "Array ~~
Regex" and "Array ~~ CodeRef" any/all discrepancy.)

Thanks again for the response, Charles. It was useful.

Cheers,

-- Jean-Luc
 
A

Alan Curry

use Scalar::Util qw(looks_like_number);
if ( @a ~~ sub { looks_like_number($_[0]) } ) # prints "false"
{
print "true";
}
else
{
print "false"
}

the smart matching operator will return a false value. In this case,
the smart matching operator behaves like an "all()" function,

No it doesn't.

Try it with @a=(1,2,3); it's still false.

The coderef is only called once, with \@a as the argument. You passed an
arrayref to looks_like_number, which is going to be false no matter what.

You're right about one thing at least: it's not easy to predict what ~~ will
do based on the documentation.
 
I

Ilya Zakharevich

I find this a bit counter-intuitive.

There is nothing "a bit counter-intuitive" about the smart matching
operation. It is just *absolutely useless*, since there is no
humanly-possible way to predict what it would do.

The only way to handle the situation is to ignore this operator
completely, and black-list as unmaintainable any code which uses it.

Hope this helps,
Ilya
 
J

jl_post

  use Scalar::Util qw(looks_like_number);
  if ( @a ~~ sub { looks_like_number($_[0]) } )  # prints "false"
  {
     print "true";
  }
  else
  {
     print "false"
  }
the smart matching operator will return a false value.  In this case,
the smart matching operator behaves like an "all()" function,


No it doesn't.

Try it with @a=(1,2,3); it's still false.

The coderef is only called once, with \@a as the argument. You passed an
arrayref to looks_like_number, which is going to be false no matter what.


Hmmm... contrary to what you're saying, I tried it with @a=(1,2,3);
and it's evaluating to true for me. Here is the sample script I used
to test it with:


#!/usr/bin/perl
use strict;
use warnings;
my @a = (1, 2, 3);
use Scalar::Util qw(looks_like_number);
if ( @a ~~ sub { looks_like_number($_[0]) } ) # prints "true"
{
print "true";
}
else
{
print "false"
}
__END__


In fact, if I change the sub { } to be:

sub { print "$_[0]\n" }

then I see:

1
2
3
true

(The "true" is there because print() is returning a true value.) So I
have to disagree with the statement that the coderef is only called
once, with \@a as its argument.

You're right about one thing at least: it's not easy to predict what ~~ will
do based on the documentation.

Yeah... evidently I'm not the only one who's confused about aspects
of the '~~' operator. That's too bad -- it seems like it has a lot of
potential.

Cheers,

-- Jean-Luc
 
A

Alan Curry

Hmmm... contrary to what you're saying, I tried it with @a=(1,2,3);
and it's evaluating to true for me. Here is the sample script I used
to test it with:

I'll run it exactly as presented...
#!/usr/bin/perl
use strict;
use warnings;
my @a = (1, 2, 3);
use Scalar::Util qw(looks_like_number);
if ( @a ~~ sub { looks_like_number($_[0]) } ) # prints "true"
{
print "true";
}
else
{
print "false"
}
__END__

....and the answer is "false". Meanwhile, the altered version:
In fact, if I change the sub { } to be:

sub { print "$_[0]\n" }

then I see:

1
2
3
true

gives me "ARRAY(0x10030008)" and "true".
(The "true" is there because print() is returning a true value.) So I
have to disagree with the statement that the coderef is only called
once, with \@a as its argument.

It must be version-dependent. My perl is the one currentl found in the last
stable Debian release:

This is perl, v5.10.0 built for powerpc-linux-gnu-thread-multi
 
K

Klaus

There is nothing "a bit counter-intuitive" about the smart matching
operation.  It is just *absolutely useless*, since there is no
humanly-possible way to predict what it would do.

I perfectly agree.

Another feature in smart matching that is counter-intuitive/useless
where there is no humanly possible way to predict what it would do is:

the rule that if the lefthand side of a smartmatch is a number and the
righthand side is a string that *looks like a number*, then that
string is treated like a number.

First of all, it is impossible in Perl 5 (due to dualvars) to see
whether or not a variable contains a number or not.

Secondly, the rule whether or not a string looks like a number is not
straight forward:

**********************************************
use strict;
use warnings;
use 5.010;

say " 1.) 0 ~~ 0 ==> does ".( 0 ~~ 0 ? '' : 'not ')."look
like number";
say " 2.) 0 ~~ '' ==> does ".( 0 ~~ '' ? '' : 'not ')."look
like number";
say " 3.) 0 ~~ ' ' ==> does ".( 0 ~~ ' ' ? '' : 'not ')."look
like number";
say " 4.) 0 ~~ '+' ==> does ".( 0 ~~ '+' ? '' : 'not ')."look
like number";
say " 5.) 0 ~~ '-' ==> does ".( 0 ~~ '-' ? '' : 'not ')."look
like number";
say " 6.) 0 ~~ '.' ==> does ".( 0 ~~ '.' ? '' : 'not ')."look
like number";
say " 7.) 0 ~~ '0.' ==> does ".( 0 ~~ '0.' ? '' : 'not ')."look
like number";
say " 8.) 0 ~~ '0.0' ==> does ".( 0 ~~ '0.0' ? '' : 'not ')."look
like number";
say " 9.) 0 ~~ 0.0 ==> does ".( 0 ~~ 0.0 ? '' : 'not ')."look
like number";
say " 10.) 0 ~~ '0+' ==> does ".( 0 ~~ '0+' ? '' : 'not ')."look
like number";
say " 11.) 0 ~~ '+0' ==> does ".( 0 ~~ '+0' ? '' : 'not ')."look
like number";
say " 12.) 0 ~~ '0-' ==> does ".( 0 ~~ '0-' ? '' : 'not ')."look
like number";
say " 13.) 0 ~~ '-0' ==> does ".( 0 ~~ '-0' ? '' : 'not ')."look
like number";
say " 14.) 0 ~~ '0E' ==> does ".( 0 ~~ '0E' ? '' : 'not ')."look
like number";
say " 15.) 0 ~~ 'E0' ==> does ".( 0 ~~ 'E0' ? '' : 'not ')."look
like number";
say " 16.) 0 ~~ '0E0' ==> does ".( 0 ~~ '0E0' ? '' : 'not ')."look
like number";
say " 17.) 0 ~~ ' 0E0' ==> does ".( 0 ~~ ' 0E0' ? '' : 'not ')."look
like number";
say " 18.) 0 ~~ '0E0 ' ==> does ".( 0 ~~ '0E0 ' ? '' : 'not ')."look
like number";
say " 19.) 0 ~~ '0 ' ==> does ".( 0 ~~ '0 ' ? '' : 'not ')."look
like number";
say " 20.) 0 ~~ ' 0' ==> does ".( 0 ~~ ' 0' ? '' : 'not ')."look
like number";
say " 21.) 0 ~~ ' 0 ' ==> does ".( 0 ~~ ' 0 ' ? '' : 'not ')."look
like number";
say " 22.) 1 ~~ 1 ==> does ".( 1 ~~ 1 ? '' : 'not ')."look
like number";
say " 23.) 1 ~~ '' ==> does ".( 1 ~~ '' ? '' : 'not ')."look
like number";
say " 24.) 1 ~~ ' ' ==> does ".( 1 ~~ ' ' ? '' : 'not ')."look
like number";
say " 25.) 1 ~~ '+' ==> does ".( 1 ~~ '+' ? '' : 'not ')."look
like number";
say " 26.) 1 ~~ '-' ==> does ".( 1 ~~ '-' ? '' : 'not ')."look
like number";
say " 27.) 1 ~~ '.' ==> does ".( 1 ~~ '.' ? '' : 'not ')."look
like number";
say " 28.) 1 ~~ '1.' ==> does ".( 1 ~~ '1.' ? '' : 'not ')."look
like number";
say " 29.) 1 ~~ '1.0' ==> does ".( 1 ~~ '1.0' ? '' : 'not ')."look
like number";
say " 30.) 1 ~~ 1.0 ==> does ".( 1 ~~ 1.0 ? '' : 'not ')."look
like number";
say " 31.) 1 ~~ '1+' ==> does ".( 1 ~~ '1+' ? '' : 'not ')."look
like number";
say " 32.) 1 ~~ '+1' ==> does ".( 1 ~~ '+1' ? '' : 'not ')."look
like number";
say " 33.) 1 ~~ '1-' ==> does ".( 1 ~~ '1-' ? '' : 'not ')."look
like number";
say " 34.) -1 ~~ '-1' ==> does ".(-1 ~~ '-1' ? '' : 'not ')."look
like number";
say " 35.) 1 ~~ '1E' ==> does ".( 1 ~~ '1E' ? '' : 'not ')."look
like number";
say " 36.) 1 ~~ 'E0' ==> does ".( 1 ~~ 'E0' ? '' : 'not ')."look
like number";
say " 37.) 1 ~~ '1E0' ==> does ".( 1 ~~ '1E0' ? '' : 'not ')."look
like number";
say " 38.) 1 ~~ ' 1E0' ==> does ".( 1 ~~ ' 1E0' ? '' : 'not ')."look
like number";
say " 39.) 1 ~~ '1E0 ' ==> does ".( 1 ~~ '1E0 ' ? '' : 'not ')."look
like number";
say " 40.) 1 ~~ '1 ' ==> does ".( 1 ~~ '1 ' ? '' : 'not ')."look
like number";
say " 41.) 1 ~~ ' 1' ==> does ".( 1 ~~ ' 1' ? '' : 'not ')."look
like number";
say " 42.) 1 ~~ ' 1 ' ==> does ".( 1 ~~ ' 1 ' ? '' : 'not ')."look
like number";
**********************************************

the result is:
**********************************************
1.) 0 ~~ 0 ==> does look like number
2.) 0 ~~ '' ==> does not look like number
3.) 0 ~~ ' ' ==> does not look like number
4.) 0 ~~ '+' ==> does not look like number
5.) 0 ~~ '-' ==> does not look like number
6.) 0 ~~ '.' ==> does not look like number
7.) 0 ~~ '0.' ==> does look like number
8.) 0 ~~ '0.0' ==> does look like number
9.) 0 ~~ 0.0 ==> does look like number
10.) 0 ~~ '0+' ==> does not look like number
11.) 0 ~~ '+0' ==> does look like number
12.) 0 ~~ '0-' ==> does not look like number
13.) 0 ~~ '-0' ==> does look like number
14.) 0 ~~ '0E' ==> does not look like number
15.) 0 ~~ 'E0' ==> does not look like number
16.) 0 ~~ '0E0' ==> does look like number
17.) 0 ~~ ' 0E0' ==> does look like number
18.) 0 ~~ '0E0 ' ==> does look like number
19.) 0 ~~ '0 ' ==> does look like number
20.) 0 ~~ ' 0' ==> does look like number
21.) 0 ~~ ' 0 ' ==> does look like number
22.) 1 ~~ 1 ==> does look like number
23.) 1 ~~ '' ==> does not look like number
24.) 1 ~~ ' ' ==> does not look like number
25.) 1 ~~ '+' ==> does not look like number
26.) 1 ~~ '-' ==> does not look like number
27.) 1 ~~ '.' ==> does not look like number
28.) 1 ~~ '1.' ==> does look like number
29.) 1 ~~ '1.0' ==> does look like number
30.) 1 ~~ 1.0 ==> does look like number
31.) 1 ~~ '1+' ==> does not look like number
32.) 1 ~~ '+1' ==> does look like number
33.) 1 ~~ '1-' ==> does not look like number
34.) -1 ~~ '-1' ==> does look like number
35.) 1 ~~ '1E' ==> does not look like number
36.) 1 ~~ 'E0' ==> does not look like number
37.) 1 ~~ '1E0' ==> does look like number
38.) 1 ~~ ' 1E0' ==> does look like number
39.) 1 ~~ '1E0 ' ==> does look like number
40.) 1 ~~ '1 ' ==> does look like number
41.) 1 ~~ ' 1' ==> does look like number
42.) 1 ~~ ' 1 ' ==> does look like number
**********************************************
 
P

Peter J. Holzer

I perfectly agree.

I haven't tried to use it in anger yet (as mentioned in another posting,
I have still too many machines running 5.8.x), but from reading the docs
I tend to agree: It's way too complicated and I don't think I can
remember that stuff. It is of course possible that those rules exactly
match my intuition, but somehow I doubt it.

Another feature in smart matching that is counter-intuitive/useless
where there is no humanly possible way to predict what it would do is:

the rule that if the lefthand side of a smartmatch is a number and the
righthand side is a string that *looks like a number*, then that
string is treated like a number.

First of all, it is impossible in Perl 5 (due to dualvars) to see
whether or not a variable contains a number or not.
Right.


Secondly, the rule whether or not a string looks like a number is not
straight forward:

What's not straightforward about that? Except maybe for leading and
trailing whitespace the results are what I expected.

hp
 
K

Klaus

I haven't tried to use it in anger yet (as mentioned in another posting,
I have still too many machines running 5.8.x), but from reading the docs
I tend to agree: It's way too complicated and I don't think I can
remember that stuff. It is of course possible that those rules exactly
match my intuition, but somehow I doubt it.




What's not straightforward about that? Except maybe for leading and
trailing whitespace the results are what I expected.

Agreed, the rule as such whether or not a string looks like a number
is straight forward.

However, what trips me up with smartmatching is the combination of
numbers on the lefthand side combined with what looks like a number on
the righthand side.

Here is my (admittedly contrived) example:

******************************
use strict;
use warnings;
use 5.010;

my $val = ' 3';

checkvalue($val);

my $formatted = sprintf '%6.2f', $val;

checkvalue($val);

sub checkvalue {
if ($_[0] ~~ '3') { say "there is no space"; }
elsif ($_[0] ~~ ' 3') { say "there is one space"; }
elsif ($_[0] ~~ ' 3') { say "there are two spaces"; }
elsif ($_[0] ~~ ' 3') { say "there are three spaces"; }
else { say "I don't know what to say..."; }
}
******************************

Here is the output:
******************************
there are two spaces
there is no space
******************************

(please note that for exactly the same subroutine call
checkvalue($val); we get different output, depending on whether $val
has been part of an sprintf-call or not)

The thing that annoys me here is that each time I use smartmatches
with stringliterals on the righthand side, I always have to think
whether or not the string looks like a number. (in which case I should
better use the old "eq" instead of "~~")

Which leads me to the conclusion that I better should use the old "eq"
in all cases --> that also includes cases like $val ~~ ['3', ' 3', '
3', ' 3'] --> that would also better be written as $val eq '3' or
$val eq ' 3' or $val eq ' 3' or $val eq ' 3'.

Final verdict: Smartmatching works as expected in 99.9999% of all
cases. If you are concerned about 0.00001% of the cases (such as $val
~~ ['3', ' 3', ' 3', ' 3']), then better stick with the old "eq"
and let others debug their own smartmatching code.
 
I

Ilya Zakharevich

Smartmatch semantics changed in several small but important ways between
5.10.0 and 5.10.1. I would highly recommend avoiding smartmatch on
5.10.0 for this reason. (I believe the intention is not to change them
again.)

I strongly hope that this intention will be violated. *THERE IS* a
way to salvage smart match - but it must be made non-backward
compatible.

E.g., consider the following scenario:

a) Choose a small number N (I favor 3 ;-).

b) Write N simple rules explaining what ~~ should do (and "simple"
means AT LEAST having no "...except for...").

c) Add new `use strict' type 'smartmatch', enabled by default.

d) In scope of `use strict "smartmatch"', force ~~ to die() if it
encounters something not covered by (b), or when more than one
rule in (b) is applicable, and results conflict.

And I'm pretty sure that there must be yet smarter fixes...

Yours,
Ilya
 
K

Klaus

However, what trips me up with smartmatching is the combination of
numbers on the lefthand side combined with what looks like a number on
the righthand side.

Here is my (admittedly contrived) example:

******************************
use strict;
use warnings;
use 5.010;

my $val = '  3';

checkvalue($val);

my $formatted = sprintf '%6.2f', $val;

checkvalue($val);

sub checkvalue {
    if    ($_[0] ~~    '3') { say "there is no space";           }
    elsif ($_[0] ~~   ' 3') { say "there is one space";         }
    elsif ($_[0] ~~  '  3') { say "there are two spaces";       }
    elsif ($_[0] ~~ '   3') { say "there are three spaces";     }
    else                    { say "I don't know what to say..."; }}

One possible solution is to always quote the lefthand side of a
smartmatch

sub checkvalue {
if ("$_[0]" ~~ '3') { say "there is no space"; }
elsif ("$_[0]" ~~ ' 3') { say "there is one space"; }
elsif ("$_[0]" ~~ ' 3') { say "there are two spaces"; }
elsif ("$_[0]" ~~ ' 3') { say "there are three spaces"; }
else { say "I don't know what to say..."; }}

This works, but I don't think it is a good solution in the light of
perlfaq4 - What's wrong with always quoting "$vars"?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top