Binding array to pattern

  • Thread starter Shmuel (Seymour J.) Metz
  • Start date
S

Shmuel (Seymour J.) Metz

I'd like to bind an array to a pattern. I couldn't find anything in
the Camel book about the context for the left side of the binding
operator. I ran some tests, and it appears that I get scalar context
if I write

while (@anarray =~ /pattern/g) {
block;
}

which means that I match against the size rather than the contents. I
tried

while ("@anarray" =~ /pattern/g) {
block;
}

but that went into a loop. Is there a better way to do this than

$_="@anarray";
while (/pattern/g) {
block;
}

?

Is there a description that I missed of interpolation for the left
side of the binding operator?

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to (e-mail address removed)
 
X

xhoster

Shmuel (Seymour J.) Metz said:
I'd like to bind an array to a pattern. I couldn't find anything in
the Camel book about the context for the left side of the binding
operator. I ran some tests, and it appears that I get scalar context
if I write

I don't know about the camel book, but from perldoc perlop:
Binding Operators
Binary "=~" binds a scalar
expression to a pattern
match.

So yes, it is a scalar.
while ("@anarray" =~ /pattern/g) {
block;
}

but that went into a loop.

Of course it did, what with the while there and all. Oh, you mean
an infinite loop. Yep, it does seem to. But then again, so does:

while ("$_" =~ /pattern/g) {

So apparently the reinterpolation is performed each time and thus the
string is not known to be the same.

Is there a better way to do this than

$_="@anarray";
while (/pattern/g) {
block;
}

I have no idea why you want to do it in the first place, but I can't think
of a better way to convert an array into a string and then repeatedly
matching on it in a while loop. (I guess localizing $_ first might be a
good idea.) I guess one possibly better alternative would be:

foreach ("@anarray" =~ /pattern/g) {

As it seems unlikely that the intermediate list would break the memory
bank.

Xho
 
G

Gunnar Hjalmarsson

Shmuel said:
I'd like to bind an array to a pattern.
Why?

Is there a better way to do this than

$_="@anarray";
while (/pattern/g) {
block;
}

?

This is more readable IMO:

foreach my $element ( @anarray ) {
while ( $element =~ /PATTERN/g ) {
...
}
}
 
T

Tad McClellan

Shmuel (Seymour J.) Metz said:
I'd like to bind an array to a pattern.


That makes no sense.

A pattern match is *defined* to operate on a string. An array
is not a string.

Why would you like to bind an array to a pattern?

(and what is your new definition of "pattern match" to go along with it?)

Do you instead want to apply a pattern match to each _element_
of an array? If so, then use foreach or grep.

I
tried

while ("@anarray" =~ /pattern/g) {
block;
}

but that went into a loop.


Errr, the "while" construct _is_ a loop. If you don't want a loop,
then don't use "while".

Is there a description that I missed of interpolation for the left
side of the binding operator?


Interpolation has nothing to do with any of Perl's other operators.

Interpolation happens with "strings" without regard to what
operator the string is an operand for.



What is it that you are ultimately trying to achieve?
 
B

Brian McCauley

Shmuel said:
I'd like to bind an array to a pattern. I couldn't find anything in
the Camel book about the context for the left side of the binding
operator. I ran some tests, and it appears that I get scalar context
if I write

while (@anarray =~ /pattern/g) {
block;
}

which means that I match against the size rather than the contents. I
tried

while ("@anarray" =~ /pattern/g) {
block;
}

but that went into a loop.

There problem is that you are using an expression with =~ //g

There was a similar problem here a while back.

http://groups.google.com/group/comp..._frm/thread/1cdea0dc1313b9b7/9bdc49d7f9d7bf31
Is there a description that I missed of interpolation for the left
side of the binding operator?

It's not the fact that it's interpolation, it's the fact that it's an
rvalue expression so each time round the while() loop the =~ is binding
to a new string and the /g position pointer is starts again at zero.
 
S

Shmuel (Seymour J.) Metz


I want to test for a match anywhere in the array.
This is more readable IMO:
foreach my $element ( @anarray ) {
while ( $element =~ /PATTERN/g ) {
...
}
}

I had simplified my code because I was primarily concerned with the
endless loop rather than style. What I actually wound up with[1] was

foreach (sort keys %{$host_info{$host}{Email}}) {
push @Contacts, @{$host_info{$host}{Email}{$_}};
my $scalarContacts="@{$host_info{$host}{Email}{$_}}";
push @abuseContacts, @{$host_info{$host}{Email}{$_}}
if (/abuse/ or $scalarContacts =~ /abuse/);
}

and I'd rather avoid replicating the push statement. Given that, is
there a better style?

Thanks.

[1] The code is quck and dirty and at some point I intend to
do some massive cleanup, but it's still a work in progress.

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to (e-mail address removed)
 
S

Shmuel (Seymour J.) Metz

on 06/08/2006 said:
That makes no sense.

To you. Others had no trouble understanding it.
Why would you like to bind an array to a pattern?

Because I want all of the matches on all of the strings in the array.
(and what is your new definition of "pattern match" to go along with
it?)

What is your definition of "new", and are you really older than
Griswold?
Do you instead want to apply a pattern match to each _element_ of an
array?

That would be obvious if you looked at the code. Do you know how
interpolation works inside quotes?
If so, then use foreach or grep.

That would complicate the logic in this case.
Errr, the "while" construct _is_ a loop. If you don't want a loop,
then don't use "while".

Sorry, I meant nonterminating loop. Another poster has explained the
problem.
What is it that you are ultimately trying to achieve?

Scan an array for a pattern as part of a larger boolean expression.
The code that I posted was part of debug scaffolding that I wrote
while trying to resolve the original problem.

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to (e-mail address removed)
 
S

Shmuel (Seymour J.) Metz

In <[email protected]>, on
06/09/2006
at 05:17 AM said:
It's not the fact that it's interpolation, it's the fact that it's an
rvalue expression so each time round the while() loop the =~ is
binding to a new string and the /g position pointer is starts again
at zero.

Thanks.

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to (e-mail address removed)
 
S

Shmuel (Seymour J.) Metz

on 06/08/2006 said:
So apparently the reinterpolation is performed each time and thus the
string is not known to be the same.
Ouch!

Of course it did, what with the while there and all. Oh, you mean
an infinite loop.

Well, it ends when I do ^c ;-) Sorry, I should have been clearer.
I have no idea why you want to do it in the first place,

I need a term in a boolean expression for a match anywhere in the
array.
I guess one possibly better alternative would be:
foreach ("@anarray" =~ /pattern/g) {

That would have the wrong semantics even if it didn't go into an
endless loop. IAC, the code that I posted was test cases intended to
help me track down the original problem. The original failing code was
a match inside a boolean expression, and I have currently changed it
to the following:

foreach (sort keys %{$host_info{$host}{Email}}) {
push @Contacts, @{$host_info{$host}{Email}{$_}};
my $scalarContacts="@{$host_info{$host}{Email}{$_}}";
push @abuseContacts, @{$host_info{$host}{Email}{$_}}
if (/abuse/ or $scalarContacts =~ /abuse/);
}

I could use

foreach my $type (sort keys %{$host_info{$host}{Email}}) {

and throw in a nested

foreach (@{$host_info{$host}{Email}{$type}}) {

but I'd consider replicating the push to be uglier than coercing the
array to a string.

Thanks.

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to (e-mail address removed)
 
U

Uri Guttman

SJM> foreach (sort keys %{$host_info{$host}{Email}}) {
SJM> push @Contacts, @{$host_info{$host}{Email}{$_}};
SJM> my $scalarContacts="@{$host_info{$host}{Email}{$_}}";
SJM> push @abuseContacts, @{$host_info{$host}{Email}{$_}}
SJM> if (/abuse/ or $scalarContacts =~ /abuse/);
SJM> }

SJM> and I'd rather avoid replicating the push statement. Given that, is
SJM> there a better style?

disregarding the loop issues, that is very hard to read code. notice the
massive redundant use of $host_info{$host}{Email} in there? factor that
out into a scalar before the loop. and then it can become almost
readable (with some needed whitespace too)

my $email_info = $host_info{$host}{Email} ;

foreach (sort keys %{$email_info}) {

my $emails = $email_info->{$_} ;
push @Contacts, @{$emails};
push @abuseContacts, @{$emails};
my $scalarContacts = "@{$emails}";

if (/abuse/ or $scalarContacts =~ /abuse/);

that method of checking a joined string vs scanning a array bothers
me. and why do you push the same stuff into 2 different arrays?
if you used List::Utils::first you can scan for the first abuse email in
the array and it could be faster as you don't make up the string first.

uri
 
T

Tad McClellan

Shmuel (Seymour J.) Metz said:
I need a term in a boolean expression for a match anywhere in the
array.

if (/abuse/ or $scalarContacts =~ /abuse/);


if grep /abuse/, $_, @{$host_info{$host}{Email}{$_}};
 
T

Tad McClellan

[ The "that" was snipped, it was:
I'd like to bind an array to a pattern.
]
To you. Others had no trouble understanding it.


Neither you nor any of the others understood "bind an array to a pattern".

The rest of your OP below here, and the followups I've seen so far,
where all with binding a string to a pattern (a string (supposedly)
made up of strings taken from some array).

Because I want all of the matches on all of the strings in the array.


My suggestion would do that for you.

That would be obvious if you looked at the code.


It was not obvious if you _understood_ the code.

I looked at it and saw that

@array = ('a', 'b');
print "true" if "@array" =~ /a b/g;

would fail to do the Right Thing.

Do you know how
interpolation works inside quotes?


Better than you do, apparently.

That would complicate the logic in this case.


No it wouldn't.

Scan an array for a pattern as part of a larger boolean expression.


Use grep(), just like I suggested then.
 
S

Shmuel (Seymour J.) Metz

on 06/09/2006 said:
disregarding the loop issues, that is very hard to read code. notice
the massive redundant use of $host_info{$host}{Email} in there?
factor that out into a scalar before the loop. and then it can become
almost readable (with some needed whitespace too)
Thanks.

my $email_info = $host_info{$host}{Email} ;

I assume that $email_info will b e a reference, so that stores into it
will go into $host_info?
push @Contacts, @{$emails};
push @abuseContacts, @{$emails};

No; that would change the logic. What would be needed is:

push @abuseContacts, @{$emails};
my $scalarContacts = "@{$emails}"
if (/abuse/ or $scalarContacts =~ /abuse/);
and why do you push the same stuff into 2 different arrays?

I don't; the 2nd push is conditional.
if you used List::Utils::first you can scan for the first abuse
email in the array and it could be faster as you don't make up the
string first.

It's more complicated than that; there are two arrays and if the
second has any entries then I need to use it in place of the first. So
I'd need to use something like grep to extract the abuse entries if I
didn't select them out into a separate array.

if (@abuseContacts) {
print $fhLookup "Abuse contacts: ", join(',
',@abuseContacts),"\n";
} else {
print $fhLookup "contacts: ", join(', ',@Contacts),"\n";
}

Just for laughs I checked Hash::Util to see if there was an equivalent
to List::Utils::first, but no joy.

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to (e-mail address removed)
 
U

Uri Guttman

SM> I assume that $email_info will b e a reference, so that stores into it
SM> will go into $host_info?

it has to be a reference as you created a ref there earlier when you
built the structure. all scalar values in a structure that hold lower
level structures are refs.

SM> No; that would change the logic. What would be needed is:

SM> push @abuseContacts, @{$emails};
SM> my $scalarContacts = "@{$emails}"
SM> if (/abuse/ or $scalarContacts =~ /abuse/);

that is wrong. you didn't make $scalarContacts before you tested it.

SM> I don't; the 2nd push is conditional.

but based on wrong logic. i still don't know your goals here.

SM> It's more complicated than that; there are two arrays and if the
SM> second has any entries then I need to use it in place of the first. So
SM> I'd need to use something like grep to extract the abuse entries if I
SM> didn't select them out into a separate array.

that is very unclear to me. you need to learn how to express your
requirements better. that makes it much easier to code to them. i can't
read your mind so you have to explain it in very clear english.

SM> if (@abuseContacts) {
SM> print $fhLookup "Abuse contacts: ", join(',
SM> ',@abuseContacts),"\n";
SM> } else {
SM> print $fhLookup "contacts: ", join(', ',@Contacts),"\n";
SM> }

SM> Just for laughs I checked Hash::Util to see if there was an equivalent
SM> to List::Utils::first, but no joy.

that makes no sense as there is no order in hashes so first can't
exist. you want any() from quantum::superpositions or one of the perl6
modules.

again, please write up VERY clear requirements as it will help you and
us.

uri
 
T

Tad McClellan

Shmuel (Seymour J.) Metz said:
In <[email protected]>, on 06/09/2006
at 05:20 PM, Uri Guttman <[email protected]> said:


No; that would change the logic. What would be needed is:

push @abuseContacts, @{$emails};
my $scalarContacts = "@{$emails}"
if (/abuse/ or $scalarContacts =~ /abuse/);


You flipped the lines from what you posted before, which was:

push @Contacts, @{$host_info{$host}{Email}{$_}};
my $scalarContacts="@{$host_info{$host}{Email}{$_}}";
push @abuseContacts, @{$host_info{$host}{Email}{$_}}
if (/abuse/ or $scalarContacts =~ /abuse/);


Applying both Uri's and my suggestions to that code yields:

push @Contacts, @{$emails};
push @abuseContacts, @{$emails}
if grep /abuse/, $_, @{$emails};

Or, taking advantage of the special case of the reference
being a simple standalone scalar allowing you to leave out
some curlies:

push @Contacts, @$emails;
push @abuseContacts, @$emails
if grep /abuse/, $_, @$emails;

I don't; the 2nd push is conditional.


Not in the code you posted this time.
 
S

Shmuel (Seymour J.) Metz

on 06/15/2006 said:
it has to be a reference as you created a ref there earlier when you
built the structure. all scalar values in a structure that hold lower
level structures are refs.
Thanks.

that is wrong.

Whoops! I edited the text from the article instead of doing a
cut-and-paste directly from my code. Make that

my $email_contact = $email_info->{$_} ;
push @Contacts, @{$email_contact};
my $scalarContacts="@{$email_contact}";
push @abuseContacts, @{$email_contact}
if (/abuse/ or $scalarContacts =~ /abuse/);
but based on wrong logic.

Please see above.
i still don't know your goals here.

My goal is to construct one of two messages; one containg only the
abuse matches and the other containing all of the array element,
depending on whether there are any abuse matches. I was trying to
quote only the minimum code needed to provide context, not all 620
lines.
that is very unclear to me.

I'm extracting e-mail contacts from whois data. In some cases there
are multiple contacts for the same role. The abuse contacts might have
the word "abuse" in the addresses or might have it only in the tags.
If there are abuse contacts then I want to put them in a message;
otherwise want to put all of the e-mail in a different message. That's
part of a larger program that deobfuscates a spa e-mail and attempts
to locate information on the sender and the drop boxes for use in a
complaint.
that makes no sense as there is no order in hashes so first can't
exist. you want any() from quantum::superpositions or one of the
perl6 modules.

When will Perl6 be ready for prime time?

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to (e-mail address removed)
 
S

Shmuel (Seymour J.) Metz

on 06/15/2006 said:
You flipped the lines from what you posted before,

I edited the text of the article instead of clipping from my code, and
inadvertently left some out. That should be

my $email_contact = $email_info->{$_} ;
push @Contacts, @{$email_contact};
my $scalarContacts="@{$email_contact}";
push @abuseContacts, @{$email_contact}
if (/abuse/ or $scalarContacts =~ /abuse/);
if grep /abuse/, $_, @$emails;

How is that better than scanning the derived scalar? Although it might
be better to do

my $scalarContacts="$_ @$email_contact";
push @abuseContacts, @$email_contact
if $scalarContacts =~ /abuse/;

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to (e-mail address removed)
 
B

Ben Morrow

Quoth "Shmuel (Seymour J.) Metz said:
Whoops! I edited the text from the article instead of doing a
cut-and-paste directly from my code. Make that

my $email_contact = $email_info->{$_} ;
push @Contacts, @{$email_contact};
my $scalarContacts="@{$email_contact}";
push @abuseContacts, @{$email_contact}
if (/abuse/ or $scalarContacts =~ /abuse/);

These three lines are equivalent to

push @abuseContacts, @{$email_contact}
if grep /abuse/, $_, @{$email_contact};

except that doesn't waste time (both for the machine and the human
reading the code) stringifying the array; and there aren't problems with

$email_contact = ['fooab', 'use this'];

(though I suspect this isn't an issue in your case).

You do realise this will push *all* of @{$email_contact} onto
@abuseContact, if *any* of them match? From your description below I
can't quite see how this could be what you want.
I'm extracting e-mail contacts from whois data. In some cases there
are multiple contacts for the same role. The abuse contacts might have
the word "abuse" in the addresses or might have it only in the tags.
If there are abuse contacts then I want to put them in a message;

By 'abuse contacts' you mean 'email addresses matching /abuse/', right?
[Side issue: are you sure you mean /abuse/ and not /^abuse\@/ ?]
otherwise want to put all of the e-mail in a different message.

....so you can read the tags and find the correct addr by hand? Do you
want the whole whois reply, or just all the email addesses in the reply?

In any case, I'd do something like (untested)

my @abuse_addrs;
my @misc_addrs;
my @domains = ...;

for (@domains) {
my $whois = get_whois_data($_);
my @emails = extract_email_addrs($whois);

my @ae = grep /abuse/, @emails;
if (@ae) {
push @abuse_addrs, @ae;
}
else {
push @misc_addrs, $whois; # or @emails
}
}

Or have I misunderstood you?
When will Perl6 be ready for prime time?

Err... not for a while :). Perl5 will be the supported and developed
version of Perl for the forseeable future. Some features of Perl6 are
available for Perl5 in the modules in the Perl6::* namespace; any() is
in Perl6::Junction (or, as Uri said, in Quantum::Superpositions, though
that's likely much slower); also in List::MoreUtils, which is probably
what I'd use if I needed it.

Ben
 
U

Uri Guttman

S(J)M> Whoops! I edited the text from the article instead of doing a
S(J)M> cut-and-paste directly from my code. Make that

always cut/paste real code here. otherwise you waste your and our time.

S(J)M> I'm extracting e-mail contacts from whois data. In some cases there
S(J)M> are multiple contacts for the same role. The abuse contacts might have
S(J)M> the word "abuse" in the addresses or might have it only in the tags.
S(J)M> If there are abuse contacts then I want to put them in a message;
S(J)M> otherwise want to put all of the e-mail in a different message. That's
S(J)M> part of a larger program that deobfuscates a spa e-mail and attempts
S(J)M> to locate information on the sender and the drop boxes for use in a
S(J)M> complaint.

again, you are somewhat unclear. 'put them in a message' means what? in
the to: fields of email? in the body? 'all of the email' means what? all
addresses in the whole whois record? only those with abuse in the
address? only those emails in a section which mentions abuse? specifying
clean problem requirements is the key to any solution.

i smell an XY problem here. it is always best to explain the original
problem than to ask how to solve it in the way you picked. just going
back to the whois data may make this whole thing much easier. what is
the format of the whois records?

S(J)M> When will Perl6 be ready for prime time?

there are perl6 modules on cpan which are written in perl5. look at the
Perl6:: namespace.

but i will wait until i see the whois stuff. solving your problem from
that level looks like it will be much easier.

uri
 
T

Tad McClellan

Shmuel (Seymour J.) Metz said:
I edited the text of the article instead of clipping from my code,


Yes, I could tell.

Please follow the posting guidelines to avoid launching such
red herrings in the future.

my $scalarContacts="@{$email_contact}";
push @abuseContacts, @{$email_contact}
if (/abuse/ or $scalarContacts =~ /abuse/);


How is that better than scanning the derived scalar?


1) you don't have do the deriving of any scalar (save a few cycles,
both in silicon and in grey matter)

2) it is not vulnerable to the bug that I pointed out earlier
(That would be reason enough for me to not get used to doing
it that way. Inserting bugs is bad.)

3) you don't end up having the regex engine compile the same
regex multiple times (save more than a few cycles)


When I suggested using grep() in my very first followup, you
dismissed it because it would "complicate the logic".

push @abuseContacts, @{$email_contact}
if grep /abuse/, $_, @{$email_contact};

Those 2 lines are less complicated than your 3 lines quoted above.

Using grep() in this situation *simplifies* the logic (and avoids
the potential bug).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top