Count differences between arrays

S

Steve

Hi all,

I'm trying to count occurrences of elements in array1 that aren't in
array2. Currently I'm doing this by converting one array to a hash and
using 'exists':-

my %foo;
my $score = 0;
@foo{@array2} = (); # Convert array to hash (for exists)
for (@array1) { $score++ unless exists $foo{$_} };

Whilst this seems to work I'm sure there's a more efficient method.

Any suggestions?

Thanks
 
J

Jürgen Exner

Steve said:
I'm trying to count occurrences of elements in array1 that aren't in
array2.

perldoc -q difference: "How do I compute the difference of two arrays?"

jue
 
N

nolo contendere

Hi all,

I'm trying to count occurrences of elements in array1 that aren't in
array2.  Currently I'm doing this by converting one array to a hash and
using 'exists':-

my %foo;
my $score = 0;
@foo{@array2} = (); # Convert array to hash (for exists)
for (@array1) { $score++ unless exists $foo{$_} };

Whilst this seems to work I'm sure there's a more efficient method.

Any suggestions?

Which aspect of efficiency are you trying to improve?
 
S

Steve

Which aspect of efficiency are you trying to improve?

The code is used in an update to Cleanfeed, the defacto filtering
software operated by Usenet server admins. Each NNTP message is
processed individually through the Cleanfeed filter so speed is really
the primary driver.

The actual function of this fragment of code is to compare the content
of the Newsgroups and Followup-To headers so that messages which
followup-to groups that aren't in the distribution are negatively
scored.
 
N

nolo contendere

The code is used in an update to Cleanfeed, the defacto filtering
software operated by Usenet server admins.  Each NNTP message is
processed individually through the Cleanfeed filter so speed is really
the primary driver.

The actual function of this fragment of code is to compare the content
of the Newsgroups and Followup-To headers so that messages which
followup-to groups that aren't in the distribution are negatively
scored.

Will the hash from array2 need to be constructed anew each time you
filter? Can you parallelize the work across all the NNTP messages, and
use a shared hash (or a reasonable facsimile) to perform the lookups?
 
S

Steve

Will the hash from array2 need to be constructed anew each time you
filter? Can you parallelize the work across all the NNTP messages, and
use a shared hash (or a reasonable facsimile) to perform the lookups?

The hash will be constructed new for each message processed, as are the
arrays for Newsgroups and Followup-To: content. For both arrays there
are unlikely to be more than 10 elements.
 
N

nolo contendere

The hash will be constructed new for each message processed, as are the
arrays for Newsgroups and Followup-To: content.  For both arrays there
are unlikely to be more than 10 elements.

What's the purpose of reconstructing the hash each time? Just do it
once in the beginning, if it won't change. Also, if it's static, you
can use it without reservation in parallel processing.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,139
Latest member
JamaalCald
Top