Count differences between arrays

Steve · Jan 7, 2008

Hi all,

I'm trying to count occurrences of elements in array1 that aren't in
array2. Currently I'm doing this by converting one array to a hash and
using 'exists':-

my %foo;
my $score = 0;
@foo{@array2} = (); # Convert array to hash (for exists)
for (@array1) { $score++ unless exists $foo{$_} };

Whilst this seems to work I'm sure there's a more efficient method.

Any suggestions?

Thanks

Jürgen Exner · Jan 7, 2008

Steve said:
I'm trying to count occurrences of elements in array1 that aren't in
array2.

perldoc -q difference: "How do I compute the difference of two arrays?"

jue

nolo contendere · Jan 7, 2008

Hi all,

I'm trying to count occurrences of elements in array1 that aren't in
array2. Currently I'm doing this by converting one array to a hash and
using 'exists':-

my %foo;
my $score = 0;
@foo{@array2} = (); # Convert array to hash (for exists)
for (@array1) { $score++ unless exists $foo{$_} };

Whilst this seems to work I'm sure there's a more efficient method.

Any suggestions?

Which aspect of efficiency are you trying to improve?

Steve · Jan 7, 2008

Which aspect of efficiency are you trying to improve?

The code is used in an update to Cleanfeed, the defacto filtering
software operated by Usenet server admins. Each NNTP message is
processed individually through the Cleanfeed filter so speed is really
the primary driver.

The actual function of this fragment of code is to compare the content
of the Newsgroups and Followup-To headers so that messages which
followup-to groups that aren't in the distribution are negatively
scored.

nolo contendere · Jan 7, 2008

The code is used in an update to Cleanfeed, the defacto filtering
software operated by Usenet server admins. Each NNTP message is
processed individually through the Cleanfeed filter so speed is really
the primary driver.

The actual function of this fragment of code is to compare the content
of the Newsgroups and Followup-To headers so that messages which
followup-to groups that aren't in the distribution are negatively
scored.

Will the hash from array2 need to be constructed anew each time you
filter? Can you parallelize the work across all the NNTP messages, and
use a shared hash (or a reasonable facsimile) to perform the lookups?

Steve · Jan 8, 2008

Will the hash from array2 need to be constructed anew each time you
filter? Can you parallelize the work across all the NNTP messages, and
use a shared hash (or a reasonable facsimile) to perform the lookups?

The hash will be constructed new for each message processed, as are the
arrays for Newsgroups and Followup-To: content. For both arrays there
are unlikely to be more than 10 elements.

nolo contendere · Jan 8, 2008

The hash will be constructed new for each message processed, as are the
arrays for Newsgroups and Followup-To: content. For both arrays there
are unlikely to be more than 10 elements.

What's the purpose of reconstructing the hash each time? Just do it
once in the beginning, if it won't change. Also, if it's static, you
can use it without reservation in parallel processing.

Arrays instead of files into hashes	15	Jan 12, 2009
FAQ 4.43 How do I compute the difference of two arrays? How do I compute the intersection of two ar	0	Feb 2, 2011
differences between hashes and arrays ?	19	Sep 22, 2006
FAQ 4.44 How do I test whether two arrays or hashes are equal?	2	Apr 20, 2011
Subtracting arrays of floats.	12	Sep 16, 2003
dynamically naming arrays	7	Feb 1, 2011
Creating a html table from arrays in perl	2	Apr 26, 2006
New FAQ: How do I compute the difference of two arrays?	9	Sep 26, 2003

Count differences between arrays

Steve

Jürgen Exner

nolo contendere

Steve

nolo contendere

Steve

nolo contendere

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads