Most efficient way to do set-comparison

Krishna Chaitanya · Mar 13, 2009

Hi all,

If set A defined elements a,b,c (a set of permissible program options,
let's say), and we receive from user options in set B.

I want to make sure that user options in B are exactly same as those
defined in A .... nothing more, nothing less and completely
identical....this represents some kinda set operation if I'm not
wrong...

A U B = A
A int B = A

How to do this most efficiently in Perl...?

Peter Makholm · Mar 13, 2009

Krishna Chaitanya said:
If set A defined elements a,b,c (a set of permissible program options,
let's say), and we receive from user options in set B.

Many set operations can be implemented quite easy by using hash'es
where the keys are the elements in the sets.

//Makholm

Krishna Chaitanya · Mar 13, 2009

let's say), and we receive from user options in set B.

Many set operations can be implemented quite easy by using hash'es
where the keys are the elements in the sets.

Yep...I could make it out, but what is the most efficient way....using
any trick in the book of Perl? I could come up with this:

Listing 1
==========
use warnings;
use strict;

my %legalopts = (a => 1, b => 1, c => 1);

my %opts = (a => 1, c => 1); # failure would be that some elements are
missing (i.e. 'b')
# my %opts = (a => 1, b => 2, c => 3, d => 4); # failure would be that
extra options are present (i.e. 'd')

my @del_list = delete @opts{keys %legalopts};

$num_def = grep {!/^$/} map { defined $_ } @del_list;
die "Some options are missing in opts" unless @del_list == $num_def;

if (scalar keys %opts)
{
print "Extra elements found in opts: ";
print "$_ " foreach keys %opts;
print "\n";
die;
}

Pls. tell me if any better way exists...

Tad J McClellan · Mar 13, 2009

Krishna Chaitanya said:
Hi all,

If set A defined elements a,b,c (a set of permissible program options,
let's say), and we receive from user options in set B.

I want to make sure that user options in B are exactly same as those
defined in A .... nothing more, nothing less and completely
identical....this represents some kinda set operation if I'm not
wrong...

You want the "symmetric difference" to be empty.

A U B = A
A int B = A

How to do this most efficiently in Perl...?

perldoc -q difference

How do I compute the difference of two arrays?
How do I compute the intersection of two arrays?

Gunnar Hjalmarsson · Mar 13, 2009

Krishna said:
let's say), and we receive from user options in set B.

Yep...I could make it out, but what is the most efficient way....using
any trick in the book of Perl? I could come up with this:

Listing 1
==========
use warnings;
use strict;

my %legalopts = (a => 1, b => 1, c => 1);

my %opts = (a => 1, c => 1); # failure would be that some elements are
missing (i.e. 'b')
# my %opts = (a => 1, b => 2, c => 3, d => 4); # failure would be that
extra options are present (i.e. 'd')

my @del_list = delete @opts{keys %legalopts};

$num_def = grep {!/^$/} map { defined $_ } @del_list;
die "Some options are missing in opts" unless @del_list == $num_def;

if (scalar keys %opts)
{
print "Extra elements found in opts: ";
print "$_ " foreach keys %opts;
print "\n";
die;
}

Pls. tell me if any better way exists...

$ perl -le '
%legalopts = (a=>1, b=>1, c=>1); %opts = (a=>1, c=>1);
@L = sort keys %legalopts; @O = sort keys %opts;
print "@L" eq "@O" ? "Ok" : "Not equal"
'
Not equal
$

smallpond · Mar 13, 2009

Hi all,

If set A defined elements a,b,c (a set of permissible program options,
let's say), and we receive from user options in set B.

I want to make sure that user options in B are exactly same as those
defined in A .... nothing more, nothing less and completely
identical....this represents some kinda set operation if I'm not
wrong...

A U B = A
A int B = A

How to do this most efficiently in Perl...?

Set::Scalar has an is_equal method, and overloads ==

Tad J McClellan · Mar 13, 2009

Gunnar Hjalmarsson said:
$ perl -le '
%legalopts = (a=>1, b=>1, c=>1); %opts = (a=>1, c=>1);

Fails with:

%opts = ('a b'=>1, c=>1);

Gunnar Hjalmarsson · Mar 13, 2009

Tad said:
Fails with:

%opts = ('a b'=>1, c=>1);

Ouch! This should fix that:

$ perl -le '
%legalopts = (a=>1, b=>1, c=>1); %opts = (a=>1, b=>1);
@L = sort keys %legalopts; @O = sort keys %opts;
print "@L" eq "@O" && @L == @O ? "Ok" : "Not equal"
'
Not equal
$

OTOH you might object again and ask what happens if:

%legalopts = ("a b"=>1, c=>1); %opts = (a=>1, "b c"=>1);

I give up. This simplified approach is not safe, unless you can preclude
those odd combinations based on the rest of the code.

Dr.Ruud · Mar 14, 2009

Gunnar said:
$ perl -le '
%legalopts = (a=>1, b=>1, c=>1); %opts = (a=>1, c=>1);
@L = sort keys %legalopts; @O = sort keys %opts;
print "@L" eq "@O" ? "Ok" : "Not equal"
'
Not equal

I hate suggestions like that, because the stringifications of the arrays
can be equal when the arrays aren't.

Dr.Ruud · Mar 14, 2009

Krishna said:
If set A defined elements a,b,c (a set of permissible program options,
let's say), and we receive from user options in set B.

I want to make sure that user options in B are exactly same as those
defined in A .... nothing more, nothing less and completely
identical....this represents some kinda set operation if I'm not
wrong...

A U B = A
A int B = A

How to do this most efficiently in Perl...?

See List:Util and List::MoreUtils.

Or call this:

sub cmp_set {
my %a; @a{ @{$_[0]} } = (); delete @a{ @{$_[1]} };
my %b; @b{ @{$_[1]} } = (); delete @b{ @{$_[0]} };
if ( %b ) {
return 2 if %a; # dissimilar
return -1;
}
return %a ? 1 : 0;
}

sln · Mar 14, 2009

Hi all,

If set A defined elements a,b,c (a set of permissible program options,
let's say), and we receive from user options in set B.

I want to make sure that user options in B are exactly same as those
defined in A .... nothing more, nothing less and completely
identical....this represents some kinda set operation if I'm not
wrong...

A U B = A
A int B = A

How to do this most efficiently in Perl...?

You seem (in another post under this thread) to want more than just intersection
and equality. There is no real efficient way to do this under any language, but
if you want to report over/under relationships, this may help.

This is one approach. There may be more generalization that could be done,
perhaps combining all the mappings into one array. Certainly though, there
is no one-liner hiding for this problem.

-sln

-----------------------------------------------------------------------------

## options2.pl
##

use strict;
use warnings;

my $Debug = 1;

my %legalopts = (a => 1, b => 1, c => 1, d => 1);
my %useropts = (a => 1, c => 1, e => 1, f => 1, g => 1);
#my %useropts = (a => 1, b => 1, c => 1, d => 1);

sub mapOptions {
my ($legal, $user) = @_;
my %all = ();
@all { keys %{$legal} } = ( 1 ) x ( keys %{$legal} );
@all { keys %{$user} } = map { !defined $_ ? 2 : 3 } @all { keys %{$user} };
\%all;
}
sub getOptMap {
my ($options, $mask) = @_;
return map { $options->{$_} == $mask ? $_ : () } sort keys %{$options};
}

##
my $all = mapOptions ( \%legalopts, \%useropts );
my @legalonly = getOptMap ( $all, 1 );
my @useronly = getOptMap ( $all, 2 );
my @common = getOptMap ( $all, 3 );

##
if ( $Debug ) {
print "\nDebug:\n";
print "All - ",( sort keys %{$all} ),"\n";
print "Legal missing/non-matching - ",@legalonly,", elements = ", scalar(@legalonly), "\n";
print "User too many/non-matching - ",@useronly,", elements = ", scalar(@useronly), "\n";
print "Common matching - ",@common,", elements = ", scalar(@common), "\n";
}
##
if ( @legalonly || @useronly ) {
print "\nError:\n";
print "Legal options are '",( sort keys %legalopts ),"'\n";
print "Missing options '",@legalonly,"'\n" if (@legalonly);
print "Non-legal options '",@useronly,"'\n" if (@useronly);
} else {
print "\nSucessfully enterred legal options '",( sort keys %useropts ),"' !\n"
}

__END__

output:
-------------

Debug:
All - abcdefg
Legal missing/non-matching - bd, elements = 2
User too many/non-matching - efg, elements = 3
Common matching - ac, elements = 2

Error:
Legal options are 'abcd'
Missing options 'bd'
Non-legal options 'efg'

sln · Mar 14, 2009

[snip]
sub getOptMap {
my ($options, $mask) = @_;
return map { $options->{$_} == $mask ? $_ : () } sort keys %{$options};

^^^^^^
or, no need for return statement, the map ... is just fine

-sln

sln · Mar 16, 2009

You seem (in another post under this thread) to want more than just intersection
and equality. There is no real efficient way to do this under any language, but
if you want to report over/under relationships, this may help.

This is one approach. There may be more generalization that could be done,
perhaps combining all the mappings into one array. Certainly though, there
is no one-liner hiding for this problem.

So this is possible. There is getOptions() and getOptionsNoCase().
I found it interresting for my own use and, I thought I would drop
it back into here for anybody elses use. All mappings are now in one array,
and only one function call is needed per comparison of parameter pairs.
Since all the parts are broken out, there is many possibilities available
for error reporting. If the rigid standards are removed (its just interpretation),
the array of common parameter keys can be silently processed quite easily.
That is of interrest to me.

-sln

======================================

## options3.pl
##

use strict;
use warnings;

# indexes, if fully referenced option is needed
use constant OPT_LEGAL_ONLY => 0;
use constant OPT_USER_ONLY => 1;
use constant OPT_ALL => 2;
use constant OPT_COMMON => 3;

sub getOptions {
my ($legal, $user) = @_;
my @options = ([],[],[],[]);
# -- 'user only' and common
my %legalonly = %{ $legal };
for ( sort keys %{ $user } ) {
push @{ !exists $legal->{ $_ } ? $options[1] : $options[3] }, $_;
delete $legalonly{ $_ };
}
# -- legal only
@{ $options[0] } = sort keys %legalonly;
# -- all
@{ $options[2] } = sort keys %{ { map { $_ => 1 } (keys %{ $legal }, keys %{ $user }) } };
@options;
}

sub getOptionsNoCase {
my ($l, $u) = @_;
my $legal = { map { lc $_, 1 } keys %{ $l } };
my $user = { map { lc $_, 1 } keys %{ $u } };
getOptions( $legal, $user );
}

##

my $Debug = 1;

my %legalopts = (a => 1, b => 1, c => 1, d => 1);
my %useropts = (a => 1, c => 1, e => 1, f => 1, g => 1);
#my %useropts = (a => 1, b => 1, c => 1, d => 1); # try a sucessfull match

my ($legalonly, $useronly, $all, $common) = getOptions ( \%legalopts, \%useropts );

if ( $Debug ) {
print "\nDebug:\n";
print "All - ",@$all, ", elements = ", scalar(@$all), "\n";
print "Legal missing/non-matching - ",@$legalonly,", elements = ", scalar(@$legalonly), "\n";
print "User too many/non-matching - ",@$useronly, ", elements = ", scalar(@$useronly), "\n";
print "Common matching - ",@$common, ", elements = ", scalar(@$common), "\n";
}

if ( @$legalonly || @$useronly ) {
print "\nError:\n";
print "Legal options are '",( sort keys %legalopts ),"'\n";
print "Missing legal user options '",@$legalonly,"'\n" if (@$legalonly);
print "Non-legal user options '",@$useronly,"'\n" if (@$useronly);
} else {
print "\nSucessfully enterred legal options '",( sort keys %useropts ),"' !\n"
}

__END__

output:
-------

Debug:
All - abcdefg, elements = 7
Legal missing/non-matching - bd, elements = 2
User too many/non-matching - efg, elements = 3
Common matching - ac, elements = 2

Error:
Legal options are 'abcd'
Missing legal user options 'bd'
Non-legal user options 'efg'

Krishna Chaitanya · Mar 16, 2009

Thanks, sln. I'd started to think about this problem in relation to a
larger one...namely that of how to design a constructor in the best
way in general for .pm that I write. Some of the common tasks in
constructor would involve:

1. creating hash-ref and blessing it with package name
2. populating keys and values in hash-ref with what the user supplied
(this is where this set-comparison comes of use)
3. doing other "validations" on the values of those user-supplied keys
...
...finally returning the blessed referent.

A major discussion here at my work place is - is it good to put strict
checks in new() and bail out the program if user didn't give any
options to it or gave bad options. I'm of this idea. But several
others feel the new() should be benign and when you call other
methods, it should report problem and bail out.

What would all of you say about this? Is there a general best-practice
for designing constructors?

Also, is there a cool way to create the get-set methods for the
numerous instance-attributes in my classes? Any ready-made CPAN
tricks or something??

sln · Mar 17, 2009

Thanks, sln. I'd started to think about this problem in relation to a
larger one...namely that of how to design a constructor in the best
way in general for .pm that I write. Some of the common tasks in
constructor would involve:

1. creating hash-ref and blessing it with package name
2. populating keys and values in hash-ref with what the user supplied
(this is where this set-comparison comes of use)
3. doing other "validations" on the values of those user-supplied keys
..
..finally returning the blessed referent.

A major discussion here at my work place is - is it good to put strict
checks in new() and bail out the program if user didn't give any
options to it or gave bad options. I'm of this idea. But several
others feel the new() should be benign and when you call other
methods, it should report problem and bail out.

What would all of you say about this? Is there a general best-practice
for designing constructors?

Also, is there a cool way to create the get-set methods for the
numerous instance-attributes in my classes? Any ready-made CPAN
tricks or something??

To me, it all depends on what your module does.

If its a busy module, usually the majority of instance data is module generated
to a defualt state with helper member subs that can change the state during the
course of operations after initialization. So that file-scoped global arrays and
scalars hold some kind of general static data tha is used by the module to generate
and track dynamic instance data to be stuffed into the "$self" blessed hash reference.

Users may pass in arguments (that they can add/change) to new() but ideally, it should
not be a requirement at construction time. So the module should have the default parameters
in $self if none are provided. So in constructors, the module should set up a default state
then any args passed in to the constructor should be passed to another method, along with
the instance reference, that add/change values to the hash. Then that method can be called
later by users to change state at a time past instantiation.
So as a general rule, you want to set up methods both public and private, that change state,
during runtime. It is possible that over the course of usage, the $self hash gets used quite
frequently and it state sometimes needs to be reset before other operations take place.

Beyond state that persists between method calls, most of the argument (parameter) checking
is done within user called methods. Most of the parameters are consumed as local to the method,
however with complex data, processing sometimes permutes to sub calls to private helper methods
and area's of $self are sometimes used for scratch instead of passing around the data again.

It all depends on how you design it. Its based on your needs. If its a big item, you may want
to build a reporting truss within it that can be called by methods. There may be different levels
of errors from fatal to misdemeaner. A good logging system is always recommended.

I would always recommend ways to work around just die'ing if you can avoid it, report the
error and continue. You never know, your module may be used in automation. But if you have
to croak, do it for a good reason, otherwise report an error and return from sub with some
indication of error, but without dying.

For setting/getting $self values, unless its something very distinct, usually the way its done
is to have a setVal() kind of method that accepts a hash. Processing and validation is
done via something like: while (my ($name, $val) = splice (@args, 0, 2))
But there a many many ways to do this, it depends on your needs. You can absolutely force
compliance any way you want.

Good luck.
-sln

Krishna Chaitanya · Mar 17, 2009

So as a general rule, you want to set up methods both public and private, that change state,

during runtime.

The mention of private methods is interesting...esp. in Perl. Since
there are no C++-like keywords (private, et al), it's a lot more work
to ensure privacy in Perl for OO? Like using file-level scoping,
closures, etc? Would love to know how this is done as an example, so
that I make up my own ways of designing classes in future. Any good
CPAN modules you know that have private method implementation?

One of my biggest problems is how to achieve true-blue data
encapsulation in Perl.....much like in C++/Java. Am looking for the
strong OO-like encapsulation with the beauty and terseness of Perl.
What an amazing combo that would be!

Most of the parameters are consumed as local to the method,
however with complex data, processing sometimes permutes to sub calls to private helper methods
and area's of $self are sometimes used for scratch instead of passing around the data again.

Yep, in 1 of the .pm I am writing, I've resorted to create more $self
keys "internally" to store some of the results of my calculations
using user input, but which I want the object to "remember" and not be
evaluated again and again (the user doesn't know about these keys as
they are not documented). So, the first-time call to this method would
calculate something, create a key on $self for it and store it. I know
this won't change with repeated calls to this function, so the
subsequent calls will just examine if this key is defined or not and
return the value if defined.

If its a big item, you may want
to build a reporting truss within it that can be called by methods. There may be different levels
of errors from fatal to misdemeaner. A good logging system is always recommended.

What is a reporting truss, pls? I've searched but didn't find much on
it. Also, about this truss and good logging system, again, could you
pls. site examples of good CPAN modules that already do all this? I am
prepared to read every line of them to understand these concepts. Pls.
don't mind if I ask for examples, but they really drive home all of
you guys' points...

Thanks for all of your time, help...means much.

sln · Mar 18, 2009

The mention of private methods is interesting...esp. in Perl. Since
there are no C++-like keywords (private, et al), it's a lot more work
to ensure privacy in Perl for OO? Like using file-level scoping,
closures, etc? Would love to know how this is done as an example, so
that I make up my own ways of designing classes in future. Any good
CPAN modules you know that have private method implementation?

There probably is. Perl is not really OO though. You can get private subs
via anonymous subs, but most don't need to do it. Users usually don't muck with
unadvertised methods and it won't do what you think anyway.
You can distinguish the private ones with an underscore _func {}, but it is not private.

Read up in the perlmod.pod for basic info on OO stuff "Guidelines for Module Creation":

"Generally anything not exported is still accessible from outside the module using the
ModuleName::item_name (or $blessed_ref->method) syntax. By convention you can use a leading
underscore on names to indicate informally that they are 'internal' and not for public use.
(It is actually possible to get private functions by saying: my $subref = sub { ... }; &$subref;.
But there's no way to call that directly as a method, because a method must have a name in the symbol table.)"

[snip]

What is a reporting truss, pls? I've searched but didn't find much on
it. Also, about this truss and good logging system, again, could you
pls. site examples of good CPAN modules that already do all this? I am
prepared to read every line of them to understand these concepts. Pls.
don't mind if I ask for examples, but they really drive home all of
you guys' points...

I'll see if I can find a primitave example for you.

Below is a way of using anonymous sub methods as private (if its really necessary).

-sln

## Test.pl
# ==================

use strict;
use warnings;

use Ptest;

my $t = Ptest->new();
$t->PrintWorld( 'Hello' );

# print Ptest::$p{_MyPrint},"\n"; <- won't work, %p is file-scoped lexical in Ptest!

__END__

## Ptest.pm
# ==================

use strict;
use warnings;

package Ptest;
our @ISA = qw();

# File scope vars
# -----------------
my %p = (); # hash of private method refs

# User methods
# -------------
sub new
{
my $class = shift;
my $self = {
B => 'World',
C => 'Bye',
};
return bless ($self, $class);
}

sub PrintWorld
{
my ($self, $str) = @_;
$p{_MyPrint} ( $self, 'Hello' );
}

sub PrintBye
{
my $self = shift;
print "$self->{C}!\n";
}

# Private methods
# -----------------

$p{_MyPrint} = sub
{
my ($self, $A) = @_;
print "$A $self->{B}\n";
$self->PrintBye();
};
1;

Krishna Chaitanya · Mar 23, 2009

[snip]

I'll see if I can find a primitave example for you.

Hi sln, could you get your hands on an example yet? Thanks much...

efficient way to process data	0	Jan 12, 2014
most efficient way to get number of files in a directory	9	Jan 3, 2010
What is the most efficient way to compare similar contents in two lists?	12	Jun 13, 2011
Most efficient way to "pre-grow" a list?	30	Nov 6, 2009
Efficient Integer Comparison Algorithm	10	Mar 4, 2008
Memory efficient way to store strings in hash maps using	3	Apr 17, 2012
Newbie question: most efficient way to search fields of this file	9	Apr 14, 2006
Most efficient way to evaluate the contents of a variable.	2	Jul 13, 2007

Most efficient way to do set-comparison

Krishna Chaitanya

Peter Makholm

Krishna Chaitanya

Tad J McClellan

Gunnar Hjalmarsson

smallpond

Tad J McClellan

Gunnar Hjalmarsson

Dr.Ruud

Dr.Ruud

sln

sln

sln

Krishna Chaitanya

sln

Krishna Chaitanya

sln

Krishna Chaitanya

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads