making easy things difficult: @a-@b

W

wana

Suppose I want to create an array made up of all of the elements of @a
except for those that are also contained in @b. For example:

(apple, banana, peach, pear) - (apple, peach) = (banana, pear)

I was hoping that @c = @a - @b; would work. This would really be
making an easy task easy.

Searching the faqs and docs, the solution seems to be a confusing (to
me at least) use of either a hash or grep. I looked at the map
function which seemed appropriate, but I couldn't think of an obvious
way to do it.

I finally settled for the old C way of doing it of having a loop in a
loop to compare each value of one array against each value of the
other.

Is there a better way? Would it be wrong for Perl to make @a-@b
legal?
 
R

Rhesa Rozendaal

wana said:
Suppose I want to create an array made up of all of the elements of @a
except for those that are also contained in @b. For example:

(apple, banana, peach, pear) - (apple, peach) = (banana, pear)

I was hoping that @c = @a - @b; would work. This would really be
making an easy task easy.

If it would do anything, I'd expect it to do vector subtraction:

push @c, (shift(@a) - shift(@b)) while( @a or @b );

But I am not a set theorist.
Searching the faqs and docs, the solution seems to be a confusing (to
me at least) use of either a hash or grep. I looked at the map
function which seemed appropriate, but I couldn't think of an obvious
way to do it.

I finally settled for the old C way of doing it of having a loop in a
loop to compare each value of one array against each value of the
other.

Is there a better way? Would it be wrong for Perl to make @a-@b
legal?

In general yes, because I have a different idea of arrays than you ;)

Have a look at Set::Array,
http://search.cpan.org/~djberg/Set-Array-0.11/Array.pm
It implements a - that you seek.

Rhesa
 
J

Jürgen Exner

wana said:
Suppose I want to create an array made up of all of the elements of @a
except for those that are also contained in @b. For example:

(apple, banana, peach, pear) - (apple, peach) = (banana, pear)

I was hoping that @c = @a - @b; would work. This would really be
making an easy task easy.

Well, if @a = (apple, banana, peach, pear) and @b = (apple, peach) then I
would expect @a - @b to be (0, 0, 0, 0) because the numerical because the
numerical values of every singe element is 0 and 0 - 0 is 0.
But what do I know about vector arithmetic...

If a binary minus on arrays would have such a semantic as you indicated,
then I wonder what you expect to happen if e.g. @a contains the value
"apple" twice. Would both occurences be removed? Only the first? Only the
last?

If you don't want a vector but a set then don't use an array but a hash.
Searching the faqs and docs, the solution seems to be a confusing (to
me at least) use of either a hash or grep.

You don't say _which_ FAQ you checked, but "perldoc -q intersection" seems
to be pretty clear to me.
I looked at the map
function which seemed appropriate, but I couldn't think of an obvious
way to do it.
I finally settled for the old C way of doing it of having a loop in a
loop to compare each value of one array against each value of the
other. Is there a better way?

Absolutely. See the FAQ.
Would it be wrong for Perl to make @a-@b legal?

Wrong? No, why? But it probably wouldn't do what you seem to expect it to
do.
If there would ever be a binary minus on arrays it would more likely be used
for vector substraction because there are other data structures, which are
better suited for set operations than arrays.

jue
 
G

Gunnar Hjalmarsson

wana said:
Suppose I want to create an array made up of all of the elements of
@a except for those that are also contained in @b. For example:

(apple, banana, peach, pear) - (apple, peach) = (banana, pear)

I was hoping that @c = @a - @b; would work. This would really be
making an easy task easy.

Searching the faqs and docs, the solution seems to be a confusing
(to me at least) use of either a hash or grep.

Why not both?

my %b = map { $_, 1 } @b;
my @c = grep !$b{$_}, @a;
 
T

Tassilo v. Parseval

Also sprach wana:
Suppose I want to create an array made up of all of the elements of @a
except for those that are also contained in @b. For example:

(apple, banana, peach, pear) - (apple, peach) = (banana, pear)

I was hoping that @c = @a - @b; would work. This would really be
making an easy task easy.

I'm not convinced that this is such a great idea.
Searching the faqs and docs, the solution seems to be a confusing (to
me at least) use of either a hash or grep. I looked at the map
function which seemed appropriate, but I couldn't think of an obvious
way to do it.

It may not seem obvious why a hash is often the tool of choice when
manipulating lists. However, a hash has two very useful properties. One
is that you cannot have duplicate keys. This is really just a
side-effect from the way hashes work internally. The other one is the
fact that a look-up happens in constant time.

For your problem, both features come in very handy. You'd first fill all
the values of @b into a hash as keys:

my %hash;
@hash{ @b } = ();

Now you use %hash as a filter. You walk through @a and only keep those
values that are not in %hash. This is efficient because a key look-up
takes next to no time:

my @new = grep !exists $hash{$_}, @a;

Note that the values associated with each key in the hash are irrelevant
here.

In a smilar fashion hashes can be used to filter out duplicates or
calculate the intersection of two lists (in which case you'd just have
to negate the grep-condition in the above statement).
I finally settled for the old C way of doing it of having a loop in a
loop to compare each value of one array against each value of the
other.

You can do that, sure, but it becomes impractical for large lists.

Tassilo
 
W

wana

Tassilo v. Parseval said:
Also sprach wana:


I'm not convinced that this is such a great idea.


It may not seem obvious why a hash is often the tool of choice when
manipulating lists. However, a hash has two very useful properties. One
is that you cannot have duplicate keys. This is really just a
side-effect from the way hashes work internally. The other one is the
fact that a look-up happens in constant time.

For your problem, both features come in very handy. You'd first fill all
the values of @b into a hash as keys:

my %hash;
@hash{ @b } = ();

Now you use %hash as a filter. You walk through @a and only keep those
values that are not in %hash. This is efficient because a key look-up
takes next to no time:

my @new = grep !exists $hash{$_}, @a;

Note that the values associated with each key in the hash are irrelevant
here.

In a smilar fashion hashes can be used to filter out duplicates or
calculate the intersection of two lists (in which case you'd just have
to negate the grep-condition in the above statement).


You can do that, sure, but it becomes impractical for large lists.

Tassilo


Thanks! I guess I needed to sleep on it and let it sink in. I woke
up this morning and my first thought was "the key is the key! you just
have to ask if it exists or not." I ran to my computer to look at
everyones examples to see if I was right. I found your beautiful
explanation waiting for me. Maybe what I am asking for is more in
line with the PHP philosophy, which is to provide a simple built in
function for everything so people don't have to think too much. Now I
am glad that there was no built in function so I had a chance to learn
something new.

wana
 
W

wana

// I finally settled for the old C way of doing it of having a loop in a
// loop to compare each value of one array against each value of the
// other.

That's very inefficient. If both arrays contain 1000 elements, you
would be doing 1_000_000 comparisons.
Not if you break out of the inner loop when you find a match. This
will cut back considerably on the number of comparisons.

The solutions from the faq are (nearly) linear.

Yes, after you have the hash created. How long does it take to index
a newly created hash so you can look up those keys so quickly?
 
P

Peter Corlett

wana said:
Not if you break out of the inner loop when you find a match. This
will cut back considerably on the number of comparisons.

It won't cut back any comparisons if there is no match, and even if
there is a match, it's still highly inefficient.

I think you need to read up on algorithmic complexity and big-O
notation to know why such a linear search approach is bad.
 
E

Eric Bohlman

(e-mail address removed) (wana) wrote in
Not if you break out of the inner loop when you find a match. This
will cut back considerably on the number of comparisons.

It won't change the time complexity, though. The run time will still be
roughly proportional to N*M, whereas with the hash solution it will be
roughly proportional to N+M.

Breaking out of the inner loop on a match will cut the run time in half, on
average. But N*M/2 is still greater than N+M for N and M greater than 4.
Yes, after you have the hash created. How long does it take to index
a newly created hash so you can look up those keys so quickly?

Creating a hash is pretty close to linear, so you're still talking N+M.
 
W

wana

It won't cut back any comparisons if there is no match, and even if
there is a match, it's still highly inefficient.

I think you need to read up on algorithmic complexity and big-O
notation to know why such a linear search approach is bad.

I took your advice and did some reading from 'Mastering Algorithms in
C', the O'Reilly book with the seahorse on the cover. There is a
brief chapter on the subject in the beginning.

Of course you are right. My way is O(n^2) the right way is O(n). Any
recommendations on further reading?

I have gotten into the bad habit in my past programming in OO C++ of
encapsulating innefficient code into classes and methods with
indefinite plans to improve later. Of course, if something works and
is hidden, it is easy to forget about, even if it is wrong.

Sorry for my stupidity and thanks for all the help.

wana
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top