making easy things difficult: @a-@b

Discussion in 'Perl Misc' started by wana, Oct 1, 2004.

  1. wana

    wana Guest

    Suppose I want to create an array made up of all of the elements of @a
    except for those that are also contained in @b. For example:

    (apple, banana, peach, pear) - (apple, peach) = (banana, pear)

    I was hoping that @c = @a - @b; would work. This would really be
    making an easy task easy.

    Searching the faqs and docs, the solution seems to be a confusing (to
    me at least) use of either a hash or grep. I looked at the map
    function which seemed appropriate, but I couldn't think of an obvious
    way to do it.

    I finally settled for the old C way of doing it of having a loop in a
    loop to compare each value of one array against each value of the
    other.

    Is there a better way? Would it be wrong for Perl to make @a-@b
    legal?
     
    wana, Oct 1, 2004
    #1
    1. Advertising

  2. wana wrote:
    > Suppose I want to create an array made up of all of the elements of @a
    > except for those that are also contained in @b. For example:
    >
    > (apple, banana, peach, pear) - (apple, peach) = (banana, pear)
    >
    > I was hoping that @c = @a - @b; would work. This would really be
    > making an easy task easy.


    If it would do anything, I'd expect it to do vector subtraction:

    push @c, (shift(@a) - shift(@b)) while( @a or @b );

    But I am not a set theorist.

    > Searching the faqs and docs, the solution seems to be a confusing (to
    > me at least) use of either a hash or grep. I looked at the map
    > function which seemed appropriate, but I couldn't think of an obvious
    > way to do it.
    >
    > I finally settled for the old C way of doing it of having a loop in a
    > loop to compare each value of one array against each value of the
    > other.
    >
    > Is there a better way? Would it be wrong for Perl to make @a-@b
    > legal?


    In general yes, because I have a different idea of arrays than you ;)

    Have a look at Set::Array,
    http://search.cpan.org/~djberg/Set-Array-0.11/Array.pm
    It implements a - that you seek.

    Rhesa
     
    Rhesa Rozendaal, Oct 1, 2004
    #2
    1. Advertising

  3. wana wrote:
    > Suppose I want to create an array made up of all of the elements of @a
    > except for those that are also contained in @b. For example:
    >
    > (apple, banana, peach, pear) - (apple, peach) = (banana, pear)
    >
    > I was hoping that @c = @a - @b; would work. This would really be
    > making an easy task easy.


    Well, if @a = (apple, banana, peach, pear) and @b = (apple, peach) then I
    would expect @a - @b to be (0, 0, 0, 0) because the numerical because the
    numerical values of every singe element is 0 and 0 - 0 is 0.
    But what do I know about vector arithmetic...

    If a binary minus on arrays would have such a semantic as you indicated,
    then I wonder what you expect to happen if e.g. @a contains the value
    "apple" twice. Would both occurences be removed? Only the first? Only the
    last?

    If you don't want a vector but a set then don't use an array but a hash.

    > Searching the faqs and docs, the solution seems to be a confusing (to
    > me at least) use of either a hash or grep.


    You don't say _which_ FAQ you checked, but "perldoc -q intersection" seems
    to be pretty clear to me.

    > I looked at the map
    > function which seemed appropriate, but I couldn't think of an obvious
    > way to do it.


    > I finally settled for the old C way of doing it of having a loop in a
    > loop to compare each value of one array against each value of the
    > other. Is there a better way?


    Absolutely. See the FAQ.

    > Would it be wrong for Perl to make @a-@b legal?


    Wrong? No, why? But it probably wouldn't do what you seem to expect it to
    do.
    If there would ever be a binary minus on arrays it would more likely be used
    for vector substraction because there are other data structures, which are
    better suited for set operations than arrays.

    jue
     
    Jürgen Exner, Oct 1, 2004
    #3
  4. wana wrote:
    > Suppose I want to create an array made up of all of the elements of
    > @a except for those that are also contained in @b. For example:
    >
    > (apple, banana, peach, pear) - (apple, peach) = (banana, pear)
    >
    > I was hoping that @c = @a - @b; would work. This would really be
    > making an easy task easy.
    >
    > Searching the faqs and docs, the solution seems to be a confusing
    > (to me at least) use of either a hash or grep.


    Why not both?

    my %b = map { $_, 1 } @b;
    my @c = grep !$b{$_}, @a;

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Oct 1, 2004
    #4
  5. Also sprach wana:

    > Suppose I want to create an array made up of all of the elements of @a
    > except for those that are also contained in @b. For example:
    >
    > (apple, banana, peach, pear) - (apple, peach) = (banana, pear)
    >
    > I was hoping that @c = @a - @b; would work. This would really be
    > making an easy task easy.


    I'm not convinced that this is such a great idea.

    > Searching the faqs and docs, the solution seems to be a confusing (to
    > me at least) use of either a hash or grep. I looked at the map
    > function which seemed appropriate, but I couldn't think of an obvious
    > way to do it.


    It may not seem obvious why a hash is often the tool of choice when
    manipulating lists. However, a hash has two very useful properties. One
    is that you cannot have duplicate keys. This is really just a
    side-effect from the way hashes work internally. The other one is the
    fact that a look-up happens in constant time.

    For your problem, both features come in very handy. You'd first fill all
    the values of @b into a hash as keys:

    my %hash;
    @hash{ @b } = ();

    Now you use %hash as a filter. You walk through @a and only keep those
    values that are not in %hash. This is efficient because a key look-up
    takes next to no time:

    my @new = grep !exists $hash{$_}, @a;

    Note that the values associated with each key in the hash are irrelevant
    here.

    In a smilar fashion hashes can be used to filter out duplicates or
    calculate the intersection of two lists (in which case you'd just have
    to negate the grep-condition in the above statement).

    > I finally settled for the old C way of doing it of having a loop in a
    > loop to compare each value of one array against each value of the
    > other.


    You can do that, sure, but it becomes impractical for large lists.

    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
     
    Tassilo v. Parseval, Oct 2, 2004
    #5
  6. wana

    wana Guest

    "Tassilo v. Parseval" <> wrote in message news:<>...
    > Also sprach wana:
    >
    > > Suppose I want to create an array made up of all of the elements of @a
    > > except for those that are also contained in @b. For example:
    > >
    > > (apple, banana, peach, pear) - (apple, peach) = (banana, pear)
    > >
    > > I was hoping that @c = @a - @b; would work. This would really be
    > > making an easy task easy.

    >
    > I'm not convinced that this is such a great idea.
    >
    > > Searching the faqs and docs, the solution seems to be a confusing (to
    > > me at least) use of either a hash or grep. I looked at the map
    > > function which seemed appropriate, but I couldn't think of an obvious
    > > way to do it.

    >
    > It may not seem obvious why a hash is often the tool of choice when
    > manipulating lists. However, a hash has two very useful properties. One
    > is that you cannot have duplicate keys. This is really just a
    > side-effect from the way hashes work internally. The other one is the
    > fact that a look-up happens in constant time.
    >
    > For your problem, both features come in very handy. You'd first fill all
    > the values of @b into a hash as keys:
    >
    > my %hash;
    > @hash{ @b } = ();
    >
    > Now you use %hash as a filter. You walk through @a and only keep those
    > values that are not in %hash. This is efficient because a key look-up
    > takes next to no time:
    >
    > my @new = grep !exists $hash{$_}, @a;
    >
    > Note that the values associated with each key in the hash are irrelevant
    > here.
    >
    > In a smilar fashion hashes can be used to filter out duplicates or
    > calculate the intersection of two lists (in which case you'd just have
    > to negate the grep-condition in the above statement).
    >
    > > I finally settled for the old C way of doing it of having a loop in a
    > > loop to compare each value of one array against each value of the
    > > other.

    >
    > You can do that, sure, but it becomes impractical for large lists.
    >
    > Tassilo



    Thanks! I guess I needed to sleep on it and let it sink in. I woke
    up this morning and my first thought was "the key is the key! you just
    have to ask if it exists or not." I ran to my computer to look at
    everyones examples to see if I was right. I found your beautiful
    explanation waiting for me. Maybe what I am asking for is more in
    line with the PHP philosophy, which is to provide a simple built in
    function for everything so people don't have to think too much. Now I
    am glad that there was no built in function so I had a chance to learn
    something new.

    wana
     
    wana, Oct 2, 2004
    #6
  7. wana

    wana Guest

    > // I finally settled for the old C way of doing it of having a loop in a
    > // loop to compare each value of one array against each value of the
    > // other.
    >
    > That's very inefficient. If both arrays contain 1000 elements, you
    > would be doing 1_000_000 comparisons.
    >

    Not if you break out of the inner loop when you find a match. This
    will cut back considerably on the number of comparisons.


    > The solutions from the faq are (nearly) linear.
    >


    Yes, after you have the hash created. How long does it take to index
    a newly created hash so you can look up those keys so quickly?
     
    wana, Oct 2, 2004
    #7
  8. wana <> wrote:
    [...]
    > Not if you break out of the inner loop when you find a match. This
    > will cut back considerably on the number of comparisons.


    It won't cut back any comparisons if there is no match, and even if
    there is a match, it's still highly inefficient.

    I think you need to read up on algorithmic complexity and big-O
    notation to know why such a linear search approach is bad.

    --
    There are three reasons for becoming a writer: the first is that you need the
    money; the second that you have something to say that you think the world
    should know; the third is that you can't think what to do with the long winter
    evenings. - Quentin Crisp
     
    Peter Corlett, Oct 2, 2004
    #8
  9. wana

    Eric Bohlman Guest

    (wana) wrote in
    news::

    >> // I finally settled for the old C way of doing it of having a loop
    >> in a // loop to compare each value of one array against each value
    >> of the // other.
    >>
    >> That's very inefficient. If both arrays contain 1000 elements, you
    >> would be doing 1_000_000 comparisons.
    >>

    > Not if you break out of the inner loop when you find a match. This
    > will cut back considerably on the number of comparisons.


    It won't change the time complexity, though. The run time will still be
    roughly proportional to N*M, whereas with the hash solution it will be
    roughly proportional to N+M.

    Breaking out of the inner loop on a match will cut the run time in half, on
    average. But N*M/2 is still greater than N+M for N and M greater than 4.

    >> The solutions from the faq are (nearly) linear.
    >>

    >
    > Yes, after you have the hash created. How long does it take to index
    > a newly created hash so you can look up those keys so quickly?


    Creating a hash is pretty close to linear, so you're still talking N+M.
     
    Eric Bohlman, Oct 2, 2004
    #9
  10. wana

    wana Guest

    (Peter Corlett) wrote in message news:<415ede2f$0$94921$>...
    > wana <> wrote:
    > [...]
    > > Not if you break out of the inner loop when you find a match. This
    > > will cut back considerably on the number of comparisons.

    >
    > It won't cut back any comparisons if there is no match, and even if
    > there is a match, it's still highly inefficient.
    >
    > I think you need to read up on algorithmic complexity and big-O
    > notation to know why such a linear search approach is bad.


    I took your advice and did some reading from 'Mastering Algorithms in
    C', the O'Reilly book with the seahorse on the cover. There is a
    brief chapter on the subject in the beginning.

    Of course you are right. My way is O(n^2) the right way is O(n). Any
    recommendations on further reading?

    I have gotten into the bad habit in my past programming in OO C++ of
    encapsulating innefficient code into classes and methods with
    indefinite plans to improve later. Of course, if something works and
    is hidden, it is easy to forget about, even if it is wrong.

    Sorry for my stupidity and thanks for all the help.

    wana
     
    wana, Oct 3, 2004
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. asj
    Replies:
    0
    Views:
    371
  2. shadow427
    Replies:
    19
    Views:
    7,378
    Larry Coon
    Nov 18, 2005
  3. =?Utf-8?B?V2lsbGlhbSBTdWxsaXZhbg==?=

    vs2005 publish website doing bad things, bad things

    =?Utf-8?B?V2lsbGlhbSBTdWxsaXZhbg==?=, Oct 25, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    616
    =?Utf-8?B?UGV0ZXIgQnJvbWJlcmcgW0MjIE1WUF0=?=
    Oct 25, 2006
  4. Nick Keighley
    Replies:
    5
    Views:
    303
    Andre Kostur
    Aug 4, 2005
  5. pillbug
    Replies:
    1
    Views:
    547
    Alf P. Steinbach
    May 6, 2006
Loading...

Share This Page