Effective perl function to remove one element from array?

Discussion in 'Perl Misc' started by josh, May 30, 2004.

  1. josh

    josh Guest

    Hi all:

    I am trying to find out the most efficient way to remove an element
    from an array, and have the array size shrink by one.

    pop, push, and splice won't work too well for me, I am trying to
    remove an element that could be in any position.

    Here is my current implementation (may not actually compile, just a
    example)

    <psuedo perl code>
    # this array is purposely unsorted
    @array = ['bob', 'alice', 'david', 'christy', 'edward', 'henry',
    'frank'];
    @array = removeFromArray ( @array, "edward" );

    sub removeFromArray {
    @array = $_[0];
    $name = $_[1];

    for ( $i = 0; $i < @$array; $i++ ) {
    if ( $array[$i] eq $name ) {
    unset $array[$i];
    }
    }
    }
    </psuedo perl code>

    As you can see, this is not exactly the most efficient way of removing
    an element as the size of my array grows. And especially when I run
    this in a nested loop, say, when I want to compare two arrays and
    remove the differences, it will result in a close to BigO(n^2)
    efficiency.

    Is there a faster, more efficient way to remove an element from an
    array (and preferrably not BigO(n) )?

    Will it help if I am performinig this on an already sorted array? Then
    I can perhaps use some sort of binary search function to find the item
    I am looking for?
     
    josh, May 30, 2004
    #1
    1. Advertising

  2. josh

    Matt Garrish Guest

    "josh" <> wrote in message
    news:...
    > Hi all:
    >
    > I am trying to find out the most efficient way to remove an element
    > from an array, and have the array size shrink by one.
    >
    > pop, push, and splice won't work too well for me, I am trying to
    > remove an element that could be in any position.
    >
    > Here is my current implementation (may not actually compile, just a
    > example)
    >
    > <psuedo perl code>
    > # this array is purposely unsorted
    > @array = ['bob', 'alice', 'david', 'christy', 'edward', 'henry',
    > 'frank'];
    > @array = removeFromArray ( @array, "edward" );
    >
    > sub removeFromArray {
    > @array = $_[0];
    > $name = $_[1];
    >
    > for ( $i = 0; $i < @$array; $i++ ) {
    > if ( $array[$i] eq $name ) {
    > unset $array[$i];
    > }
    > }
    > }
    > </psuedo perl code>
    >
    > As you can see, this is not exactly the most efficient way of removing
    > an element as the size of my array grows.


    Which begs the question, why use an array when a hash would work better?

    my %hash = (bob => 1, alice => 1, david => 1, christy =>1, edward => 1,
    henry => 1);

    delete $hash{'edward'};

    It would also make removing duplicates simple because you could just assign
    all the keys from your existing hashes to a new hash. And sorting would be
    faster, too.

    Matt
     
    Matt Garrish, May 30, 2004
    #2
    1. Advertising

  3. josh wrote:
    > I am trying to find out the most efficient way to remove an element
    > from an array, and have the array size shrink by one.
    >
    > pop, push, and splice won't work too well for me, I am trying to
    > remove an element that could be in any position.


    I would doubt that you can hand code anything that is faster than the
    build-in function splice().If there were anything faster, then chances are
    that algorithm would have been used to implement splice() in the first
    place.

    > Here is my current implementation (may not actually compile, just a
    > example)
    >
    > <psuedo perl code>
    > # this array is purposely unsorted
    > @array = ['bob', 'alice', 'david', 'christy', 'edward', 'henry',
    > 'frank'];
    > @array = removeFromArray ( @array, "edward" );
    >
    > sub removeFromArray {
    > @array = $_[0];
    > $name = $_[1];
    >
    > for ( $i = 0; $i < @$array; $i++ ) {
    > if ( $array[$i] eq $name ) {
    > unset $array[$i];
    > }
    > }
    > }
    > </psuedo perl code>
    >
    > As you can see, this is not exactly the most efficient way of removing
    > an element as the size of my array grows. And especially when I run
    > this in a nested loop, say, when I want to compare two arrays and
    > remove the differences, it will result in a close to BigO(n^2)
    > efficiency.
    > Is there a faster, more efficient way to remove an element from an
    > array (and preferrably not BigO(n) )?


    This code screams for a better data structure. Why aren't you using a hash?
    Then all you need to do is
    sub removeFromHash {
    delete $hash{$_[0]};
    }
    I would guess this should be somewhere near O(log n) (access to hash table
    is not quite linear).

    > Will it help if I am performinig this on an already sorted array? Then
    > I can perhaps use some sort of binary search function to find the item
    > I am looking for?


    Well, sure, but why? I thought your problem was deleting, not finding. A
    sorted array won't help you with deleting an element.
    Use a proper data structure and your problem becomes trivial.

    jue
     
    Jürgen Exner, May 30, 2004
    #3
  4. josh

    John Bokma Guest

    John Bokma, May 30, 2004
    #4
  5. John Bokma wrote:
    > Jürgen Exner wrote:
    >
    >> I would guess this should be somewhere near O(log n) (access to hash
    >> table is not quite linear).


    Ooopps, that should have been "not quite constant"

    > Access to a hash table is normally O(1). That's the whole idea behind
    > a hash :-D.


    True, but only as long as the hash is sparsely populated.
    As the hash begins to fill up you will get hash conflicts and then you are
    loosing O(1).

    jue
     
    Jürgen Exner, May 30, 2004
    #5
  6. josh

    Uri Guttman Guest

    >>>>> "JE" == Jürgen Exner <> writes:

    JE> John Bokma wrote:
    >> Jürgen Exner wrote:
    >>
    >>> I would guess this should be somewhere near O(log n) (access to hash
    >>> table is not quite linear).


    JE> Ooopps, that should have been "not quite constant"

    >> Access to a hash table is normally O(1). That's the whole idea behind
    >> a hash :-D.


    JE> True, but only as long as the hash is sparsely populated.
    JE> As the hash begins to fill up you will get hash conflicts and then you are
    JE> loosing O(1).

    and perl will grow the hash as needed to keep it efficient. one of the
    nice little behind the scenes things. true it is not exactly O(1) but
    you can assume that for common uses. the real killer is when it grows so
    large that you have to swap :). then you might as well use a real db
    which will be more efficient since they are designed to handle this
    whereas an in ram hash is not.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
     
    Uri Guttman, May 30, 2004
    #6
  7. josh

    John Bokma Guest

    Jürgen Exner wrote:

    > John Bokma wrote:
    >
    >>Jürgen Exner wrote:
    >>
    >>
    >>>I would guess this should be somewhere near O(log n) (access to hash
    >>>table is not quite linear).

    >
    > Ooopps, that should have been "not quite constant"
    >
    >>Access to a hash table is normally O(1). That's the whole idea behind
    >>a hash :-D.

    >
    > True, but only as long as the hash is sparsely populated.
    > As the hash begins to fill up you will get hash conflicts and then you are
    > loosing O(1).


    Ok, you get O(k), which is still O(1) as long as k << n, which is
    hopefully the case :-D.

    --
    John MexIT: http://johnbokma.com/mexit/
    personal page: http://johnbokma.com/
    Experienced Perl programmer available: http://castleamber.com/
     
    John Bokma, May 30, 2004
    #7
  8. josh

    John Bokma Guest

    Uri Guttman wrote:

    > and perl will grow the hash as needed to keep it efficient. one of the
    > nice little behind the scenes things. true it is not exactly O(1) but


    The fun part behind the big O notation is that O(k) for k << n, is still
    O(1). So "not exactly O(1)" does not exists.

    --
    John MexIT: http://johnbokma.com/mexit/
    personal page: http://johnbokma.com/
    Experienced Perl programmer available: http://castleamber.com/
     
    John Bokma, May 30, 2004
    #8
  9. Also sprach Jürgen Exner:

    > josh wrote:


    >> Is there a faster, more efficient way to remove an element from an
    >> array (and preferrably not BigO(n) )?

    >
    > This code screams for a better data structure. Why aren't you using a hash?
    > Then all you need to do is
    > sub removeFromHash {
    > delete $hash{$_[0]};
    > }
    > I would guess this should be somewhere near O(log n) (access to hash table
    > is not quite linear).


    Hash-access is O(n) in the absolutely worst-case (for open hashing where
    linear lists are used). However, the average case can be considered
    constant, too, which has to do with the fact that hashes are resized
    (normally their amount of buckets is doubled). So you have a complexity
    of O(1 + a/2), where 'a' is the average list-length with a = m/n. 'm' is
    the number of keys whereas 'n' is the number of buckets. In case of
    Perl, some empirical tests suggest that 'a' is between 0.35 and 0.7.
    When treating it as a random variable, I am quite sure that it will have
    an expected value of 0.5. The standard deviation, I would guess, is then
    probably around 0.1 or even less.

    It is therefore reasonable to assume that access to Perl hashes is
    constant in all but the contrived cases.

    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
     
    Tassilo v. Parseval, May 30, 2004
    #9
  10. Also sprach John Bokma:

    > Jürgen Exner wrote:


    >> True, but only as long as the hash is sparsely populated.
    >> As the hash begins to fill up you will get hash conflicts and then you are
    >> loosing O(1).

    >
    > Ok, you get O(k), which is still O(1) as long as k << n, which is
    > hopefully the case :-D.


    Why would that be the case? If you have O(ln(n)) and set n to 10^16,
    then k is around 37. That should qualify as k << n but still, it is not
    constant. It can't be constant as long as it is parametrized with n and
    the function is monotonously increasing with n (log is such a function).

    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
     
    Tassilo v. Parseval, May 30, 2004
    #10
  11. josh

    Anno Siegel Guest

    Jürgen Exner <> wrote in comp.lang.perl.misc:
    > josh wrote:
    > > I am trying to find out the most efficient way to remove an element
    > > from an array, and have the array size shrink by one.
    > >
    > > pop, push, and splice won't work too well for me, I am trying to
    > > remove an element that could be in any position.

    >
    > I would doubt that you can hand code anything that is faster than the
    > build-in function splice().If there were anything faster, then chances are
    > that algorithm would have been used to implement splice() in the first
    > place.
    >
    > > Here is my current implementation (may not actually compile, just a
    > > example)
    > >
    > > <psuedo perl code>
    > > # this array is purposely unsorted
    > > @array = ['bob', 'alice', 'david', 'christy', 'edward', 'henry',
    > > 'frank'];
    > > @array = removeFromArray ( @array, "edward" );
    > >
    > > sub removeFromArray {
    > > @array = $_[0];
    > > $name = $_[1];
    > >
    > > for ( $i = 0; $i < @$array; $i++ ) {
    > > if ( $array[$i] eq $name ) {
    > > unset $array[$i];
    > > }
    > > }
    > > }
    > > </psuedo perl code>
    > >
    > > As you can see, this is not exactly the most efficient way of removing
    > > an element as the size of my array grows. And especially when I run
    > > this in a nested loop, say, when I want to compare two arrays and
    > > remove the differences, it will result in a close to BigO(n^2)
    > > efficiency.
    > > Is there a faster, more efficient way to remove an element from an
    > > array (and preferrably not BigO(n) )?

    >
    > This code screams for a better data structure. Why aren't you using a hash?
    > Then all you need to do is
    > sub removeFromHash {
    > delete $hash{$_[0]};
    > }


    The OP's pseudo-code (heavy on the pseudo, light on the code) looks like
    an attempt to re-implement grep(). In real (but untested) code:

    sub remove_from_array {
    my ( $array, $name) = @_;
    @$array = grep $_ ne $name, @$array;
    $array; # return ref to changed array
    }

    Depending on what the OP means with "unset $array[ $i]", the grep
    line could also be

    @$array = map { $_ eq $name ? undef : $_ } @$array;

    It is probably true that a hash is a better data structure for the
    problem -- "remove the differences" is a well-known keyword -- but
    "grep" seems to be what OP was aiming at.

    Anno
     
    Anno Siegel, May 30, 2004
    #11
  12. josh

    Uri Guttman Guest

    >>>>> "JB" == John Bokma <> writes:

    JB> Uri Guttman wrote:
    >> and perl will grow the hash as needed to keep it efficient. one of the
    >> nice little behind the scenes things. true it is not exactly O(1) but


    JB> The fun part behind the big O notation is that O(k) for k << n, is
    JB> still O(1). So "not exactly O(1)" does not exists.

    well, it does in a way. extending hashes and handling dense ones will
    cause the behaviour to be mostly O(1) with a touch of O(N) in it. you
    have to factor in the cost of extending the hash. so it is some other
    O() function thing which is between 1 and N. i am not about to calculate
    it. it is not constant as the size of the hash does affect things. but
    the effect is not just linear with N. but saying it is close to O(1) is
    fine for practical uses.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
     
    Uri Guttman, May 30, 2004
    #12
  13. josh wrote:
    >
    > I am trying to find out the most efficient way to remove an element
    > from an array, and have the array size shrink by one.
    >
    > pop, push, and splice won't work too well for me, I am trying to
    > remove an element that could be in any position.
    >
    > Here is my current implementation (may not actually compile, just a
    > example)
    >
    > <psuedo perl code>
    > # this array is purposely unsorted
    > @array = ['bob', 'alice', 'david', 'christy', 'edward', 'henry',
    > 'frank'];
    > @array = removeFromArray ( @array, "edward" );
    >
    > sub removeFromArray {
    > @array = $_[0];
    > $name = $_[1];
    >
    > for ( $i = 0; $i < @$array; $i++ ) {
    > if ( $array[$i] eq $name ) {
    > unset $array[$i];
    > }
    > }
    > }
    > </psuedo perl code>
    >
    > As you can see, this is not exactly the most efficient way of removing
    > an element as the size of my array grows. And especially when I run
    > this in a nested loop, say, when I want to compare two arrays and
    > remove the differences, it will result in a close to BigO(n^2)
    > efficiency.
    >
    > Is there a faster, more efficient way to remove an element from an
    > array (and preferrably not BigO(n) )?
    >
    > Will it help if I am performinig this on an already sorted array? Then
    > I can perhaps use some sort of binary search function to find the item
    > I am looking for?


    This will work if there is only one of $name in @array:

    for my $i ( 0 .. $#array ) {
    if ( $array[ $i ] eq $name ) {
    splice @array, $i, 1;
    last;
    }
    }

    However, if you have multiple $name entries in @array then you need this
    instead:

    for my $i ( reverse 0 .. $#array ) {
    if ( $array[ $i ] eq $name ) {
    splice @array, $i, 1;
    }
    }


    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, May 30, 2004
    #13
  14. josh

    John Bokma Guest

    Tassilo v. Parseval wrote:

    > Also sprach John Bokma:
    >
    >>Jürgen Exner wrote:

    >
    >
    >>>True, but only as long as the hash is sparsely populated.
    >>>As the hash begins to fill up you will get hash conflicts and then you are
    >>>loosing O(1).

    >>
    >>Ok, you get O(k), which is still O(1) as long as k << n, which is
    >>hopefully the case :-D.

    >
    > Why would that be the case?


    In the above context it is. k is some constant, not related to n, so k
    is not f(n), otherwise I would have written that.

    --
    John MexIT: http://johnbokma.com/mexit/
    personal page: http://johnbokma.com/
    Experienced Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Jun 6, 2004
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    2,538
  2. Leon Pollak
    Replies:
    0
    Views:
    404
    Leon Pollak
    Mar 17, 2009
  3. Xah Lee

    whereabouts of Effective Perl

    Xah Lee, Sep 21, 2005, in forum: Perl Misc
    Replies:
    1
    Views:
    103
    Ian Wilson
    Sep 21, 2005
  4. Rita
    Replies:
    14
    Views:
    165
    Jahagirdar Vijayvithal S
    Dec 5, 2005
  5. Owen_Townsend
    Replies:
    5
    Views:
    172
    Stephan Titard
    Jul 10, 2006
Loading...

Share This Page