Delete elements in array, break, and keep changes?

Discussion in 'Ruby' started by Joe Buck, Dec 31, 2009.

  1. Joe Buck

    Joe Buck Guest

    I have an array of a lot elements that I need to cluster (they are
    latitudes and longitudes).

    As I iterate the array I want to remove elements that I have already
    clustered.

    Currently I've tried using Array::reject! and Array::delete_if. It
    works. I can further optimize by breaking from the delete_if/reject!
    block if certiain conditions are met.

    Here is the issue, if I call break in the block, all of the changes to
    the array are lost.

    Is there a way to break from the block but still delete/reject elements
    in the array?
    --
    Posted via http://www.ruby-forum.com/.
    Joe Buck, Dec 31, 2009
    #1
    1. Advertising


  2. > I have an array of a lot elements that I need to cluster (they are
    > latitudes and longitudes).
    >=20
    > As I iterate the array I want to remove elements that I have already
    > clustered.
    >=20


    In this case=2C it looks like Array#shift might work=2C though I'm not sure=
    I totally understand your requirements. Something like:

    until long_lat_array.empty?
    long_lat =3D long_lat_array.shift
    cluster(long_lat)
    end

    Hope I'm not missing your issue.

    - Ehsan
    =20
    _________________________________________________________________
    Hotmail: Powerful Free email with security by Microsoft.
    http://clk.atdmt.com/GBL/go/171222986/direct/01/=
    Ehsanul Hoque, Jan 1, 2010
    #2
    1. Advertising

  3. Joe Buck

    Joe Buck Guest

    Ehsan,

    Thanks for the help. Unfortunately I don't think that will work as it
    removes the first element in the array.

    What I need is an Array::delete_if or Array::reject! call that allows me
    to break from the block while still removing elements.

    My solution is to use a second array to keep the indices of all removed
    elements. I can then iterate through the second array after the first
    loop and remove the elements. I'm guessing reject! and delete_if do this
    as well. It could be more efficient but it works.

    Thanks!
    --
    Posted via http://www.ruby-forum.com/.
    Joe Buck, Jan 1, 2010
    #3
  4. On 01.01.2010 17:45, Joe Buck wrote:
    > Ehsan,
    >
    > Thanks for the help. Unfortunately I don't think that will work as it
    > removes the first element in the array.


    def copy_and_shift_array array
    temp_array = array # or perform a deep copy, if you are worried about
    # about the first array's integrity.
    temp_array.each do |element|
    return element if condition_met?
    end
    end

    Array#shift is useful here *because* it gives you the first Array
    element in an easily accessible format (the actual Object in the Array),
    so that you can perform your checks and other magic, before storing it
    someplace else.

    Note: Above code is untested, but should work, at least in principle.

    (I'm always a foggy when it comes to blocks and variable assignment in
    Ruby, despite many ruby-talk discussions about scope in Ruby).

    --
    Phillip Gawlowski
    Phillip Gawlowski, Jan 1, 2010
    #4
  5. Joe Buck

    pharrington Guest

    On Jan 1, 11:45 am, Joe Buck <> wrote:
    > Ehsan,
    >
    > Thanks for the help. Unfortunately I don't think that will work as it
    > removes the first element in the array.
    >
    > What I need is an Array::delete_if or Array::reject! call that allows me
    > to break from the block while still removing elements.
    >
    > My solution is to use a second array to keep the indices of all removed
    > elements. I can then iterate through the second array after the first
    > loop and remove the elements. I'm guessing reject! and delete_if do this
    > as well. It could be more efficient but it works.
    >
    > Thanks!
    > --
    > Posted viahttp://www.ruby-forum.com/.


    Without knowing the point of what you're trying to do, this sounds
    like what you have in mind:

    array = [] # or whatev
    process_array = true
    array.reject!(x) do
    if process_array
    conditional_code_for x
    else
    false
    end
    pharrington, Jan 1, 2010
    #5
  6. Joe Buck

    pharrington Guest

    On Jan 1, 2:18 pm, pharrington <> wrote:
    > On Jan 1, 11:45 am, Joe Buck <> wrote:
    >
    >
    >
    >
    >
    > > Ehsan,

    >
    > > Thanks for the help. Unfortunately I don't think that will work as it
    > > removes the first element in the array.

    >
    > > What I need is an Array::delete_if or Array::reject! call that allows me
    > > to break from the block while still removing elements.

    >
    > > My solution is to use a second array to keep the indices of all removed
    > > elements. I can then iterate through the second array after the first
    > > loop and remove the elements. I'm guessing reject! and delete_if do this
    > > as well. It could be more efficient but it works.

    >
    > > Thanks!
    > > --
    > > Posted viahttp://www.ruby-forum.com/.

    >
    > Without knowing the point of what you're trying to do, this sounds
    > like what you have in mind:
    >
    > array = [] # or whatev
    > process_array = true
    > array.reject!(x) do
    >   if process_array
    >     conditional_code_for x
    >   else
    >     false
    > end


    Also, is the reject! really the bottleneck in your app? We obviously
    don't want to intentionally write sub-optimal code, but do not
    optimize what's not slowing you down.
    pharrington, Jan 1, 2010
    #6
  7. Joe Buck

    pharrington Guest

    On Jan 1, 2:19 pm, pharrington <> wrote:
    > On Jan 1, 2:18 pm, pharrington <> wrote:
    >
    >
    >
    >
    >
    > > On Jan 1, 11:45 am, Joe Buck <> wrote:

    >
    > > > Ehsan,

    >
    > > > Thanks for the help. Unfortunately I don't think that will work as it
    > > > removes the first element in the array.

    >
    > > > What I need is an Array::delete_if or Array::reject! call that allowsme
    > > > to break from the block while still removing elements.

    >
    > > > My solution is to use a second array to keep the indices of all removed
    > > > elements. I can then iterate through the second array after the first
    > > > loop and remove the elements. I'm guessing reject! and delete_if do this
    > > > as well. It could be more efficient but it works.

    >
    > > > Thanks!
    > > > --
    > > > Posted viahttp://www.ruby-forum.com/.

    >
    > > Without knowing the point of what you're trying to do, this sounds
    > > like what you have in mind:

    >
    > > array = [] # or whatev
    > > process_array = true
    > > array.reject!(x) do
    > >   if process_array
    > >     conditional_code_for x
    > >   else
    > >     false
    > > end

    >
    > Also, is the reject! really the bottleneck in your app? We obviously
    > don't want to intentionally write sub-optimal code, but do not
    > optimize what's not slowing you down.


    wow array.reject! do |x| i don't know basic ruby syntax today...

    Or if you really want to avoid even iterating over each element:

    original_array = []
    new_array = []

    original_array.each {|x| new_array << x unless conditional_code_for
    (x)}

    But again, what precisely are you doing? Is the deletion code really
    your bottleneck? And if it is, I'm almost certain that using a more
    appropriate structure than an array would be the way to go.
    pharrington, Jan 1, 2010
    #7
  8. Joe Buck

    Joe Buck Guest

    Alright, I won't hide you from the details. I have a map (google maps
    API) with data points (100,000+).

    I'm doing server side clustering of the points to limit the amount of
    data I send to the client (I'm using the Ruby On Rails framework).

    I start out with an array of 100,000 spots. Each spot is a hash with a
    :lat and :lng. I need to iterate through the array and group the spots
    into clusters. I break the world up into 30/30 degree clusters, so I
    have 72 clusters to put the spots in.

    The spots are ordered by longitude.

    Here is the code (do you use code snippet tags here?). Note that
    'listings' is the array of spots ordered by longitude.

    #---------------------------------------------------------------------
    for i in (0..72)

    # Move to the next block.
    bNextRow = (i%nRows == 0 && i!=0)
    cLatLngBox.topRightLat = bNextRow ? 90 : cLatLngBox.topRightLat-size
    cLatLngBox.topRightLng = bNextRow ? (cLatLngBox.topRightLng+size) :
    cLatLngBox.topRightLng
    cLatLngBox.botLeftLat = bNextRow ? 90-size :
    (cLatLngBox.botLeftLat-size)
    cLatLngBox.botLeftLng = bNextRow ? (cLatLngBox.botLeftLng+size) :
    cLatLngBox.botLeftLng

    # Holds the index of each spot we added to a cluster. We will remove
    these
    # from the listing when done.
    cRemove = []

    # Our new cluster.
    cluster = {:lat=>0, :lng=>0, :size=>0}

    # Iterate through all the spots, adding them to clusters.
    listings.each_with_index do |listing, h|
    if cLatLngBox.is_in_box?(listing[:lat], listing[:lng])

    # Add to the cluster.
    cluster[:size]+=1
    cluster[:lat]+=listing[:lat].to_f
    cluster[:lng]+=listing[:lng].to_f

    # Tag the spot for removal.
    cRemove << h
    else
    # We cheat here. Because the spots are ordered by longitude
    (-180 to 180) we'll check
    # if the spot longitude exceeds our current max. If it does
    we're done with this loop.
    if listing[:lng].to_f > cLatLngBox.topRightLng
    break
    end
    end
    end

    # Remove all the items that were clustered.
    cRemove.each do |index|
    listings.delete_at(index)
    end
    #---------------------------------------------------------------------

    This is faster than not removing the items. But I'd like to remove them
    while I'm in the above loop, but still have the ability to break.
    --
    Posted via http://www.ruby-forum.com/.
    Joe Buck, Jan 1, 2010
    #8
  9. Joe Buck

    Joe Buck Guest

    Sorry for the terrible formatting. As another example, here is C++ code
    that would do what I want using iterators.


    typedef std::vector<cSpot> spots_t;
    spots_t cSpots;

    // Fill cSpots here ...

    CLatLngBox cLatLngBox;

    for(int i=0; i<72; i++)
    {
    // Move cLatLngBox to the next block ..

    // Our new cluster.
    CCluster cCluster;

    // Iterate points.
    spots_t::iterator itr = cSpots.begin ();
    for(itr; itr!=cSpots.end();)
    {
    if(cLatLngBox.is_in_box(*itr))
    {
    // Add to cluster.

    // This is where we remove the spot.
    itr = cSpots.erase(itr);
    }
    else
    {
    if((*itr).lng > cLatLngBox.lng)
    {
    // Done with the inner loop.
    break;
    }

    // Next spot.
    itr++;
    }
    }
    }
    --
    Posted via http://www.ruby-forum.com/.
    Joe Buck, Jan 1, 2010
    #9
  10. Array#delete_if appears to be completely borked when used with 'break'.

    >> a = [5,6,7,8,9,10]

    => [5, 6, 7, 8, 9, 10]
    >> a.delete_if { |x| break if x > 8; x < 7 }

    => nil
    >> a

    => [7, 8, 7, 8, 9, 10]
    >>


    I have opened a ticket:
    http://redmine.ruby-lang.org/issues/show/2545

    In the mean time you could do something like this:

    class Array
    def safe_delete_if
    i = 0
    while i < length
    if yield self
    slice!(i,1)
    else
    i += 1
    end
    end
    end
    end
    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Jan 1, 2010
    #10
  11. On 01/01/2010 09:00 PM, Joe Buck wrote:
    > Alright, I won't hide you from the details. I have a map (google maps
    > API) with data points (100,000+).
    >
    > I'm doing server side clustering of the points to limit the amount of
    > data I send to the client (I'm using the Ruby On Rails framework).
    >
    > I start out with an array of 100,000 spots. Each spot is a hash with a
    > :lat and :lng. I need to iterate through the array and group the spots
    > into clusters. I break the world up into 30/30 degree clusters, so I
    > have 72 clusters to put the spots in.
    >
    > The spots are ordered by longitude.
    >
    > Here is the code (do you use code snippet tags here?). Note that
    > 'listings' is the array of spots ordered by longitude.
    >
    > #---------------------------------------------------------------------
    > for i in (0..72)
    >
    > # Move to the next block.
    > bNextRow = (i%nRows == 0 && i!=0)
    > cLatLngBox.topRightLat = bNextRow ? 90 : cLatLngBox.topRightLat-size
    > cLatLngBox.topRightLng = bNextRow ? (cLatLngBox.topRightLng+size) :
    > cLatLngBox.topRightLng
    > cLatLngBox.botLeftLat = bNextRow ? 90-size :
    > (cLatLngBox.botLeftLat-size)
    > cLatLngBox.botLeftLng = bNextRow ? (cLatLngBox.botLeftLng+size) :
    > cLatLngBox.botLeftLng
    >
    > # Holds the index of each spot we added to a cluster. We will remove
    > these
    > # from the listing when done.
    > cRemove = []
    >
    > # Our new cluster.
    > cluster = {:lat=>0, :lng=>0, :size=>0}
    >
    > # Iterate through all the spots, adding them to clusters.
    > listings.each_with_index do |listing, h|
    > if cLatLngBox.is_in_box?(listing[:lat], listing[:lng])
    >
    > # Add to the cluster.
    > cluster[:size]+=1
    > cluster[:lat]+=listing[:lat].to_f
    > cluster[:lng]+=listing[:lng].to_f
    >
    > # Tag the spot for removal.
    > cRemove << h
    > else
    > # We cheat here. Because the spots are ordered by longitude
    > (-180 to 180) we'll check
    > # if the spot longitude exceeds our current max. If it does
    > we're done with this loop.
    > if listing[:lng].to_f > cLatLngBox.topRightLng
    > break
    > end
    > end
    > end
    >
    > # Remove all the items that were clustered.
    > cRemove.each do |index|
    > listings.delete_at(index)
    > end
    > #---------------------------------------------------------------------
    >
    > This is faster than not removing the items. But I'd like to remove them
    > while I'm in the above loop, but still have the ability to break.


    I am not sure I fully understand your requirement (partly because the
    code seems incomplete, e.g. where does "size" come from?) but I believe
    you might have the wrong iteration approach: as far as I understand you
    have a large array and *every* element in that array falls into one of
    72 cluster spots. Since the array is large you want to traverse it only
    once. So the natural thing would be to traverse the array, put every
    element into its corresponding spot and then use the clustered
    information. You could use #group_by for this when on 1.8.7 or newer.

    robert@fussel:~$ ruby1.8 -v
    ruby 1.8.7 (2009-06-12 patchlevel 174) [i486-linux]
    robert@fussel:~$ ruby1.8 -e 'p (1..10).group_by{|x|x%3}'
    {0=>[3, 6, 9], 1=>[1, 4, 7, 10], 2=>[2, 5, 8]}
    robert@fussel:~$

    So I would end up doing something like this:

    Spot = Struct.new :lat, :lng do
    include Comparable

    def <=>(spot)
    to_a <=> spot.to_a
    end

    def get_box(precision)
    # place realistic calculation here
    Spot.new lat.round(-1), lng.round(-1)
    end
    end

    listings = ... # filled with Spot instances

    clustered = listings.group_by do |spot|
    spot.get_box
    end

    # a variant might be:
    ClusterSpot = Struct.new :lat, :lng, :count do
    def initialize
    self.lat = self.lng = self.count = 0
    end

    def update(spot)
    c = (count + 1).to_f
    self.lat = ((lat * count) + spot.lat) / c
    self.lng = ((lng * count) + spot.lng) / c
    self.count += 1
    end
    end

    clustered = Hash.new do |h,k|
    # add something to the Hash whenever the key shows
    # up the first time
    h[k] = ClusterSpot.new
    end

    listings.each do |spot|
    clustered[spot.get_box].update(spot)
    end


    # output clustered information
    clustered.keys.sort.each do |cl_spot|
    printf "%6d entries for %p\n", clustered[cl_spot].size, cl_spot
    end

    Of course you can pull this off without defining additional classes but
    with specialized classes the code becomes clearer and more reusable IMHO.

    Kind regards

    robert


    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Jan 2, 2010
    #11
  12. On 01/02/2010 12:28 PM, Robert Klemme wrote:
    > On 01/01/2010 09:00 PM, Joe Buck wrote:
    >> Alright, I won't hide you from the details. I have a map (google maps
    >> API) with data points (100,000+).
    >>
    >> I'm doing server side clustering of the points to limit the amount of
    >> data I send to the client (I'm using the Ruby On Rails framework).
    >>
    >> I start out with an array of 100,000 spots. Each spot is a hash with a
    >> :lat and :lng. I need to iterate through the array and group the spots
    >> into clusters. I break the world up into 30/30 degree clusters, so I
    >> have 72 clusters to put the spots in.
    >>
    >> The spots are ordered by longitude.
    >>
    >> Here is the code (do you use code snippet tags here?). Note that
    >> 'listings' is the array of spots ordered by longitude.
    >>
    >> #---------------------------------------------------------------------
    >> for i in (0..72)
    >>
    >> # Move to the next block.
    >> bNextRow = (i%nRows == 0 && i!=0)
    >> cLatLngBox.topRightLat = bNextRow ? 90 : cLatLngBox.topRightLat-size
    >> cLatLngBox.topRightLng = bNextRow ? (cLatLngBox.topRightLng+size)
    >> : cLatLngBox.topRightLng
    >> cLatLngBox.botLeftLat = bNextRow ? 90-size :
    >> (cLatLngBox.botLeftLat-size)
    >> cLatLngBox.botLeftLng = bNextRow ? (cLatLngBox.botLeftLng+size) :
    >> cLatLngBox.botLeftLng
    >>
    >> # Holds the index of each spot we added to a cluster. We will
    >> remove these
    >> # from the listing when done.
    >> cRemove = []
    >>
    >> # Our new cluster.
    >> cluster = {:lat=>0, :lng=>0, :size=>0}
    >>
    >> # Iterate through all the spots, adding them to clusters.
    >> listings.each_with_index do |listing, h|
    >> if cLatLngBox.is_in_box?(listing[:lat], listing[:lng])
    >>
    >> # Add to the cluster.
    >> cluster[:size]+=1
    >> cluster[:lat]+=listing[:lat].to_f
    >> cluster[:lng]+=listing[:lng].to_f
    >>
    >> # Tag the spot for removal.
    >> cRemove << h
    >> else
    >> # We cheat here. Because the spots are ordered by longitude
    >> (-180 to 180) we'll check
    >> # if the spot longitude exceeds our current max. If it does
    >> we're done with this loop.
    >> if listing[:lng].to_f > cLatLngBox.topRightLng
    >> break
    >> end
    >> end
    >> end
    >>
    >> # Remove all the items that were clustered.
    >> cRemove.each do |index|
    >> listings.delete_at(index)
    >> end
    >> #---------------------------------------------------------------------
    >>
    >> This is faster than not removing the items. But I'd like to remove
    >> them while I'm in the above loop, but still have the ability to break.

    >
    > I am not sure I fully understand your requirement (partly because the
    > code seems incomplete, e.g. where does "size" come from?) but I believe
    > you might have the wrong iteration approach: as far as I understand you
    > have a large array and *every* element in that array falls into one of
    > 72 cluster spots. Since the array is large you want to traverse it only
    > once. So the natural thing would be to traverse the array, put every
    > element into its corresponding spot and then use the clustered
    > information. You could use #group_by for this when on 1.8.7 or newer.
    >
    > robert@fussel:~$ ruby1.8 -v
    > ruby 1.8.7 (2009-06-12 patchlevel 174) [i486-linux]
    > robert@fussel:~$ ruby1.8 -e 'p (1..10).group_by{|x|x%3}'
    > {0=>[3, 6, 9], 1=>[1, 4, 7, 10], 2=>[2, 5, 8]}
    > robert@fussel:~$
    >
    > So I would end up doing something like this:
    >
    > Spot = Struct.new :lat, :lng do
    > include Comparable
    >
    > def <=>(spot)
    > to_a <=> spot.to_a
    > end
    >
    > def get_box(precision)
    > # place realistic calculation here
    > Spot.new lat.round(-1), lng.round(-1)
    > end
    > end
    >
    > listings = ... # filled with Spot instances
    >
    > clustered = listings.group_by do |spot|
    > spot.get_box
    > end
    >
    > # a variant might be:
    > ClusterSpot = Struct.new :lat, :lng, :count do
    > def initialize
    > self.lat = self.lng = self.count = 0
    > end
    >
    > def update(spot)
    > c = (count + 1).to_f
    > self.lat = ((lat * count) + spot.lat) / c
    > self.lng = ((lng * count) + spot.lng) / c
    > self.count += 1
    > end
    > end
    >
    > clustered = Hash.new do |h,k|
    > # add something to the Hash whenever the key shows
    > # up the first time
    > h[k] = ClusterSpot.new
    > end
    >
    > listings.each do |spot|
    > clustered[spot.get_box].update(spot)
    > end


    This output code belongs to the first variant. Sorry for mixing this up
    but my concentration suffers a bit from lying sick in bed. :-(

    > # output clustered information
    > clustered.keys.sort.each do |cl_spot|
    > printf "%6d entries for %p\n", clustered[cl_spot].size, cl_spot
    > end


    Output for the second alternative could look like

    clustered.sort.each do |spot, cl_spot|
    printf "lat=%8.3f lng=%8.3f\n", cl_spot
    end

    The main advantage of both these approaches is that they do not need a
    sorted input. This makes them more flexible. The single pass approach
    is useful for arbitrary large sets of data - especially when using the
    second approach where the size of the collection never grows beyond what
    will be output in the end.

    Cheers

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Jan 2, 2010
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Toni
    Replies:
    1
    Views:
    315
    =?ISO-8859-1?Q?G=F6ran_Andersson?=
    Apr 11, 2007
  2. Replies:
    12
    Views:
    935
  3. Jules
    Replies:
    6
    Views:
    139
    Jules
    Jul 15, 2003
  4. Bernd Burnt
    Replies:
    7
    Views:
    125
    Robert Dober
    Jul 13, 2007
  5. hisan
    Replies:
    1
    Views:
    1,297
    Dan Stromberg
    Jun 25, 2012
Loading...

Share This Page