Delete elements in array, break, and keep changes?

J

Joe Buck

I have an array of a lot elements that I need to cluster (they are
latitudes and longitudes).

As I iterate the array I want to remove elements that I have already
clustered.

Currently I've tried using Array::reject! and Array::delete_if. It
works. I can further optimize by breaking from the delete_if/reject!
block if certiain conditions are met.

Here is the issue, if I call break in the block, all of the changes to
the array are lost.

Is there a way to break from the block but still delete/reject elements
in the array?
 
E

Ehsanul Hoque

I have an array of a lot elements that I need to cluster (they are
latitudes and longitudes).
=20
As I iterate the array I want to remove elements that I have already
clustered.
=20

In this case=2C it looks like Array#shift might work=2C though I'm not sure=
I totally understand your requirements. Something like:

until long_lat_array.empty?
long_lat =3D long_lat_array.shift
cluster(long_lat)
end

Hope I'm not missing your issue.

- Ehsan
=20
_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/171222986/direct/01/=
 
J

Joe Buck

Ehsan,

Thanks for the help. Unfortunately I don't think that will work as it
removes the first element in the array.

What I need is an Array::delete_if or Array::reject! call that allows me
to break from the block while still removing elements.

My solution is to use a second array to keep the indices of all removed
elements. I can then iterate through the second array after the first
loop and remove the elements. I'm guessing reject! and delete_if do this
as well. It could be more efficient but it works.

Thanks!
 
P

Phillip Gawlowski

Ehsan,

Thanks for the help. Unfortunately I don't think that will work as it
removes the first element in the array.

def copy_and_shift_array array
temp_array = array # or perform a deep copy, if you are worried about
# about the first array's integrity.
temp_array.each do |element|
return element if condition_met?
end
end

Array#shift is useful here *because* it gives you the first Array
element in an easily accessible format (the actual Object in the Array),
so that you can perform your checks and other magic, before storing it
someplace else.

Note: Above code is untested, but should work, at least in principle.

(I'm always a foggy when it comes to blocks and variable assignment in
Ruby, despite many ruby-talk discussions about scope in Ruby).
 
P

pharrington

Ehsan,

Thanks for the help. Unfortunately I don't think that will work as it
removes the first element in the array.

What I need is an Array::delete_if or Array::reject! call that allows me
to break from the block while still removing elements.

My solution is to use a second array to keep the indices of all removed
elements. I can then iterate through the second array after the first
loop and remove the elements. I'm guessing reject! and delete_if do this
as well. It could be more efficient but it works.

Thanks!

Without knowing the point of what you're trying to do, this sounds
like what you have in mind:

array = [] # or whatev
process_array = true
array.reject!(x) do
if process_array
conditional_code_for x
else
false
end
 
P

pharrington

Thanks for the help. Unfortunately I don't think that will work as it
removes the first element in the array.
What I need is an Array::delete_if or Array::reject! call that allows me
to break from the block while still removing elements.
My solution is to use a second array to keep the indices of all removed
elements. I can then iterate through the second array after the first
loop and remove the elements. I'm guessing reject! and delete_if do this
as well. It could be more efficient but it works.

Without knowing the point of what you're trying to do, this sounds
like what you have in mind:

array = [] # or whatev
process_array = true
array.reject!(x) do
  if process_array
    conditional_code_for x
  else
    false
end

Also, is the reject! really the bottleneck in your app? We obviously
don't want to intentionally write sub-optimal code, but do not
optimize what's not slowing you down.
 
P

pharrington

Without knowing the point of what you're trying to do, this sounds
like what you have in mind:
array = [] # or whatev
process_array = true
array.reject!(x) do
  if process_array
    conditional_code_for x
  else
    false
end

Also, is the reject! really the bottleneck in your app? We obviously
don't want to intentionally write sub-optimal code, but do not
optimize what's not slowing you down.

wow array.reject! do |x| i don't know basic ruby syntax today...

Or if you really want to avoid even iterating over each element:

original_array = []
new_array = []

original_array.each {|x| new_array << x unless conditional_code_for
(x)}

But again, what precisely are you doing? Is the deletion code really
your bottleneck? And if it is, I'm almost certain that using a more
appropriate structure than an array would be the way to go.
 
J

Joe Buck

Alright, I won't hide you from the details. I have a map (google maps
API) with data points (100,000+).

I'm doing server side clustering of the points to limit the amount of
data I send to the client (I'm using the Ruby On Rails framework).

I start out with an array of 100,000 spots. Each spot is a hash with a
:lat and :lng. I need to iterate through the array and group the spots
into clusters. I break the world up into 30/30 degree clusters, so I
have 72 clusters to put the spots in.

The spots are ordered by longitude.

Here is the code (do you use code snippet tags here?). Note that
'listings' is the array of spots ordered by longitude.

#---------------------------------------------------------------------
for i in (0..72)

# Move to the next block.
bNextRow = (i%nRows == 0 && i!=0)
cLatLngBox.topRightLat = bNextRow ? 90 : cLatLngBox.topRightLat-size
cLatLngBox.topRightLng = bNextRow ? (cLatLngBox.topRightLng+size) :
cLatLngBox.topRightLng
cLatLngBox.botLeftLat = bNextRow ? 90-size :
(cLatLngBox.botLeftLat-size)
cLatLngBox.botLeftLng = bNextRow ? (cLatLngBox.botLeftLng+size) :
cLatLngBox.botLeftLng

# Holds the index of each spot we added to a cluster. We will remove
these
# from the listing when done.
cRemove = []

# Our new cluster.
cluster = {:lat=>0, :lng=>0, :size=>0}

# Iterate through all the spots, adding them to clusters.
listings.each_with_index do |listing, h|
if cLatLngBox.is_in_box?(listing[:lat], listing[:lng])

# Add to the cluster.
cluster[:size]+=1
cluster[:lat]+=listing[:lat].to_f
cluster[:lng]+=listing[:lng].to_f

# Tag the spot for removal.
cRemove << h
else
# We cheat here. Because the spots are ordered by longitude
(-180 to 180) we'll check
# if the spot longitude exceeds our current max. If it does
we're done with this loop.
if listing[:lng].to_f > cLatLngBox.topRightLng
break
end
end
end

# Remove all the items that were clustered.
cRemove.each do |index|
listings.delete_at(index)
end
#---------------------------------------------------------------------

This is faster than not removing the items. But I'd like to remove them
while I'm in the above loop, but still have the ability to break.
 
J

Joe Buck

Sorry for the terrible formatting. As another example, here is C++ code
that would do what I want using iterators.


typedef std::vector<cSpot> spots_t;
spots_t cSpots;

// Fill cSpots here ...

CLatLngBox cLatLngBox;

for(int i=0; i<72; i++)
{
// Move cLatLngBox to the next block ..

// Our new cluster.
CCluster cCluster;

// Iterate points.
spots_t::iterator itr = cSpots.begin ();
for(itr; itr!=cSpots.end();)
{
if(cLatLngBox.is_in_box(*itr))
{
// Add to cluster.

// This is where we remove the spot.
itr = cSpots.erase(itr);
}
else
{
if((*itr).lng > cLatLngBox.lng)
{
// Done with the inner loop.
break;
}

// Next spot.
itr++;
}
}
}
 
R

Robert Klemme

Alright, I won't hide you from the details. I have a map (google maps
API) with data points (100,000+).

I'm doing server side clustering of the points to limit the amount of
data I send to the client (I'm using the Ruby On Rails framework).

I start out with an array of 100,000 spots. Each spot is a hash with a
:lat and :lng. I need to iterate through the array and group the spots
into clusters. I break the world up into 30/30 degree clusters, so I
have 72 clusters to put the spots in.

The spots are ordered by longitude.

Here is the code (do you use code snippet tags here?). Note that
'listings' is the array of spots ordered by longitude.

#---------------------------------------------------------------------
for i in (0..72)

# Move to the next block.
bNextRow = (i%nRows == 0 && i!=0)
cLatLngBox.topRightLat = bNextRow ? 90 : cLatLngBox.topRightLat-size
cLatLngBox.topRightLng = bNextRow ? (cLatLngBox.topRightLng+size) :
cLatLngBox.topRightLng
cLatLngBox.botLeftLat = bNextRow ? 90-size :
(cLatLngBox.botLeftLat-size)
cLatLngBox.botLeftLng = bNextRow ? (cLatLngBox.botLeftLng+size) :
cLatLngBox.botLeftLng

# Holds the index of each spot we added to a cluster. We will remove
these
# from the listing when done.
cRemove = []

# Our new cluster.
cluster = {:lat=>0, :lng=>0, :size=>0}

# Iterate through all the spots, adding them to clusters.
listings.each_with_index do |listing, h|
if cLatLngBox.is_in_box?(listing[:lat], listing[:lng])

# Add to the cluster.
cluster[:size]+=1
cluster[:lat]+=listing[:lat].to_f
cluster[:lng]+=listing[:lng].to_f

# Tag the spot for removal.
cRemove << h
else
# We cheat here. Because the spots are ordered by longitude
(-180 to 180) we'll check
# if the spot longitude exceeds our current max. If it does
we're done with this loop.
if listing[:lng].to_f > cLatLngBox.topRightLng
break
end
end
end

# Remove all the items that were clustered.
cRemove.each do |index|
listings.delete_at(index)
end
#---------------------------------------------------------------------

This is faster than not removing the items. But I'd like to remove them
while I'm in the above loop, but still have the ability to break.

I am not sure I fully understand your requirement (partly because the
code seems incomplete, e.g. where does "size" come from?) but I believe
you might have the wrong iteration approach: as far as I understand you
have a large array and *every* element in that array falls into one of
72 cluster spots. Since the array is large you want to traverse it only
once. So the natural thing would be to traverse the array, put every
element into its corresponding spot and then use the clustered
information. You could use #group_by for this when on 1.8.7 or newer.

robert@fussel:~$ ruby1.8 -v
ruby 1.8.7 (2009-06-12 patchlevel 174) [i486-linux]
robert@fussel:~$ ruby1.8 -e 'p (1..10).group_by{|x|x%3}'
{0=>[3, 6, 9], 1=>[1, 4, 7, 10], 2=>[2, 5, 8]}
robert@fussel:~$

So I would end up doing something like this:

Spot = Struct.new :lat, :lng do
include Comparable

def <=>(spot)
to_a <=> spot.to_a
end

def get_box(precision)
# place realistic calculation here
Spot.new lat.round(-1), lng.round(-1)
end
end

listings = ... # filled with Spot instances

clustered = listings.group_by do |spot|
spot.get_box
end

# a variant might be:
ClusterSpot = Struct.new :lat, :lng, :count do
def initialize
self.lat = self.lng = self.count = 0
end

def update(spot)
c = (count + 1).to_f
self.lat = ((lat * count) + spot.lat) / c
self.lng = ((lng * count) + spot.lng) / c
self.count += 1
end
end

clustered = Hash.new do |h,k|
# add something to the Hash whenever the key shows
# up the first time
h[k] = ClusterSpot.new
end

listings.each do |spot|
clustered[spot.get_box].update(spot)
end


# output clustered information
clustered.keys.sort.each do |cl_spot|
printf "%6d entries for %p\n", clustered[cl_spot].size, cl_spot
end

Of course you can pull this off without defining additional classes but
with specialized classes the code becomes clearer and more reusable IMHO.

Kind regards

robert
 
R

Robert Klemme

Alright, I won't hide you from the details. I have a map (google maps
API) with data points (100,000+).

I'm doing server side clustering of the points to limit the amount of
data I send to the client (I'm using the Ruby On Rails framework).

I start out with an array of 100,000 spots. Each spot is a hash with a
:lat and :lng. I need to iterate through the array and group the spots
into clusters. I break the world up into 30/30 degree clusters, so I
have 72 clusters to put the spots in.

The spots are ordered by longitude.

Here is the code (do you use code snippet tags here?). Note that
'listings' is the array of spots ordered by longitude.

#---------------------------------------------------------------------
for i in (0..72)

# Move to the next block.
bNextRow = (i%nRows == 0 && i!=0)
cLatLngBox.topRightLat = bNextRow ? 90 : cLatLngBox.topRightLat-size
cLatLngBox.topRightLng = bNextRow ? (cLatLngBox.topRightLng+size)
: cLatLngBox.topRightLng
cLatLngBox.botLeftLat = bNextRow ? 90-size :
(cLatLngBox.botLeftLat-size)
cLatLngBox.botLeftLng = bNextRow ? (cLatLngBox.botLeftLng+size) :
cLatLngBox.botLeftLng

# Holds the index of each spot we added to a cluster. We will
remove these
# from the listing when done.
cRemove = []

# Our new cluster.
cluster = {:lat=>0, :lng=>0, :size=>0}

# Iterate through all the spots, adding them to clusters.
listings.each_with_index do |listing, h|
if cLatLngBox.is_in_box?(listing[:lat], listing[:lng])

# Add to the cluster.
cluster[:size]+=1
cluster[:lat]+=listing[:lat].to_f
cluster[:lng]+=listing[:lng].to_f

# Tag the spot for removal.
cRemove << h
else
# We cheat here. Because the spots are ordered by longitude
(-180 to 180) we'll check
# if the spot longitude exceeds our current max. If it does
we're done with this loop.
if listing[:lng].to_f > cLatLngBox.topRightLng
break
end
end
end

# Remove all the items that were clustered.
cRemove.each do |index|
listings.delete_at(index)
end
#---------------------------------------------------------------------

This is faster than not removing the items. But I'd like to remove
them while I'm in the above loop, but still have the ability to break.

I am not sure I fully understand your requirement (partly because the
code seems incomplete, e.g. where does "size" come from?) but I believe
you might have the wrong iteration approach: as far as I understand you
have a large array and *every* element in that array falls into one of
72 cluster spots. Since the array is large you want to traverse it only
once. So the natural thing would be to traverse the array, put every
element into its corresponding spot and then use the clustered
information. You could use #group_by for this when on 1.8.7 or newer.

robert@fussel:~$ ruby1.8 -v
ruby 1.8.7 (2009-06-12 patchlevel 174) [i486-linux]
robert@fussel:~$ ruby1.8 -e 'p (1..10).group_by{|x|x%3}'
{0=>[3, 6, 9], 1=>[1, 4, 7, 10], 2=>[2, 5, 8]}
robert@fussel:~$

So I would end up doing something like this:

Spot = Struct.new :lat, :lng do
include Comparable

def <=>(spot)
to_a <=> spot.to_a
end

def get_box(precision)
# place realistic calculation here
Spot.new lat.round(-1), lng.round(-1)
end
end

listings = ... # filled with Spot instances

clustered = listings.group_by do |spot|
spot.get_box
end

# a variant might be:
ClusterSpot = Struct.new :lat, :lng, :count do
def initialize
self.lat = self.lng = self.count = 0
end

def update(spot)
c = (count + 1).to_f
self.lat = ((lat * count) + spot.lat) / c
self.lng = ((lng * count) + spot.lng) / c
self.count += 1
end
end

clustered = Hash.new do |h,k|
# add something to the Hash whenever the key shows
# up the first time
h[k] = ClusterSpot.new
end

listings.each do |spot|
clustered[spot.get_box].update(spot)
end

This output code belongs to the first variant. Sorry for mixing this up
but my concentration suffers a bit from lying sick in bed. :-(
# output clustered information
clustered.keys.sort.each do |cl_spot|
printf "%6d entries for %p\n", clustered[cl_spot].size, cl_spot
end

Output for the second alternative could look like

clustered.sort.each do |spot, cl_spot|
printf "lat=%8.3f lng=%8.3f\n", cl_spot
end

The main advantage of both these approaches is that they do not need a
sorted input. This makes them more flexible. The single pass approach
is useful for arbitrary large sets of data - especially when using the
second approach where the size of the collection never grows beyond what
will be output in the end.

Cheers

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top