Searching through a sorted array

F

FireAphis

Hello,

I have a very big array of objects sorted by one of its numeric data
members. During the flow of my application I need occasionally to get
all the elements in a specific range. I could use find_all

partition = my_array.find_all {|x| (x > start) && (x < end)}

My problem is that, as far as I know, find_all just iterates through
all the elements of the array and that's very inefficient, especially
in my case, in which I have a sorted array. Is there any standard way
to search efficiently through a sorted array? A binary search for
example?

Thanks
FireAphis
 
F

FireAphis

FireAphis said:
I have a very big array of objects sorted by one of its numeric data
members. During the flow of my application I need occasionally to get
all the elements in a specific range.

Then why not

my_array[start..end] ?

Cheers,
Peter
__http://www.rubyrailways.comhttp://scrubyt.org

Sorry if my explanation has been misleading. The array isn't
necessarily comprised of continuous or consecutive numbers. Moreover
they don't necessarily start from zero. For example:

my_array = [-100, -99, -2, 0, 3, 125, 1742, 1234568763]

now I want to get all the elements bigger than 1 and smaller than
1000. AFAIK my_array[1..1000] doesn't help in this case.
 
A

Axel Etzold

-------- Original-Nachricht --------
Datum: Wed, 3 Oct 2007 16:30:03 +0900
Von: Peter Szinek <[email protected]>
An: (e-mail address removed)
Betreff: Re: Searching through a sorted array
FireAphis said:
Hello,

I have a very big array of objects sorted by one of its numeric data
members. During the flow of my application I need occasionally to get
all the elements in a specific range.

Then why not

my_array[start..end] ?

Dear FireAphis,

to find the values 'start' and 'end', you could use something like
the twenty questions game to find a number between 1 and a million
(roughly 2^20, hence 20 questions), starting by :

1.) Is the index of 'start' ('end') bigger or smaller than 2^19 (roughly 500000)?

2.) If the answer was "yes", is the index of 'start' ('end') bigger or smaller than 2^19+2^18 (roughly 750000)?
If the answer was "no", is the index of 'start' ('end') bigger or smaller than 2^18 (roughly 250000)?

etc..

The range in which "start" lies reduces its size by half with each question, and you need 2*log(2) n questions to find the 'start' and 'end'
values.

I am actually not aware whether that's implemented in the Array#index
method.

Best regards,

Axel
 
A

Axel Etzold

-------- Original-Nachricht --------
Datum: Wed, 3 Oct 2007 16:50:03 +0900
Von: FireAphis <[email protected]>
An: (e-mail address removed)
Betreff: Re: Searching through a sorted array
I have a very big array of objects sorted by one of its numeric data
members. During the flow of my application I need occasionally to get
all the elements in a specific range.

Then why not

my_array[start..end] ?

Cheers,
Peter
__http://www.rubyrailways.comhttp://scrubyt.org

Sorry if my explanation has been misleading. The array isn't
necessarily comprised of continuous or consecutive numbers. Moreover
they don't necessarily start from zero. For example:

my_array = [-100, -99, -2, 0, 3, 125, 1742, 1234568763]

now I want to get all the elements bigger than 1 and smaller than
1000. AFAIK my_array[1..1000] doesn't help in this case.
In that case, you can say:

interesting_array=my_array.delete_if{|entry| entry>1000 or entry<1}

Best regards,

Axel
 
F

FireAphis

-------- Original-Nachricht --------
Datum: Wed, 3 Oct 2007 16:30:03 +0900
Von: Peter Szinek <[email protected]>
An: (e-mail address removed)
Betreff: Re: Searching through a sorted array
Then why not
my_array[start..end] ?

Dear FireAphis,

to find the values 'start' and 'end', you could use something like
the twenty questions game to find a number between 1 and a million
(roughly 2^20, hence 20 questions), starting by :

1.) Is the index of 'start' ('end') bigger or smaller than 2^19 (roughly 500000)?

2.) If the answer was "yes", is the index of 'start' ('end') bigger or smaller than 2^19+2^18 (roughly 750000)?
If the answer was "no", is the index of 'start' ('end') bigger or smallerthan 2^18 (roughly 250000)?

etc..

The range in which "start" lies reduces its size by half with each question, and you need 2*log(2) n questions to find the 'start' and 'end'
values.

I am actually not aware whether that's implemented in the Array#index
method.

Best regards,

Axel

Of course that could be an excellent solution. Do you know if there is
already such a facility in Ruby that implements the algorithm? I'm
quite sure Array#index doesn't do that since in order to apply a
binary search the array has to be sorted.
 
A

Axel Etzold

-------- Original-Nachricht --------
Datum: Wed, 3 Oct 2007 17:15:12 +0900
Von: FireAphis <[email protected]>
An: (e-mail address removed)
Betreff: Re: Searching through a sorted array
-------- Original-Nachricht --------
Datum: Wed, 3 Oct 2007 16:30:03 +0900
Von: Peter Szinek <[email protected]>
An: (e-mail address removed)
Betreff: Re: Searching through a sorted array
FireAphis wrote:
Hello,
I have a very big array of objects sorted by one of its numeric data
members. During the flow of my application I need occasionally to get
all the elements in a specific range.
Then why not
my_array[start..end] ?

Dear FireAphis,

to find the values 'start' and 'end', you could use something like
the twenty questions game to find a number between 1 and a million
(roughly 2^20, hence 20 questions), starting by :

1.) Is the index of 'start' ('end') bigger or smaller than 2^19 (roughly 500000)?

2.) If the answer was "yes", is the index of 'start' ('end') bigger or
smaller than 2^19+2^18 (roughly 750000)?
If the answer was "no", is the index of 'start' ('end') bigger or
smaller than 2^18 (roughly 250000)?
etc..

The range in which "start" lies reduces its size by half with each
question, and you need 2*log(2) n questions to find the 'start' and 'end'
values.

I am actually not aware whether that's implemented in the Array#index
method.

Best regards,

Axel

Of course that could be an excellent solution. Do you know if there is
already such a facility in Ruby that implements the algorithm? I'm
quite sure Array#index doesn't do that since in order to apply a
binary search the array has to be sorted.

Dear FireAphis,

I don't know of an implemented solution, but I've implemented my own ... use at your own risk!

Best regards,

Axel


class Array
def find_lower_index(cond)
len=self.length
upper=len
max_nr=(Math.log(len)/Math.log(2)).floor
lower_index=0
pow=1
while eval(cond)==false
lower_index=2**pow
pow=pow+1
end
pow=pow-2
lower=2**pow
(pow-1).downto(0) do |x|
old_test=lower_index
lower_index=lower+2**x
if eval(cond)==false
else
lower_index=old_test
end
end
return lower_index+1
end

def find_upper_index(cond)
len=self.length
upper=len
max_nr=(Math.log(len)/Math.log(2)).floor
upper_index=0
pow=1
while eval(cond)==true
upper_index=2**pow
pow=pow+1
end
pow=pow-2
upper=2**pow
(pow-1).downto(0) do |x|
old_test=upper_index
upper_index=upper+2**x
if eval(cond)
else
upper_index=old_test
end
upper=upper_index
end
return upper_index-1
end


end

# some example
len=10**8
a=(1..len).to_a
b=5*len/2
cond='lower_index>10'
r=a.find_lower_index(cond)
cond='upper_index<100'
s=a.find_upper_index(cond)
p r,s
p a[r]
p a
 
7

7stud --

FireAphis said:
My problem is that, as far as I know, find_all just iterates through
all the elements of the array and that's very inefficient, especially
in my case, in which I have a sorted array. Is there any standard way
to search efficiently through a sorted array? A binary search for
example?

You can also turn your array into a hash, which will work just as well
for unsorted arrays:


class Dog
attr_reader :breed, :id

def initialize(breed, id)
@breed = breed
@id = id
end

end

#Create sorted array of Dog's:
#-----------------------------
dogs = []
(1..2_000).each do |r|
dogs << Dog.new("Akita", r*2)
end

=begin
dogs = (1..2000).inject([]) do |arr, r|
arr << Dog.new("Akita", r*2)
arr
end
=end
#----------------------------+


#Convert array to hash.
#key: dog.id
#val: Dog object
#---------------------
my_id_dog_hash = {}

class << my_id_dog_hash
attr_accessor :highest_id
end

dogs.each do |dog|
my_id_dog_hash[dog.id] = dog
end

my_id_dog_hash.highest_id = dogs[-1].id
----------------------+


#Find Dog's in range:
#-------------------
def get_dogs_in_range(id_dog_hash, range)

high_id = id_dog_hash.highest_id
if range.end > high_id
range = range.begin..high_id
end


dogs_in_range = []

range.each do |r|
dog = id_dog_hash[r]

if dog
dogs_in_range << id_dog_hash[r]
end

end
=begin
dogs_in_range = range.inject([]) do |arr1, r|
if dog = id_dog_hash[r]
arr1 << dog
end

arr1
end
=end

dogs_in_range
end
------------------+


#Test it out:
------------
results = get_dogs_in_range(my_id_dog_hash, 1_990..2000)
results.each {|dog| p dog}
-----------+

--output:--
#<Dog:0x1a70c @id=1990, @breed="Akita">
#<Dog:0x1a6e4 @id=1992, @breed="Akita">
#<Dog:0x1a6bc @id=1994, @breed="Akita">
#<Dog:0x1a694 @id=1996, @breed="Akita">
#<Dog:0x1a66c @id=1998, @breed="Akita">
#<Dog:0x1a644 @id=2000, @breed="Akita">
 
F

FireAphis

FireAphis said:
My problem is that, as far as I know, find_all just iterates through
all the elements of the array and that's very inefficient, especially
in my case, in which I have a sorted array. Is there any standard way
to search efficiently through a sorted array? A binary search for
example?

You can also turn your array into a hash, which will work just as well
for unsorted arrays:

class Dog
attr_reader :breed, :id

def initialize(breed, id)
@breed = breed
@id = id
end

end

#Create sorted array of Dog's:
#-----------------------------
dogs = []
(1..2_000).each do |r|
dogs << Dog.new("Akita", r*2)
end

=begin
dogs = (1..2000).inject([]) do |arr, r|
arr << Dog.new("Akita", r*2)
arr
end
=end
#----------------------------+

#Convert array to hash.
#key: dog.id
#val: Dog object
#---------------------
my_id_dog_hash = {}

class << my_id_dog_hash
attr_accessor :highest_id
end

dogs.each do |dog|
my_id_dog_hash[dog.id] = dog
end

my_id_dog_hash.highest_id = dogs[-1].id
----------------------+

#Find Dog's in range:
#-------------------
def get_dogs_in_range(id_dog_hash, range)

high_id = id_dog_hash.highest_id
if range.end > high_id
range = range.begin..high_id
end

dogs_in_range = []

range.each do |r|
dog = id_dog_hash[r]

if dog
dogs_in_range << id_dog_hash[r]
end

end
=begin
dogs_in_range = range.inject([]) do |arr1, r|
if dog = id_dog_hash[r]
arr1 << dog
end

arr1
end
=end

dogs_in_range
end
------------------+

#Test it out:
------------
results = get_dogs_in_range(my_id_dog_hash, 1_990..2000)
results.each {|dog| p dog}
-----------+

--output:--
#<Dog:0x1a70c @id=1990, @breed="Akita">
#<Dog:0x1a6e4 @id=1992, @breed="Akita">
#<Dog:0x1a6bc @id=1994, @breed="Akita">
#<Dog:0x1a694 @id=1996, @breed="Akita">
#<Dog:0x1a66c @id=1998, @breed="Akita">
#<Dog:0x1a644 @id=2000, @breed="Akita">

Neat. Not space efficient but undoubtfully more quick. But if the IDs
have large gaps ([10000, 20000, 30000]) do you think it still will be
more efficient compared to a simple iteration? Consider the fact that
a simple loop will have three iterations whereas your implementation
will have 30000 iteration.

Thanks,
FireAphis
 
J

Jesús Gabriel y Galán

Sorry if my explanation has been misleading. The array isn't
necessarily comprised of continuous or consecutive numbers. Moreover
they don't necessarily start from zero. For example:

my_array = [-100, -99, -2, 0, 3, 125, 1742, 1234568763]

now I want to get all the elements bigger than 1 and smaller than
1000. AFAIK my_array[1..1000] doesn't help in this case.

Here is my try:

class Array
# assumes a sorted array whose elements respond appropriately to <=>
def subrange(lower, higher)
low = find_index_lower(lower)
high = find_index_higher(higher)
if (low >= length) || (high < 0)
[]
else
self[low..high]
end

end

# Finds the lowest index whose value is greater than the specified parameter
# If all values in the array are lower, returns the index after the last one
# If all values are higher, returns 0
def find_index_lower(lower)
return length if ((last <=> lower) == -1)
return 0 if ((first <=> lower) == 1)
low = 0
high = self.length - 1
while low < high
index = low + (high - low) / 2
element = self[index]
test = element <=> lower
case test
when 0:
return index + 1
when -1:
return (index + 1) if ((self[index+1] <=> lower) == 1)
low = index + 1
when 1:
return (index) if ((self[index-1] <=> lower) <= 0)
high = index - 1
end
end
end

# Finds the highest index whose value is lower than the specified parameter
# If all values in the array are higher, returns -1
# If all values are lower, returns length-1
def find_index_higher(higher)
return -1 if ((first <=> higher) >= 0)
return length - 1 if ((last <=> higher) == -1)
low = 0
high = self.length - 1
while low < high
index = low + (high - low) / 2
element = self[index]
test = element <=> higher
case test
when 0:
return index - 1
when -1:
return (index) if ((self[index+1] <=> higher) >= 0)
low = index + 1
when 1:
return (index - 1) if ((self[index-1] <=> higher) == -1)
high = index - 1
end
end
end


end

a = [5, 6, 7, 8, 9, 10, 14]
p a
puts "6 - 13: #{a.subrange(6,13).inspect}"
puts "5 - 15: #{a.subrange(5,15).inspect}"
puts "5 - 100: #{a.subrange(5, 100).inspect}"
puts "2 - 100: #{a.subrange(2, 100).inspect}"
puts "-1 - 100: #{a.subrange(-1, 100).inspect}"
puts "100 - 150: #{a.subrange(100,150).inspect}"
puts "1 - 2: #{a.subrange(1,2).inspect}"

require 'benchmark'

n = 10_000
Benchmark.bmbm do |x|
x.report("Array subrange") {
n.times {a.subrange(6,13)}
}
end

-----------------------------------------------------------

$ ruby array_subrange.rb
[5, 6, 7, 8, 9, 10, 14]
6 - 13: [7, 8, 9, 10]
5 - 15: [6, 7, 8, 9, 10, 14]
5 - 100: [6, 7, 8, 9, 10, 14]
2 - 100: [5, 6, 7, 8, 9, 10, 14]
-1 - 100: [5, 6, 7, 8, 9, 10, 14]
100 - 150: []
1 - 2: []
Rehearsal --------------------------------------------------
Array subrange 0.220000 0.020000 0.240000 ( 0.274937)
----------------------------------------- total: 0.240000sec

user system total real
Array subrange 0.190000 0.010000 0.200000 ( 0.227276)

Jesus.
 
7

7stud --

Axel said:
# some example
len=10**8
a=(1..len).to_a
b=5*len/2
cond='lower_index>10'
r=a.find_lower_index(cond)
cond='upper_index<100'
s=a.find_upper_index(cond)
p r,s
p a[r]
p a


When I use this code:

a = [2, 4, 6, 8]
cond='lower_index>0'
r=a.find_lower_index(cond)
cond='upper_index<2'
s=a.find_upper_index(cond)
p r,s
p a[r]
p a

I get this output:

3
1
8
4

The output appears to be backwards, since the lower index would be 1 and
the upper index would be 3. And it's a little bit puzzling why a search
would be needed to find an index in an array that is greater than 0. I
can tell you the answer without searching: it's 1. The same goes for
the upper index: if the cond = "upper index < 2", then the upper index
is 1.
 
7

7stud --

Can you explain this output:

a = [2, 4, 6, 8]
cond='lower_index>0'
r=a.find_lower_index(cond)
cond='upper_index<4'
s=a.find_upper_index(cond)
p r,s
p a[r]
p a

--output:--
3
2
8
6
 
7

7stud --

FireAphis said:
Neat. Not space efficient

Yes, you're right. I wouldn't recommend it for really large arrays. The
original sorted array could also be abandoned in favor of a hash to
begin with.

But if the IDs
have large gaps ([10000, 20000, 30000]) do you think it still will be
more efficient compared to a simple iteration?

Depending on how you write the simple iteration, the hash method can be
a lot slower.
a simple loop will have three iterations whereas your implementation
will have 30000 iteration.

What if a simple iteration has 3 iterations x 3,000 range checks, e.g.:

dogs_in_range = []

dogs.each do |dog|
range.each do |r|
if dog.id == r
dogs_in_range << dog
end
end
end
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top