Searching through a sorted array

Discussion in 'Ruby' started by FireAphis, Oct 3, 2007.

  1. FireAphis

    FireAphis Guest

    Hello,

    I have a very big array of objects sorted by one of its numeric data
    members. During the flow of my application I need occasionally to get
    all the elements in a specific range. I could use find_all

    partition = my_array.find_all {|x| (x > start) && (x < end)}

    My problem is that, as far as I know, find_all just iterates through
    all the elements of the array and that's very inefficient, especially
    in my case, in which I have a sorted array. Is there any standard way
    to search efficiently through a sorted array? A binary search for
    example?

    Thanks
    FireAphis
     
    FireAphis, Oct 3, 2007
    #1
    1. Advertising

  2. FireAphis

    Peter Szinek Guest

    FireAphis wrote:
    > Hello,
    >
    > I have a very big array of objects sorted by one of its numeric data
    > members. During the flow of my application I need occasionally to get
    > all the elements in a specific range.


    Then why not

    my_array[start..end] ?

    Cheers,
    Peter
    __
    http://www.rubyrailways.com
    http://scrubyt.org
     
    Peter Szinek, Oct 3, 2007
    #2
    1. Advertising

  3. FireAphis

    FireAphis Guest

    On Oct 3, 9:30 am, Peter Szinek <> wrote:
    > FireAphis wrote:
    > > Hello,

    >
    > > I have a very big array of objects sorted by one of its numeric data
    > > members. During the flow of my application I need occasionally to get
    > > all the elements in a specific range.

    >
    > Then why not
    >
    > my_array[start..end] ?
    >
    > Cheers,
    > Peter
    > __http://www.rubyrailways.comhttp://scrubyt.org


    Sorry if my explanation has been misleading. The array isn't
    necessarily comprised of continuous or consecutive numbers. Moreover
    they don't necessarily start from zero. For example:

    my_array = [-100, -99, -2, 0, 3, 125, 1742, 1234568763]

    now I want to get all the elements bigger than 1 and smaller than
    1000. AFAIK my_array[1..1000] doesn't help in this case.
     
    FireAphis, Oct 3, 2007
    #3
  4. FireAphis

    Axel Etzold Guest

    -------- Original-Nachricht --------
    > Datum: Wed, 3 Oct 2007 16:30:03 +0900
    > Von: Peter Szinek <>
    > An:
    > Betreff: Re: Searching through a sorted array


    > FireAphis wrote:
    > > Hello,
    > >
    > > I have a very big array of objects sorted by one of its numeric data
    > > members. During the flow of my application I need occasionally to get
    > > all the elements in a specific range.

    >
    > Then why not
    >
    > my_array[start..end] ?


    Dear FireAphis,

    to find the values 'start' and 'end', you could use something like
    the twenty questions game to find a number between 1 and a million
    (roughly 2^20, hence 20 questions), starting by :

    1.) Is the index of 'start' ('end') bigger or smaller than 2^19 (roughly 500000)?

    2.) If the answer was "yes", is the index of 'start' ('end') bigger or smaller than 2^19+2^18 (roughly 750000)?
    If the answer was "no", is the index of 'start' ('end') bigger or smaller than 2^18 (roughly 250000)?

    etc..

    The range in which "start" lies reduces its size by half with each question, and you need 2*log(2) n questions to find the 'start' and 'end'
    values.

    I am actually not aware whether that's implemented in the Array#index
    method.

    Best regards,

    Axel

    --
    Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
    Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
     
    Axel Etzold, Oct 3, 2007
    #4
  5. FireAphis

    Axel Etzold Guest

    -------- Original-Nachricht --------
    > Datum: Wed, 3 Oct 2007 16:50:03 +0900
    > Von: FireAphis <>
    > An:
    > Betreff: Re: Searching through a sorted array


    > On Oct 3, 9:30 am, Peter Szinek <> wrote:
    > > FireAphis wrote:
    > > > Hello,

    > >
    > > > I have a very big array of objects sorted by one of its numeric data
    > > > members. During the flow of my application I need occasionally to get
    > > > all the elements in a specific range.

    > >
    > > Then why not
    > >
    > > my_array[start..end] ?
    > >
    > > Cheers,
    > > Peter
    > > __http://www.rubyrailways.comhttp://scrubyt.org

    >
    > Sorry if my explanation has been misleading. The array isn't
    > necessarily comprised of continuous or consecutive numbers. Moreover
    > they don't necessarily start from zero. For example:
    >
    > my_array = [-100, -99, -2, 0, 3, 125, 1742, 1234568763]
    >
    > now I want to get all the elements bigger than 1 and smaller than
    > 1000. AFAIK my_array[1..1000] doesn't help in this case.
    >

    In that case, you can say:

    interesting_array=my_array.delete_if{|entry| entry>1000 or entry<1}

    Best regards,

    Axel
    --
    Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
    Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
     
    Axel Etzold, Oct 3, 2007
    #5
  6. FireAphis

    FireAphis Guest

    On Oct 3, 9:56 am, "Axel Etzold" <> wrote:
    > -------- Original-Nachricht --------
    >
    > > Datum: Wed, 3 Oct 2007 16:30:03 +0900
    > > Von: Peter Szinek <>
    > > An:
    > > Betreff: Re: Searching through a sorted array
    > > FireAphis wrote:
    > > > Hello,

    >
    > > > I have a very big array of objects sorted by one of its numeric data
    > > > members. During the flow of my application I need occasionally to get
    > > > all the elements in a specific range.

    >
    > > Then why not

    >
    > > my_array[start..end] ?

    >
    > Dear FireAphis,
    >
    > to find the values 'start' and 'end', you could use something like
    > the twenty questions game to find a number between 1 and a million
    > (roughly 2^20, hence 20 questions), starting by :
    >
    > 1.) Is the index of 'start' ('end') bigger or smaller than 2^19 (roughly 500000)?
    >
    > 2.) If the answer was "yes", is the index of 'start' ('end') bigger or smaller than 2^19+2^18 (roughly 750000)?
    > If the answer was "no", is the index of 'start' ('end') bigger or smallerthan 2^18 (roughly 250000)?
    >
    > etc..
    >
    > The range in which "start" lies reduces its size by half with each question, and you need 2*log(2) n questions to find the 'start' and 'end'
    > values.
    >
    > I am actually not aware whether that's implemented in the Array#index
    > method.
    >
    > Best regards,
    >
    > Axel
    >
    > --
    > Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
    > Ideal für Modem und ISDN:http://www.gmx.net/de/go/smartsurfer


    Of course that could be an excellent solution. Do you know if there is
    already such a facility in Ruby that implements the algorithm? I'm
    quite sure Array#index doesn't do that since in order to apply a
    binary search the array has to be sorted.
     
    FireAphis, Oct 3, 2007
    #6
  7. FireAphis

    Axel Etzold Guest

    -------- Original-Nachricht --------
    > Datum: Wed, 3 Oct 2007 17:15:12 +0900
    > Von: FireAphis <>
    > An:
    > Betreff: Re: Searching through a sorted array


    > On Oct 3, 9:56 am, "Axel Etzold" <> wrote:
    > > -------- Original-Nachricht --------
    > >
    > > > Datum: Wed, 3 Oct 2007 16:30:03 +0900
    > > > Von: Peter Szinek <>
    > > > An:
    > > > Betreff: Re: Searching through a sorted array
    > > > FireAphis wrote:
    > > > > Hello,

    > >
    > > > > I have a very big array of objects sorted by one of its numeric data
    > > > > members. During the flow of my application I need occasionally to

    > get
    > > > > all the elements in a specific range.

    > >
    > > > Then why not

    > >
    > > > my_array[start..end] ?

    > >
    > > Dear FireAphis,
    > >
    > > to find the values 'start' and 'end', you could use something like
    > > the twenty questions game to find a number between 1 and a million
    > > (roughly 2^20, hence 20 questions), starting by :
    > >
    > > 1.) Is the index of 'start' ('end') bigger or smaller than 2^19 (roughly

    > 500000)?
    > >
    > > 2.) If the answer was "yes", is the index of 'start' ('end') bigger or

    > smaller than 2^19+2^18 (roughly 750000)?
    > > If the answer was "no", is the index of 'start' ('end') bigger or

    > smaller than 2^18 (roughly 250000)?
    > >
    > > etc..
    > >
    > > The range in which "start" lies reduces its size by half with each

    > question, and you need 2*log(2) n questions to find the 'start' and 'end'
    > > values.
    > >
    > > I am actually not aware whether that's implemented in the Array#index
    > > method.
    > >
    > > Best regards,
    > >
    > > Axel
    > >
    > > --
    > > Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
    > > Ideal für Modem und ISDN:http://www.gmx.net/de/go/smartsurfer

    >
    > Of course that could be an excellent solution. Do you know if there is
    > already such a facility in Ruby that implements the algorithm? I'm
    > quite sure Array#index doesn't do that since in order to apply a
    > binary search the array has to be sorted.
    >


    Dear FireAphis,

    I don't know of an implemented solution, but I've implemented my own ... use at your own risk!

    Best regards,

    Axel


    class Array
    def find_lower_index(cond)
    len=self.length
    upper=len
    max_nr=(Math.log(len)/Math.log(2)).floor
    lower_index=0
    pow=1
    while eval(cond)==false
    lower_index=2**pow
    pow=pow+1
    end
    pow=pow-2
    lower=2**pow
    (pow-1).downto(0) do |x|
    old_test=lower_index
    lower_index=lower+2**x
    if eval(cond)==false
    else
    lower_index=old_test
    end
    end
    return lower_index+1
    end

    def find_upper_index(cond)
    len=self.length
    upper=len
    max_nr=(Math.log(len)/Math.log(2)).floor
    upper_index=0
    pow=1
    while eval(cond)==true
    upper_index=2**pow
    pow=pow+1
    end
    pow=pow-2
    upper=2**pow
    (pow-1).downto(0) do |x|
    old_test=upper_index
    upper_index=upper+2**x
    if eval(cond)
    else
    upper_index=old_test
    end
    upper=upper_index
    end
    return upper_index-1
    end


    end

    # some example
    len=10**8
    a=(1..len).to_a
    b=5*len/2
    cond='lower_index>10'
    r=a.find_lower_index(cond)
    cond='upper_index<100'
    s=a.find_upper_index(cond)
    p r,s
    p a[r]
    p a

    --
    Psssst! Schon vom neuen GMX MultiMessenger gehört?
    Der kanns mit allen: http://www.gmx.net/de/go/multimessenger
     
    Axel Etzold, Oct 3, 2007
    #7
  8. FireAphis

    7stud -- Guest

    FireAphis wrote:
    > My problem is that, as far as I know, find_all just iterates through
    > all the elements of the array and that's very inefficient, especially
    > in my case, in which I have a sorted array. Is there any standard way
    > to search efficiently through a sorted array? A binary search for
    > example?
    >


    You can also turn your array into a hash, which will work just as well
    for unsorted arrays:


    class Dog
    attr_reader :breed, :id

    def initialize(breed, id)
    @breed = breed
    @id = id
    end

    end

    #Create sorted array of Dog's:
    #-----------------------------
    dogs = []
    (1..2_000).each do |r|
    dogs << Dog.new("Akita", r*2)
    end

    =begin
    dogs = (1..2000).inject([]) do |arr, r|
    arr << Dog.new("Akita", r*2)
    arr
    end
    =end
    #----------------------------+


    #Convert array to hash.
    #key: dog.id
    #val: Dog object
    #---------------------
    my_id_dog_hash = {}

    class << my_id_dog_hash
    attr_accessor :highest_id
    end

    dogs.each do |dog|
    my_id_dog_hash[dog.id] = dog
    end

    my_id_dog_hash.highest_id = dogs[-1].id
    ----------------------+


    #Find Dog's in range:
    #-------------------
    def get_dogs_in_range(id_dog_hash, range)

    high_id = id_dog_hash.highest_id
    if range.end > high_id
    range = range.begin..high_id
    end


    dogs_in_range = []

    range.each do |r|
    dog = id_dog_hash[r]

    if dog
    dogs_in_range << id_dog_hash[r]
    end

    end
    =begin
    dogs_in_range = range.inject([]) do |arr1, r|
    if dog = id_dog_hash[r]
    arr1 << dog
    end

    arr1
    end
    =end

    dogs_in_range
    end
    ------------------+


    #Test it out:
    ------------
    results = get_dogs_in_range(my_id_dog_hash, 1_990..2000)
    results.each {|dog| p dog}
    -----------+

    --output:--
    #<Dog:0x1a70c @id=1990, @breed="Akita">
    #<Dog:0x1a6e4 @id=1992, @breed="Akita">
    #<Dog:0x1a6bc @id=1994, @breed="Akita">
    #<Dog:0x1a694 @id=1996, @breed="Akita">
    #<Dog:0x1a66c @id=1998, @breed="Akita">
    #<Dog:0x1a644 @id=2000, @breed="Akita">



    --
    Posted via http://www.ruby-forum.com/.
     
    7stud --, Oct 3, 2007
    #8
  9. On Oct 3, 2007, at 3:15 AM, FireAphis wrote:

    > Of course that could be an excellent solution. Do you know if there is
    > already such a facility in Ruby that implements the algorithm?


    A recent Ruby Quiz centered around binary search algorithms, so you
    can find several examples reading those solutions:

    http://www.rubyquiz.com/quiz139.html

    James Edward Gray II
     
    James Edward Gray II, Oct 3, 2007
    #9
  10. FireAphis

    FireAphis Guest

    On Oct 3, 1:45 pm, 7stud -- <> wrote:
    > FireAphis wrote:
    > > My problem is that, as far as I know, find_all just iterates through
    > > all the elements of the array and that's very inefficient, especially
    > > in my case, in which I have a sorted array. Is there any standard way
    > > to search efficiently through a sorted array? A binary search for
    > > example?

    >
    > You can also turn your array into a hash, which will work just as well
    > for unsorted arrays:
    >
    > class Dog
    > attr_reader :breed, :id
    >
    > def initialize(breed, id)
    > @breed = breed
    > @id = id
    > end
    >
    > end
    >
    > #Create sorted array of Dog's:
    > #-----------------------------
    > dogs = []
    > (1..2_000).each do |r|
    > dogs << Dog.new("Akita", r*2)
    > end
    >
    > =begin
    > dogs = (1..2000).inject([]) do |arr, r|
    > arr << Dog.new("Akita", r*2)
    > arr
    > end
    > =end
    > #----------------------------+
    >
    > #Convert array to hash.
    > #key: dog.id
    > #val: Dog object
    > #---------------------
    > my_id_dog_hash = {}
    >
    > class << my_id_dog_hash
    > attr_accessor :highest_id
    > end
    >
    > dogs.each do |dog|
    > my_id_dog_hash[dog.id] = dog
    > end
    >
    > my_id_dog_hash.highest_id = dogs[-1].id
    > ----------------------+
    >
    > #Find Dog's in range:
    > #-------------------
    > def get_dogs_in_range(id_dog_hash, range)
    >
    > high_id = id_dog_hash.highest_id
    > if range.end > high_id
    > range = range.begin..high_id
    > end
    >
    > dogs_in_range = []
    >
    > range.each do |r|
    > dog = id_dog_hash[r]
    >
    > if dog
    > dogs_in_range << id_dog_hash[r]
    > end
    >
    > end
    > =begin
    > dogs_in_range = range.inject([]) do |arr1, r|
    > if dog = id_dog_hash[r]
    > arr1 << dog
    > end
    >
    > arr1
    > end
    > =end
    >
    > dogs_in_range
    > end
    > ------------------+
    >
    > #Test it out:
    > ------------
    > results = get_dogs_in_range(my_id_dog_hash, 1_990..2000)
    > results.each {|dog| p dog}
    > -----------+
    >
    > --output:--
    > #<Dog:0x1a70c @id=1990, @breed="Akita">
    > #<Dog:0x1a6e4 @id=1992, @breed="Akita">
    > #<Dog:0x1a6bc @id=1994, @breed="Akita">
    > #<Dog:0x1a694 @id=1996, @breed="Akita">
    > #<Dog:0x1a66c @id=1998, @breed="Akita">
    > #<Dog:0x1a644 @id=2000, @breed="Akita">
    >
    > --
    > Posted viahttp://www.ruby-forum.com/.


    Neat. Not space efficient but undoubtfully more quick. But if the IDs
    have large gaps ([10000, 20000, 30000]) do you think it still will be
    more efficient compared to a simple iteration? Consider the fact that
    a simple loop will have three iterations whereas your implementation
    will have 30000 iteration.

    Thanks,
    FireAphis
     
    FireAphis, Oct 3, 2007
    #10
  11. On 10/3/07, FireAphis <> wrote:
    > > > I have a very big array of objects sorted by one of its numeric data
    > > > members. During the flow of my application I need occasionally to get
    > > > all the elements in a specific range.

    > Sorry if my explanation has been misleading. The array isn't
    > necessarily comprised of continuous or consecutive numbers. Moreover
    > they don't necessarily start from zero. For example:
    >
    > my_array = [-100, -99, -2, 0, 3, 125, 1742, 1234568763]
    >
    > now I want to get all the elements bigger than 1 and smaller than
    > 1000. AFAIK my_array[1..1000] doesn't help in this case.


    Here is my try:

    class Array
    # assumes a sorted array whose elements respond appropriately to <=>
    def subrange(lower, higher)
    low = find_index_lower(lower)
    high = find_index_higher(higher)
    if (low >= length) || (high < 0)
    []
    else
    self[low..high]
    end

    end

    # Finds the lowest index whose value is greater than the specified parameter
    # If all values in the array are lower, returns the index after the last one
    # If all values are higher, returns 0
    def find_index_lower(lower)
    return length if ((last <=> lower) == -1)
    return 0 if ((first <=> lower) == 1)
    low = 0
    high = self.length - 1
    while low < high
    index = low + (high - low) / 2
    element = self[index]
    test = element <=> lower
    case test
    when 0:
    return index + 1
    when -1:
    return (index + 1) if ((self[index+1] <=> lower) == 1)
    low = index + 1
    when 1:
    return (index) if ((self[index-1] <=> lower) <= 0)
    high = index - 1
    end
    end
    end

    # Finds the highest index whose value is lower than the specified parameter
    # If all values in the array are higher, returns -1
    # If all values are lower, returns length-1
    def find_index_higher(higher)
    return -1 if ((first <=> higher) >= 0)
    return length - 1 if ((last <=> higher) == -1)
    low = 0
    high = self.length - 1
    while low < high
    index = low + (high - low) / 2
    element = self[index]
    test = element <=> higher
    case test
    when 0:
    return index - 1
    when -1:
    return (index) if ((self[index+1] <=> higher) >= 0)
    low = index + 1
    when 1:
    return (index - 1) if ((self[index-1] <=> higher) == -1)
    high = index - 1
    end
    end
    end


    end

    a = [5, 6, 7, 8, 9, 10, 14]
    p a
    puts "6 - 13: #{a.subrange(6,13).inspect}"
    puts "5 - 15: #{a.subrange(5,15).inspect}"
    puts "5 - 100: #{a.subrange(5, 100).inspect}"
    puts "2 - 100: #{a.subrange(2, 100).inspect}"
    puts "-1 - 100: #{a.subrange(-1, 100).inspect}"
    puts "100 - 150: #{a.subrange(100,150).inspect}"
    puts "1 - 2: #{a.subrange(1,2).inspect}"

    require 'benchmark'

    n = 10_000
    Benchmark.bmbm do |x|
    x.report("Array subrange") {
    n.times {a.subrange(6,13)}
    }
    end

    -----------------------------------------------------------

    $ ruby array_subrange.rb
    [5, 6, 7, 8, 9, 10, 14]
    6 - 13: [7, 8, 9, 10]
    5 - 15: [6, 7, 8, 9, 10, 14]
    5 - 100: [6, 7, 8, 9, 10, 14]
    2 - 100: [5, 6, 7, 8, 9, 10, 14]
    -1 - 100: [5, 6, 7, 8, 9, 10, 14]
    100 - 150: []
    1 - 2: []
    Rehearsal --------------------------------------------------
    Array subrange 0.220000 0.020000 0.240000 ( 0.274937)
    ----------------------------------------- total: 0.240000sec

    user system total real
    Array subrange 0.190000 0.010000 0.200000 ( 0.227276)

    Jesus.
     
    Jesús Gabriel y Galán, Oct 3, 2007
    #11
  12. FireAphis

    7stud -- Guest

    Axel Etzold wrote:
    >
    > # some example
    > len=10**8
    > a=(1..len).to_a
    > b=5*len/2
    > cond='lower_index>10'
    > r=a.find_lower_index(cond)
    > cond='upper_index<100'
    > s=a.find_upper_index(cond)
    > p r,s
    > p a[r]
    > p a


    When I use this code:

    a = [2, 4, 6, 8]
    cond='lower_index>0'
    r=a.find_lower_index(cond)
    cond='upper_index<2'
    s=a.find_upper_index(cond)
    p r,s
    p a[r]
    p a

    I get this output:

    3
    1
    8
    4

    The output appears to be backwards, since the lower index would be 1 and
    the upper index would be 3. And it's a little bit puzzling why a search
    would be needed to find an index in an array that is greater than 0. I
    can tell you the answer without searching: it's 1. The same goes for
    the upper index: if the cond = "upper index < 2", then the upper index
    is 1.
    --
    Posted via http://www.ruby-forum.com/.
     
    7stud --, Oct 3, 2007
    #12
  13. FireAphis

    7stud -- Guest

    Axel Etzold wrote:
    >


    Can you explain this output:

    a = [2, 4, 6, 8]
    cond='lower_index>0'
    r=a.find_lower_index(cond)
    cond='upper_index<4'
    s=a.find_upper_index(cond)
    p r,s
    p a[r]
    p a

    --output:--
    3
    2
    8
    6
    --
    Posted via http://www.ruby-forum.com/.
     
    7stud --, Oct 3, 2007
    #13
  14. FireAphis

    7stud -- Guest

    FireAphis wrote:
    >
    > Neat. Not space efficient
    >


    Yes, you're right. I wouldn't recommend it for really large arrays. The
    original sorted array could also be abandoned in favor of a hash to
    begin with.


    > But if the IDs
    > have large gaps ([10000, 20000, 30000]) do you think it still will be
    > more efficient compared to a simple iteration?
    >


    Depending on how you write the simple iteration, the hash method can be
    a lot slower.

    >
    > a simple loop will have three iterations whereas your implementation
    > will have 30000 iteration.
    >


    What if a simple iteration has 3 iterations x 3,000 range checks, e.g.:

    dogs_in_range = []

    dogs.each do |dog|
    range.each do |r|
    if dog.id == r
    dogs_in_range << dog
    end
    end
    end
    --
    Posted via http://www.ruby-forum.com/.
     
    7stud --, Oct 3, 2007
    #14
  15. 2007/10/3, FireAphis <>:
    > Of course that could be an excellent solution. Do you know if there is
    > already such a facility in Ruby that implements the algorithm? I'm
    > quite sure Array#index doesn't do that since in order to apply a
    > binary search the array has to be sorted.


    http://raa.ruby-lang.org/search.rhtml?search=binary search

    Kind regards

    robert
     
    Robert Klemme, Oct 4, 2007
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rick

    Sorted Array

    Rick, Oct 12, 2003, in forum: Java
    Replies:
    13
    Views:
    797
    Alex Hunsley
    Oct 11, 2004
  2. Hunk
    Replies:
    4
    Views:
    376
  3. Replies:
    5
    Views:
    318
    Joe Greer
    Aug 23, 2008
  4. one man army

    Searching a sorted array of strings

    one man army, Feb 28, 2006, in forum: Perl Misc
    Replies:
    29
    Views:
    237
    Uri Guttman
    Mar 1, 2006
  5. stumblng.tumblr
    Replies:
    1
    Views:
    211
    stumblng.tumblr
    Feb 4, 2008
Loading...

Share This Page