algorithm help

A

Ara.T.Howard

any of you knuth fans know some slick (read: fast) way to do this

a = [ 1, 2, 3, 6, 7, 8 ,9, 42 ]

a.ranges #=> [ (1..3), (6..9), (42..42) ]

or am i doomed to O(n)?

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
A

Ara.T.Howard

What exactly are you trying to do ?
-s

i have a massive (100-500MB) grid of solar elevations and a coresponding data
set of binary format.

i need to find the areas in the solar elevations file that are less than a
certain amount and, based on those ranges, munge the data file in a certain
way to fix an issue with 'wobbling' sensor aboard the satelite imaging the
data. basically we have lines that look like

------------ | ------------
------------ | ------------

where the '|'s are supposed to line up. as it turns out this is only
happening in 'dark' areas of the data - therefore the solar elevations can be
used to predict where the problem will occur within a few pixels. (the
problem is caused by a photo-multiplier tube switching on when darkness is
sensed - we think).

so - i currently am munging the data, which is a terrible record based format,
using writable mmap's of the file and this is very fast. the issue is
constraining the munging to certain regions. narray allows me to say this
sort of thing __very__ quicky:

solar_elevations = NArray::sint 'solar_elevations.data', samples, lines

lines.times |j|
line = solar_elevations[ true, j ]
dark = line.lt(-3).where
...
...
end

here dark is the array having values like

[ 1, 2, 3, 6789, 6790, 6791 ]

that i want to reduce to a list of ranges.

in reality the ranges are huge and there will, in amost all cases except
crossing the poles, be only one. also note that 'dark' is a sorted list. my
current optimization is

min, max = dark[0], dark[dark.size - 1]

ranges =
if((max - min + 1) != dark.size)
slow_search dark
else
[ min .. max ]
end

ranges.each do |range|
#
# munge data based on ranges which are small and fast
#
end


i can't think of anything better than brute force for the 'slow_search' but
thought i'd throw it out there...

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
A

Ara.T.Howard

--8323328-254812897-1122958942=:22412
Content-Type: MULTIPART/MIXED; BOUNDARY="8323328-254812897-1122958942=:22412"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

--8323328-254812897-1122958942=:22412
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

He=B4s trying to identify all the sequences inside the original array.

based on the assumptions that

* array is sorted
* array is huge
* array probably contains zero or one range
* real upper limit of ranges is 2 but, in theory, could be n

which makes it a bit easier...

cheers.

-a
--=20
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D

--8323328-254812897-1122958942=:22412--
--8323328-254812897-1122958942=:22412--
 
M

Martin DeMello

Ara.T.Howard said:
here dark is the array having values like

[ 1, 2, 3, 6789, 6790, 6791 ]

that i want to reduce to a list of ranges.

in reality the ranges are huge and there will, in amost all cases except
crossing the poles, be only one. also note that 'dark' is a sorted list. my
current optimization is

min, max = dark[0], dark[dark.size - 1]

ranges =
if((max - min + 1) != dark.size)
slow_search dark
else
[ min .. max ]
end

ranges.each do |range|
#
# munge data based on ranges which are small and fast
#
end


i can't think of anything better than brute force for the 'slow_search' but
thought i'd throw it out there...

This might work better: binary search for {x | (x - min) == (i_x - i_min)},
set i_min to i_x+1 and repeat. For an added optimisation, run two loops
in parallel, searching from each end.

martin
 
D

Daniel Amelang

Instead of looking for ranges, why not look for gaps in the sequence
as you read in the input. Keep track of where each gap occurs
(beginning and end), like an inverse range. If you do this as you
populate your array, you won't have to go back and traverse the array
afterwards (hence the O(n)).

Hmmm...you would have thought about that by now if it were _that_ easy :)

Sorry, Ara, I guess I don't fully understand your problem.

Dan
 
J

Joel VanderWerf

Ara.T.Howard said:
On Tue, 2 Aug 2005, Luiz Esmiralha wrote:
=20 array.
=20
=20
based on the assumptions that
=20
* array is sorted
* array is huge
* array probably contains zero or one range
* real upper limit of ranges is 2 but, in theory, could be n
=20
which makes it a bit easier...
=20
cheers.
=20
-a

How could the array contain zero ranges? Empty array?

If I understand the practical assumptions we can make, then there are a
bounded number (1,2) of "gaps" between ranges. Why not just binary
search for them?

--=20
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407
 
J

Joel VanderWerf

Daniel said:
Instead of looking for ranges, why not look for gaps in the sequence
as you read in the input. Keep track of where each gap occurs
(beginning and end), like an inverse range. If you do this as you
populate your array, you won't have to go back and traverse the array
afterwards (hence the O(n)).

It's already O(n) if you read the array sequentially. I guess we're
assuming the array is given as a block of memory somewhere (Ara said
something about mmap...), so if we're clever enough we may not need to
traverse it even once.
 
W

William James

Joel said:
How could the array contain zero ranges? Empty array?

If I understand the practical assumptions we can make, then there are a
bounded number (1,2) of "gaps" between ranges. Why not just binary
search for them?

[
[3, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41],
[3, 4, 5, 6, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41],
[3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 37, 38, 39, 40, 41],
[3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17],
[3, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42],
[3, 4, 5, 6, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42],
[3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 37, 38, 39, 40, 41, 42],
[3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 42]
].each { |dark|

min, max = dark[0], dark[-1]
if((max - min + 1) == dark.size)
ranges = [ min .. max ]
else
dif = min
lower, upper = 0, dark.size-1
while upper - lower > 1
print "."
i = (lower + upper)/2
if dark - i == dif
lower = i
else
upper = i
end
end
ranges = [ (dark[0]..dark[lower]), (dark[upper]..dark[-1]) ]
end

p ranges
}

-----
ouput:

....[3..3, 28..41]
.....[3..6, 31..41]
.....[3..12, 37..41]
[3..17]
....[3..3, 28..42]
.....[3..6, 31..42]
.....[3..12, 37..42]
.....[3..17, 42..42]
 
P

Pit Capitain

Martin said:
Ara.T.Howard said:
here dark is the array having values like

[ 1, 2, 3, 6789, 6790, 6791 ]

that i want to reduce to a list of ranges.

in reality the ranges are huge and there will, in amost all cases except
crossing the poles, be only one. also note that 'dark' is a sorted list. my
current optimization is

min, max = dark[0], dark[dark.size - 1]

ranges =
if((max - min + 1) != dark.size)
slow_search dark
else
[ min .. max ]
end

ranges.each do |range|
#
# munge data based on ranges which are small and fast
#
end

i can't think of anything better than brute force for the 'slow_search' but
thought i'd throw it out there...

This might work better: binary search for {x | (x - min) == (i_x - i_min)},
set i_min to i_x+1 and repeat. For an added optimisation, run two loops
in parallel, searching from each end.

Just in case you haven't implemented Martin's binary search idea yet:

def gap?( data, min, max )
data[ max ] - data[ min ] > max - min
end

def find_gaps( data, min, max, acc )
if gap?( data, min, max )
if max - min == 1
acc << max
else
mid = ( max + min ) / 2
find_gaps( data, min, mid, acc )
find_gaps( data, mid, max, acc )
end
end
end

def gaps( data )
acc = [ 0 ]
find_gaps( data, 0, data.size - 1, acc )
acc << data.size
end

def ranges( *data )
gaps = gaps( data )
( 1 ... gaps.size ).map { |idx|
data[ gaps[ idx - 1 ] ] .. data[ gaps[ idx ] - 1 ]
}
end

p ranges( 1, 2, 3, 6789, 6790, 6791 )
# => [1..3, 6789..6791]

Regards,
Pit
 
A

Ara.T.Howard

This might work better: binary search for {x | (x - min) == (i_x - i_min)},
set i_min to i_x+1 and repeat. For an added optimisation, run two loops
in parallel, searching from each end.

Just in case you haven't implemented Martin's binary search idea yet:

def gap?( data, min, max )
data[ max ] - data[ min ] > max - min
end

def find_gaps( data, min, max, acc )
if gap?( data, min, max )
if max - min == 1
acc << max
else
mid = ( max + min ) / 2
find_gaps( data, min, mid, acc )
find_gaps( data, mid, max, acc )
end
end
end

def gaps( data )
acc = [ 0 ]
find_gaps( data, 0, data.size - 1, acc )
acc << data.size
end

def ranges( *data )
gaps = gaps( data )
( 1 ... gaps.size ).map { |idx|
data[ gaps[ idx - 1 ] ] .. data[ gaps[ idx ] - 1 ]
}
end

p ranges( 1, 2, 3, 6789, 6790, 6791 )
# => [1..3, 6789..6791]

sorry it took me so long to respond pit and martin... got sidetracked. your
binary search idea was perfect! for the actual data i'll be working with this
should be either O(1) or O(log(n)) - thought it can obviously degrade in the
general case. i used your idea in essentially the reverse pit, here's what i
ended up with:


harp:~ > cat a.rb

def sequences list
distance = lambda do |a,b|
(b - a).abs
end

continuous = lambda do |range, list|
first, last = range.first, range.last
a, b = list[first], list[last]
(list.empty? or (distance[a,b] == distance[first,last])) ? (a .. b) : nil
end

discrete = lambda do |range, list|
first, last = range.first, range.last
edge = last + 1
edge >= list.length or distance[list[last], list[edge]] > 1
end

sequence = lambda do |range, list|
first, last = range.first, range.last
list[first] .. list[last]
end

last = list.length - 1
a, b = 0, last

accum = []

while a <= b and a < list.length and b < list.length
range = a .. b

if continuous[ range, list ]
if discrete[ range, list ]
accum << sequence[ range, list ]
a, b = b + 1, last # move a and b up
else
b = b + distance[b, last] / 2 # move b up
end
else
b = a + distance[a, b] / 2 # move b down
end
end

accum
end

tests = [
[],
[0],
[0,1],
[0,1,2],
[0,1,2,3],
[0,3],
[0,3,5],
[0,3,5,7],
[0,1,3,4],
[0,1,3,4,6,7],
[0,1,3,4,6,7,9,10],
[0,1,3,4,6,7,10],
[0,3,4,6,7,9,10],
[0,1,3],
[0,3,4],
[0,-1],
[0,-1,-2],
[0,-1,-3,-4],
[0,-1,-3,-4,-6],
[0,-1,-3,-4,-6,-7],
[-1,-3,-4,-6,-7],
[-3,-4],
[-3,-4,-6, *(-42 .. -8).to_a.reverse],
]

tests.each do |a|
printf "%-37.37s => %37.37s\n", a.inspect, sequences(a).inspect
a.reverse!
printf "%-37.37s => %37.37s\n", a.inspect, sequences(a).inspect
end



harp:~ > ruby a.rb
[] => []
[] => []
[0] => [0..0]
[0] => [0..0]
[0, 1] => [0..1]
[1, 0] => [1..0]
[0, 1, 2] => [0..2]
[2, 1, 0] => [2..0]
[0, 1, 2, 3] => [0..3]
[3, 2, 1, 0] => [3..0]
[0, 3] => [0..0, 3..3]
[3, 0] => [3..3, 0..0]
[0, 3, 5] => [0..0, 3..3, 5..5]
[5, 3, 0] => [5..5, 3..3, 0..0]
[0, 3, 5, 7] => [0..0, 3..3, 5..5, 7..7]
[7, 5, 3, 0] => [7..7, 5..5, 3..3, 0..0]
[0, 1, 3, 4] => [0..1, 3..4]
[4, 3, 1, 0] => [4..3, 1..0]
[0, 1, 3, 4, 6, 7] => [0..1, 3..4, 6..7]
[7, 6, 4, 3, 1, 0] => [7..6, 4..3, 1..0]
[0, 1, 3, 4, 6, 7, 9, 10] => [0..1, 3..4, 6..7, 9..10]
[10, 9, 7, 6, 4, 3, 1, 0] => [10..9, 7..6, 4..3, 1..0]
[0, 1, 3, 4, 6, 7, 10] => [0..1, 3..4, 6..7, 10..10]
[10, 7, 6, 4, 3, 1, 0] => [10..10, 7..6, 4..3, 1..0]
[0, 3, 4, 6, 7, 9, 10] => [0..0, 3..4, 6..7, 9..10]
[10, 9, 7, 6, 4, 3, 0] => [10..9, 7..6, 4..3, 0..0]
[0, 1, 3] => [0..1, 3..3]
[3, 1, 0] => [3..3, 1..0]
[0, 3, 4] => [0..0, 3..4]
[4, 3, 0] => [4..3, 0..0]
[0, -1] => [0..-1]
[-1, 0] => [-1..0]
[0, -1, -2] => [0..-2]
[-2, -1, 0] => [-2..0]
[0, -1, -3, -4] => [0..-1, -3..-4]
[-4, -3, -1, 0] => [-4..-3, -1..0]
[0, -1, -3, -4, -6] => [0..-1, -3..-4, -6..-6]
[-6, -4, -3, -1, 0] => [-6..-6, -4..-3, -1..0]
[0, -1, -3, -4, -6, -7] => [0..-1, -3..-4, -6..-7]
[-7, -6, -4, -3, -1, 0] => [-7..-6, -4..-3, -1..0]
[-1, -3, -4, -6, -7] => [-1..-1, -3..-4, -6..-7]
[-7, -6, -4, -3, -1] => [-7..-6, -4..-3, -1..-1]
[-3, -4] => [-3..-4]
[-4, -3] => [-4..-3]
[-3, -4, -6, -8, -9, -10, -11, -12, - => [-3..-4, -6..-6, -8..-42]
[-42, -41, -40, -39, -38, -37, -36, - => [-42..-8, -6..-6, -4..-3]



your ideas were essential to getting the peice of code i'm working on now
running in a reasonable amount of time - thanks to all the contributors!

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
D

Daniel Berger

Ara.T.Howard wrote:

your ideas were essential to getting the peice of code i'm working on now
running in a reasonable amount of time - thanks to all the contributors!

-a

No applause please. Just throw money (at RubyCentral).

:)

Regards,

Dan
 
A

Ara.T.Howard

Ara.T.Howard wrote:



No applause please. Just throw money (at RubyCentral).

how? i bought the new book...

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,045
Latest member
DRCM

Latest Threads

Top