How to program efficient pattern searches in a list of float numbers?

M

malv

Simple case:
In this list, how to find all occurences of intervals of n adjacent
indexes having at least one list-member with a value between given
limits.
Visualizing the list as a two-dimensional curve, this is like
horizontally dragging a given rectangle over the curve and finding the
x coordinates where the curve passes through the rectangle.(Define such
a x-index coordinate as the left corner of the rectangle.)

More complicated case:
Given a pair of rectangles spaced relatively to each other in a fixed
manner. Drag this rectangle pair horizontally over the above
two-dimensional curve and list the indexes of the occurences where the
curve passes simultaneously through both rectangles.
(Define such a x-index coordinate as the leftmost corner of the
rectangle pair).

These problems can be solved by programming a naive search advancing
index by index. It seems obvious that due to the localized properties
searched for, much more efficient searches should be possible.
After having found the occurence-indexes for one particular rectangle
set, how to find the pattern occurences after changing one or more
rectangle parameters?
 
C

Charles Krug

Simple case:
In this list, how to find all occurences of intervals of n adjacent
indexes having at least one list-member with a value between given
limits.
Visualizing the list as a two-dimensional curve, this is like
horizontally dragging a given rectangle over the curve and finding the
x coordinates where the curve passes through the rectangle.(Define such
a x-index coordinate as the left corner of the rectangle.)

More complicated case:
Given a pair of rectangles spaced relatively to each other in a fixed
manner. Drag this rectangle pair horizontally over the above
two-dimensional curve and list the indexes of the occurences where the
curve passes simultaneously through both rectangles.
(Define such a x-index coordinate as the leftmost corner of the
rectangle pair).

These problems can be solved by programming a naive search advancing
index by index. It seems obvious that due to the localized properties
searched for, much more efficient searches should be possible.
After having found the occurence-indexes for one particular rectangle
set, how to find the pattern occurences after changing one or more
rectangle parameters?

Make a picture of the problem. From your description, I'm not certain
anything more complicated than a linear search is called for.

Is this the phrasing of the original problem presentation? Seems to me
the word "superimpose" ought to occur here.

Another thought: What is the end result you're after? Often we start
thinking of a solution but lose sight of the end result. Then another
solution will pop into our mind that makes it Oh So Simple.
 
P

Paul McGuire

Have you tried coding even the brute-force naive search? It is far
easier to improve an algorithm when you have someplace relatively
concrete to start from. Plus, the naive approach is most likely to
return a correct result, so that you can regression test your exotic
interval-skipping, second-derivative conjugate interpolation algorithm.
:)

In sum, I started to think through your problem this morning, and yes,
I can visualize the rectangles floating across the 2D curve, and yes, I
can picture that there are likely to be shortcuts and optimizations
over the brute-force "check every index" approach. But really, you
*should* do some of the work yourself...

-- Paul
 
B

Bengt Richter

Simple case:
In this list, how to find all occurences of intervals of n adjacent
indexes having at least one list-member with a value between given
limits.
Visualizing the list as a two-dimensional curve, this is like
horizontally dragging a given rectangle over the curve and finding the
x coordinates where the curve passes through the rectangle.(Define such
a x-index coordinate as the left corner of the rectangle.)

More complicated case:
Given a pair of rectangles spaced relatively to each other in a fixed
manner. Drag this rectangle pair horizontally over the above
two-dimensional curve and list the indexes of the occurences where the
curve passes simultaneously through both rectangles.
(Define such a x-index coordinate as the leftmost corner of the
rectangle pair).

These problems can be solved by programming a naive search advancing
index by index. It seems obvious that due to the localized properties
searched for, much more efficient searches should be possible.
After having found the occurence-indexes for one particular rectangle
set, how to find the pattern occurences after changing one or more
rectangle parameters?

for the first problem, maybe (untested beyond the below):
>>> from itertools import groupby
>>> data = [5+i%5 for i in xrange(40)]
>>> data
[5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 5, 6,
7, 8, 9, 5, 6, 7, 8, 9]
>>> [g.next()[0] for k,g in groupby(enumerate(data), lambda t: 6<t[1]<9) if k]
[2, 7, 12, 17, 22, 27, 32, 37]

By the time you figure a more efficient way, this can have solved it for quite a long sequence.
But since an efficient solution seems a lot like homework, I'll leave it to you ;-)
Or is this a bioinfo problem in disguise?

Regards,
Bengt Richter
 
B

Bengt Richter

Simple case:
In this list, how to find all occurences of intervals of n adjacent
indexes having at least one list-member with a value between given
limits.
Visualizing the list as a two-dimensional curve, this is like
horizontally dragging a given rectangle over the curve and finding the
x coordinates where the curve passes through the rectangle.(Define such
a x-index coordinate as the left corner of the rectangle.)

More complicated case:
Given a pair of rectangles spaced relatively to each other in a fixed
manner. Drag this rectangle pair horizontally over the above
two-dimensional curve and list the indexes of the occurences where the
curve passes simultaneously through both rectangles.
(Define such a x-index coordinate as the leftmost corner of the
rectangle pair).

These problems can be solved by programming a naive search advancing
index by index. It seems obvious that due to the localized properties
searched for, much more efficient searches should be possible.
After having found the occurence-indexes for one particular rectangle
set, how to find the pattern occurences after changing one or more
rectangle parameters?

for the first problem, maybe (untested beyond the below):
from itertools import groupby
data = [5+i%5 for i in xrange(40)]
data
[5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 5, 6,
7, 8, 9, 5, 6, 7, 8, 9]
[g.next()[0] for k,g in groupby(enumerate(data), lambda t: 6<t[1]<9) if k]
[2, 7, 12, 17, 22, 27, 32, 37]

By the time you figure a more efficient way, this can have solved it for quite a long sequence.
But since an efficient solution seems a lot like homework, I'll leave it to you ;-)
Or is this a bioinfo problem in disguise?
Wrong problem, sorry. I misread too quickly. I'll try to misread more slowly next time ;-/

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top