How to identify which numbers in a list are within each others' range

E

erikcw

Hi,

I have a list of numbers each with a +/- margin of error. I need to
identify which ones overlab each other.

For example:
55 +/- 3
20 +/- 2
17 +/- 4
60 +/- 3

#base, max, min
list = [
(55, 58, 52),
(20, 22, 18),
(17, 21, 13),
(60, 63, 57),
]

In this example the range of list[0] overlaps the range of list[3] AND
list[1] overlaps list[2]

What is the best way to in python to identify the list items that
overlap and the items that don't overlap with any other.

Thanks!
Erik
 
P

Paul Rubin

erikcw said:
What is the best way to in python to identify the list items that
overlap and the items that don't overlap with any other.

This sounds like a homework problem, so the first thing I'll suggest
is that you figure out exactly what it means for two of those
intervals to overlap. That should let you write a simple program that
gets the right answer, but that can run slowly if the number of lists
gets large. The next thing to do after that is figure out how to
speed it up, if necessary. But do the first part first.
 
A

Arnaud Delobelle

Hi,

I have a list of numbers each with a +/- margin of error.  I need to
identify which ones overlab each other.

For example:
55 +/- 3
20 +/- 2
17 +/- 4
60 +/- 3

#base, max, min
list = [
(55, 58, 52),
(20, 22, 18),
(17, 21, 13),
(60, 63, 57),
]

In this example the range of list[0] overlaps the range of list[3] AND
list[1] overlaps list[2]

What is the best way to in python to identify the list items that
overlap and the items that don't overlap with any other.

This is definitely the best way:

=======================
lst = [
(55, 58, 52),
(20, 22, 18),
(17, 21, 13),
(60, 63, 57),
]

from itertools import chain

def overlaps(lst):
bounds = chain(*(((x[1],i), (x[2], i)) for i,x in enumerate(lst)))
inside = {}
for x, i in sorted(bounds):
if inside.pop(i, None) is None:
for j, y in inside.iteritems():
if y != x: yield i, j
inside = x

==============================[(1, 2), (3, 0)]
 
A

attn.steven.kuo

Hi,

I have a list of numbers each with a +/- margin of error. I need to
identify which ones overlab each other.

For example:
55 +/- 3
20 +/- 2
17 +/- 4
60 +/- 3

#base, max, min
list = [
(55, 58, 52),
(20, 22, 18),
(17, 21, 13),
(60, 63, 57),
]

In this example the range of list[0] overlaps the range of list[3] AND
list[1] overlaps list[2]

What is the best way to in python to identify the list items that
overlap and the items that don't overlap with any other.


One way would be to use sets and check for intersection:

for idx, s in enumerate(mysets):
for next_idx, next_s in enumerate(mysets[idx+1:]):
if s.intersection(next_s):
print "mylist[%d] and mylist[%d] intersect" % (
idx, idx + next_idx + 1 )
 
A

attn.steven.kuo

One way would be to use sets and check for intersection:

for idx, s in enumerate(mysets):
for next_idx, next_s in enumerate(mysets[idx+1:]):
if s.intersection(next_s):
print "mylist[%d] and mylist[%d] intersect" % (
idx, idx + next_idx + 1 )


Um, that would have been more helpful if I hadn't forgotten
to preface that with:

mylist = [(55, 58, 52), (20, 22, 18), (17, 21, 13), (60, 63, 57),]
mysets = [set(range(x[2],x[1])) for x in mylist]
 
A

attn.steven.kuo

mysets = [set(range(x[2],x[1])) for x in mylist]

This is pretty horrible, each set can be arbitrarily large,
i.e. if x[2] and x[1] are 0 and 1000000, you get a set with
a million elements.



True... Any lighter-weight implementation of
sets out there? That is, one that would defer
use of resources until actually needed --
somewhat akin to the distinction between
range and xrange, and so on.

xset?
 
P

Paul Rubin

True... Any lighter-weight implementation of
sets out there? That is, one that would defer
use of resources until actually needed --
somewhat akin to the distinction between
range and xrange, and so on.

Don't even think of doing it that way, that solves the space problem
but leaves a speed problem. You probably want to use something like
sort the intervals and then use the bisect module to find the
intervals intersecting a given one.
 
G

George Sakkis

Don't even think of doing it that way, that solves the space problem
but leaves a speed problem. You probably want to use something like
sort the intervals and then use the bisect module to find the
intervals intersecting a given one.

I haven't actually used it but from the short description this package
seems to fit the bill: http://pypi.python.org/pypi/interval/1.0.0

George
 
T

Thomas Pani

Arnaud said:
This is definitely the best way:

from itertools import chain

def overlaps(lst):
bounds = chain(*(((x[1],i), (x[2], i)) for i,x in enumerate(lst)))
inside = {}
for x, i in sorted(bounds):
if inside.pop(i, None) is None:
for j, y in inside.iteritems():
if y != x: yield i, j
inside = x


Why not just

def overlaps(lst):
for i,x in enumerate(lst):
for j,y in enumerate(lst):
if i != j and x[1] > y[2] > x[2]:
yield i,j

It's still n^2. Or am I missing something?

Cheers,
thomas
 
A

Arnaud Delobelle

Arnaud said:
This is definitely the best way:
from itertools import chain
def overlaps(lst):
    bounds = chain(*(((x[1],i), (x[2], i)) for i,x in enumerate(lst)))
    inside = {}
    for x, i in sorted(bounds):
        if inside.pop(i, None) is None:
            for j, y in inside.iteritems():
                if y != x: yield i, j
            inside = x


Why not just

def overlaps(lst):
    for i,x in enumerate(lst):
        for j,y in enumerate(lst):
                if i != j and x[1] > y[2] > x[2]:
                    yield i,j

It's still n^2. Or am I missing something?


Yours is O(n^2) and \Omega(n^2). I think mine is O(max(c, nlogn))
(assuming constant time access for dictionaries, but 'inside' could be
replaced with a list) where c is the number of overlaps. I'll try to
post a proof later (if it's true!). So if we are in a situation where
overlaps are rare, it is in practice O(nlogn).

PS: the way I build 'bounds' is awkward.
 
N

Neil Cerutti

Yours is O(n^2) and \Omega(n^2). I think mine is O(max(c, nlogn))
(assuming constant time access for dictionaries, but 'inside' could be
replaced with a list) where c is the number of overlaps. I'll try to
post a proof later (if it's true!). So if we are in a situation where
overlaps are rare, it is in practice O(nlogn).

Here's another contender, basically the same as yours, but spelled
without iterators.

def overlaps(eranges):
"""Determine, from a list of numbers and a +/- range of uncertainty, which
number ranges overlap each other. Return the overlapping range indexes as
a list of overlapping pairs.
[(0, 3), (1, 2)]
"""
d = [(r[0] - r[1], r[0] + r[1], i) for i, r in enumerate(eranges)]
d.sort()
rv = []
x = 0
while x < len(d):
minx, maxx, ix = d[x]
y = x+1
while y < len(d):
miny, maxy, iy = d[y]
if miny < maxx:
rv.append((ix, iy) if ix < iy else (iy, ix))
else:
break
y += 1
x += 1
return rv
 
A

Arnaud Delobelle

Yours  is O(n^2) and \Omega(n^2).  I think mine is O(max(c, nlogn))
(assuming constant time access for dictionaries, but 'inside' could be
replaced with a list) where c is the number of overlaps.  I'll try to
post a proof later (if it's true!).  So if we are in a situation where
overlaps are rare, it is in practice O(nlogn).

I have slightly modified overlaps() in order to make the analysis
easier (I have made overlaps() consider all intervals open). Here is
the new version:

def open_interval(i, (a, b)):
return (a, 1, i), (b, 0, i)

def overlaps(lst, interval=open_interval):
bounds = chain(*starmap(interval, enumerate(lst))) # 1
inside = {} # 2
for x, _, i in sorted(bounds): # 3
if inside.pop(i, None) is None: # 4
for j, y in inside.iteritems(): yield i, j # 5
inside = x # 6

'Detailed' analysis of running overlaps(lst) follows, where

* lst is of length n
* the total number of overlaps in m

(assuming dict.__getitem__ and dic.__setitem__ are done in constant
time [1])

1. is a simple loop over lst, takes An
2. is constant (K)
3. The sorted(...) function will take Bnlogn
3,4. This loop is iterated 2n times, with the if condition it will
take Cn
5. This loop is iterated once for each overlap exacly. So it will take
Dm
6. This statement is executed n times, will take En

Altogether the time taken is:

K + (A+B+E)n + Bnlogn + Dm

So if m is dominated by nlogn, the algorithm is O(nlogn). Otherwise
one cannot hope for better thant O(m)!

--
Arnaud

[1] Otherwise one could use a combination of an array and a doubly
linked list to achieve constant time access, deletion and updating of
the 'inside' structure. Anyway even if it was log(n) the overall
complexity would be the same!
 
A

Arnaud Delobelle

Yours  is O(n^2) and \Omega(n^2).  I think mine is O(max(c, nlogn))
(assuming constant time access for dictionaries, but 'inside' could be
replaced with a list) where c is the number of overlaps.  I'll try to
post a proof later (if it's true!).  So if we are in a situation where
overlaps are rare, it is in practice O(nlogn).

Here's another contender, basically the same as yours, but spelled
without iterators.

def overlaps(eranges):
    """Determine, from a list of numbers and a +/- range of uncertainty, which
    number ranges overlap each other. Return the overlapping range indexes as
    a list of  overlapping pairs.

    >>> sorted(overlaps(((55, 3), (20, 2), (17, 4), (60, 3))))
    [(0, 3), (1, 2)]
    """
    d = [(r[0] - r[1], r[0] + r[1], i) for i, r in enumerate(eranges)]
    d.sort()
    rv = []
    x = 0
    while x < len(d):
        minx, maxx, ix = d[x]
        y = x+1
        while y < len(d):
            miny, maxy, iy = d[y]
            if miny < maxx:
                rv.append((ix, iy) if ix < iy else (iy, ix))
            else:
                break
            y += 1
        x += 1
    return rv

The total number of iterations is 1+2+...+n = n(n+1)/2 (when n is the
number of intervals) so this has quadratic behaviour regardless of
input. See my other post for a 'detailed' analysis of my solution.
 
K

Karthik Gurusamy

Hi,

I have a list of numbers each with a +/- margin of error. I need to
identify which ones overlab each other.

For example:
55 +/- 3
20 +/- 2
17 +/- 4
60 +/- 3

#base, max, min
list = [
(55, 58, 52),
(20, 22, 18),
(17, 21, 13),
(60, 63, 57),
]

In this example the range of list[0] overlaps the range of list[3] AND
list[1] overlaps list[2]

What is the best way to in python to identify the list items that
overlap and the items that don't overlap with any other.

Note you just need the left-end and right-end of each interval. Mean
is redundant information. Once you sort the interval, you can just go
from left to right, retaining only necessary information.

Below method is O(n log n) + O (nk) where k is the average overlaps
per interval.
On most average case, first term dominates and hence its O(n log n);
worst case is ofcourse O(n^2) (a simple case is all n intervals
overlapping with each other)


def strip_sort (a, b):
if a[0] < b[0]:
return -1
if a[0] > b[0]:
return 1
# To allow overlaps on a point, return L first then R
# If overlap on a point must not be allowed, return 1 below
if a[0] == 'L': return -1
return 0

def overlaps (strips_given):
# split each interval into two items. basically decorate with 'L'
for left-end of the interval, 'R' for right end of the interval
s2 = [((s[0], 'L', i) , (s[1], 'R', i)) for i,s in
enumerate(strips_given)]
strips = []
for s in s2: # flatten out the tuples
strips.append(s[0])
strips.append(s[1])

clique = set() # set of nodes where each overlap with everyone
else in the set
edges = [] # we are constructing a graph on N nodes where edge
i,j implies i and j overlap
# below is O(N log N) where is N is number of intervals
strips.sort(cmp=strip_sort)
for s in strips:
node = s[2]
if s[1] == 'L':
clique.add(node)
if s[1] == 'R':
# below is O(k) where k is clique size (overlaps per
interval)
new_edges = [(node, i) for i in clique if i != node]
edges += new_edges
clique.remove(node)
return edges

def main():
lst = [(52, 58), (18, 22), (13, 21), (57, 63)]
print overlaps(lst)

Output:
[(2, 1), (0, 3)]

Karthik
 
A

Arnaud Delobelle

def strip_sort (a, b):
    if a[0] < b[0]:
        return -1
    if a[0] > b[0]:
        return 1
    if a[0] == 'L': return -1
    return 0

def overlaps (strips_given):
    s2 = [((s[0], 'L', i) , (s[1], 'R', i)) for i,s in
enumerate(strips_given)]
    strips = []
    for s in s2:
        strips.append(s[0])
        strips.append(s[1])
    clique = set()
    edges = []
    strips.sort(cmp=strip_sort)
    for s in strips:
        node = s[2]
        if s[1] == 'L':
            clique.add(node)
        if s[1] == 'R':
            new_edges = [(node, i) for i in clique if i != node]
            edges += new_edges
            clique.remove(node)
    return edges

Interesting. This is a long version of the algorithm I posted
earlier.
 
N

Neil Cerutti

Here's another contender, basically the same as yours, but spelled
without iterators.

def overlaps(eranges):
"""Determine, from a list of numbers and a +/- range of uncertainty, which
number ranges overlap each other. Return the overlapping range indexes as
a list of overlapping pairs.
sorted(overlaps(((55, 3), (20, 2), (17, 4), (60, 3))))
[(0, 3), (1, 2)]
"""
d = [(r[0] - r[1], r[0] + r[1], i) for i, r in enumerate(eranges)]
d.sort()
rv = []
x = 0
while x < len(d):
minx, maxx, ix = d[x]
y = x+1
while y < len(d):
miny, maxy, iy = d[y]
if miny < maxx:
rv.append((ix, iy) if ix < iy else (iy, ix))
else:
break
y += 1
x += 1
return rv

The total number of iterations is 1+2+...+n = n(n+1)/2 (when n is the
number of intervals) so this has quadratic behaviour regardless of
input. See my other post for a 'detailed' analysis of my solution.

But it breaks out of the inner loop as soon as ranges on the right
cannot overlap. The loop is O(N) for all input if I define N as
"number of overlaps", a pretty reasonable definition--it's madness to
expect to report N overlaps with less than N complexity.

Unless I'm making a silly mistake. Again.
 
A

Arnaud Delobelle

The total number of iterations is 1+2+...+n = n(n+1)/2 (when n is the
number of intervals) so this has quadratic behaviour regardless of
input.[...]
But it breaks out of the inner loop as soon as ranges on the right
cannot overlap. The loop is O(N) for all input if I define N as
"number of overlaps", a pretty reasonable definition--it's madness to
expect to report N overlaps with less than N complexity.

Unless I'm making a silly mistake. Again.

No you're not mistaken, I am. I didn't see the 'else: break' bit
which of course makes all the difference. Sorry...

On the point of complexity though, it is only O(N) is N dominates
nlogn (n being the length of input).
 
B

Boris Borcic

Arnaud Delobelle wrote:
(...)
from itertools import chain

def overlaps(lst):
bounds = chain(*(((x[1],i), (x[2], i)) for i,x in enumerate(lst)))

imho, this is a uselessly painful equivalent of

bounds = ((x[k],i) for i,x in enumerate(lst) for k in (1,2))

Cheers, BB
 
B

Boris Borcic

Boris said:
Arnaud Delobelle wrote:
(...)
from itertools import chain

def overlaps(lst):
bounds = chain(*(((x[1],i), (x[2], i)) for i,x in enumerate(lst)))

imho, this is a uselessly painful equivalent of

bounds = ((x[k],i) for i,x in enumerate(lst) for k in (1,2))

even more readable :

bounds = ((bound(x),i) for bound in (min,max) for i,x in enumerate(lst))
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top