How to identify which numbers in a list are within each others' range

erikcw · Jan 31, 2008

Hi,

I have a list of numbers each with a +/- margin of error. I need to
identify which ones overlab each other.

For example:
55 +/- 3
20 +/- 2
17 +/- 4
60 +/- 3

#base, max, min
list = [
(55, 58, 52),
(20, 22, 18),
(17, 21, 13),
(60, 63, 57),
]

In this example the range of list[0] overlaps the range of list[3] AND
list[1] overlaps list[2]

What is the best way to in python to identify the list items that
overlap and the items that don't overlap with any other.

Thanks!
Erik

Paul Rubin · Jan 31, 2008

erikcw said:
What is the best way to in python to identify the list items that
overlap and the items that don't overlap with any other.

This sounds like a homework problem, so the first thing I'll suggest
is that you figure out exactly what it means for two of those
intervals to overlap. That should let you write a simple program that
gets the right answer, but that can run slowly if the number of lists
gets large. The next thing to do after that is figure out how to
speed it up, if necessary. But do the first part first.

Arnaud Delobelle · Jan 31, 2008

Hi,

I have a list of numbers each with a +/- margin of error. I need to
identify which ones overlab each other.

For example:
55 +/- 3
20 +/- 2
17 +/- 4
60 +/- 3

#base, max, min
list = [
(55, 58, 52),
(20, 22, 18),
(17, 21, 13),
(60, 63, 57),
]

In this example the range of list[0] overlaps the range of list[3] AND
list[1] overlaps list[2]

What is the best way to in python to identify the list items that
overlap and the items that don't overlap with any other.

This is definitely the best way:

=======================
lst = [
(55, 58, 52),
(20, 22, 18),
(17, 21, 13),
(60, 63, 57),
]

from itertools import chain

def overlaps(lst):
bounds = chain(*(((x[1],i), (x[2], i)) for i,x in enumerate(lst)))
inside = {}
for x, i in sorted(bounds):
if inside.pop(i, None) is None:
for j, y in inside.iteritems():
if y != x: yield i, j
inside = x

==============================[(1, 2), (3, 0)]

attn.steven.kuo · Jan 31, 2008

Hi,

I have a list of numbers each with a +/- margin of error. I need to
identify which ones overlab each other.

For example:
55 +/- 3
20 +/- 2
17 +/- 4
60 +/- 3

#base, max, min
list = [
(55, 58, 52),
(20, 22, 18),
(17, 21, 13),
(60, 63, 57),
]

In this example the range of list[0] overlaps the range of list[3] AND
list[1] overlaps list[2]

What is the best way to in python to identify the list items that
overlap and the items that don't overlap with any other.

One way would be to use sets and check for intersection:

for idx, s in enumerate(mysets):
for next_idx, next_s in enumerate(mysets[idx+1:]):
if s.intersection(next_s):
print "mylist[%d] and mylist[%d] intersect" % (
idx, idx + next_idx + 1 )

attn.steven.kuo · Jan 31, 2008

One way would be to use sets and check for intersection:

for idx, s in enumerate(mysets):
for next_idx, next_s in enumerate(mysets[idx+1:]):
if s.intersection(next_s):
print "mylist[%d] and mylist[%d] intersect" % (
idx, idx + next_idx + 1 )

Um, that would have been more helpful if I hadn't forgotten
to preface that with:

mylist = [(55, 58, 52), (20, 22, 18), (17, 21, 13), (60, 63, 57),]
mysets = [set(range(x[2],x[1])) for x in mylist]

Paul Rubin · Jan 31, 2008

[email protected] said:
mysets = [set(range(x[2],x[1])) for x in mylist]

This is pretty horrible, each set can be arbitrarily large,
i.e. if x[2] and x[1] are 0 and 1000000, you get a set with
a million elements.

attn.steven.kuo · Jan 31, 2008

[email protected] said:
[email protected] said:

mysets = [set(range(x[2],x[1])) for x in mylist]

Click to expand...

This is pretty horrible, each set can be arbitrarily large,
i.e. if x[2] and x[1] are 0 and 1000000, you get a set with
a million elements.

True... Any lighter-weight implementation of
sets out there? That is, one that would defer
use of resources until actually needed --
somewhat akin to the distinction between
range and xrange, and so on.

xset?

Paul Rubin · Feb 1, 2008

[email protected] said:
True... Any lighter-weight implementation of
sets out there? That is, one that would defer
use of resources until actually needed --
somewhat akin to the distinction between
range and xrange, and so on.

Don't even think of doing it that way, that solves the space problem
but leaves a speed problem. You probably want to use something like
sort the intervals and then use the bisect module to find the
intervals intersecting a given one.

George Sakkis · Feb 1, 2008

Don't even think of doing it that way, that solves the space problem
but leaves a speed problem. You probably want to use something like
sort the intervals and then use the bisect module to find the
intervals intersecting a given one.

I haven't actually used it but from the short description this package
seems to fit the bill: http://pypi.python.org/pypi/interval/1.0.0

George

Thomas Pani · Feb 1, 2008

Arnaud said:
This is definitely the best way:

from itertools import chain

def overlaps(lst):
bounds = chain(*(((x[1],i), (x[2], i)) for i,x in enumerate(lst)))
inside = {}
for x, i in sorted(bounds):
if inside.pop(i, None) is None:
for j, y in inside.iteritems():
if y != x: yield i, j
inside = x

Why not just

def overlaps(lst):
for i,x in enumerate(lst):
for j,y in enumerate(lst):
if i != j and x[1] > y[2] > x[2]:
yield i,j

It's still n^2. Or am I missing something?

Cheers,
thomas

Arnaud Delobelle · Feb 1, 2008

Arnaud said:
Arnaud said:

This is definitely the best way:

Click to expand...

from itertools import chain

Click to expand...

def overlaps(lst):
bounds = chain(*(((x[1],i), (x[2], i)) for i,x in enumerate(lst)))
inside = {}
for x, i in sorted(bounds):
if inside.pop(i, None) is None:
for j, y in inside.iteritems():
if y != x: yield i, j
inside = x

Click to expand...

Why not just

def overlaps(lst):
for i,x in enumerate(lst):
for j,y in enumerate(lst):
if i != j and x[1] > y[2] > x[2]:
yield i,j

It's still n^2. Or am I missing something?

Yours is O(n^2) and \Omega(n^2). I think mine is O(max(c, nlogn))
(assuming constant time access for dictionaries, but 'inside' could be
replaced with a list) where c is the number of overlaps. I'll try to
post a proof later (if it's true!). So if we are in a situation where
overlaps are rare, it is in practice O(nlogn).

PS: the way I build 'bounds' is awkward.

Neil Cerutti · Feb 1, 2008

Yours is O(n^2) and \Omega(n^2). I think mine is O(max(c, nlogn))
(assuming constant time access for dictionaries, but 'inside' could be
replaced with a list) where c is the number of overlaps. I'll try to
post a proof later (if it's true!). So if we are in a situation where
overlaps are rare, it is in practice O(nlogn).

Here's another contender, basically the same as yours, but spelled
without iterators.

def overlaps(eranges):
"""Determine, from a list of numbers and a +/- range of uncertainty, which
number ranges overlap each other. Return the overlapping range indexes as
a list of overlapping pairs.
[(0, 3), (1, 2)]
"""
d = [(r[0] - r[1], r[0] + r[1], i) for i, r in enumerate(eranges)]
d.sort()
rv = []
x = 0
while x < len(d):
minx, maxx, ix = d[x]
y = x+1
while y < len(d):
miny, maxy, iy = d[y]
if miny < maxx:
rv.append((ix, iy) if ix < iy else (iy, ix))
else:
break
y += 1
x += 1
return rv

Arnaud Delobelle · Feb 1, 2008

Yours is O(n^2) and \Omega(n^2). I think mine is O(max(c, nlogn))
(assuming constant time access for dictionaries, but 'inside' could be
replaced with a list) where c is the number of overlaps. I'll try to
post a proof later (if it's true!). So if we are in a situation where
overlaps are rare, it is in practice O(nlogn).

I have slightly modified overlaps() in order to make the analysis
easier (I have made overlaps() consider all intervals open). Here is
the new version:

def open_interval(i, (a, b)):
return (a, 1, i), (b, 0, i)

def overlaps(lst, interval=open_interval):
bounds = chain(*starmap(interval, enumerate(lst))) # 1
inside = {} # 2
for x, _, i in sorted(bounds): # 3
if inside.pop(i, None) is None: # 4
for j, y in inside.iteritems(): yield i, j # 5
inside = x # 6

'Detailed' analysis of running overlaps(lst) follows, where

* lst is of length n
* the total number of overlaps in m

(assuming dict.__getitem__ and dic.__setitem__ are done in constant
time [1])

1. is a simple loop over lst, takes An
2. is constant (K)
3. The sorted(...) function will take Bnlogn
3,4. This loop is iterated 2n times, with the if condition it will
take Cn
5. This loop is iterated once for each overlap exacly. So it will take
Dm
6. This statement is executed n times, will take En

Altogether the time taken is:

K + (A+B+E)n + Bnlogn + Dm

So if m is dominated by nlogn, the algorithm is O(nlogn). Otherwise
one cannot hope for better thant O(m)!

--
Arnaud

[1] Otherwise one could use a combination of an array and a doubly
linked list to achieve constant time access, deletion and updating of
the 'inside' structure. Anyway even if it was log(n) the overall
complexity would be the same!

Arnaud Delobelle · Feb 1, 2008

Yours is O(n^2) and \Omega(n^2). I think mine is O(max(c, nlogn))
(assuming constant time access for dictionaries, but 'inside' could be
replaced with a list) where c is the number of overlaps. I'll try to
post a proof later (if it's true!). So if we are in a situation where
overlaps are rare, it is in practice O(nlogn).

Click to expand...

Here's another contender, basically the same as yours, but spelled
without iterators.

def overlaps(eranges):
"""Determine, from a list of numbers and a +/- range of uncertainty, which
number ranges overlap each other. Return the overlapping range indexes as
a list of overlapping pairs.

>>> sorted(overlaps(((55, 3), (20, 2), (17, 4), (60, 3))))
[(0, 3), (1, 2)]
"""
d = [(r[0] - r[1], r[0] + r[1], i) for i, r in enumerate(eranges)]
d.sort()
rv = []
x = 0
while x < len(d):
minx, maxx, ix = d[x]
y = x+1
while y < len(d):
miny, maxy, iy = d[y]
if miny < maxx:
rv.append((ix, iy) if ix < iy else (iy, ix))
else:
break
y += 1
x += 1
return rv

The total number of iterations is 1+2+...+n = n(n+1)/2 (when n is the
number of intervals) so this has quadratic behaviour regardless of
input. See my other post for a 'detailed' analysis of my solution.

Karthik Gurusamy · Feb 1, 2008

Hi,

I have a list of numbers each with a +/- margin of error. I need to
identify which ones overlab each other.

For example:
55 +/- 3
20 +/- 2
17 +/- 4
60 +/- 3

#base, max, min
list = [
(55, 58, 52),
(20, 22, 18),
(17, 21, 13),
(60, 63, 57),
]

In this example the range of list[0] overlaps the range of list[3] AND
list[1] overlaps list[2]

What is the best way to in python to identify the list items that
overlap and the items that don't overlap with any other.

Note you just need the left-end and right-end of each interval. Mean
is redundant information. Once you sort the interval, you can just go
from left to right, retaining only necessary information.

Below method is O(n log n) + O (nk) where k is the average overlaps
per interval.
On most average case, first term dominates and hence its O(n log n);
worst case is ofcourse O(n^2) (a simple case is all n intervals
overlapping with each other)

def strip_sort (a, b):
if a[0] < b[0]:
return -1
if a[0] > b[0]:
return 1
# To allow overlaps on a point, return L first then R
# If overlap on a point must not be allowed, return 1 below
if a[0] == 'L': return -1
return 0

def overlaps (strips_given):
# split each interval into two items. basically decorate with 'L'
for left-end of the interval, 'R' for right end of the interval
s2 = [((s[0], 'L', i) , (s[1], 'R', i)) for i,s in
enumerate(strips_given)]
strips = []
for s in s2: # flatten out the tuples
strips.append(s[0])
strips.append(s[1])

clique = set() # set of nodes where each overlap with everyone
else in the set
edges = [] # we are constructing a graph on N nodes where edge
i,j implies i and j overlap
# below is O(N log N) where is N is number of intervals
strips.sort(cmp=strip_sort)
for s in strips:
node = s[2]
if s[1] == 'L':
clique.add(node)
if s[1] == 'R':
# below is O(k) where k is clique size (overlaps per
interval)
new_edges = [(node, i) for i in clique if i != node]
edges += new_edges
clique.remove(node)
return edges

def main():
lst = [(52, 58), (18, 22), (13, 21), (57, 63)]
print overlaps(lst)

Output:
[(2, 1), (0, 3)]

Karthik

Arnaud Delobelle · Feb 1, 2008

On Feb 1 said:
def strip_sort (a, b):
if a[0] < b[0]:
return -1
if a[0] > b[0]:
return 1
if a[0] == 'L': return -1
return 0

def overlaps (strips_given):
s2 = [((s[0], 'L', i) , (s[1], 'R', i)) for i,s in
enumerate(strips_given)]
strips = []
for s in s2:
strips.append(s[0])
strips.append(s[1])
clique = set()
edges = []
strips.sort(cmp=strip_sort)
for s in strips:
node = s[2]
if s[1] == 'L':
clique.add(node)
if s[1] == 'R':
new_edges = [(node, i) for i in clique if i != node]
edges += new_edges
clique.remove(node)
return edges

Interesting. This is a long version of the algorithm I posted
earlier.

Neil Cerutti · Feb 1, 2008

Here's another contender, basically the same as yours, but spelled
without iterators.

def overlaps(eranges):
"""Determine, from a list of numbers and a +/- range of uncertainty, which
number ranges overlap each other. Return the overlapping range indexes as
a list of overlapping pairs.

sorted(overlaps(((55, 3), (20, 2), (17, 4), (60, 3))))

Click to expand...

[(0, 3), (1, 2)]
"""
d = [(r[0] - r[1], r[0] + r[1], i) for i, r in enumerate(eranges)]
d.sort()
rv = []
x = 0
while x < len(d):
minx, maxx, ix = d[x]
y = x+1
while y < len(d):
miny, maxy, iy = d[y]
if miny < maxx:
rv.append((ix, iy) if ix < iy else (iy, ix))
else:
break
y += 1
x += 1
return rv

Click to expand...

The total number of iterations is 1+2+...+n = n(n+1)/2 (when n is the
number of intervals) so this has quadratic behaviour regardless of
input. See my other post for a 'detailed' analysis of my solution.

But it breaks out of the inner loop as soon as ranges on the right
cannot overlap. The loop is O(N) for all input if I define N as
"number of overlaps", a pretty reasonable definition--it's madness to
expect to report N overlaps with less than N complexity.

Unless I'm making a silly mistake. Again.

Arnaud Delobelle · Feb 1, 2008

The total number of iterations is 1+2+...+n = n(n+1)/2 (when n is the
number of intervals) so this has quadratic behaviour regardless of
input.[...]

Click to expand...

But it breaks out of the inner loop as soon as ranges on the right
cannot overlap. The loop is O(N) for all input if I define N as
"number of overlaps", a pretty reasonable definition--it's madness to
expect to report N overlaps with less than N complexity.

Unless I'm making a silly mistake. Again.

No you're not mistaken, I am. I didn't see the 'else: break' bit
which of course makes all the difference. Sorry...

On the point of complexity though, it is only O(N) is N dominates
nlogn (n being the length of input).

Boris Borcic · Feb 8, 2008

Arnaud Delobelle wrote:
(...)

from itertools import chain

def overlaps(lst):
bounds = chain(*(((x[1],i), (x[2], i)) for i,x in enumerate(lst)))

imho, this is a uselessly painful equivalent of

bounds = ((x[k],i) for i,x in enumerate(lst) for k in (1,2))

Cheers, BB

Boris Borcic · Feb 8, 2008

Boris said:
Arnaud Delobelle wrote:
(...)

from itertools import chain

def overlaps(lst):
bounds = chain(*(((x[1],i), (x[2], i)) for i,x in enumerate(lst)))

Click to expand...

imho, this is a uselessly painful equivalent of

bounds = ((x[k],i) for i,x in enumerate(lst) for k in (1,2))

even more readable :

bounds = ((bound(x),i) for bound in (min,max) for i,x in enumerate(lst))

How to try a range of hex values in C# code ?	0	Nov 19, 2022
Minimum Total Difficulty	0	Nov 15, 2023
Taskcproblem calendar	4	Aug 31, 2023
Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
How to use ufixed when it involves multiplication a number of times?(VHDL question)	0	Aug 22, 2016
How can I get my menu inside of a menu to function properly?	1	Jan 19, 2023
Google gauge chart	2	Jul 16, 2023
Strange behavior for a 2D list	1	Apr 18, 2013

How to identify which numbers in a list are within each others' range

erikcw

Paul Rubin

Arnaud Delobelle

attn.steven.kuo

attn.steven.kuo

Paul Rubin

attn.steven.kuo

Paul Rubin

George Sakkis

Thomas Pani

Arnaud Delobelle

Neil Cerutti

Arnaud Delobelle

Arnaud Delobelle

Karthik Gurusamy

Arnaud Delobelle

Neil Cerutti

Arnaud Delobelle

Boris Borcic

Boris Borcic

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads