Number set type

H

Heiko Wundram

Hi all!

I'm wondering whether there is some form of number set type implemented in
pure Python. I'm currently in the process of implementing one myself (for
an IPv4 address range type), and if there's some form of reference
implementation I'd love to see it. I've not found a reasonably complete
implementation anywhere so far.

To explain what I mean with number set: a set type that doesn't store
individual entries but as it only contains numbers can store ranges of
numbers.

Basically what it boils down to is implementing intersection reasonably
fast, because all other set operations can be reduced to intersection and
union (which is easy to implement). I have implementations for both, but
only union is fast with O(n), intersection is currently O(n^2).

But, anyway, if anybody knows of some implementation I can have a look at,
please let me know.

Thanks!

--- Heiko.
 
J

Justin Azoff

You could use IPy...
http://svn.23.nu/svn/repos/IPy/trunk/IPy.py is one location for it...

I wonder where you get O(n) and O(n^2) from... CIDR blocks are all
sequential.. All you need to store is the starting and ending address
or length. Then any set operation only has to deal with 4 numbers, and
should be literally a few lines of code with no loops.
 
H

Heiko Wundram

Justin said:
You could use IPy...
http://svn.23.nu/svn/repos/IPy/trunk/IPy.py is one location for it...

I know about IPy... But: it simply doesn't implement all I need it for.
I wonder where you get O(n) and O(n^2) from... CIDR blocks are all
sequential.. All you need to store is the starting and ending address
or length. Then any set operation only has to deal with 4 numbers, and
should be literally a few lines of code with no loops.

You make assumptions that my usage simply doesn't fill. An IP4Range must
consist of more than a single block of IP addresses (like 192.168.0.0/24
_and_ 192.168.10.0/24), that's why it's an IP4Range and not a IP4Network
(which I implement as an addr/mask pair in my IP4 abstraction).

An IP4Range can consist of several different ranges (which are of course
normalized and disjunct). I can specify what I meant by O(n) and O(n^2):

Union of two IP4Ranges is simply normalizing a concatenated list of both
IP4Range ranges. Normalizing takes O(log n)+O(n) = O(n) steps, where n is
the number of ranges in the combined IP4Range.

Intersection takes O(n^2) steps in my current implementation (which I know
is mathematically correct), where n is max(n_1,n_2) where n_1 is the number
of ranges in the first IP4Range and n_2 the number of ranges in the second
IP4Range respectively.

Intersecting two IP4Ranges can be done with fewer steps, and I think it
could be done in O(n) in the case of normalized and sorted ranges, and I
have a few ideas of myself, but I'm currently too lazy to try to prove them
correct.

Just as intersection and union can be done more efficiently, I guess that
several set operations can be done the simple way (like
self.difference(other) <=> self.union(other.inverse()), whereby inverse is
O(n) too), but still maybe there are better algorithms to do it directly.

Anyway, that's why I wanted to have a look at a ready implementation of a
number set type so that I might get a glimpse at somebody elses
implementation.

--- Heiko.
 
H

Heiko Wundram

Correction:

Heiko said:
You make assumptions that my usage simply doesn't fill. An IP4Range must

be able to
consist of more than a single block of IP addresses (like 192.168.0.0/24
_and_ 192.168.10.0/24), that's why it's an IP4Range and not a IP4Network
(which I implement as an addr/mask pair in my IP4 abstraction).

--- Heiko.
 
J

Justin Azoff

Heiko said:
Union of two IP4Ranges is simply normalizing a concatenated list of both
IP4Range ranges. Normalizing takes O(log n)+O(n) = O(n) steps, where n is
the number of ranges in the combined IP4Range.

I see now :) If the ranges are sorted, I bet you could just iterate
through both at the same time, merging intersecting ranges where
possible.
Intersection takes O(n^2) steps in my current implementation (which I know
is mathematically correct), where n is max(n_1,n_2) where n_1 is the number
of ranges in the first IP4Range and n_2 the number of ranges in the second
IP4Range respectively.

Intersecting two IP4Ranges can be done with fewer steps, and I think it
could be done in O(n) in the case of normalized and sorted ranges, and I
have a few ideas of myself, but I'm currently too lazy to try to prove them
correct.

Yes.. if they are sorted, something like this should work:

def intersection(self, other):
ret = []
ai=iter(self.ranges)
bi=iter(other.ranges)
try :
a = ai.next()
b = bi.next()
except StopIteration:
return IP4Range([])

while 1:
try :
if a.intersects(b):
ret.append(a.intersection(b))
a = ai.next()
b = bi.next()
elif a.start < b.start:
a = ai.next()
else :
b = bi.next()
except StopIteration:
break
return IP4Range(ret)
 
J

Justin Azoff

Justin said:
Yes.. if they are sorted, something like this should work:
Oops, that was almost right, but it would skip some ranges.

This should always work:

....
while 1:
try :
if a.intersects(b):
ret.append(a.intersection(b))
if a.end < b.end:
a = ai.next()
else :
b = bi.next()
elif a.start < b.start:
a = ai.next()
else :
b = bi.next()
except StopIteration:
break
return RangeList(ret)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top