fast pythonic algorithm question

  • Thread starter =?iso-8859-1?B?R3V5b24gTW9y6WU=?=
  • Start date
?

=?iso-8859-1?B?R3V5b24gTW9y6WU=?=

hi all,

i have a big list of tuples like this:

[ (host, port, protocol, startime, endtime), .. ] etc

now i have another big(ger) list of tuples like this:

[(src_host, src_port, dest_src, dest_port, protocol, time), ... ] etc

now i need to find all the items in the second list where either
src_host/src_port or dest_host/dest_port matches, protocol matches and
time is between starttime and end time.

After trynig some stuff out i actually found dictionary lookup pretty
fast. Putting the first list in a dict like this:

dict[(host,port,protocol)] = (starttime, endtime)

then:

if (((src_host,src_port, protocol) in dict or (dest_host, dest_port,
protocol) in dict) and starttime < time < endtime):
print "we have a winner"


I have also been looking at the bisect module, but couldnt figure out
if it was what I was looking for...

any ideas?


regards,

Guyon Moree
http;//gumuz.looze.net/
 
P

Paul Rubin

Guyon Morée said:
if (((src_host,src_port, protocol) in dict or (dest_host, dest_port,
protocol) in dict) and starttime < time < endtime):
print "we have a winner"

If you have enough memory to do it that way, what's the problem?
 
?

=?iso-8859-1?B?R3V5b24gTW9y6WU=?=

Memory is no problem. It just needs to be as fast as possible, if
that's what this is, fine.

If not, I'd like to find out what is :)


thanx,

Guyon Moree
http://gumuz.looze.net


Paul Rubin schreef:
 
D

Diez B. Roggisch

Guyon said:
Memory is no problem. It just needs to be as fast as possible, if
that's what this is, fine.

If not, I'd like to find out what is :)

I'd say it is as fast as it can get - using hashing for lookups is O(n) in
most cases, where bisection or other order-based lookups have O(log n)

Additionally, dict lookups are fully written in C.

Diez
 
D

David Reed

I'd say it is as fast as it can get - using hashing for lookups is O
(n) in


I know you meant O(1) for hash lookups, but just in case anyone is
confused, I figured I'd correct this.

most cases, where bisection or other order-based lookups have O(log n)

Additionally, dict lookups are fully written in C.

Diez

Dave
 
D

Diez B. Roggisch

I'd say it is as fast as it can get - using hashing for lookups is O
I know you meant O(1) for hash lookups, but just in case anyone is
confused, I figured I'd correct this.

Ooops. Thanks.

Diez
 
B

bryanjugglercryptographer

Guyon said:
i have a big list of tuples like this:

[ (host, port, protocol, startime, endtime), .. ] etc

now i have another big(ger) list of tuples like this:

[(src_host, src_port, dest_src, dest_port, protocol, time), ... ] etc

now i need to find all the items in the second list where either
src_host/src_port or dest_host/dest_port matches, protocol matches and
time is between starttime and end time.

After trynig some stuff out i actually found dictionary lookup pretty
fast. Putting the first list in a dict like this:

dict[(host,port,protocol)] = (starttime, endtime)

That only works if each (host,port,protocol) can appear with only
one (starttime, endtime) in your first big list. Do the variable
names mean what they look like? There's nothing unusual about
connecting to the same host and port with the same protocol, at
multiple times.

You might want your dict to associate (host,port,protocol) with a
list, or a set, of tuples of the form (starttime, endtime). If the
lists can be long, there are fancier methods for keeping the set
of intervals and searching them for contained times or overlapping
intervals. Google up "interval tree" for more.
 
?

=?iso-8859-1?B?R3V5b24gTW9y6WU=?=

Brian you are right, but in my case (host, port, protocol) is unique.


(e-mail address removed) schreef:
Guyon said:
i have a big list of tuples like this:

[ (host, port, protocol, startime, endtime), .. ] etc

now i have another big(ger) list of tuples like this:

[(src_host, src_port, dest_src, dest_port, protocol, time), ... ] etc

now i need to find all the items in the second list where either
src_host/src_port or dest_host/dest_port matches, protocol matches and
time is between starttime and end time.

After trynig some stuff out i actually found dictionary lookup pretty
fast. Putting the first list in a dict like this:

dict[(host,port,protocol)] = (starttime, endtime)

That only works if each (host,port,protocol) can appear with only
one (starttime, endtime) in your first big list. Do the variable
names mean what they look like? There's nothing unusual about
connecting to the same host and port with the same protocol, at
multiple times.

You might want your dict to associate (host,port,protocol) with a
list, or a set, of tuples of the form (starttime, endtime). If the
lists can be long, there are fancier methods for keeping the set
of intervals and searching them for contained times or overlapping
intervals. Google up "interval tree" for more.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top