fast pythonic algorithm question

Discussion in 'Python' started by =?iso-8859-1?B?R3V5b24gTW9y6WU=?=, Aug 1, 2006.

  1. hi all,

    i have a big list of tuples like this:

    [ (host, port, protocol, startime, endtime), .. ] etc

    now i have another big(ger) list of tuples like this:

    [(src_host, src_port, dest_src, dest_port, protocol, time), ... ] etc

    now i need to find all the items in the second list where either
    src_host/src_port or dest_host/dest_port matches, protocol matches and
    time is between starttime and end time.

    After trynig some stuff out i actually found dictionary lookup pretty
    fast. Putting the first list in a dict like this:

    dict[(host,port,protocol)] = (starttime, endtime)

    then:

    if (((src_host,src_port, protocol) in dict or (dest_host, dest_port,
    protocol) in dict) and starttime < time < endtime):
    print "we have a winner"


    I have also been looking at the bisect module, but couldnt figure out
    if it was what I was looking for...

    any ideas?


    regards,

    Guyon Moree
    http;//gumuz.looze.net/
     
    =?iso-8859-1?B?R3V5b24gTW9y6WU=?=, Aug 1, 2006
    #1
    1. Advertising

  2. =?iso-8859-1?B?R3V5b24gTW9y6WU=?=

    Paul Rubin Guest

    "Guyon Morée" <> writes:
    > if (((src_host,src_port, protocol) in dict or (dest_host, dest_port,
    > protocol) in dict) and starttime < time < endtime):
    > print "we have a winner"


    If you have enough memory to do it that way, what's the problem?
     
    Paul Rubin, Aug 1, 2006
    #2
    1. Advertising

  3. Memory is no problem. It just needs to be as fast as possible, if
    that's what this is, fine.

    If not, I'd like to find out what is :)


    thanx,

    Guyon Moree
    http://gumuz.looze.net


    Paul Rubin schreef:

    > "Guyon Morée" <> writes:
    > > if (((src_host,src_port, protocol) in dict or (dest_host, dest_port,
    > > protocol) in dict) and starttime < time < endtime):
    > > print "we have a winner"

    >
    > If you have enough memory to do it that way, what's the problem?
     
    =?iso-8859-1?B?R3V5b24gTW9y6WU=?=, Aug 1, 2006
    #3
  4. Guyon Morée wrote:

    > Memory is no problem. It just needs to be as fast as possible, if
    > that's what this is, fine.
    >
    > If not, I'd like to find out what is :)


    I'd say it is as fast as it can get - using hashing for lookups is O(n) in
    most cases, where bisection or other order-based lookups have O(log n)

    Additionally, dict lookups are fully written in C.

    Diez
     
    Diez B. Roggisch, Aug 1, 2006
    #4
  5. =?iso-8859-1?B?R3V5b24gTW9y6WU=?=

    David Reed Guest

    On Aug 1, 2006, at 11:13 AM, Diez B. Roggisch wrote:

    > Guyon Morée wrote:
    >
    >> Memory is no problem. It just needs to be as fast as possible, if
    >> that's what this is, fine.
    >>
    >> If not, I'd like to find out what is :)

    >
    > I'd say it is as fast as it can get - using hashing for lookups is O
    > (n) in



    I know you meant O(1) for hash lookups, but just in case anyone is
    confused, I figured I'd correct this.


    > most cases, where bisection or other order-based lookups have O(log n)
    >
    > Additionally, dict lookups are fully written in C.
    >
    > Diez


    Dave
     
    David Reed, Aug 1, 2006
    #5
  6. >> I'd say it is as fast as it can get - using hashing for lookups is O
    >> (n) in

    >
    >
    > I know you meant O(1) for hash lookups, but just in case anyone is
    > confused, I figured I'd correct this.


    Ooops. Thanks.

    Diez
     
    Diez B. Roggisch, Aug 1, 2006
    #6
  7. =?iso-8859-1?B?R3V5b24gTW9y6WU=?=

    Guest

    Guyon Morée wrote:
    > i have a big list of tuples like this:
    >
    > [ (host, port, protocol, startime, endtime), .. ] etc
    >
    > now i have another big(ger) list of tuples like this:
    >
    > [(src_host, src_port, dest_src, dest_port, protocol, time), ... ] etc
    >
    > now i need to find all the items in the second list where either
    > src_host/src_port or dest_host/dest_port matches, protocol matches and
    > time is between starttime and end time.
    >
    > After trynig some stuff out i actually found dictionary lookup pretty
    > fast. Putting the first list in a dict like this:
    >
    > dict[(host,port,protocol)] = (starttime, endtime)


    That only works if each (host,port,protocol) can appear with only
    one (starttime, endtime) in your first big list. Do the variable
    names mean what they look like? There's nothing unusual about
    connecting to the same host and port with the same protocol, at
    multiple times.

    You might want your dict to associate (host,port,protocol) with a
    list, or a set, of tuples of the form (starttime, endtime). If the
    lists can be long, there are fancier methods for keeping the set
    of intervals and searching them for contained times or overlapping
    intervals. Google up "interval tree" for more.


    --
    --Bryan
     
    , Aug 1, 2006
    #7
  8. Brian you are right, but in my case (host, port, protocol) is unique.


    schreef:

    > Guyon Morée wrote:
    > > i have a big list of tuples like this:
    > >
    > > [ (host, port, protocol, startime, endtime), .. ] etc
    > >
    > > now i have another big(ger) list of tuples like this:
    > >
    > > [(src_host, src_port, dest_src, dest_port, protocol, time), ... ] etc
    > >
    > > now i need to find all the items in the second list where either
    > > src_host/src_port or dest_host/dest_port matches, protocol matches and
    > > time is between starttime and end time.
    > >
    > > After trynig some stuff out i actually found dictionary lookup pretty
    > > fast. Putting the first list in a dict like this:
    > >
    > > dict[(host,port,protocol)] = (starttime, endtime)

    >
    > That only works if each (host,port,protocol) can appear with only
    > one (starttime, endtime) in your first big list. Do the variable
    > names mean what they look like? There's nothing unusual about
    > connecting to the same host and port with the same protocol, at
    > multiple times.
    >
    > You might want your dict to associate (host,port,protocol) with a
    > list, or a set, of tuples of the form (starttime, endtime). If the
    > lists can be long, there are fancier methods for keeping the set
    > of intervals and searching them for contained times or overlapping
    > intervals. Google up "interval tree" for more.
    >
    >
    > --
    > --Bryan
     
    =?iso-8859-1?B?R3V5b24gTW9y6WU=?=, Aug 1, 2006
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    701
  2. Michele Simionato

    Python is darn fast (was: How fast is Python)

    Michele Simionato, Aug 23, 2003, in forum: Python
    Replies:
    13
    Views:
    591
  3. Carl J. Van Arsdall
    Replies:
    4
    Views:
    519
    Bruno Desthuilliers
    Feb 7, 2006
  4. Willi Richert

    Pythonic A*-Algorithm

    Willi Richert, Jan 11, 2007, in forum: Python
    Replies:
    0
    Views:
    342
    Willi Richert
    Jan 11, 2007
  5. Juha Nieminen
    Replies:
    22
    Views:
    1,074
    Kai-Uwe Bux
    Oct 12, 2007
Loading...

Share This Page