Is there such an idiom?

Discussion in 'Python' started by Per, Mar 20, 2006.

1. PerGuest

"""Use dictionaries for searching, not lists. To find items in common
between two lists, make the first into a dictionary and then look for
items in the second in it. Searching a list for an item is linear-time,
while searching a dict for an item is constant time. This can often let
you reduce search time from quadratic to linear."""

Is this correct?
s = [1,2,3,4,5...]
t = [4,5,6,,8,...]
how to find whether there is/are common item(s) between two list in
linear-time?
how to find the number of common items between two list in linear-time?

Per, Mar 20, 2006

2. Rene PijlmanGuest

Per:
>how to find whether there is/are common item(s) between two list
>in linear-time?

To find items in common between two lists, make the first into a
dictionary and then look for items in the second in it.

--

Wat wil jij leren? http://www.leren.nl

Rene Pijlman, Mar 20, 2006

Per wrote:
>
> """Use dictionaries for searching, not lists. To find items in common
> between two lists, make the first into a dictionary and then look for
> items in the second in it. Searching a list for an item is linear-time,
> while searching a dict for an item is constant time. This can often let
> you reduce search time from quadratic to linear."""
>
> Is this correct?
> s = [1,2,3,4,5...]
> t = [4,5,6,,8,...]
> how to find whether there is/are common item(s) between two list in
> linear-time?
> how to find the number of common items between two list in linear-time?

I'm not sure if it works in linear time, but if there are no duplicates
in each list, sets would be the easiest way to do these with.

>>> s = set([1,2,3,4,5,6])
>>> t = set([4,5,6,7,8,9])
>>> s.intersection(t)

set([4, 5, 6])
>>> len(s.intersection(t))

3

Cheers,
Ron

Ron Adam, Mar 20, 2006
4. Irmen de JongGuest

Per wrote:

> how to find the number of common items between two list in linear-time?
>

Not really sure about linear-time, but you could try the following:

>>> a=[1,2,3,4]
>>> b=[3,4,5,6]
>>> set(a) & set(b)

set([3, 4])

--Irmen

Irmen de Jong, Mar 20, 2006
5. PerGuest

Thanks Ron,
surely set is the simplest way to understand the question, to see
whether there is a non-empty intersection. But I did the following
thing in a silly way, still not sure whether it is going to be linear
time.
def foo():
l = [...]
s = [...]
dic = {}
for i in l:
dic = 0
k=0
while k <len(s):
if s[k] in dic:
return True
else: pass
k+=1
if k == len(s):
return False

I am still a rookie, and partly just migrated from Haskell...
I am not clear about how making one of the lists a dictionary is

Per, Mar 20, 2006
6. Michael SpencerGuest

Per wrote:
> Thanks Ron,
> surely set is the simplest way to understand the question, to see
> whether there is a non-empty intersection. But I did the following
> thing in a silly way, still not sure whether it is going to be linear
> time.
> def foo():
> l = [...]
> s = [...]
> dic = {}
> for i in l:
> dic = 0
> k=0
> while k <len(s):
> if s[k] in dic:
> return True
> else: pass
> k+=1
> if k == len(s):
> return False
>
>
> I am still a rookie, and partly just migrated from Haskell...
> I am not clear about how making one of the lists a dictionary is
>

The dict-based approach is to do something like this:

>>> ls1 = [1,3,5,7,9]
>>> ls2 = [3,5,6,7,10]
>>> d1 = dict.fromkeys(ls1)
>>> d1

{1: None, 3: None, 9: None, 5: None, 7: None}

Note the values, are irrelevant - we care only about the keys
Now, membership testing takes linear time:

>>> [item for item in ls2 if item in d1]

[3, 5, 7]
>>>

But, as you say, this approach is unnecessary, given sets.

HTH
Michael

Michael Spencer, Mar 20, 2006

Per wrote:
> Thanks Ron,
> surely set is the simplest way to understand the question, to see
> whether there is a non-empty intersection. But I did the following
> thing in a silly way, still not sure whether it is going to be linear
> time.
> def foo():
> l = [...]
> s = [...]
> dic = {}
> for i in l:
> dic = 0
> k=0
> while k <len(s):
> if s[k] in dic:
> return True
> else: pass
> k+=1
> if k == len(s):
> return False
>
>
> I am still a rookie, and partly just migrated from Haskell...
> I am not clear about how making one of the lists a dictionary is

Lets compare them by checking different length with no overlap which is
the worst case situation.

## is_interstion comparison

def ii_set(a, b):
return len(set(a).intersection(b))>0

def ii_dict(l, s):
dic = {}
for i in l:
dic = 0
for i in s:
if i in dic:
return True
return False

def ii_dict2(l, s):
dic = dict.fromkeys(l)
for i in s:
if i in dic:
return True
return False

import time

foos = [ii_set, ii_dict, ii_dict2]
lsts = [10, 100, 1000, 10000, 100000, 1000000]

for f in foos:
for lst in lsts:
a = range(lst)
b = range(lst, lst*2)
start = time.clock()
assert f(a,b) == False
t = time.clock()-start
print f.__name__, lst, t
print

ii_set 10 1.25714301678e-005
ii_set 100 2.45841301059e-005
ii_set 1000 0.000162031766607
ii_set 10000 0.0020256764477
ii_set 100000 0.0238681173166
ii_set 1000000 0.23067484834

ii_dict 10 2.31873045317e-005
ii_dict 100 6.73269926764e-005
ii_dict 1000 0.000442234976792
ii_dict 10000 0.0047891561637
ii_dict 100000 0.0502407428877
ii_dict 1000000 0.506360165887

ii_dict2 10 2.70984161395e-005
ii_dict2 100 5.55936578532e-005
ii_dict2 1000 0.000317358770458
ii_dict2 10000 0.00366638776716
ii_dict2 100000 0.0394256811969
ii_dict2 1000000 0.39200764343

The sets solution seems to be the fastest. And it is usually is when
you can move more of your task into the built-in methods which are
programmed in C.

From what I recently read here (not sure where) in another thread, in
python 2.3 sets were implemented as a python module using dictionaries,
and in 2.4 it was written in C code based on the dictionary C code.

Cheers,
Ron

Ron Adam, Mar 20, 2006
8. Alex MartelliGuest

Ron Adam <> wrote:
...
> From what I recently read here (not sure where) in another thread, in
> python 2.3 sets were implemented as a python module using dictionaries,
> and in 2.4 it was written in C code based on the dictionary C code.

....and in 2.5 they're due to be further optimized, using their own
specialized code rather than piggybacking on dictionaries'.

Alex

Alex Martelli, Mar 20, 2006
9. Bruce CropleyGuest

Per wrote:
[snip]

> Is this correct?
> s = [1,2,3,4,5...]
> t = [4,5,6,,8,...]
> how to find whether there is/are common item(s) between two list in
> linear-time?
> how to find the number of common items between two list in linear-time?

A common technique if both lists are sorted is to iterate through both
lists in parallel, advancing the smaller iterator each time. This is
the merge algorithm that is used by a merge sort, and it is O(s+t).

For two lists, the algorithm would go something like:

while not finished:
if s_iter_val < t_iter_val:
move s_iter on
elif s_iter_val > t_iter_val:
move t_iter on
else:
include / yield the value
move both iters on

For more on the standard merge algorithm, see:
http://en.wikipedia.org/wiki/Merge_algorithm

For an intersection merge, I hacked the recursive solution from
there...

def merge(a, b):
if len(a) == 0: return []
if len(b) == 0: return []
if a[0] < b[0]: return merge(a[1:], b)
elif a[0] > b[0]: return merge(a, b[1:])
else: return a[0:1] + merge(a[1:], b[1:])

#-------------------------8<-------------------------
import unittest

class TestMerge(unittest.TestCase):
def test_merge(self):
self.assertEquals(merge([1,2],[]), [])
self.assertEquals(merge([],[1,2]), [])
self.assertEquals(merge([1,3,5],[2,4,6]), [])
self.assertEquals(merge([1,2,3],[3,4,5]), [3])
self.assertEquals(merge([1,2,3,5,6,7],[3,4,5,7,8]), [3,5,7])

if __name__ == "__main__":
unittest.main()

HTH,
Bruce

Bruce Cropley, Mar 20, 2006