finding all sublists of list2 that are identical to list1

K

Klaus Neuner

Hello,

the function given below returns all indexes of list2 where a sublist
of list2 that is identical to list1 begins.

As I will need this function quite often, I would like to know if more
experienced programmers would agree with the way I defined the
function:

- Is there a more efficient way to do it? (Apart from building
automata.)
- Is there a more elegant/Pythonic way to write the function?

Klaus


def sublist(list1, list2):
if list1 == list2:
return [0]
elif len(list1) > len(list2):
return []
else:
result = []
shift = 0
i = 0
while shift <= len(list2)-len(list1):
if list1 == list2[shift+i]:
if i == len(list1)-1:
result.append(shift)
i = 0
shift += 1
else:
i += 1
else:
i = 0
shift += 1
return result
 
D

Diez B. Roggisch

- Is there a more efficient way to do it? (Apart from building
automata.)

Look into the shift-and algorithm for string-search- that should be much
faster. Its not trivial so I can't reproduce it right here out of my head,
but google helps.
- Is there a more elegant/Pythonic way to write the function?

Yes. You should work much more with slicing, and compare whole sublists like
this:

-------snip-----------
hs = [0, 1, 5, 3, 2, 4, 5, 3, 2]
ne = [5, 3, 2]

def sublist(needle, haystack):
hs_size = len(haystack)
n_size = len(needle)

if n_size > hs_size:
return []
elif hs_size == n_size and needle == haystack:
return []
else:
res = []
offset = 0
sublist = haystack[offset:eek:ffset + n_size]
while len(sublist) == n_size:
if sublist == needle:
res.append(offset)
offset += 1
sublist = haystack[offset:eek:ffset + n_size]
return res

if __name__ == "__main__":
print sublist(ne, hs)

-------snip-----------
 
D

Dang Griffith

Hello,

the function given below returns all indexes of list2 where a sublist
of list2 that is identical to list1 begins.

As I will need this function quite often, I would like to know if more
experienced programmers would agree with the way I defined the
function:

- Is there a more efficient way to do it? (Apart from building
automata.)
- Is there a more elegant/Pythonic way to write the function?

Klaus


def sublist(list1, list2):
if list1 == list2:
return [0]
elif len(list1) > len(list2):
return []
else:
result = []
shift = 0
i = 0
while shift <= len(list2)-len(list1):
if list1 == list2[shift+i]:
if i == len(list1)-1:
result.append(shift)
i = 0
shift += 1
else:
i += 1
else:
i = 0
shift += 1
return result


I don't know about it being Pythonic, but list comprehensions are
always fun. This one's 10-20% faster (based on timeit results) than
your code:

def sublist(list1, list2):
len1 = len(list1)
rnge = xrange(len(list2))
return [i for i in rnge if list2[i:len1+i] == list1]

If you need this for a specific purpose, you might be able to
transform the data and try a different technique which would be
executed internally at the "C" level. For example, if your list is
always integers, you might be able to join them into a tab-delimited
string and scan the string with a regular expression (unless you
consider that building automata).

The list doesn't have to be integers for the regex idea to be
used--you just have to be sure the values in list1 contain no regex
metacharacters, and that list1 and list2 don't contain your delimiter
(tab in this example). If list1 contains regex characters, you need
to escape them with a backslash first.

It's possible that if you know something about the size of list1, you
could use this information also, but without more details, I'm
shooting in the dark.

--dang
 
D

Diez B. Roggisch

def sublist(list1, list2):
len1 = len(list1)
rnge = xrange(len(list2))
return [i for i in rnge if list2[i:len1+i] == list1]

Nice one - looks like I optimized too much.
If you need this for a specific purpose, you might be able to
transform the data and try a different technique which would be
executed internally at the "C" level. For example, if your list is
always integers, you might be able to join them into a tab-delimited
string and scan the string with a regular expression (unless you
consider that building automata).

You don't gain anything by that - it would still perform the search in
O(n). But your idea makes sense when the created "strings" would then be
used in a better perforiming string searching algorithm like shift-and.
 
K

Klaus Neuner

def sublist(list1, list2):
len1 = len(list1)
rnge = xrange(len(list2))
return [i for i in rnge if list2[i:len1+i] == list1]
Thanks.

but without more details, I'm
shooting in the dark.

Actually, I want to leave open what list1 and list2 may contain. I
want to define a class Matcher, like so:

class Matcher(object):

def match(self, a, b):
if a == b:
return True
else:
return False

def all_matches(self, list1, list2):
len1 = len(list1)
rnge = xrange(len(list2))
return [i for i in rnge if self.match(list2[i:len1+i], list1)]

I use the function "match" instead of "==" here, because I want to
make subclasses that use a different concept of equivalence. One type
of equivalence
could be "equality in the first element".

class FirstElementMatcher(Matcher):

def match(self, a, b):
if a[0] == b[0]:
return True
else:
return False
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top