testing if a list contains a sublist

Johannes · Aug 16, 2011

hi list,
what is the best way to check if a given list (lets call it l1) is
totally contained in a second list (l2)?

for example:
l1 = [1,2], l2 = [1,2,3,4,5] -> l1 is contained in l2
l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2
l1 = [1,2,3], l2 = [1,3,5,7] -> l1 is not contained in l2

my problem is the second example, which makes it impossible to work with
sets insteads of lists. But something like set.issubset for lists would
be nice.

greatz Johannes

Roy Smith · Aug 16, 2011

Johannes said:
hi list,
what is the best way to check if a given list (lets call it l1) is
totally contained in a second list (l2)?

for example:
l1 = [1,2], l2 = [1,2,3,4,5] -> l1 is contained in l2
l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2
l1 = [1,2,3], l2 = [1,3,5,7] -> l1 is not contained in l2

my problem is the second example, which makes it impossible to work with
sets insteads of lists. But something like set.issubset for lists would
be nice.

greatz Johannes

import re

def sublist(l1, l2):
s1 = ''.join(map(str, l1))
s2 = ''.join(map(str, l2))
return re.search(s1, s2)

assert sublist([1,2], [1,2,3,4,5])
assert not sublist ([1,2,2], [1,2,3,4,5])
assert not sublist([1,2,3], [1,3,5,7])

(running and ducking)

Steven D'Aprano · Aug 16, 2011

hi list,
what is the best way to check if a given list (lets call it l1) is
totally contained in a second list (l2)?

This is not the most efficient algorithm, but for short lists it should be
plenty fast enough:

def contains(alist, sublist):
if len(sublist) == 0 or len(sublist) > len(alist):
return False
start = 0
while True:
try:
p = alist.index(sublist[0], start)
except ValueError:
return False
for i,x in enumerate(sublist):
if alist[p+i] != x:
start = p+1
break
else: # for loop exits without break
return True

ChasBrown · Aug 16, 2011

hi list,
what is the best way to check if a given list (lets call it l1) is
totally contained in a second list (l2)?

for example:
l1 = [1,2], l2 = [1,2,3,4,5] -> l1 is contained in l2
l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2
l1 = [1,2,3], l2 = [1,3,5,7] -> l1 is not contained in l2

my problem is the second example, which makes it impossible to work with
sets insteads of lists. But something like set.issubset for lists would
be nice.

greatz Johannes

My best guess:

from collections import Counter

def contains(alist, sublist):
alist = Counter(alist)
for k in sublist:
try:
alist[k] -= 1
if alist[k] < 0:
return False
except KeyError:
return False
return True

'Counter' being better understood as 'Multiset'.

If m = len(alist) and n = len(sublist), looks to be roughly O(m+n).

Cheers - Chas

ChasBrown · Aug 16, 2011

hi list,
what is the best way to check if a given list (lets call it l1) is
totally contained in a second list (l2)?

for example:
l1 = [1,2], l2 = [1,2,3,4,5] -> l1 is contained in l2
l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2
l1 = [1,2,3], l2 = [1,3,5,7] -> l1 is not contained in l2

my problem is the second example, which makes it impossible to work with
sets insteads of lists. But something like set.issubset for lists would
be nice.

greatz Johannes

My best guess:

from collections import Counter

def contains(alist, sublist):
alist = Counter(alist)
for k in sublist:
try:
alist[k] -= 1
if alist[k] < 0:
return False
except KeyError:
return False
return True

'Counter' being better understood as 'Multiset'.

If m = len(alist) and n = len(sublist), looks to be roughly O(m+n).

Cheers - Chas

ChasBrown · Aug 16, 2011

hi list,
what is the best way to check if a given list (lets call it l1) is
totally contained in a second list (l2)?

for example:
l1 = [1,2], l2 = [1,2,3,4,5] -> l1 is contained in l2
l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2
l1 = [1,2,3], l2 = [1,3,5,7] -> l1 is not contained in l2

my problem is the second example, which makes it impossible to work with
sets insteads of lists. But something like set.issubset for lists would
be nice.

greatz Johannes

My best guess:

from collections import Counter

def contains(alist, sublist):
alist = Counter(alist)
for k in sublist:
try:
alist[k] -= 1
if alist[k] < 0:
return False
except KeyError:
return False
return True

'Counter' being better understood as 'Multiset'.

If m = len(alist) and n = len(sublist), looks to be roughly O(m+n).

Cheers - Chas

Laszlo Nagy · Aug 16, 2011

hi list,
what is the best way to check if a given list (lets call it l1) is
totally contained in a second list (l2)?

for example:
l1 = [1,2], l2 = [1,2,3,4,5] -> l1 is contained in l2
l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2
l1 = [1,2,3], l2 = [1,3,5,7] -> l1 is not contained in l2

my problem is the second example, which makes it impossible to work with
sets insteads of lists. But something like set.issubset for lists would
be nice.

greatz Johannes

Click to expand...

Fastest, error-free and simplest solution is to use sets:

l1 = [1,2]
l2 = [1,2,3,4,5]
set(l1)-set(l2) set([])
set(l2)-set(l1) set([3, 4, 5])

Click to expand...

Click to expand...

Although with big lists, this is not very memory efficient. But I must
tell you, sometimes I use this method for lists with millions of
integers, and it is very fast and reliable, and memory is not a concern
for me, at least - some million integers will fit into a few MB of
memory. Read the docs about set operators for creating union, symmetric
difference etc.

Best,

Laszlo

alex23 · Aug 16, 2011

hi list,
what is the best way to check if a given list (lets call it l1) is
totally contained in a second list (l2)?
for example:
l1 = [1,2], l2 = [1,2,3,4,5] -> l1 is contained in l2
l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2
l1 = [1,2,3], l2 = [1,3,5,7] -> l1 is not contained in l2
my problem is the second example, which makes it impossible to work with
sets insteads of lists. But something like set.issubset for lists would
be nice.
greatz Johannes

Click to expand...

Click to expand...

Fastest, error-free and simplest solution is to use sets:

>>> l1 = [1,2]
>>> l2 = [1,2,3,4,5]
>>> set(l1)-set(l2)
set([])
>>> set(l2)-set(l1)
set([3, 4, 5])

Error free? Consider this stated requirement:

l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2

l1 = [1,2,2]
l2 = [1,2,3,4,5]
set(l1)-set(l2) set()
set(l2)-set(l1)

Click to expand...

Click to expand...

{3, 4, 5}

So set([1,2]) == set([1,2,2]), despite [1,2,2] not being a sublist of
the original.

It also completely ignores list order, which would make [9,8,7] a
sublist of [5,6,7,8,9].

alex23 · Aug 16, 2011

Laszlo Nagy said:
Fastest, error-free and simplest solution is to use sets:

>>> l1 = [1,2]
>>> l2 = [1,2,3,4,5]
>>> set(l1)-set(l2)
set([])
>>> set(l2)-set(l1)
set([3, 4, 5])
>>>

Error-free? Not given the stated requirements:

l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2

l1 = [1,2,2]
l2 = [1,2,3,4,5]
set(l1)-set(l2) set()
set(l2)-set(l1)

Click to expand...

Click to expand...

{3, 4, 5}

As you can see, using sets provides the exact same result for a non-
contained sub-list as it does for your valid example. The list
[1,2,2,2,2,2,2,2,2] is clearly not contained with [1,2,3], but your
code would say it is.

Similarly, [9,8,7] would appear to be a sub-list of [5,6,7,8,9],
something I'd also considered to be false in this instance.

(My apologies if this is a double-up, the original post hasn't
appeared yet)

ChasBrown · Aug 16, 2011

hi list,
what is the best way to check if a given list (lets call it l1) is
totally contained in a second list (l2)?
for example:
l1 = [1,2], l2 = [1,2,3,4,5] -> l1 is contained in l2
l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2
l1 = [1,2,3], l2 = [1,3,5,7] -> l1 is not contained in l2
my problem is the second example, which makes it impossible to work with
sets insteads of lists. But something like set.issubset for lists would
be nice.
greatz Johannes

Click to expand...

Click to expand...

Fastest, error-free and simplest solution is to use sets:

>>> l1 = [1,2]
>>> l2 = [1,2,3,4,5]
>>> set(l1)-set(l2)
set([])
>>> set(l2)-set(l1)
set([3, 4, 5])
>>>

This approach fails the OP's desires when

l1 = [1,2,2,]
l2 = [1,2,3,4,5]

Click to expand...

Click to expand...

The OP explicitly desires 'l2 contains l1' to be false in that case.

Cheers - Chas

Laszlo Nagy · Aug 16, 2011

Error free? Consider this stated requirement:

l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2

Click to expand...

If you look it the strict way, "containment" relation for lists is meant
this way:

l1 = []
l2 = [1,l1,2] # l2 CONTAINS l1

But you are right, I was wrong. So let's clarify what the OP wants!

For example:

l1 = [1,2,2,], l2 = [2,1,2,3,4,5]

What is the relation between these two lists? Does l2 contain l1 or not?
In other words, is this "containment" relation interpreted on multisets
not considering the order of the items?

It also completely ignores list order, which would make [9,8,7] a
sublist of [5,6,7,8,9].

Exactly. However, from the original post of Johannes it was not clear if
the order of the elements counts or not.

If It this is interpreted as a multiset relation, it would be easier to
use collections.Counter. If the order of elements is important then he
can start with a Boyer-Moore algorithm.

Best,

Laszlo

Steven D'Aprano · Aug 16, 2011

This is not the most efficient algorithm, but for short lists it should be
plenty fast enough:

Nope, sorry, that was buggy. Here's another version which should be accurate
but may be slower.

def search(source, target, start=0, end=None):
"""Naive search for target in source."""
m = len(source)
n = len(target)
if end is None:
end = m
if n == 0 or m < n:
return None
for i in range(start, end-n+1):
if source[i:i+n] == target:
return i
return None

Steven D'Aprano · Aug 16, 2011

hi list,
what is the best way to check if a given list (lets call it l1) is
totally contained in a second list (l2)?

for example:
l1 = [1,2], l2 = [1,2,3,4,5] -> l1 is contained in l2
l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2
l1 = [1,2,3], l2 = [1,3,5,7] -> l1 is not contained in l2

my problem is the second example, which makes it impossible to work with
sets insteads of lists. But something like set.issubset for lists would
be nice.

greatz Johannes

Click to expand...

My best guess:

from collections import Counter

There's no reason to think that the Original Poster wants a multiset based
solution. He asked about lists and sublists. That's a standard term, like
substring:

"12" is a substring of "01234".
"21" and "13" are not.

[1, 2] is a sublist of [0, 1, 2, 3, 4].
[2, 1] and [1, 3] are not.

Since lists are ordered, so are sublists.

If the OP does want a solution that ignores order, then he needs to describe
his problem better.

Nobody · Aug 16, 2011

what is the best way to check if a given list (lets call it l1) is
totally contained in a second list (l2)?

"Best" is subjective. AFAIK, the theoretically-optimal algorithm is
Boyer-Moore. But that would require a fair amount of code, and Python
isn't a particularly fast language, so you're likely to get better results
if you can delegate most of the leg-work to a native library (along the
lines of Roy's suggestion of using regexps).

Alain Ketterlin · Aug 16, 2011

Roy Smith said:
what is the best way to check if a given list (lets call it l1) is
totally contained in a second list (l2)?

Click to expand...

[...]
import re

def sublist(l1, l2):
s1 = ''.join(map(str, l1))
s2 = ''.join(map(str, l2))
return re.search(s1, s2)

This is complete nonsense (try it on [12] and [1,2]).

The original problem is "string searching" (where strings here are
sequences of numbers instead of characters). See

http://en.wikipedia.org/wiki/String_searching_algorithm

for any algorithm (Rabin-Karp seems appropriate to me).

-- Alain.

Roy Smith · Aug 16, 2011

Alain Ketterlin said:
Roy Smith said:

what is the best way to check if a given list (lets call it l1) is
totally contained in a second list (l2)?

Click to expand...

[...]
import re

def sublist(l1, l2):
s1 = ''.join(map(str, l1))
s2 = ''.join(map(str, l2))
return re.search(s1, s2)

Click to expand...

This is complete nonsense (try it on [12] and [1,2]).

No, it's not complete nonsense, it works just fine for the limited data
set the OP presented

Of course, you are correct that it only works
for single-digit integers, hence my "running and ducking" comment.

The original problem is "string searching" (where strings here are
sequences of numbers instead of characters). See

http://en.wikipedia.org/wiki/String_searching_algorithm

for any algorithm (Rabin-Karp seems appropriate to me).

Yes, of course this is fundamentally a string searching problem. The
problem is that while Python comes with an excellent tool for doing
these kinds of string searches, its domain is limited to, as you pointed
out, character strings. However, for the limited data the OP presented,
there is a trivial way to map the original data domain (single digit
integers) into the domain the re library knows about (character
strings). My answer may be a sucky general-purpose solution, but it's
also a neat hack, and sometimes a neat hack is what you need.

It also occurs to me that this hack could be expanded a bit to handle a
much larger input domain. If you had sequences of arbitrary (but
hashable) objects, you could map this into a unicode string where each
"character" is the 32-bit hash of of the corresponding input object,
then run the regex search on that.

In theory, it should work. In practice, I can see a few potential
problems:

1) You might have to carefully craft your hash function to avoid magic
unicode values like null or BOM. No biggie there.

2) Hash collisions can lead to false positives, so you need to filter
those out with a subsequent linear comparison step. Of course, this is
true of any kind of hash lookup. No biggie there either.

3) While the re module is advertised to work on unicode data, I have no
idea how efficient it is on arbitrary values. I wouldn't be too
surprised if it's tuned for "mostly ascii utf-8" and performance suffers
on completely random strings. If its first step is to transcode to
utf-32 internally, I expect it would work just fine, but it would take
some experimentation (or reading of the source code) to validate this
assumption.

John Posner · Aug 16, 2011

"Best" is subjective. AFAIK, the theoretically-optimal algorithm is
Boyer-Moore. But that would require a fair amount of code, and Python
isn't a particularly fast language, so you're likely to get better results
if you can delegate most of the leg-work to a native library (along the
lines of Roy's suggestion of using regexps).

How about using Python's core support for "==" on list objects:

def contains(alist, slist):
"""
predicate: is *slist* a sublist of *alist*?
"""
alist_sz = len(alist)
slist_sz = len(slist)

# special cases
if slist_sz == 0:
return True # empty list *is* a sublist
elif slist_sz == alist_sz and alist == slist:
return True
elif slist_sz > alist_sz:
return False

# standard case
for i in range(alist_sz - slist_sz + 1):
if slist == alist[i:i+slist_sz]:
return True
# fell through "for" loop: no match found
return False

-John

John Posner · Aug 16, 2011

"Best" is subjective. AFAIK, the theoretically-optimal algorithm is
Boyer-Moore. But that would require a fair amount of code, and Python
isn't a particularly fast language, so you're likely to get better results
if you can delegate most of the leg-work to a native library (along the
lines of Roy's suggestion of using regexps).

How about using Python's core support for "==" on list objects:

def contains(alist, slist):
"""
predicate: is *slist* a sublist of *alist*?
"""
alist_sz = len(alist)
slist_sz = len(slist)

# special cases
if slist_sz == 0:
return True # empty list *is* a sublist
elif slist_sz == alist_sz and alist == slist:
return True
elif slist_sz > alist_sz:
return False

# standard case
for i in range(alist_sz - slist_sz + 1):
if slist == alist[i:i+slist_sz]:
return True
# fell through "for" loop: no match found
return False

-John

nn · Aug 16, 2011

Roy Smith said:
Roy Smith said:

what is the best way to check if a given list (lets call it l1) is
totally contained in a second list (l2)?

Click to expand...

[...]

import re

Click to expand...

def sublist(l1, l2):
s1 = ''.join(map(str, l1))
s2 = ''.join(map(str, l2))
return re.search(s1, s2)

Click to expand...

This is complete nonsense (try it on [12] and [1,2]).

The original problem is "string searching" (where strings here are
sequences of numbers instead of characters). See

http://en.wikipedia.org/wiki/String_searching_algorithm

for any algorithm (Rabin-Karp seems appropriate to me).

-- Alain.

That can be easily fixed:
s1 = ','.join(map(str, lst1))
s2 = ','.join(map(str, lst2))
return False if s2.find(s1)==-1 else True

sublist([1,2],[1,2,3,4,5]) True
sublist([1,2,2],[1,2,3,4,5]) False
sublist([1,2,3],[1,3,5,7]) False
sublist([12],[1,2]) False

Click to expand...

Click to expand...

I don't know about best, but it works for the examples given.

Laszlo Nagy · Aug 16, 2011

That can be easily fixed:

s1 = ','.join(map(str, lst1))
s2 = ','.join(map(str, lst2))
return False if s2.find(s1)==-1 else True

I don't know about best, but it works for the examples given.

For numbers, it will always work. But what about

lst1 = [",",",,"]
lst1 = [",",",",","]

Efficient way to break up a list into two pieces	16	Feb 21, 2010
Comparing lists	1	Aug 16, 2010
how to separate a list into two lists?	7	Aug 6, 2011
multirember&co	21	Apr 17, 2007
Best way to merge/sort two sorted lists?...	11	Dec 6, 2007
Statically declare a constant linked list	9	Nov 7, 2011
Default function arguments behaving badly	3	Aug 24, 2005
Iterating over several lists at once	17	Dec 13, 2006

testing if a list contains a sublist

Johannes

Roy Smith

Steven D'Aprano

ChasBrown

ChasBrown

ChasBrown

Laszlo Nagy

alex23

alex23

ChasBrown

Laszlo Nagy

Steven D'Aprano

Steven D'Aprano

Nobody

Alain Ketterlin

Roy Smith

John Posner

John Posner

nn

Laszlo Nagy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads