Test if list contains another list

  • Thread starter Bruno Desthuilliers
  • Start date
B

Bruno Desthuilliers

mathieu a écrit :
Hi there,

I am trying to write something very simple to test if a list
contains another one:

a = [1,2,3]

b = [3,2,1,4]

but 'a in b' returns False.

Indeed. Lists are not sets, and the fact that all elements of list a
happens to also be part of list b doesn't make the list a itself an
element of list b.
>>> a = [1, 2, 3]
>>> b = [3,2,1,4]
>>> c = [b, a]
>>> a in c True
>>> b in c True
>>> c [[3, 2, 1, 4], [1, 2, 3]]
>>>

How do I check that a is indeed contained
in b ?

But that's what you did - you *did* checked if a was contained in b, and
this is not the case. What you want is to check if *all elements* of a
are contained in b, which is quite another problem. Now the answer to
your question : use sets.

HTH
 
M

mathieu

Hi there,

I am trying to write something very simple to test if a list
contains another one:

a = [1,2,3]

b = [3,2,1,4]

but 'a in b' returns False. How do I check that a is indeed contained
in b ?

thanks
 
M

mathieu

mathieu a écrit :
Hi there,
I am trying to write something very simple to test if a list
contains another one:
a = [1,2,3]
b = [3,2,1,4]
but 'a in b' returns False.

Indeed. Lists are not sets, and the fact that all elements of list a
happens to also be part of list b doesn't make the list a itself an
element of list b.
a = [1, 2, 3]
b = [3,2,1,4]
c = [b, a]
a in c True
b in c True
c [[3, 2, 1, 4], [1, 2, 3]]
How do I check that a is indeed contained
in b ?

But that's what you did - you *did* checked if a was contained in b, and
this is not the case. What you want is to check if *all elements* of a
are contained in b, which is quite another problem. Now the answer to
your question : use sets.

thanks all !
 
M

Matimus

mathieu a écrit :
Hi there,
  I am trying to write something very simple to test if a list
contains another one:
a = [1,2,3]
b = [3,2,1,4]
but 'a in b' returns False.

Indeed. Lists are not sets, and the fact that all elements of list a
happens to also be part of list b doesn't make the list a itself an
element of list b.

 >>> a = [1, 2, 3]
 >>> b = [3,2,1,4]
 >>> c = [b, a]
 >>> a in c
True
 >>> b in c
True
 >>> c
[[3, 2, 1, 4], [1, 2, 3]]
 >>>
How do I check that a is indeed contained
in b ?

But that's what you did - you *did* checked if a was contained in b, and
this is not the case. What you want is to check if *all elements* of a
are contained in b, which is quite another problem. Now the answer to
your question : use sets.

 >>> set(a).issubset(set(b))
True
 >>>

HTH

Just to clarify, doing it using sets is not going to preserve order OR
number of elements that are the same.

That is:
a = [1,1,2,3,4]
b = [4,5,3,7,2,6,1]
set(a).issubset(set(b))
True

This will return True if b contains at least on of each element found
in a. If the OP wanted to check that list `a` appeared in order
somewhere in list `b` then sets won't work.

Matt
 
B

Bruno Desthuilliers

Matimus a écrit :
On Sep 8, 12:32 am, Bruno Desthuilliers
Just to clarify, doing it using sets is not going to preserve order OR
number of elements that are the same.

That is:
a = [1,1,2,3,4]
b = [4,5,3,7,2,6,1]
set(a).issubset(set(b))
True

This will return True if b contains at least on of each element found
in a. If the OP wanted to check that list `a` appeared in order
somewhere in list `b` then sets won't work.

Indeed, and I should have mentionned this myself. Thanks for this reminder.
 
J

J. Cliff Dyer

Matimus a écrit :
On Sep 8, 12:32 am, Bruno Desthuilliers
set(a).issubset(set(b))
True
Just to clarify, doing it using sets is not going to preserve order OR
number of elements that are the same.

That is:
a = [1,1,2,3,4]
b = [4,5,3,7,2,6,1]
set(a).issubset(set(b))
True

This will return True if b contains at least on of each element found
in a. If the OP wanted to check that list `a` appeared in order
somewhere in list `b` then sets won't work.

Indeed, and I should have mentionned this myself. Thanks for this reminder.

If preserving order is important, strings have many of the properties
you're looking for, but it might take some work to figure out a suitable
way to convert to a string. The problem is easier if you know something
about the inputs. If all inputs are known to be numbers between 0 and
15, you can just do:

if ''.join(map(hex, a)) in ''.join(map(hex, b)):
return True

Hmm... actually, with the '0x' prefix that hex() puts on numbers, I
think this works for arbitrary integers.

Cheers,
Cliff
 
G

gauravatnet

Matimus a écrit :
On Sep 8, 12:32 am, Bruno Desthuilliers
 >>> set(a).issubset(set(b))
True
Just to clarify, doing it using sets is not going to preserve order OR
number of elements that are the same.
That is:
a = [1,1,2,3,4]
b = [4,5,3,7,2,6,1]
set(a).issubset(set(b))
True
This will return True if b contains at least on of each element found
in a. If the OP wanted to check that list `a` appeared in order
somewhere in list `b` then sets won't work.
Indeed, and I should have mentionned this myself. Thanks for this reminder.

If preserving order is important, strings have many of the properties
you're looking for, but it might take some work to figure out a suitable
way to convert to a string.  The problem is easier if you know something
about the inputs.  If all inputs are known to be numbers between 0 and
15, you can just do:

if ''.join(map(hex, a)) in ''.join(map(hex, b)):
    return True

Hmm... actually, with the '0x' prefix that hex() puts on numbers, I
think this works for arbitrary integers.

Cheers,
Cliff

Hi,

I looked inside this thread for my query which brought me the
following google search result
"Test if list contains another list - comp.lang.python | Google
Groups"

But then I was disappointed to see the question asked was not exactly
right. Other programmers have already answered to the main question.
But what if you actually have to find out if a list has all its
element inside another list in the same order. For that I wrote the
code and that's what I came up with.. let me know if there are any
bugs in this code.

#!C:\Python24

def findAllMatchingList(mainList, subList):
resultIndex = []
globalIndex = 0
for i in range(len(mainList)):
if i < globalIndex:
continue
globalIndex = i
increment = 0
for j in range(len(subList)):
if mainList[globalIndex] == subList[j]:
globalIndex += 1
increment += 1
if j == (len(subList)-1):
resultIndex.append(globalIndex-increment)
else:
break

return resultIndex

if __name__ == "__main__":
#Test case
mL = [ 'a', 'b', 'c', 1, 2, 4, 1, 2, 1, 1, 1, 2, 9, 1, 1, 1, 2, 3,
'a', 1, 2, 3, 4 ];
#mL = [ 'a', 'a', 'b', 1 ,2 ,3, 5, 6]
sL = [ 1, 2, 3 ]
result = findList( mL, sL )
for i in result:
print str(i)

Regards,
Gaurav.
 
G

gauravatnet

Matimus a écrit :
On Sep 8, 12:32 am, Bruno Desthuilliers
(snip)
 >>> set(a).issubset(set(b))
True
Just to clarify, doing it using sets is not going to preserve order OR
number of elements that are the same.
That is:
a = [1,1,2,3,4]
b = [4,5,3,7,2,6,1]
set(a).issubset(set(b))
True
This will return True if b contains at least on of each element found
in a. If the OP wanted to check that list `a` appeared in order
somewhere in list `b` then sets won't work.
Indeed, and I should have mentionned this myself. Thanks for this reminder.
If preserving order is important, strings have many of the properties
you're looking for, but it might take some work to figure out a suitable
way to convert to a string.  The problem is easier if you know something
about the inputs.  If all inputs are known to be numbers between 0 and
15, you can just do:
if ''.join(map(hex, a)) in ''.join(map(hex, b)):
    return True
Hmm... actually, with the '0x' prefix that hex() puts on numbers, I
think this works for arbitrary integers.
Cheers,
Cliff

Hi,

I looked inside this thread for my query which brought me the
following google search result
"Test if list contains another list - comp.lang.python | Google
Groups"

But then I was disappointed to see the question asked was not exactly
right. Other programmers have already answered to the main question.
But what if you actually have to find out if a list has all its
element inside another list in the same order. For that I wrote the
code and that's what I came up with.. let me know if there are any
bugs in this code.

#!C:\Python24

def findAllMatchingList(mainList, subList):
    resultIndex = []
    globalIndex = 0
    for i in range(len(mainList)):
        if i < globalIndex:
            continue
        globalIndex = i
        increment = 0
        for j in range(len(subList)):
            if mainList[globalIndex] == subList[j]:
                globalIndex += 1
                increment += 1
                if j == (len(subList)-1):
                    resultIndex.append(globalIndex-increment)
            else:
                break

    return resultIndex

if __name__ == "__main__":
    #Test case
    mL = [ 'a', 'b', 'c', 1, 2, 4, 1, 2, 1, 1, 1, 2, 9, 1, 1, 1, 2, 3,
'a', 1, 2, 3, 4 ]
    #mL = [ 'a', 'a', 'b', 1 ,2 ,3, 5, 6]
    sL = [ 1, 2, 3 ]
    result = findAllMatchingList( mL, sL )
    for i in result:
        print str(i)

Regards,
Gaurav.
 
D

Derek Martin

I looked inside this thread for my query which brought me the
following google search result
"Test if list contains another list - comp.lang.python | Google
Groups"

But then I was disappointed to see the question asked was not exactly
right. [...]
def findAllMatchingList(mainList, subList):
resultIndex = []
globalIndex = 0
for i in range(len(mainList)):
if i < globalIndex:
continue
globalIndex = i
increment = 0
for j in range(len(subList)):
if mainList[globalIndex] == subList[j]:
globalIndex += 1
increment += 1
if j == (len(subList)-1):
resultIndex.append(globalIndex-increment)
else:
break

return resultIndex

I didn't time them to compare, but how about this instead:
.... r = []
.... L = len(needle)
.... for i in range(len(haystack)):
.... if haystack[i:i+L] == needle:
.... r.append(i)
.... return r
# this fails because "3" is not 3...
find_needle_in_haystack([1,2,3], ["a","b",1,2,"3","9"]) []
find_needle_in_haystack([1,2,3], [1,2,3]) [0]
find_needle_in_haystack([1,2,3], ["a","b",1,2,3,"9"]) [2]
find_needle_in_haystack([1,2,3], ["a","b",1,2,3,"9","q",1,2,3])
[2, 7]

--
Derek D. Martin
http://www.pizzashack.org/
GPG Key ID: 0x81CFE75D


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFI3TcUdjdlQoHP510RAlc0AJ9CaqXxWSN6FeWsgApaNNf973medQCeKJUZ
AAZ4X3LFezSlIXRp3H3W748=
=3OrB
-----END PGP SIGNATURE-----
 
B

bearophileHUGS

I suggest Python programmers to fill the holes in the Python std lib
with some debugged & tuned implementations, their "bag of tricks", so
they don't have to re-invent and debug things all the time. This works
well with Psyco:


def issubseq(sub, items):
"""issubseq(sub, items): return true if the sequence 'sub' is a
contiguous
subsequence of the 'items' sequence.
Traceback (most recent call last):
...
TypeError: issubseq() takes exactly 2 arguments (0 given) Traceback (most recent call last):
...
TypeError: issubseq() takes exactly 2 arguments (1 given)
Traceback (most recent call last):
...
TypeError: 'int' object is not iterable
>>> isi = lambda s,i: int(issubseq(s,i))
>>> isi([], []) 1
>>> isi("a", "") 0
>>> isi("", "a") 1
>>> isi("", "aaa") 1
>>> isi("a", "a") 1
>>> isi("ab", "bab") 1
>>> isi("ab", "bab") 1
>>> isi("ba", "bbb") 0
>>> isi("bab", "ab") 0
>>> isi(("a", "b"), ("a","a","b")) 1
>>> isi(("a", "b"), ("a","a","c")) 0
>>> isi([1,2,1], [3,5, 1,2,4, 1,2,1, 6]) 1
>>> isi([1,2,1], [3,5, 1,2,4, 1,2,3, 6]) 0
>>> l = [1] * 50 + [1,2,3] + [4] * 50
>>> isi([1,2,3], l) 1
>>> l = [1] * 50 + [1,2,4] + [5] * 50
>>> isi([1,2,3], l)
0
"""
if not hasattr(sub, "__getitem__"):
sub = list(sub)

len_sub = len(sub)
if len_sub == 0:
return True

try:
if not len(items) or len(items) < len_sub:
return False
except TypeError:
pass

if sub == items:
return True

table = [0] * (len_sub + 1)

# building prefix-function
m = 0
for i in xrange(1, len_sub):
while m > 0 and sub[m] != sub:
m = table[m - 1]
if sub[m] == sub:
m += 1
table = m

# searching
m, i = 0, 0
for x in items:
while m > 0 and sub[m] != x:
m = table[m - 1]
if sub[m] == x:
m += 1
if m == len_sub:
return True
i += 1

return False


Usage:
(True, 0.20000000000000001)

Bye,
bearophile
 
D

Derek Martin

# building prefix-function
m = 0
for i in xrange(1, len_sub):
while m > 0 and sub[m] != sub:
m = table[m - 1]
if sub[m] == sub:
m += 1
table = m

# searching
m, i = 0, 0
for x in items:
while m > 0 and sub[m] != x:
m = table[m - 1]
if sub[m] == x:
m += 1
if m == len_sub:
return True
i += 1

return False


Quite a lot faster than mine... even without using psyco. Which is
interesting, especially because if I change my search loop to work
like yours, but leave out all your other optimizations, mine runs
roughly as fast as yours (i.e. execution time is negligible to a user
running the program in both cases, even with the large example data
you gave). This leads me to point out two caveats with your version:

1. Guavat posted a version which returns a list of all the indexes
where a match is found... which is what I mimiced. Yours returns only
true or false indicating whether or not it found a match. The main
difference in performance seems to come due to your iteration over the
list items, versus my comparing a sublist-sized slice of the whole to
the sublist. But this alone is an invalid optimization, because it
doesn't produce the same results... If you only want to know if a
match exists, it's great; but if you want to know *where*, or *how
many times*, you lose. That could be fixed though, by counting the
elements as you loop through them... I didn't attempt to determine
the performance hit for doing that, but I assume it's negligible.
I also imagine that that was your original purpose for the unused
variable i...

2. Your secondary optimizations add a great deal of complexity to your
code, making the algorithm much harder to understand. However they
don't appear to buy you much, given that the cases they optimize would
probably be rare, and the difference in execution time gained by the
optimization is not noticable to the user. Unless you're doing lots
and lots of these in your application, or maybe if you know in advance
that your data will contain many instances of the cases you optimized,
I think you're better off leaving the optimizations out, for the sake
of code clarity.

At the very least, if you're going to write complicated optimizations,
you ought to have explained what you were doing in comments... :)

--
Derek D. Martin
http://www.pizzashack.org/
GPG Key ID: 0x81CFE75D


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFI4HJ2djdlQoHP510RAhrKAJ9HhSX5lXuQrX565dv6MF0T/jMlXgCfaQWe
kJBxLoxYmzRu6/3C7XXHobU=
=CAjG
-----END PGP SIGNATURE-----
 
B

bearophileHUGS

Derek Martin:
Quite a lot faster than mine... even without using psyco.<

It's designed for Psyco.

However they don't appear to buy you much, given that the cases they optimize would probably be rare, and the difference in execution time gained by the optimization is not noticable to the user.<
http://www-igm.univ-mlv.fr/~lecroq/string/node7.html


Unless you're doing lots and lots of these in your application,<

I don't agree. That's library code, so it has to be efficient and
flexible, because it's designed to be used in many different
situations (if you don't agree, then you can try to suggest a
replacement of the C code of the string search of CPython written by
effbot with some slower code).

Bye,
bearophile
 
D

Derek Martin

Derek Martin:

I don't agree. That's library code, so it has to be efficient and
flexible, because it's designed to be used in many different
situations

That's fair, but lots of folks writing Python code will look at that
and say, "What the %$#@! is this doing?!?" As I already suggested,
code that implements non-obvious algorithms ought to explain what it's
doing in comments, so that the neophyte programmers charged with
maintaining the library aren't tempted to rewrite the code so that
it's easier to understand what it's doing. It can be as simple as:

# Use Morris-Pratt algorithm to search data

Then, anyone not familiar with the algorithm can easily look it up,
and see why it was written that way.

I think it's just as important to do that in code you post on the
list, since a) the person asking the question obviously doesn't know
what you're doing, or they wouldn't have needed to ask the question,
and b) there are lots of other folks reading the list who could
benefit from the same knowledge. :)

--
Derek D. Martin
http://www.pizzashack.org/
GPG Key ID: 0x81CFE75D


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFI4PqudjdlQoHP510RAi3cAJsH344P41yMojDLqVL2z29qGodldgCfQ5u8
4N48WpmWHjjnBNwuDx2vbGE=
=OT7l
-----END PGP SIGNATURE-----
 
B

bearophileHUGS

Derek Martin:
code that implements non-obvious algorithms ought to explain what it's
doing in comments,

I am sorry, you are right, of course.

In my libs/code there are always docstrings and doctests/tests, and
most times comments too, like you say. When I post code here I often
strip away part of those things, mostly to reduce the length of my
posts -.- I guess that I have to leave more of that meta-
information...

Bye,
bearophile
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top