testing if a list contains a sublist

Discussion in 'Python' started by Johannes, Aug 16, 2011.

  1. Johannes

    Johannes Guest

    hi list,
    what is the best way to check if a given list (lets call it l1) is
    totally contained in a second list (l2)?

    for example:
    l1 = [1,2], l2 = [1,2,3,4,5] -> l1 is contained in l2
    l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2
    l1 = [1,2,3], l2 = [1,3,5,7] -> l1 is not contained in l2

    my problem is the second example, which makes it impossible to work with
    sets insteads of lists. But something like set.issubset for lists would
    be nice.

    greatz Johannes
    Johannes, Aug 16, 2011
    #1
    1. Advertising

  2. Johannes

    Roy Smith Guest

    In article <>,
    Johannes <> wrote:

    > hi list,
    > what is the best way to check if a given list (lets call it l1) is
    > totally contained in a second list (l2)?
    >
    > for example:
    > l1 = [1,2], l2 = [1,2,3,4,5] -> l1 is contained in l2
    > l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2
    > l1 = [1,2,3], l2 = [1,3,5,7] -> l1 is not contained in l2
    >
    > my problem is the second example, which makes it impossible to work with
    > sets insteads of lists. But something like set.issubset for lists would
    > be nice.
    >
    > greatz Johannes


    import re

    def sublist(l1, l2):
    s1 = ''.join(map(str, l1))
    s2 = ''.join(map(str, l2))
    return re.search(s1, s2)

    assert sublist([1,2], [1,2,3,4,5])
    assert not sublist ([1,2,2], [1,2,3,4,5])
    assert not sublist([1,2,3], [1,3,5,7])


    (running and ducking)
    Roy Smith, Aug 16, 2011
    #2
    1. Advertising

  3. On Tue, 16 Aug 2011 09:26 am Johannes wrote:

    > hi list,
    > what is the best way to check if a given list (lets call it l1) is
    > totally contained in a second list (l2)?


    This is not the most efficient algorithm, but for short lists it should be
    plenty fast enough:


    def contains(alist, sublist):
    if len(sublist) == 0 or len(sublist) > len(alist):
    return False
    start = 0
    while True:
    try:
    p = alist.index(sublist[0], start)
    except ValueError:
    return False
    for i,x in enumerate(sublist):
    if alist[p+i] != x:
    start = p+1
    break
    else: # for loop exits without break
    return True


    --
    Steven
    Steven D'Aprano, Aug 16, 2011
    #3
  4. Johannes

    ChasBrown Guest

    On Aug 15, 4:26 pm, Johannes <> wrote:
    > hi list,
    > what is the best way to check if a given list (lets call it l1) is
    > totally contained in a second list (l2)?
    >
    > for example:
    > l1 = [1,2], l2 = [1,2,3,4,5] -> l1 is contained in l2
    > l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2
    > l1 = [1,2,3], l2 = [1,3,5,7] -> l1 is not contained in l2
    >
    > my problem is the second example, which makes it impossible to work with
    > sets insteads of lists. But something like set.issubset for lists would
    > be nice.
    >
    > greatz Johannes


    My best guess:

    from collections import Counter

    def contains(alist, sublist):
    alist = Counter(alist)
    for k in sublist:
    try:
    alist[k] -= 1
    if alist[k] < 0:
    return False
    except KeyError:
    return False
    return True

    'Counter' being better understood as 'Multiset'.

    If m = len(alist) and n = len(sublist), looks to be roughly O(m+n).

    Cheers - Chas
    ChasBrown, Aug 16, 2011
    #4
  5. Johannes

    ChasBrown Guest

    On Aug 15, 4:26 pm, Johannes <> wrote:
    > hi list,
    > what is the best way to check if a given list (lets call it l1) is
    > totally contained in a second list (l2)?
    >
    > for example:
    > l1 = [1,2], l2 = [1,2,3,4,5] -> l1 is contained in l2
    > l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2
    > l1 = [1,2,3], l2 = [1,3,5,7] -> l1 is not contained in l2
    >
    > my problem is the second example, which makes it impossible to work with
    > sets insteads of lists. But something like set.issubset for lists would
    > be nice.
    >
    > greatz Johannes


    My best guess:

    from collections import Counter

    def contains(alist, sublist):
    alist = Counter(alist)
    for k in sublist:
    try:
    alist[k] -= 1
    if alist[k] < 0:
    return False
    except KeyError:
    return False
    return True

    'Counter' being better understood as 'Multiset'.

    If m = len(alist) and n = len(sublist), looks to be roughly O(m+n).

    Cheers - Chas
    ChasBrown, Aug 16, 2011
    #5
  6. Johannes

    ChasBrown Guest

    On Aug 15, 4:26 pm, Johannes <> wrote:
    > hi list,
    > what is the best way to check if a given list (lets call it l1) is
    > totally contained in a second list (l2)?
    >
    > for example:
    > l1 = [1,2], l2 = [1,2,3,4,5] -> l1 is contained in l2
    > l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2
    > l1 = [1,2,3], l2 = [1,3,5,7] -> l1 is not contained in l2
    >
    > my problem is the second example, which makes it impossible to work with
    > sets insteads of lists. But something like set.issubset for lists would
    > be nice.
    >
    > greatz Johannes


    My best guess:

    from collections import Counter

    def contains(alist, sublist):
    alist = Counter(alist)
    for k in sublist:
    try:
    alist[k] -= 1
    if alist[k] < 0:
    return False
    except KeyError:
    return False
    return True

    'Counter' being better understood as 'Multiset'.

    If m = len(alist) and n = len(sublist), looks to be roughly O(m+n).

    Cheers - Chas
    ChasBrown, Aug 16, 2011
    #6
  7. Johannes

    Laszlo Nagy Guest


    >> hi list,
    >> what is the best way to check if a given list (lets call it l1) is
    >> totally contained in a second list (l2)?
    >>
    >> for example:
    >> l1 = [1,2], l2 = [1,2,3,4,5] -> l1 is contained in l2
    >> l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2
    >> l1 = [1,2,3], l2 = [1,3,5,7] -> l1 is not contained in l2
    >>
    >> my problem is the second example, which makes it impossible to work with
    >> sets insteads of lists. But something like set.issubset for lists would
    >> be nice.
    >>
    >> greatz Johannes

    Fastest, error-free and simplest solution is to use sets:

    >>> l1 = [1,2]
    >>> l2 = [1,2,3,4,5]
    >>> set(l1)-set(l2)

    set([])
    >>> set(l2)-set(l1)

    set([3, 4, 5])
    >>>


    Although with big lists, this is not very memory efficient. But I must
    tell you, sometimes I use this method for lists with millions of
    integers, and it is very fast and reliable, and memory is not a concern
    for me, at least - some million integers will fit into a few MB of
    memory. Read the docs about set operators for creating union, symmetric
    difference etc.

    Best,

    Laszlo
    Laszlo Nagy, Aug 16, 2011
    #7
  8. Johannes

    alex23 Guest

    On Aug 16, 4:51 pm, Laszlo Nagy <> wrote:
    > >> hi list,
    > >> what is the best way to check if a given list (lets call it l1) is
    > >> totally contained in a second list (l2)?

    >
    > >> for example:
    > >> l1 = [1,2], l2 = [1,2,3,4,5] ->  l1 is contained in l2
    > >> l1 = [1,2,2,], l2 = [1,2,3,4,5] ->  l1 is not contained in l2
    > >> l1 = [1,2,3], l2 = [1,3,5,7] ->  l1 is not contained in l2

    >
    > >> my problem is the second example, which makes it impossible to work with
    > >> sets insteads of lists. But something like set.issubset for lists would
    > >> be nice.

    >
    > >> greatz Johannes

    >
    > Fastest, error-free and simplest solution is to use sets:
    >
    >  >>> l1 = [1,2]
    >  >>> l2 = [1,2,3,4,5]
    >  >>> set(l1)-set(l2)
    > set([])
    >  >>> set(l2)-set(l1)
    > set([3, 4, 5])


    Error free? Consider this stated requirement:
    > l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2


    >>> l1 = [1,2,2]
    >>> l2 = [1,2,3,4,5]
    >>> set(l1)-set(l2)

    set()
    >>> set(l2)-set(l1)

    {3, 4, 5}

    So set([1,2]) == set([1,2,2]), despite [1,2,2] not being a sublist of
    the original.

    It also completely ignores list order, which would make [9,8,7] a
    sublist of [5,6,7,8,9].
    alex23, Aug 16, 2011
    #8
  9. Johannes

    alex23 Guest

    Laszlo Nagy <> wrote:
    > Fastest, error-free and simplest solution is to use sets:
    >
    >  >>> l1 = [1,2]
    >  >>> l2 = [1,2,3,4,5]
    >  >>> set(l1)-set(l2)
    > set([])
    >  >>> set(l2)-set(l1)
    > set([3, 4, 5])
    >  >>>


    Error-free? Not given the stated requirements:

    > l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2


    >>> l1 = [1,2,2]
    >>> l2 = [1,2,3,4,5]
    >>> set(l1)-set(l2)

    set()
    >>> set(l2)-set(l1)

    {3, 4, 5}

    As you can see, using sets provides the exact same result for a non-
    contained sub-list as it does for your valid example. The list
    [1,2,2,2,2,2,2,2,2] is clearly not contained with [1,2,3], but your
    code would say it is.

    Similarly, [9,8,7] would appear to be a sub-list of [5,6,7,8,9],
    something I'd also considered to be false in this instance.

    (My apologies if this is a double-up, the original post hasn't
    appeared yet)
    alex23, Aug 16, 2011
    #9
  10. Johannes

    ChasBrown Guest

    On Aug 15, 11:51 pm, Laszlo Nagy <> wrote:
    > >> hi list,
    > >> what is the best way to check if a given list (lets call it l1) is
    > >> totally contained in a second list (l2)?

    >
    > >> for example:
    > >> l1 = [1,2], l2 = [1,2,3,4,5] ->  l1 is contained in l2
    > >> l1 = [1,2,2,], l2 = [1,2,3,4,5] ->  l1 is not contained in l2
    > >> l1 = [1,2,3], l2 = [1,3,5,7] ->  l1 is not contained in l2

    >
    > >> my problem is the second example, which makes it impossible to work with
    > >> sets insteads of lists. But something like set.issubset for lists would
    > >> be nice.

    >
    > >> greatz Johannes

    >
    > Fastest, error-free and simplest solution is to use sets:
    >
    >  >>> l1 = [1,2]
    >  >>> l2 = [1,2,3,4,5]
    >  >>> set(l1)-set(l2)
    > set([])
    >  >>> set(l2)-set(l1)
    > set([3, 4, 5])
    >  >>>
    >


    This approach fails the OP's desires when

    >>> l1 = [1,2,2,]
    >>> l2 = [1,2,3,4,5]


    The OP explicitly desires 'l2 contains l1' to be false in that case.

    Cheers - Chas
    ChasBrown, Aug 16, 2011
    #10
  11. Johannes

    Laszlo Nagy Guest


    > Error free? Consider this stated requirement:
    >> l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2

    If you look it the strict way, "containment" relation for lists is meant
    this way:


    l1 = []
    l2 = [1,l1,2] # l2 CONTAINS l1

    But you are right, I was wrong. So let's clarify what the OP wants!

    For example:

    l1 = [1,2,2,], l2 = [2,1,2,3,4,5]


    What is the relation between these two lists? Does l2 contain l1 or not?
    In other words, is this "containment" relation interpreted on multisets
    not considering the order of the items?

    >
    > It also completely ignores list order, which would make [9,8,7] a
    > sublist of [5,6,7,8,9].

    Exactly. However, from the original post of Johannes it was not clear if
    the order of the elements counts or not.

    If It this is interpreted as a multiset relation, it would be easier to
    use collections.Counter. If the order of elements is important then he
    can start with a Boyer-Moore algorithm.

    Best,

    Laszlo
    Laszlo Nagy, Aug 16, 2011
    #11
  12. On Tue, 16 Aug 2011 12:12 pm Steven D'Aprano wrote:

    > On Tue, 16 Aug 2011 09:26 am Johannes wrote:
    >
    >> hi list,
    >> what is the best way to check if a given list (lets call it l1) is
    >> totally contained in a second list (l2)?

    >
    > This is not the most efficient algorithm, but for short lists it should be
    > plenty fast enough:


    Nope, sorry, that was buggy. Here's another version which should be accurate
    but may be slower.

    def search(source, target, start=0, end=None):
    """Naive search for target in source."""
    m = len(source)
    n = len(target)
    if end is None:
    end = m
    if n == 0 or m < n:
    return None
    for i in range(start, end-n+1):
    if source[i:i+n] == target:
    return i
    return None



    --
    Steven
    Steven D'Aprano, Aug 16, 2011
    #12
  13. On Tue, 16 Aug 2011 04:14 pm ChasBrown wrote:

    > On Aug 15, 4:26 pm, Johannes <> wrote:
    >> hi list,
    >> what is the best way to check if a given list (lets call it l1) is
    >> totally contained in a second list (l2)?
    >>
    >> for example:
    >> l1 = [1,2], l2 = [1,2,3,4,5] -> l1 is contained in l2
    >> l1 = [1,2,2,], l2 = [1,2,3,4,5] -> l1 is not contained in l2
    >> l1 = [1,2,3], l2 = [1,3,5,7] -> l1 is not contained in l2
    >>
    >> my problem is the second example, which makes it impossible to work with
    >> sets insteads of lists. But something like set.issubset for lists would
    >> be nice.
    >>
    >> greatz Johannes

    >
    > My best guess:
    >
    > from collections import Counter


    There's no reason to think that the Original Poster wants a multiset based
    solution. He asked about lists and sublists. That's a standard term, like
    substring:

    "12" is a substring of "01234".
    "21" and "13" are not.

    [1, 2] is a sublist of [0, 1, 2, 3, 4].
    [2, 1] and [1, 3] are not.

    Since lists are ordered, so are sublists.

    If the OP does want a solution that ignores order, then he needs to describe
    his problem better.



    --
    Steven
    Steven D'Aprano, Aug 16, 2011
    #13
  14. Johannes

    Nobody Guest

    On Tue, 16 Aug 2011 01:26:54 +0200, Johannes wrote:

    > what is the best way to check if a given list (lets call it l1) is
    > totally contained in a second list (l2)?


    "Best" is subjective. AFAIK, the theoretically-optimal algorithm is
    Boyer-Moore. But that would require a fair amount of code, and Python
    isn't a particularly fast language, so you're likely to get better results
    if you can delegate most of the leg-work to a native library (along the
    lines of Roy's suggestion of using regexps).
    Nobody, Aug 16, 2011
    #14
  15. Roy Smith <> writes:

    >> what is the best way to check if a given list (lets call it l1) is
    >> totally contained in a second list (l2)?


    [...]
    > import re
    >
    > def sublist(l1, l2):
    > s1 = ''.join(map(str, l1))
    > s2 = ''.join(map(str, l2))
    > return re.search(s1, s2)


    This is complete nonsense (try it on [12] and [1,2]).

    The original problem is "string searching" (where strings here are
    sequences of numbers instead of characters). See

    http://en.wikipedia.org/wiki/String_searching_algorithm

    for any algorithm (Rabin-Karp seems appropriate to me).

    -- Alain.
    Alain Ketterlin, Aug 16, 2011
    #15
  16. Johannes

    Roy Smith Guest

    In article <-strasbg.fr>,
    Alain Ketterlin <-strasbg.fr> wrote:

    > Roy Smith <> writes:
    >
    > >> what is the best way to check if a given list (lets call it l1) is
    > >> totally contained in a second list (l2)?

    >
    > [...]
    > > import re
    > >
    > > def sublist(l1, l2):
    > > s1 = ''.join(map(str, l1))
    > > s2 = ''.join(map(str, l2))
    > > return re.search(s1, s2)

    >
    > This is complete nonsense (try it on [12] and [1,2]).


    No, it's not complete nonsense, it works just fine for the limited data
    set the OP presented :) Of course, you are correct that it only works
    for single-digit integers, hence my "running and ducking" comment.

    > The original problem is "string searching" (where strings here are
    > sequences of numbers instead of characters). See
    >
    > http://en.wikipedia.org/wiki/String_searching_algorithm
    >
    > for any algorithm (Rabin-Karp seems appropriate to me).


    Yes, of course this is fundamentally a string searching problem. The
    problem is that while Python comes with an excellent tool for doing
    these kinds of string searches, its domain is limited to, as you pointed
    out, character strings. However, for the limited data the OP presented,
    there is a trivial way to map the original data domain (single digit
    integers) into the domain the re library knows about (character
    strings). My answer may be a sucky general-purpose solution, but it's
    also a neat hack, and sometimes a neat hack is what you need.

    It also occurs to me that this hack could be expanded a bit to handle a
    much larger input domain. If you had sequences of arbitrary (but
    hashable) objects, you could map this into a unicode string where each
    "character" is the 32-bit hash of of the corresponding input object,
    then run the regex search on that.

    In theory, it should work. In practice, I can see a few potential
    problems:

    1) You might have to carefully craft your hash function to avoid magic
    unicode values like null or BOM. No biggie there.

    2) Hash collisions can lead to false positives, so you need to filter
    those out with a subsequent linear comparison step. Of course, this is
    true of any kind of hash lookup. No biggie there either.

    3) While the re module is advertised to work on unicode data, I have no
    idea how efficient it is on arbitrary values. I wouldn't be too
    surprised if it's tuned for "mostly ascii utf-8" and performance suffers
    on completely random strings. If its first step is to transcode to
    utf-32 internally, I expect it would work just fine, but it would take
    some experimentation (or reading of the source code) to validate this
    assumption.
    Roy Smith, Aug 16, 2011
    #16
  17. Johannes

    John Posner Guest

    On 2:59 PM, Nobody wrote:
    > On Tue, 16 Aug 2011 01:26:54 +0200, Johannes wrote:
    >
    >> what is the best way to check if a given list (lets call it l1) is
    >> totally contained in a second list (l2)?

    > "Best" is subjective. AFAIK, the theoretically-optimal algorithm is
    > Boyer-Moore. But that would require a fair amount of code, and Python
    > isn't a particularly fast language, so you're likely to get better results
    > if you can delegate most of the leg-work to a native library (along the
    > lines of Roy's suggestion of using regexps).
    >
    >

    How about using Python's core support for "==" on list objects:

    def contains(alist, slist):
    """
    predicate: is *slist* a sublist of *alist*?
    """
    alist_sz = len(alist)
    slist_sz = len(slist)

    # special cases
    if slist_sz == 0:
    return True # empty list *is* a sublist
    elif slist_sz == alist_sz and alist == slist:
    return True
    elif slist_sz > alist_sz:
    return False

    # standard case
    for i in range(alist_sz - slist_sz + 1):
    if slist == alist[i:i+slist_sz]:
    return True
    # fell through "for" loop: no match found
    return False

    -John
    John Posner, Aug 16, 2011
    #17
  18. Johannes

    John Posner Guest

    On 2:59 PM, Nobody wrote:
    > On Tue, 16 Aug 2011 01:26:54 +0200, Johannes wrote:
    >
    >> what is the best way to check if a given list (lets call it l1) is
    >> totally contained in a second list (l2)?

    > "Best" is subjective. AFAIK, the theoretically-optimal algorithm is
    > Boyer-Moore. But that would require a fair amount of code, and Python
    > isn't a particularly fast language, so you're likely to get better results
    > if you can delegate most of the leg-work to a native library (along the
    > lines of Roy's suggestion of using regexps).
    >
    >

    How about using Python's core support for "==" on list objects:

    def contains(alist, slist):
    """
    predicate: is *slist* a sublist of *alist*?
    """
    alist_sz = len(alist)
    slist_sz = len(slist)

    # special cases
    if slist_sz == 0:
    return True # empty list *is* a sublist
    elif slist_sz == alist_sz and alist == slist:
    return True
    elif slist_sz > alist_sz:
    return False

    # standard case
    for i in range(alist_sz - slist_sz + 1):
    if slist == alist[i:i+slist_sz]:
    return True
    # fell through "for" loop: no match found
    return False

    -John
    John Posner, Aug 16, 2011
    #18
  19. Johannes

    nn Guest

    On Aug 16, 8:23 am, Alain Ketterlin <-strasbg.fr>
    wrote:
    > Roy Smith <> writes:
    > >> what is the best way to check if a given list (lets call it l1) is
    > >> totally contained in a second list (l2)?

    >
    > [...]
    >
    > > import re

    >
    > > def sublist(l1, l2):
    > >     s1 = ''.join(map(str, l1))
    > >     s2 = ''.join(map(str, l2))
    > >     return re.search(s1, s2)

    >
    > This is complete nonsense (try it on [12] and [1,2]).
    >
    > The original problem is "string searching" (where strings here are
    > sequences of numbers instead of characters). See
    >
    > http://en.wikipedia.org/wiki/String_searching_algorithm
    >
    > for any algorithm (Rabin-Karp seems appropriate to me).
    >
    > -- Alain.


    That can be easily fixed:

    >>> def sublist(lst1, lst2):

    s1 = ','.join(map(str, lst1))
    s2 = ','.join(map(str, lst2))
    return False if s2.find(s1)==-1 else True

    >>> sublist([1,2],[1,2,3,4,5])

    True
    >>> sublist([1,2,2],[1,2,3,4,5])

    False
    >>> sublist([1,2,3],[1,3,5,7])

    False
    >>> sublist([12],[1,2])

    False
    >>>


    I don't know about best, but it works for the examples given.
    nn, Aug 16, 2011
    #19
  20. Johannes

    Laszlo Nagy Guest


    > That can be easily fixed:
    >
    >>>> def sublist(lst1, lst2):

    > s1 = ','.join(map(str, lst1))
    > s2 = ','.join(map(str, lst2))
    > return False if s2.find(s1)==-1 else True
    >
    > I don't know about best, but it works for the examples given.
    >

    For numbers, it will always work. But what about

    lst1 = [",",",,"]
    lst1 = [",",",",","]
    Laszlo Nagy, Aug 16, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paul Whipp
    Replies:
    1
    Views:
    365
    Alex Martelli
    Nov 7, 2003
  2. Bernard A.

    inner sublist positions ?

    Bernard A., Apr 20, 2005, in forum: Python
    Replies:
    2
    Views:
    340
    tiissa
    Apr 21, 2005
  3. giampiero mu

    finding sublist

    giampiero mu, Aug 2, 2005, in forum: Python
    Replies:
    5
    Views:
    379
    Johan Lindberg
    Aug 3, 2005
  4. Helmut Jarausch

    append to a sublist - please help

    Helmut Jarausch, Apr 6, 2008, in forum: Python
    Replies:
    2
    Views:
    347
  5. Matthias Gallé

    find sublist inside list

    Matthias Gallé, May 4, 2009, in forum: Python
    Replies:
    13
    Views:
    1,196
    mzdude
    May 5, 2009
Loading...

Share This Page