Best strategy for finding a pattern in a sequence of integers

Discussion in 'Python' started by Slaunger, Nov 21, 2008.

  1. Slaunger

    Slaunger Guest

    Hi all,

    I am a Python novice, and I have run into a problem in a project I am
    working on, which boils down to identifying the patterns in a sequence
    of integers, for example

    ..... 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
    1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
    1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
    1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
    1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 ...

    I want to process this such that I get out two patterns, like:
    (9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0)
    and
    (10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)

    I am pretty sure I can figure out how to do that, but I would like to
    have some guidance on the most pythonic approach to this.

    Two paths I have considered is:
    1. Convert the sequence of integers to a hex string, i.e., "...
    16616616616616619330330330330A66166..." and use the re module to find
    the patterns. Use the string positions to go back to the sequence
    2. Put them in a list or an array and manually look for the patterns
    by iterating and filtering the elements compare with sets.

    I am not looking for a "solution" to this specific problem, just some
    guidance

    The rules for the sequence is:
    1. The sequence may start in the middle of a pattern
    2. There are one or two patterns, Pattern A and Pattern B in the
    sequence
    3. Pattern A only consists of the numbers 0, 3, and 9. 3, 3 is always
    followed by 0
    4. Pattern B only consists of the numbers 1, 6, and 10. 6, 6, is
    always followed by 1
    5. There may be other numbers interspersed within the sequence, but
    they can be ignored
    6. The relative position of 9 or 10 in the patterns varies from case
    to case, but is consistent throughout a sequence.
    7. There is always one 9 or one 10 in a pattern
    7. The beginning of a pattern is marked by the transision from oner
    pattern to the other.
    8. If there is only one pattern in the sequence, the pattern beginning
    is marked by the first occurance of either 9 or 10
    9. The pattern is repetitive in the sequence,
    e.g., ...ABABABAB..., ...AAA..., or ...BBB...

    Thank you,
    -- Slaunger
     
    Slaunger, Nov 21, 2008
    #1
    1. Advertisements

  2. Maybe:

    #-----------------------------------------------------------------
    data = '''
    1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
    1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
    1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
    1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
    1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''

    data = [int(x) for x in data.split()]

    from itertools import groupby

    S1 = [0, 3, 9]

    s = set()
    for k, g in groupby(data, lambda x: x in S1):
    seq = tuple(g)
    # maybe the next line should be 'if 9 in seq or 10 in seq'?
    if seq[0] in [9, 10]:
    s.add(seq)

    print s
    #------------------------------------------------------------------
    set(
    [(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0),
    (10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)])

    hth

    G.
     
    Gerard flanagan, Nov 21, 2008
    #2
    1. Advertisements

  3. Then it would be a good starting point to write some code. Then you
    could post it and ask how it can be made more 'pythonic'.

    HTH
     
    Arnaud Delobelle, Nov 21, 2008
    #3
  4. Slaunger

    Mensanator Guest

    Your rules appear to be incomplete and inconsistent.
    But does a 3 always follow a 3? Can you have 3, 0, 3, 0?
    Can 0's occur without 3's, such as 0, 0, 0?
    So, I can have 3, 3, 0, 7, 3, 3, 0?

    What if the 7 occurs after the pair of 3's? Is the number following
    the 7 forced to be 0, i.e., is 3, 3, 7, 3, 3, 0 legal?
    Can there be an ignored number between the patterns? Is
    9,3,3,0,7,10,6,6,1
    legal? If NO, you violate Rule 5. If YES, you violate the second Rule
    7.
     
    Mensanator, Nov 21, 2008
    #4
  5. But groupby needs sorted data?

    Suppose the rules do not conflict or overlap and between them divide
    all the values, then maybe this would work:

    class StateMachine:

    def __init__(self,*rules):
    self.rules = rules
    self.state = len(rules) #deliberately unreachable
    self.first = True

    def change(self,x):
    #check and/or change state
    for i,rule in enumerate(self.rules):
    if rule(x):
    if i == self.state: #no state change
    return False
    else: #maybe state change
    self.state = i
    if self.first: #set initial state, no change
    self.first = False
    return False
    else:
    return True #state is changed
    raise ValueError

    def test():

    data = '''
    1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
    1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10
    6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10
    6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10
    6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''

    data = map(int, data.split())

    def rule1(x):
    return x in set((0, 3, 9))
    def rule2(x):
    return x in set((6, 1, 10))

    state = StateMachine(rule1,rule2)
    L = []
    res = []
    for x in data:
    if state.change(x):
    res.append(list(L))
    L =[]
    L.append(x)
    res.append(list(L))
    print res

    if __name__=='__main__':
    test()
     
    Anton Vredegoor, Nov 21, 2008
    #5
  6. Slaunger

    Slaunger Guest

    That is actually a good point. I will do that.
    -- ~~~~
     
    Slaunger, Nov 22, 2008
    #6
  7. Slaunger

    Slaunger Guest

    OK. Let me try to clarify then...
    Yes, 3s always comes in pairs. So, 3, 0, 3, 0 is not allowed.
    And of the numbers 0, 3, and 9; 0 will always be the first after the
    pair of 3s
    Yes, there is a point I did not mention propery in my first
    description:
    The number 7 for instance could appear in that position, but it would
    not be repetitive;
    as a matter of fact these other numbers can be filtered away before
    looking for the pattern,
    so let us just forgot about those.
    No, it would have to be 3, 3, 0, 7, 3, 3, 0, it is sequeezed in - but
    as mentioned they can be prefiltered out of the problem
    Yes you are right. This complication is again eliminated by
    prefiltering "other" numbers out

    -- Slaunger
     
    Slaunger, Nov 22, 2008
    #7
  8. Slaunger

    Slaunger Guest

    Hi Gerard,
    This definitely looks like a path to walk along, and I think your code
    does the trick, although I have to play a little around with the
    groupby method, of which I had no prior knowledge. I think I will
    write some unit test cases to stress test you concept (on Monday, when
    I am back at work). I appreciate your almost full implementation - it
    would have sufficed to point me to the itertools module, and then I
    think I would have figured out.
    -- ~~~~
     
    Slaunger, Nov 22, 2008
    #8
  9. Slaunger

    Paul McGuire Guest

    So I think you just need to find the first two complete sequences of
    1,6,10 and 0,3,9, remove any repetitions and then you're done.

    data = '''
    1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 7 3 0 3 3 0 3 3 0 10 6 6
    1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 10 6 6
    1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6
    6
    1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6
    6
    1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''


    data = [int(x) for x in data.split()]

    s1 = frozenset([1,6,10])
    s2 = frozenset([0,3,9])

    diter = iter(data)

    i = diter.next()
    curset = (s1,s2)[i in s2]
    otherset = lambda : (s1,s2)[curset is s1]
    seq = { s1 : [], s2 : [] }

    # read until there is the first change in state - discard
    # these, since we may have started in the middle of a sequence
    other = otherset()
    while i not in other:
    i = diter.next()

    # read in 2 sequences
    for _ in range(2):
    other = curset
    curset = otherset()
    tmp = []
    while i not in other:
    if i in curset:
    tmp.append(i)
    i = diter.next()
    seq[curset] = tmp[:]

    # look for repeats in a seq, truncate
    def truncateReps(s,sentinel):
    if s.count(sentinel) > 1:
    loc1 = s.index(sentinel)
    loc2 = s.index(sentinel,loc1+1)
    s[:] = s[:loc2-loc1]

    truncateReps(seq[s1],10)
    truncateReps(seq[s2],9)

    # the answer
    print seq[s1]
    print seq[s2]

    Prints:
    [10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1]
    [9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0]

    Your original sample was only the nominal, most friendly case, so it
    is hard to know if any submitted solutions will work will all of your
    other conditions. Please try this with more challenging data, such as
    starting a sequence in the middle, numbers not in the set
    (0,1,3,6,9,10), repeated patterns, sequences that don't start with 9
    or 10, etc.

    -- Paul
     
    Paul McGuire, Nov 22, 2008
    #9
  10. I think the data is sorted - that's my reading of the given rules, at least.

    G.
     
    Gerard flanagan, Nov 22, 2008
    #10
  11. Yes, it would need plenty of testing - it will not handle the presence
    of other integers for example. It's a 'quickie' solution of course, but
    if the data is as regular as you say, then it may be good enough.
    Apologies. I was in two minds whether to post actual code, but I thought
    that you could spend a lot of time on a more low-level solution when a
    'batteries-included' method would suffice. But, at least you are now
    aware of itertools - lot's of goodness there, groupby in particular is
    always my first thought for any kind of data-partitioning problem.

    regards

    Gerard
     
    Gerard flanagan, Nov 22, 2008
    #11
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.