# Best strategy for finding a pattern in a sequence of integers

Discussion in 'Python' started by Slaunger, Nov 21, 2008.

1. ### SlaungerGuest

Hi all,

I am a Python novice, and I have run into a problem in a project I am
working on, which boils down to identifying the patterns in a sequence
of integers, for example

..... 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 ...

I want to process this such that I get out two patterns, like:
(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0)
and
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)

I am pretty sure I can figure out how to do that, but I would like to
have some guidance on the most pythonic approach to this.

Two paths I have considered is:
1. Convert the sequence of integers to a hex string, i.e., "...
16616616616616619330330330330A66166..." and use the re module to find
the patterns. Use the string positions to go back to the sequence
2. Put them in a list or an array and manually look for the patterns
by iterating and filtering the elements compare with sets.

I am not looking for a "solution" to this specific problem, just some
guidance

The rules for the sequence is:
1. The sequence may start in the middle of a pattern
2. There are one or two patterns, Pattern A and Pattern B in the
sequence
3. Pattern A only consists of the numbers 0, 3, and 9. 3, 3 is always
followed by 0
4. Pattern B only consists of the numbers 1, 6, and 10. 6, 6, is
always followed by 1
5. There may be other numbers interspersed within the sequence, but
they can be ignored
6. The relative position of 9 or 10 in the patterns varies from case
to case, but is consistent throughout a sequence.
7. There is always one 9 or one 10 in a pattern
7. The beginning of a pattern is marked by the transision from oner
pattern to the other.
8. If there is only one pattern in the sequence, the pattern beginning
is marked by the first occurance of either 9 or 10
9. The pattern is repetitive in the sequence,
e.g., ...ABABABAB..., ...AAA..., or ...BBB...

Thank you,
-- Slaunger

Slaunger, Nov 21, 2008

2. ### Gerard flanaganGuest

Maybe:

#-----------------------------------------------------------------
data = '''
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''

data = [int(x) for x in data.split()]

from itertools import groupby

S1 = [0, 3, 9]

s = set()
for k, g in groupby(data, lambda x: x in S1):
seq = tuple(g)
# maybe the next line should be 'if 9 in seq or 10 in seq'?
if seq[0] in [9, 10]:

print s
#------------------------------------------------------------------
set(
[(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0),
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)])

hth

G.

Gerard flanagan, Nov 21, 2008

3. ### Arnaud DelobelleGuest

Then it would be a good starting point to write some code. Then you
could post it and ask how it can be made more 'pythonic'.

HTH

Arnaud Delobelle, Nov 21, 2008
4. ### MensanatorGuest

Your rules appear to be incomplete and inconsistent.
But does a 3 always follow a 3? Can you have 3, 0, 3, 0?
Can 0's occur without 3's, such as 0, 0, 0?
So, I can have 3, 3, 0, 7, 3, 3, 0?

What if the 7 occurs after the pair of 3's? Is the number following
the 7 forced to be 0, i.e., is 3, 3, 7, 3, 3, 0 legal?
Can there be an ignored number between the patterns? Is
9,3,3,0,7,10,6,6,1
legal? If NO, you violate Rule 5. If YES, you violate the second Rule
7.

Mensanator, Nov 21, 2008
5. ### Anton VredegoorGuest

But groupby needs sorted data?

Suppose the rules do not conflict or overlap and between them divide
all the values, then maybe this would work:

class StateMachine:

def __init__(self,*rules):
self.rules = rules
self.state = len(rules) #deliberately unreachable
self.first = True

def change(self,x):
#check and/or change state
for i,rule in enumerate(self.rules):
if rule(x):
if i == self.state: #no state change
return False
else: #maybe state change
self.state = i
if self.first: #set initial state, no change
self.first = False
return False
else:
return True #state is changed
raise ValueError

def test():

data = '''
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10
6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10
6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10
6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''

data = map(int, data.split())

def rule1(x):
return x in set((0, 3, 9))
def rule2(x):
return x in set((6, 1, 10))

state = StateMachine(rule1,rule2)
L = []
res = []
for x in data:
if state.change(x):
res.append(list(L))
L =[]
L.append(x)
res.append(list(L))
print res

if __name__=='__main__':
test()

Anton Vredegoor, Nov 21, 2008
6. ### SlaungerGuest

That is actually a good point. I will do that.
-- ~~~~

Slaunger, Nov 22, 2008
7. ### SlaungerGuest

OK. Let me try to clarify then...
Yes, 3s always comes in pairs. So, 3, 0, 3, 0 is not allowed.
And of the numbers 0, 3, and 9; 0 will always be the first after the
pair of 3s
Yes, there is a point I did not mention propery in my first
description:
The number 7 for instance could appear in that position, but it would
not be repetitive;
as a matter of fact these other numbers can be filtered away before
looking for the pattern,
so let us just forgot about those.
No, it would have to be 3, 3, 0, 7, 3, 3, 0, it is sequeezed in - but
as mentioned they can be prefiltered out of the problem
Yes you are right. This complication is again eliminated by
prefiltering "other" numbers out

-- Slaunger

Slaunger, Nov 22, 2008
8. ### SlaungerGuest

Hi Gerard,
This definitely looks like a path to walk along, and I think your code
does the trick, although I have to play a little around with the
groupby method, of which I had no prior knowledge. I think I will
write some unit test cases to stress test you concept (on Monday, when
I am back at work). I appreciate your almost full implementation - it
would have sufficed to point me to the itertools module, and then I
think I would have figured out.
-- ~~~~

Slaunger, Nov 22, 2008
9. ### Paul McGuireGuest

So I think you just need to find the first two complete sequences of
1,6,10 and 0,3,9, remove any repetitions and then you're done.

data = '''
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 7 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6
6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6
6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''

data = [int(x) for x in data.split()]

s1 = frozenset([1,6,10])
s2 = frozenset([0,3,9])

diter = iter(data)

i = diter.next()
curset = (s1,s2)[i in s2]
otherset = lambda : (s1,s2)[curset is s1]
seq = { s1 : [], s2 : [] }

# read until there is the first change in state - discard
# these, since we may have started in the middle of a sequence
other = otherset()
while i not in other:
i = diter.next()

for _ in range(2):
other = curset
curset = otherset()
tmp = []
while i not in other:
if i in curset:
tmp.append(i)
i = diter.next()
seq[curset] = tmp[:]

# look for repeats in a seq, truncate
def truncateReps(s,sentinel):
if s.count(sentinel) > 1:
loc1 = s.index(sentinel)
loc2 = s.index(sentinel,loc1+1)
s[:] = s[:loc2-loc1]

truncateReps(seq[s1],10)
truncateReps(seq[s2],9)

print seq[s1]
print seq[s2]

Prints:
[10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1]
[9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0]

Your original sample was only the nominal, most friendly case, so it
is hard to know if any submitted solutions will work will all of your
other conditions. Please try this with more challenging data, such as
starting a sequence in the middle, numbers not in the set
or 10, etc.

-- Paul

Paul McGuire, Nov 22, 2008
10. ### Gerard flanaganGuest

I think the data is sorted - that's my reading of the given rules, at least.

G.

Gerard flanagan, Nov 22, 2008
11. ### Gerard flanaganGuest

Yes, it would need plenty of testing - it will not handle the presence
of other integers for example. It's a 'quickie' solution of course, but
if the data is as regular as you say, then it may be good enough.
Apologies. I was in two minds whether to post actual code, but I thought
that you could spend a lot of time on a more low-level solution when a
'batteries-included' method would suffice. But, at least you are now
aware of itertools - lot's of goodness there, groupby in particular is
always my first thought for any kind of data-partitioning problem.

regards

Gerard

Gerard flanagan, Nov 22, 2008