D
Denis McMahon
Hi
I have a list of data that presents as:
timestamp: value
Timestamps are used solely to determine the sequence of items in the list.
I want to find the longest repeated sequence of values in the list.
Example, in the following list:
data = { 0: "d", 1: "x", 2: "y", 3: "t", 4: "d", 5: "y", 77: "g"' 78:
"h", 79: "x", 80: "y", 206: "t", 210: "d", 211: "x" }
I would pull out the sequence x-y-t-d (starting at 1 and 79)
I need to keep the timestamp / data association because I need to
generate output that identifies (a) the longest repeated sequence (b) how
many elements in the longest repeated sequence (c) at what timestamps
each occurrence started.
I'm not sure of the best way, programatically, to aproach this task,
which means I'm unsure whether eg a list of tuples ( time, data ) or an
OrderedDict keyed on the timestamp is the best starting point.
I can make a list of tuples using:
d = [ (k,v) for k,v in data ]
and with the list of tuples, I can do something like:
d.sort( key=lambda tup: tup[0] )
max_start_a = 0
max_start_b = 0
max_len = 0
i = 0
while i < len( d ):
j = i + 1
while j < len( d ):
o = 0
while j+o < len( d ) and d[i+o][1] == d[j+o][1]:
o += 1
if o > max_len:
max_len = 0
max_start_a = i
max_start_b = j
j += 1
i += 1
print d[max_start_a][0], d[max_start_b][0], max_len
Is there a better way to do this?
I have a list of data that presents as:
timestamp: value
Timestamps are used solely to determine the sequence of items in the list.
I want to find the longest repeated sequence of values in the list.
Example, in the following list:
data = { 0: "d", 1: "x", 2: "y", 3: "t", 4: "d", 5: "y", 77: "g"' 78:
"h", 79: "x", 80: "y", 206: "t", 210: "d", 211: "x" }
I would pull out the sequence x-y-t-d (starting at 1 and 79)
I need to keep the timestamp / data association because I need to
generate output that identifies (a) the longest repeated sequence (b) how
many elements in the longest repeated sequence (c) at what timestamps
each occurrence started.
I'm not sure of the best way, programatically, to aproach this task,
which means I'm unsure whether eg a list of tuples ( time, data ) or an
OrderedDict keyed on the timestamp is the best starting point.
I can make a list of tuples using:
d = [ (k,v) for k,v in data ]
and with the list of tuples, I can do something like:
d.sort( key=lambda tup: tup[0] )
max_start_a = 0
max_start_b = 0
max_len = 0
i = 0
while i < len( d ):
j = i + 1
while j < len( d ):
o = 0
while j+o < len( d ) and d[i+o][1] == d[j+o][1]:
o += 1
if o > max_len:
max_len = 0
max_start_a = i
max_start_b = j
j += 1
i += 1
print d[max_start_a][0], d[max_start_b][0], max_len
Is there a better way to do this?