Sequences in list

T

temp

Hi All,

I wonder could someone help me with this?

What I want to do is search through a list of letters and look for
adjacent groups of letters that form sequences, not in the usual way of
matching say abc to another appearance later on in the list but to look
for transposed patterns.

The groups of letters can be made up of between 2 to 4 letters.
It is not know beforehand how the groups are formed, so I can't say
here is a group, look for a sequence formed from this.
There must be at least three appearances of the groups for the sequence
to be pass.
More than one sequence may show in a list, i.e. first abc, bcd and cde
followed by bd, df, fa, ac at some later point.
Ideally I would like to know the start and end position of the
sequence/sequences.

I'm looking for patterns such as these:

['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']

But these would be wrong:

['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd'] - This one
fails because of the 'jump' of the last group of three letters in
relation to the others. To pass the last group would need to have been
def. However, the previous three groups are ok and would be passed as a
sequence.

['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c'] - Here, although the
first and last groups have a similar sequence this would fail because
of the intervening group (afe) not following the pattern.



Thanks for any help,

Malcolm
 
M

Michael Spencer

Hi All,

I wonder could someone help me with this?

What I want to do is search through a list of letters and look for
adjacent groups of letters that form sequences, not in the usual way of
matching say abc to another appearance later on in the list but to look
for transposed patterns.

The groups of letters can be made up of between 2 to 4 letters.
It is not know beforehand how the groups are formed, so I can't say
here is a group, look for a sequence formed from this.
There must be at least three appearances of the groups for the sequence
to be pass.
More than one sequence may show in a list, i.e. first abc, bcd and cde
followed by bd, df, fa, ac at some later point.
Ideally I would like to know the start and end position of the
sequence/sequences.

I'm looking for patterns such as these:

['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']

But these would be wrong:

['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd'] - This one
fails because of the 'jump' of the last group of three letters in
relation to the others. To pass the last group would need to have been
def. However, the previous three groups are ok and would be passed as a
sequence.

['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c'] - Here, although the
first and last groups have a similar sequence this would fail because
of the intervening group (afe) not following the pattern.



Thanks for any help,

Malcolm
I'm not sure I understand exactly what you want to do, but some of the following
may help

# Good sequences
s1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
s2 = ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
s3 = ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']

# Bad sequences
b1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd']
b2 = ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c']


def make_groups(sequence, n):
"""return groups of length n from sequence
Usage: [['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e']]
"""
length = len(sequence)
if length % n:
raise ValueError, "Sequence length %s is not multiple of %s"\
% (length, n)
for i in xrange(0, length, n):
yield sequence[i:i+n]

def make_diffs(groups):
"""return groups in canonical form where the first note
in each sequence is 0, and each subsequent note is expressed as
a positive interval from the first.
Example:
>>> list(make_diffs([['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e']]))
[(0, 1, 2), (0, 1, 2), (0, 1, 2)]
"""
for group in groups:
base, basechr = [0], ord(group[0])
for char in group[1:]:
diff = ord(char) - basechr
# although not specified it seems that you want
# to compare diffs mod 7 i.e. -2 == +5
base.append(diff % 7)
yield tuple(base)

def compare_forms(sequence):
"""test whether all groups of a sequence have the same form"""
for length in range(4,2,-1): # look for longest match first
forms = {}
try:
for group in make_diffs(make_groups(sequence, length)):
forms[group] = forms.get(group,0)+1
print forms
except ValueError:
# raised if the sequence is not a multiple of length
pass
{(0, 1, 3, 6): 1, (0, 1, 2, 1): 1, (0, 1, 0, 1): 1} # found 3 4-tuples
{(0, 1, 2): 3, (0, 2, 5): 1} # found 3 instances of (0,1,2) and 1 other
HTH
Michael
 
M

Michael Spencer

Michael said:
def compare_forms(sequence):
"""test whether all groups of a sequence have the same form"""
for length in range(4,2,-1): # look for longest match first

oops, make that range(4,1,-1) or, simply (4,3,2)

Michael
 
T

temp

Hi Michael,

Thanks for a quick response, I appreciate it. I will have to get back
to you tomorrow, as I can't check what you've given me right now. I
presume the fact that you mention the word note in your code you have
realised that I'm trying to search for musical sequences. I tried to
put my problem across in the way I did, not mentioning music, so as not
to confuse the matter. So simply put, I'm checking a list for musical
sequences, without having any idea as to how they might be formed.

Thanks,

Malcolm
 
M

Michael Spencer

Hi Michael,

Thanks for a quick response, I appreciate it. I will have to get back
to you tomorrow, as I can't check what you've given me right now. I
presume the fact that you mention the word note in your code you have
realised that I'm trying to search for musical sequences.

Yes, I thought that was the most likely application

I tried to
put my problem across in the way I did, not mentioning music, so as not
to confuse the matter. So simply put, I'm checking a list for musical
sequences, without having any idea as to how they might be formed.

What I put together does not yet address the general problem. In particular, it
looks only at groups that start at positions that are an integral multiple of
their length i.e., 3-tuples starting only at 0,3,6,9... If you really want to
look for repetition of any sequence anywhere, then you would need a small
modification to the make_groups function:

def make_groups(sequence, n, anywhere = False):
"""return groups of length n from sequence.

bool: anywhere
False: look only at positions n*i for i = 1,2,3...
True: return every (overlapping sequence) of length n

Usage:
>>> list(make_groups(s1,3)) [['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e']]
>>> list(make_groups(s1,3,anywhere = True))
[['a', 'b', 'c'], ['b', 'c', 'b'], ['c', 'b', 'c'], ['b', 'c', 'd'],
['c', 'd', 'c'], ['d', 'c', 'd'], ['c', 'd', 'e']] """
length = len(sequence)
if (length % n) and not anywhere:
raise ValueError, "Sequence length %s is not multiple of %s"\
% (length, n)
for i in xrange(0, length-n+1, anywhere or n):
yield sequence[i:i+n]

and a corresponding tweak to the compare_forms:

def compare_forms(sequence, anywhere = False):
"""test whether all groups of a sequence have the same form"""
for length in (4,3,2): # look for longest match first
forms = {}
try:
for group in make_diffs(make_groups(sequence, length, anywhere)):
forms[group] = forms.get(group,0)+1
print forms
except ValueError:
# raised if the sequence is not a multiple of length
pass
{(0, 6, 0, 1): 2, (0, 1, 2, 1): 2, (0, 1, 0, 1): 2}
{(0, 1, 2): 3, (0, 1, 0): 2, (0, 6, 0): 2}
{(0, 1): 6, (0, 6): 2}

Other questions - are your search sequences really limited to 4 notes? How long
are the pieces you are searching? Do you not need to care about note lengths,
only pitches. Are you really (as I inferred from your examples) looking only at
a 7-tone scale etc... etc... Anyway, I'd be happy to discuss this on-list or off.

Michael
 
T

temp

I’ve had chance to look at your code. I had an idea that I the answer
to my problem some how involved breaking down the list into groups and
that the notes would be easier to work with as numbers. However, I
think your ‘make_diffs’ function is really a very cunning way to
go about things ïŠ I would have never have thought of that!

As to the latest update, that is very welcome and will allow me to
analyse the results given in more detail.

In answer to your questions –

Are your search sequences really limited to 4 notes? – No, just used
that as an example
How long are the pieces you are searching? – Anything from the length
of a phrase to perhaps 32 bars, though maybe longer at some later date.
Do you not need to care about note lengths, only pitches. – While
I’ve been trying to figure out a way to do this I’ve just
concentrated on pitches, but it would be great to be able to look at
the note lengths as well.
Are you really (as I inferred from your examples) looking only at
a 7-tone scale – Yes, for the time being

Your code will identify sequences in a list, but how to index them? I
have an idea, which seems ridiculously long-winded, but should work.
First, put the groups from the ‘make_diffs’ function into a list
and do a search for the sequence identified from the
‘compare_forms’ function using the ‘Knuth-Morris-Pratt’
function in the python cookbook. Then the positions in that list will
obviously be the same as those in the original.

Next is the problem of the interval between the individual groups of
the sequence. Using something along the lines of your ‘make_diffs’
it should be a simple matter to see by what interval the groups in the
sequence are transposed.

Hmm… those last two paragraphs come across as the ranting of a mad
man. I’m off home to see if something like that works…

All the best and thanks again for the help,

Malcolm
 
M

Michael Spencer

Your code will identify sequences in a list, but how to index them? I
have an idea, which seems ridiculously long-winded, but should work.
First, put the groups from the ‘make_diffs’ function into a list
and do a search for the sequence identified from the
‘compare_forms’ function using the ‘Knuth-Morris-Pratt’
function in the python cookbook. Then the positions in that list will
obviously be the same as those in the original.

Next is the problem of the interval between the individual groups of
the sequence. Using something along the lines of your ‘make_diffs’
it should be a simple matter to see by what interval the groups in the
sequence are transposed.

Hold on! Adding the index position and transposition offset is a small change to
the existing code:

# Good sequences
s1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
s2 = ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
s3 = ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']

# Bad sequences
b1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd']
b2 = ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c']


def make_groups(sequence, n, anywhere = False):
"""yields tuples of sub-sequences of length n, and their position
from sequence.

bool: anywhere
False: look only at positions n*i for i = 1,2,3...
True: return every (overlapping sequence) of length n

Usage: [(['a', 'b', 'c'], (0, 3)), (['b', 'c', 'd'], (3, 6)), (['c', 'd', 'e'],
(6, 9))]
"""
length = len(sequence)
if (length % n) and not anywhere:
raise ValueError, "Sequence length %s is not multiple of %s"\
% (length, n)
for i in xrange(0, length-n+1, anywhere or n):
yield sequence[i:i+n], (i, i+n)

def ord_note(note, note_base = "a"):
"""Return the ordinal value of a note, or the interval between two notes"""
return (ord(note)-ord(note_base)) % 7

def make_diffs(groups):
"""return groups in canonical form where the first note
in each sequence is 0, and each subsequent note is expressed as
a positive interval from the first.
Example: [((0, 1, 2), (0, 3)), ((0, 1, 2), (3, 6)), ((0, 1, 2), (6, 9))]
"""
for group, index in groups:
seq, base_note = [0], group[0]
for note in group[1:]:
interval = ord_note(note, base_note)
seq.append(interval)
yield tuple(seq), index, base_note

def group_forms(sequence, anywhere = False, max_length = 4):
"""group sequences of similar form

yields {form_tuple: [index tuples where form occurs]}
starting with longer forms
"""

for length in range(max_length, 1, -1):
forms = {}
try:
for group, index, base_note in make_diffs(make_groups(sequence,
length, anywhere)):
forms.setdefault(group,[]).append((index,base_note))
yield forms
except ValueError:
# raised if the sequence is not a multiple of length
pass

def search(sequence, max_length = 4, min_repeats = 3):
"""print all sequences that occur least min_repeats times"""
found = 0
print "Finding forms length 2-%s, occuring at minimum %s times" \
% (max_length, min_repeats)
for form_dict in group_forms(sequence, anywhere = True):
for form, data in form_dict.iteritems():
if len(data) >= min_repeats:
print "%s:%s" % (form, ["%s:%s" % (index, ord_note(offset))
for index, offset in data])
found +=1
print "Found: %s" % found
Finding forms length 2-4, occuring at minimum 3 times
(0, 1, 2):['(0, 3):0', '(3, 6):1', '(6, 9):2']
(0, 1):['(0, 2):0', '(1, 3):1', '(3, 5):1', '(4, 6):2', '(6, 8):2', '(7, 9):3']
Found: 2


Given the increasingly complex tuples between the different stages it would
probably be worth defining a class to represent a form_match object

Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,247
Latest member
crypto tax software1

Latest Threads

Top