recombination variations

D

David Siedband

The problem I'm solving is to take a sequence like 'ATSGS' and make all
the DNA sequences it represents. The A, T, and G are fine but the S
represents C or G. I want to take this input:

[ [ 'A' ] , [ 'T' ] , [ 'C' , 'G' ], [ 'G' ] , [ 'C' , 'G' ] ]

and make the list:

[ 'ATCGC' , 'ATCGG' , 'ATGGC' , 'ATGGG' ]

The code below is what I have so far: 'alphabet' is a dictionary that
designates the set oif base pairs that each letter represents (for
example for S above it gives C and G). I call these ambiguous base
pairs because they could be more then one. Thus the function name
'unambiguate'. It makes a list of sequences with only A T C and Gs and
none of the ambiguous base pair designations.

The function 'unambiguate_bp' takes a sequence and a base pair in it
and returns a set of sequences with that base pair replaced by each of
it's unambiguous possibilities.

The function unambiguate_seq takes a sequence and runs unambiguate_bp
on each base pair in the sequence. Each time it does a base pair it
replaces the set of things it's working on with the output from the
unambiguate_bp. It's a bit confusing. I'd like it to be clearer.

Is there a better way to do this?
--
David Siedband
generation-xml.com



def unambiguate_bp(seq, bp):
seq_set = []
for i in alphabet[seq[bp]]:
seq_set.append(seq[:bp]+i+seq[bp+1:])
return seq_set

def unambiguate_seq(seq):
result = [seq]
for i in range(len(seq)):
result_tmp=[]
for j in result:
result_tmp = result_tmp + unambiguate_bp(j,i)
result = result_tmp
return result



alphabet = {
'A' : ['A'],
'T' : ['T'],
'C' : ['C'],
'G' : ['G'],
'W' : ['A','T'],
'M' : ['A','C'],
'R' : ['A','G'],
'Y' : ['T','C'],
'K' : ['T','G'],
'S' : ['C','G'],
'H' : ['A','T','C'],
'D' : ['A','T','G'],
'V': ['A','G','C'],
'B' : ['C','T','G'],
'N' : ['A','T','C','G']
}
 
P

Peter Otten

David said:
The problem I'm solving is to take a sequence like 'ATSGS' and make all
the DNA sequences it represents. The A, T, and G are fine but the S
represents C or G. I want to take this input:

[ [ 'A' ] , [ 'T' ] , [ 'C' , 'G' ], [ 'G' ] , [ 'C' , 'G' ] ]

and make the list:

[ 'ATCGC' , 'ATCGG' , 'ATGGC' , 'ATGGG' ]

[...]

The code you provide only addresses the first part of your problem, and so
does mine:
.... return [list(alphabet.get(c, c)) for c in seq]
........ "W": "AT",
.... "S": "CG"
.... }[['A'], ['T'], ['C', 'G'], ['G'], ['C', 'G']]

Note that "identity entries" (e. g. mapping "A" to "A") in the alphabet
dictionary are no longer necessary. The list() call in disambiguate() is
most likely superfluous, but I put it in to meet your spec accurately.

Now on to the next step :)

Peter
 
H

Hung Jung Lu

alphabet = {
'A': 'A',
'T': 'T',
'C': 'C',
'G': 'G',
'W': 'AT',
'M': 'AC',
'R': 'AG',
'Y': 'TC',
'K': 'TG',
'S': 'CG',
'H': 'ATC',
'D': 'ATG',
'V': 'AGC',
'B': 'CTG',
'N': 'ATCG'
}

expand = lambda t: reduce(lambda r, s: [x+y for x in r for y in
alphabet], t, [''])

print expand('ATSGS')
 
S

Scott David Daniels

Hung said:
... expand = lambda t: reduce(lambda r, s: [x+y for x in r
> for y in alphabet], t, [''])
print expand('ATSGS')


Or, for a more verbose version:

multis = dict(W='AT', M='AC', R='AG', Y='TC', K='TG', S='CG',
H='ATC', D='ATG', V='AGC', B='CTG', N='ATCG')

def expanded(string, expansions=multis):
result = ''
for pos, char in enumerate(string):
if char in multis:
break
else:
yield string
raise StopIteration
parts = multis[char]
prelude, string = string[:pos], string[pos+1:]
for expansion in expanded(string, multis):
for middle in parts:
yield prelude + middle + expansion


--Scott David Daniels
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top