recombination variations

David Siedband · Nov 30, 2004

The problem I'm solving is to take a sequence like 'ATSGS' and make all
the DNA sequences it represents. The A, T, and G are fine but the S
represents C or G. I want to take this input:

[ [ 'A' ] , [ 'T' ] , [ 'C' , 'G' ], [ 'G' ] , [ 'C' , 'G' ] ]

and make the list:

[ 'ATCGC' , 'ATCGG' , 'ATGGC' , 'ATGGG' ]

The code below is what I have so far: 'alphabet' is a dictionary that
designates the set oif base pairs that each letter represents (for
example for S above it gives C and G). I call these ambiguous base
pairs because they could be more then one. Thus the function name
'unambiguate'. It makes a list of sequences with only A T C and Gs and
none of the ambiguous base pair designations.

The function 'unambiguate_bp' takes a sequence and a base pair in it
and returns a set of sequences with that base pair replaced by each of
it's unambiguous possibilities.

The function unambiguate_seq takes a sequence and runs unambiguate_bp
on each base pair in the sequence. Each time it does a base pair it
replaces the set of things it's working on with the output from the
unambiguate_bp. It's a bit confusing. I'd like it to be clearer.

Is there a better way to do this?
--
David Siedband
generation-xml.com

def unambiguate_bp(seq, bp):
seq_set = []
for i in alphabet[seq[bp]]:
seq_set.append(seq[:bp]+i+seq[bp+1:])
return seq_set

def unambiguate_seq(seq):
result = [seq]
for i in range(len(seq)):
result_tmp=[]
for j in result:
result_tmp = result_tmp + unambiguate_bp(j,i)
result = result_tmp
return result

alphabet = {
'A' : ['A'],
'T' : ['T'],
'C' : ['C'],
'G' : ['G'],
'W' : ['A','T'],
'M' : ['A','C'],
'R' : ['A','G'],
'Y' : ['T','C'],
'K' : ['T','G'],
'S' : ['C','G'],
'H' : ['A','T','C'],
'D' : ['A','T','G'],
'V': ['A','G','C'],
'B' : ['C','T','G'],
'N' : ['A','T','C','G']
}

Dennis Benzinger · Nov 30, 2004

David said:
[...]
Is there a better way to do this?
[...]

Take a look at Biopython: http://biopython.org/

Your problem may be solved there already.

Peter Otten · Dec 1, 2004

David said:
The problem I'm solving is to take a sequence like 'ATSGS' and make all
the DNA sequences it represents. The A, T, and G are fine but the S
represents C or G. I want to take this input:

[ [ 'A' ] , [ 'T' ] , [ 'C' , 'G' ], [ 'G' ] , [ 'C' , 'G' ] ]

and make the list:

[ 'ATCGC' , 'ATCGG' , 'ATGGC' , 'ATGGG' ]

[...]

The code you provide only addresses the first part of your problem, and so
does mine:
.... return [list(alphabet.get(c, c)) for c in seq]
........ "W": "AT",
.... "S": "CG"
.... }[['A'], ['T'], ['C', 'G'], ['G'], ['C', 'G']]

Note that "identity entries" (e. g. mapping "A" to "A") in the alphabet
dictionary are no longer necessary. The list() call in disambiguate() is
most likely superfluous, but I put it in to meet your spec accurately.

Now on to the next step

Peter

Hung Jung Lu · Dec 2, 2004

alphabet = {
'A': 'A',
'T': 'T',
'C': 'C',
'G': 'G',
'W': 'AT',
'M': 'AC',
'R': 'AG',
'Y': 'TC',
'K': 'TG',
'S': 'CG',
'H': 'ATC',
'D': 'ATG',
'V': 'AGC',
'B': 'CTG',
'N': 'ATCG'
}

expand = lambda t: reduce(lambda r, s: [x+y for x in r for y in
alphabet], t, [''])

print expand('ATSGS')

Scott David Daniels · Dec 2, 2004

Hung said:
... expand = lambda t: reduce(lambda r, s: [x+y for x in r
> for y in alphabet], t, [''])
print expand('ATSGS')

Or, for a more verbose version:

multis = dict(W='AT', M='AC', R='AG', Y='TC', K='TG', S='CG',
H='ATC', D='ATG', V='AGC', B='CTG', N='ATCG')

def expanded(string, expansions=multis):
result = ''
for pos, char in enumerate(string):
if char in multis:
break
else:
yield string
raise StopIteration
parts = multis[char]
prelude, string = string[os], string[pos+1:]
for expansion in expanded(string, multis):
for middle in parts:
yield prelude + middle + expansion

--Scott David Daniels
(e-mail address removed)

My Status, Ciphertext	2	Nov 28, 2023
Blue J Ciphertext Program	2	Nov 22, 2023
Need Help: Program to Accept 2 Matrices and Show their Sum	0	Aug 21, 2022
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Java MemoryLayout/ValueLayout Questions.	2	Feb 5, 2023
Can't solve problems! please Help	0	Sep 26, 2022
generate De Bruijn sequence memory and string vs lists	0	Jan 23, 2014
Padding strings for a clean visual print out...	5	Dec 23, 2023

recombination variations

David Siedband

Dennis Benzinger

Peter Otten

Hung Jung Lu

Scott David Daniels

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads