slicing functionality for strings / Python suitability for bioinformatics

J

jbperez808

rs='AUGCUAGACGUGGAGUAG'
Traceback (most recent call last):
File "<pyshell#119>", line 1, in ?
rs[12:15]='GAG'
TypeError: object doesn't support slice assignment

You can't assign to a section of a sliced string in
Python 2.3 and there doesn't seem to be mention of this
as a Python 2.4 feature (don't have time to actually try
2.4 yet).

Q1. Does extended slicing make use of the Sequence protocol?
Q2. Don't strings also support the Sequence protcol?
Q3. Why then can't you make extended slicing assignment work
when dealing with strings?

This sort of operation (slicing/splicing of sequences represented
as strings) would seem to be a very fundamental oepration when doing
rna/dna/protein sequencing algorithms, and it would greatly enhance
Python's appeal to those doing bioinformatics work if the slicing
and extended slicing operators worked to their logical limit.

Doing a cursory search doesn't seem to reveal any current PEPs
dealing with extending the functionality of slicing/extended
slicing operators.

Syntax and feature-wise, is there a reason why Python can't kick
Perl's butt as the dominant language for bioinformatics and
eventually become the lingua franca of this fast-growing and
funding-rich field?
 
R

Reinhold Birkenfeld

rs='AUGCUAGACGUGGAGUAG'
rs[12:15]='GAG'
Traceback (most recent call last):
File "<pyshell#119>", line 1, in ?
rs[12:15]='GAG'
TypeError: object doesn't support slice assignment

You can't assign to a section of a sliced string in
Python 2.3 and there doesn't seem to be mention of this
as a Python 2.4 feature (don't have time to actually try
2.4 yet).

Strings are immutable in Python, which is why assignment to
slices won't work.

But why not use lists?

rs = list('AUGC...')
rs[12:15] = list('GAG')

Reinhold
 
T

Terry Reedy

Reinhold Birkenfeld said:
rs='AUGCUAGACGUGGAGUAG'
rs[12:15]='GAG'
Traceback (most recent call last):
File "<pyshell#119>", line 1, in ?
rs[12:15]='GAG'
TypeError: object doesn't support slice assignment

You can't assign to a section of a sliced string in
Python 2.3 and there doesn't seem to be mention of this
as a Python 2.4 feature (don't have time to actually try
2.4 yet).

Strings are immutable in Python, which is why assignment to
slices won't work.

But why not use lists?

rs = list('AUGC...')
rs[12:15] = list('GAG')

Or arrays of characters: see the array module.

Terry J. Reedy
 
J

jbperez808

Great suggestion... I was naively trying to turn the string into a list
and slice
that which I reckon would be significantly slower.
 
S

Steven D'Aprano

Having to do an array.array('c',...):
x=array.array('c','ATCTGACGTC')
x[1:9:2]=array.array('c','AAAA')
x.tostring()
'AACAGACATC'

is a bit klunkier than one would want, but I guess
the efficient performance is the silver lining here.

There are a number of ways to streamline that. The simplest is to merely
create an alias to array.array:

from array import array as str

Then you can say x = str('c', 'ATCTGACGTC').

A little more sophisticated would be to use currying:

def str(value):
return array.array('c', value)

x = str('ATCTGACGTC')

although to be frank I'm not sure that something as simple as this
deserves to be dignified with the name currying.


Lastly, you could create a wrapper class that implements everything you
want. For a serious application, this is probably what you want to do
anyway:

class DNA_Sequence:
alphabet = 'ACGT'

def __init__(self, value):
for c in value:
if c not in self.__class__.alphabet:
raise ValueError('Illegal character "%s".' % c)
self.value = array.array('c', value)

def __repr__(self):
return self.value.tostring()

and so on. Obviously you will need more work than this, and it may be
possible to subclass array directly.
 
T

Tom Anderson

Having to do an array.array('c',...):
x=array.array('c','ATCTGACGTC')
x[1:9:2]=array.array('c','AAAA')
x.tostring()
'AACAGACATC'

is a bit klunkier than one would want, but I guess the efficient
performance is the silver lining here.

There are a number of ways to streamline that. The simplest is to merely
create an alias to array.array:

from array import array as str

Then you can say x = str('c', 'ATCTGACGTC').

A little more sophisticated would be to use currying:

def str(value):
return array.array('c', value)

x = str('ATCTGACGTC')

There's a special hell for people who override builtins.
although to be frank I'm not sure that something as simple as this
deserves to be dignified with the name currying.

It's definitely not currying - it doesn't create a new function. Currying
would be:

def arraytype(kind):
def mkarray(value):
return array.array(kind, value)
return mkarray

chars = arraytype('c')
seq = chars("tacatcgtcgacgtcgatcagtaccc")
Lastly, you could create a wrapper class that implements everything you
want. For a serious application, this is probably what you want to do
anyway:

Definitely - there are lots of things to know about DNA molecules or parts
of them that aren't captured by the sequence.

tom
 
T

Tom Anderson

which is, most likely, chock full of highly experienced python
programmers.

You reckon? I've never felt the need to do it myself, and instinctively,
it seems like a bad idea. Perhaps i've been missing something, though -
could you give me some examples of when overriding a builtin is a good
thing to do?

tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top