In-place array modification

U

umdehoon

Hello everybody,

Say I have an array (Python array, not Numpy) that contains characters
representing a DNA sequence. Something like

Now I want to calculate the reverse complement of this sequence. I could
make a simple loop:
d = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
s.reverse()
s array('c', 'TCGTCGATA')
n = len(s)
for i in range(n): .... s = d[s]
s

array('c', 'AGCAGCTAT')

But usually DNA sequences are very long, so the loop might take a long
time to finish. So I thought I would use the map function instead:
s = map(lambda c: d[c], s)
s = array('c', s)

But then I am creating memory for s from scratch. So what I would really
like to do is to have a function similar to map, but have it modify the
array in place. Does such a function exist in Python?


Many thanks,

Michiel de Hoon
Human Genome Center, U Tokyo.
 
P

Peter Otten

umdehoon said:
Say I have an array (Python array, not Numpy) that contains characters
representing a DNA sequence. Something like


Now I want to calculate the reverse complement of this sequence. I could

The following does not meet your spec (it's not in-place), but might be
worth considering as well:
array('c', 'AGCAGCTAT')

Peter
 
I

Istvan Albert

umdehoon said:
> Say I have an array (Python array, not Numpy) that contains characters
>for i in range(n):
> s = d[s]


For large lists 'range' will affect the performance since it
creates another huge list. You might think about using xrange or
enumerate. With xrange I could replace 10 million bases in
20 seconds whereas I had to kill the test program using
range b/c it started gobbling up resources...

I suspected that psyco might do a good job of optimizing
this kind of looping. Indeed, the same job took less than 3
seconds with psyco enabled.

Istvan.
 
J

Jeff Epler

With 2.4's generator expressions and slice assignment, you should be
able to perform these kinds of operations in-place. You would write
something like
s[:] = (d[si] for si in s)
... unfortunately, this doesn't work.
TypeError: can only assign array (not "generator") to array slice

I thought that numarray arrays might accept generators, but I get an odd
error there too:
n = numarray.zeros((5,))
n[:] = iter(range(5))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
libnumarray.error: Type object lookup returned NULL for type -1

If you can identify a small set of operations that need to be in-place
(like this simple base-to-base mapping), you could code them in C or
Pyrex---as a bonus, they'll be fast, too. The map() variant is not likely
to be very fast (in fact, I'd be surprised if it was faster than the
inplace for-loop version). string.translate() would work in this case,
very quickly, but create an extra copy just like the map() solution would.
s = map(d.get, s) might beat
s = map(lambda c: d[c], s)
but probably not by enough to matter.

Here's a run with my implementation of "itranslate" in Pyrex:array('c', 'TATCGACGA')

And here's the Pyrex source for itranslate:

cdef extern from "Python.h":
int PyObject_AsWriteBuffer(object, void **, int *) except -1
int PyString_AsStringAndSize(object, char **, int *) except -1

def itranslate(object obuf, object ot):
cdef int blen, tlen, i
cdef unsigned char *buf, *t

PyObject_AsWriteBuffer(obuf, <void **>&buf, &blen)
PyString_AsStringAndSize(ot, <char **>&t, &tlen)

if tlen != 256:
raise ValueError, "Translation must be length-256 string"

for i from 0 <= i < blen:
buf = t[buf]

On my 650MHz x86 machine, a translation of a 1 megabyte buffer with
itranslate takes 15ms according to "timeit" (that's 66 megs per second,
which doesn't seem that great, but probably beats the plain Python code
handily)

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFA+946Jd01MZaTXX0RAvvyAJ4hgupdX+YLxzMGg2ijRzIF85dzXACdG4YH
vd35MQG+0Vm2dAinEIZwP58=
=2ELj
-----END PGP SIGNATURE-----
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top