Line segments, overlap, and bits

S

Sean Davis

I am working with genomic data. Basically, it consists of many tuples
of (start,end) on a line. I would like to convert these tuples of
(start,end) to a string of bits where a bit is 1 if it is covered by
any of the regions described by the (start,end) tuples and 0 if it is
not. I then want to do set operations on multiple bit strings (AND,
OR, NOT, etc.). Any suggestions on how to (1) set up the bit string
and (2) operate on 1 or more of them? Java has a BitSet class that
keeps this kind of thing pretty clean and high-level, but I haven't
seen anything like it for python.

Thanks,
Sean
 
B

bearophileHUGS

Sean Davis>Java has a BitSet class that keeps this kind of thing
pretty clean and high-level, but I haven't seen anything like it for
python.<

If you look around you can usually find Python code able to do most of
the things you want, like (you can modify this code to add the boolean
operations):
http://svn.zope.org/*checkout*/Zope.../ricecode.py?content-type=text/plain&rev=8532
(Later I have improved that code for personal use).

But operations on such bit arrays are very common, and you may need
them to be very fast, so you may use Cython (Pyrex) to write a small
class able to do those things in a much faster way.
Or you can also use Pyd plus an easy 30-lines long D program to write
a class that does those operations very quickly (using a dynamic
array of uint, with the help of www.digitalmars.com/d/1.0/phobos/std_intrinsic.html
).
Maybe you can use GMPY numbers as bit arrays too. (I don't know if you
can use NumPy for this using a compact representation of the bits).

Bye,
bearophile
 
C

castironpi

Sean Davis>Java has a BitSet class that keeps this kind of thing
pretty clean and high-level, but I haven't seen anything like it for
python.<

If you look around you can usually find Python code able to do most of
the things you want, like (you can modify this code to add the boolean
operations):http://svn.zope.org/*checkout*/Zope3/trunk/src/zope/textindex/ricecod...
(Later I have improved that code for personal use).

But operations on such bit arrays are very common, and you may need
them to be very fast, so you may use Cython (Pyrex) to write a small
class able to do those things in a much faster way.
Or you can also use Pyd plus an easy 30-lines long D program to write
a class that does those operations very quickly  (using a dynamic
array of uint, with the help ofwww.digitalmars.com/d/1.0/phobos/std_intrinsic.html
).
Maybe you can use GMPY numbers as bit arrays too. (I don't know if you
can use NumPy for this using a compact representation of the bits).

I believe you're talking about collision detection, and my bartender
is entertaining a proof that serial memory can't do it in time. So
far the idea I see here is primitive memory couplings that change
value based on primitive conditions of other memory. Peripherals
including a chipset might show promise, esp. a flash chip, s.l.
'autopersistent' memory. Even if you still poll results, it's still a
factor off O( n ).
 
G

George Sakkis

I am working with genomic data. Basically, it consists of many tuples
of (start,end) on a line. I would like to convert these tuples of
(start,end) to a string of bits where a bit is 1 if it is covered by
any of the regions described by the (start,end) tuples and 0 if it is
not. I then want to do set operations on multiple bit strings (AND,
OR, NOT, etc.). Any suggestions on how to (1) set up the bit string
and (2) operate on 1 or more of them? Java has a BitSet class that
keeps this kind of thing pretty clean and high-level, but I haven't
seen anything like it for python.

Thanks,
Sean

Have you looked at Pypi [1] ? From a quick search, there are at least
four potentially relevant packages: BitPacket, BitVector, BitBuffer,
BitBucket.

George


[1] http://pypi.python.org/pypi?:action=search&term=bit&submit=search
 
P

Paul Rubin

Sean Davis said:
OR, NOT, etc.). Any suggestions on how to (1) set up the bit string
and (2) operate on 1 or more of them? Java has a BitSet class that
keeps this kind of thing pretty clean and high-level, but I haven't
seen anything like it for python.

You could wrap something around the array module, or the bit vector
objects from numarray, or use something like PyJudy (google for it).
Or consider using something like a tree data structure that supports
set operations, rather than a bitmap, if that's closer to what you're
really trying to do.
 
C

castironpi

You could wrap something around the array module, or the bit vector
objects from numarray, or use something like PyJudy (google for it).
Or consider using something like a tree data structure that supports
set operations, rather than a bitmap, if that's closer to what you're
really trying to do.

Keep it continuous. You're looking at holographic or specific on tree
leaves. I can store a GPS a -lot- easier than words. Force impact
from a direction? (Strong A, two Bs.) Record wind. Repose. On
idle, broadcast at unique intervals, wait for something to do.
(developmental reposition)

I suppose you want perception in the medium, but I think a special
mechanism is always more efficient. Can you have minispiders hand off
instructions by rapid tactile contact? Dock incl. clock sync, dual
transmit, undock, work. Sounds like modem (modulator/demodulator) and
ATM.

The DNA of an organism doesn't change over time, but it manages to
perform multi-cellular operations. Each cell has its very own copy,
read-only, of the whole's DNA. Can lightning bugs turn on and off
instruction sets (exec, donotexec)?

Mem. block:
0000 Yes 0 1 add
0001 Yes 0 2 add
0010 No 0 3 add
0011 Yes 0 4 add

runs: ( add 1, add 2, add 4 ).

Local can transmit modifications to neighbor sames. Local gradient is
stronger, which takes us to chemical. There is no living model for
bees that sync post-mortem live -- they get orders for the next day
"at dance". By the way, Queen bees, Workers and Drones have a
developmental period of 16, 21, 24 days respectively. Conclude
Workers are the special.

Thing is, two adjacent cells in the animal body can vary pretty widely
in trait. Stem cells can "differentiate into a diverse range of
specialized cell types". Is time a factor in response to mechanistic
signal propogation. Make a uniform response to every signal, and
you're a functional programmer. input f o output.
 
I

Istvan Albert

I am working with genomic data. Basically, it consists of many tuples
of (start,end) on a line. I would like to convert these tuples of
(start,end) to a string of bits where a bit is 1 if it is covered by
any of the regions described by the (start,end) tuples and 0 if it is
not. I then want to do set operations on multiple bit strings (AND,
OR, NOT, etc.). Any suggestions on how to (1) set up the bit string
and (2) operate on 1 or more of them? Java has a BitSet class that
keeps this kind of thing pretty clean and high-level, but I haven't
seen anything like it for python.

The solution depends on what size of genomes you want to work with.

There is a bitvector class that probably could do what you want, there
are some issues on scaling as it is pure python.

http://cobweb.ecn.purdue.edu/~kak/dist/BitVector-1.2.html

If you want high speed stuff (implemented in C and PyRex) that works
for large scale genomic data analysis the bx-python package might do
what you need (and even things that you don't yet know that you really
want to do)

http://bx-python.trac.bx.psu.edu/

but of course this one is a lot more complicated

i.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top