Most efficient way of storing 1024*1024 bits

  • Thread starter Tor Erik Sønvisen
  • Start date
T

Tor Erik Sønvisen

Hi

I need a time and space efficient way of storing up to 6 million bits. Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.

regards tores
 
P

Paul Rubin

Tor Erik Sønvisen said:
I need a time and space efficient way of storing up to 6 million bits. Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.

Umm, what kind of searches do you want to do? For speed you want to
use built-in functions wherever you can, string.find and that kind of
thing. So choose your data format accordingly.
 
M

Mike Meyer

Tor Erik Sønvisen said:
I need a time and space efficient way of storing up to 6 million bits. Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.

Six megabytes is pretty much nothing on a modern computer. I'd store
the things as a string of "0" and "1", and then use .find (or maybe
the in keyword) for doing the searches.

This doesn't work very well if you're going to mutate the string,
though.

<mike
 
P

Paul Rubin

Mike Meyer said:
Six megabytes is pretty much nothing on a modern computer. I'd store
the things as a string of "0" and "1", and then use .find (or maybe
the in keyword) for doing the searches.

This doesn't work very well if you're going to mutate the string,
though.

You can use re.search on array.array byte vectors. I don't know how
the speed compares with string.find.
 
A

Alex Martelli

Paul Rubin said:
You can use re.search on array.array byte vectors. I don't know how
the speed compares with string.find.

Pretty well, though of course one should measure on representative cases
for one's specific application needs:

Helen:~ alex$ python -mtimeit -s'import re, array' -s'x=999999*"x"'
-s'x=x+"a"+x' \
'x.find("a")'
100 loops, best of 3: 13.3 msec per loop

Helen:~ alex$ python -mtimeit -s'import re, array' -s'x=999999*"x"'
-s'x=x+"a"+x' 're.search("a", x)'
100 loops, best of 3: 8.73 msec per loop

Helen:~ alex$ python -mtimeit -s'import re, array' -s'x=999999*"x"'
-s'x=x+"a"+x' -s'x=array.array("c",x)' 're.search("a", x)'
100 loops, best of 3: 8.72 msec per loop


Alex
 
B

Bengt Richter

Hi

I need a time and space efficient way of storing up to 6 million bits. Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.
Very dependent on what kind of "searches" -- e.g., 1024*1024 suggests the
possibility of two dimensions. Quad-trees? How sparse is the data? Etc.
What kinds of patterns are you going to search for?

Regards,
Bengt Richter
 
S

Steven D'Aprano

Hi

I need a time and space efficient way of storing up to 6 million bits.

[inserts pinky in corner of mouth]

Six MILLION bits!!!

That's almost 750K. Are you sure your computer will handle that much data?

Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.

Could you be more vague if you tried? Searching for what? Does your data
have structure you can exploit? Can you put it in a tree structure? Are
you going to be inserting and deleting data?

If you can handle your six million bits in lots of eight, store them as a
character string.

If you can keep your data sorted, then you can do binary searches instead
of linear searches. If your data is hashable, you can store it in a
dictionary and potentially get up to constant-time search speeds.

Or forget about manipulating bits, and just store your data as bools in
a list.

Explain your problem a little better, and you may get some better advice.
 
D

Dan Bishop

Tor said:
Hi

I need a time and space efficient way of storing up to 6 million bits.

The most space-efficient way of storing bits is to use the bitwise
operators on an array of bytes:

import array

class BitList(object):
def __init__(self, data=None):
self._data = array.array('B')
self._length = 0
if data is not None:
self.extend(data)
def __len__(self):
return self._length
def __getitem__(self, index):
if index < 0:
index += self._length
if index < 0 or index >= self._length:
raise IndexError('BitList index out of range')
byte_index, bit_index = divmod(index, 8)
return int(bool(self._data[byte_index] & (1 << bit_index)))
def __setitem__(self, index, value):
if index < 0:
index += self._length
if index < 0 or index >= self._length:
raise IndexError('BitList index out of range')
byte_index, bit_index = divmod(index, 8)
byte = self._data[byte_index]
bitmask = 1 << bit_index
byte &= ~bitmask & 0xFF
if value:
byte |= bitmask
self._data[byte_index] = byte
def __repr__(self):
return 'BitList([%s])' % ', '.join(str(bit) for bit in self)
def append(self, value):
if not self._length % 8:
self._data.append(0)
self._length += 1
self[-1] = value
def extend(self, values):
for v in values:
self.append(v)
Time efficency is more important then space efficency

In that case, you're better off simply using a list of bools.
as I'm going to do searches through the bit-set.

What kind of searches?
 
B

Brandon K

BTW, it'd be 6 megabits or 750kb ;)
Six megabytes is pretty much nothing on a modern computer.



----== Posted via Newsgroups.com - Usenet Access to over 100,000 Newsgroups ==----
Get Anonymous, Uncensored, Access to West and East Coast Server Farms!
----== Highest Retention and Completion Rates! HTTP://WWW.NEWSGROUPS.COM ==----
 
A

Alex Martelli

BTW, it'd be 6 megabits or 750kb ;)

....but Mike was proposing using one digit per bit, hence, 6 megabytes.
That makes it easy to search for bitpatterns with re or string.find; if
the bits were packed 8 to a byte, such searches would be very hard.


Alex
 
A

Alex Stapleton

...but Mike was proposing using one digit per bit, hence, 6 megabytes.
That makes it easy to search for bitpatterns with re or
string.find; if
the bits were packed 8 to a byte, such searches would be very hard.

They would just require some out-of-the-box thinking using character
arrays and stuff I think. It's definately still doable with regex's
if you really want to.
 
A

Alex Martelli

Alex Stapleton said:
They would just require some out-of-the-box thinking using character
arrays and stuff I think. It's definately still doable with regex's
if you really want to.

"Doable", yes, of course -- that's pretty obvious, and I'm not sure
what's "out of the box" about it -- but orders of magnitude harder. For
example, to look for a fixed sequence of X bits, you'd have to search
for any of 8 sequences of slightly more than X/8 characters (where the
first and last are normally charsets) built by the possible shifts of
the bitsequence by 0, 1, ... 7 bits. I also suspect that performance
would badly suffer (dealing with 750 KB of data, plus all the auxiliary
stuff for Python and re, isn't going to fit in cache anyway). All in
all, doesn't feel worth pursuing, particularly when the OP mentioned
time mattering more than space right from the word go.


Alex
 
T

Tom Anderson

The most space-efficient way of storing bits is to use the bitwise
operators on an array of bytes:

Actually, no, it's to xor all the bits together and store them in a single
boolean.

Getting them back out is kinda tricky though.
In that case, you're better off simply using a list of bools.

Unlikely - the indexing is a bit simpler, but the cache hit rate is going
to go through the floor.

tom
 
B

Ben Sizer

Tom said:
Actually, no, it's to xor all the bits together and store them in a single
boolean.

I'd use 'or' rather than 'xor'. The or operator is more likely to yield
a '1' at the end of it, and a 1 is narrower than a 0, obviously making
it more efficient to store.
 
A

Alex Stapleton

I'd use 'or' rather than 'xor'. The or operator is more likely to
yield
a '1' at the end of it, and a 1 is narrower than a 0, obviously making
it more efficient to store.

<xahlee>
Typical gas guzzling, SUV driving american logic. A would obviously
use more POWER and hence INCREASE GLOBAL WARMING leading to the
ultimate DEATH of EVERYBODY you know and LOVE!
</xahlee>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top