Most efficient way of storing 1024*1024 bits

Discussion in 'Python' started by Tor Erik Sønvisen, Nov 2, 2005.

  1. Hi

    I need a time and space efficient way of storing up to 6 million bits. Time
    efficency is more important then space efficency as I'm going to do searches
    through the bit-set.

    regards tores
     
    Tor Erik Sønvisen, Nov 2, 2005
    #1
    1. Advertising

  2. Tor Erik Sønvisen

    Guest

    C ?

    Tor Erik Sønvisen wrote:
    > Hi
    >
    > I need a time and space efficient way of storing up to 6 million bits. Time
    > efficency is more important then space efficency as I'm going to do searches
    > through the bit-set.
    >
    > regards tores
     
    , Nov 2, 2005
    #2
    1. Advertising

  3. Tor Erik Sønvisen

    Paul Rubin Guest

    "Tor Erik Sønvisen" <> writes:
    > I need a time and space efficient way of storing up to 6 million bits. Time
    > efficency is more important then space efficency as I'm going to do searches
    > through the bit-set.


    Umm, what kind of searches do you want to do? For speed you want to
    use built-in functions wherever you can, string.find and that kind of
    thing. So choose your data format accordingly.
     
    Paul Rubin, Nov 2, 2005
    #3
  4. Tor Erik Sønvisen

    Mike Meyer Guest

    "Tor Erik Sønvisen" <> writes:
    > I need a time and space efficient way of storing up to 6 million bits. Time
    > efficency is more important then space efficency as I'm going to do searches
    > through the bit-set.


    Six megabytes is pretty much nothing on a modern computer. I'd store
    the things as a string of "0" and "1", and then use .find (or maybe
    the in keyword) for doing the searches.

    This doesn't work very well if you're going to mutate the string,
    though.

    <mike
    --
    Mike Meyer <> http://www.mired.org/home/mwm/
    Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
     
    Mike Meyer, Nov 2, 2005
    #4
  5. Tor Erik Sønvisen

    Paul Rubin Guest

    Mike Meyer <> writes:
    > Six megabytes is pretty much nothing on a modern computer. I'd store
    > the things as a string of "0" and "1", and then use .find (or maybe
    > the in keyword) for doing the searches.
    >
    > This doesn't work very well if you're going to mutate the string,
    > though.


    You can use re.search on array.array byte vectors. I don't know how
    the speed compares with string.find.
     
    Paul Rubin, Nov 2, 2005
    #5
  6. Paul Rubin <http://> wrote:
    ...
    > You can use re.search on array.array byte vectors. I don't know how
    > the speed compares with string.find.


    Pretty well, though of course one should measure on representative cases
    for one's specific application needs:

    Helen:~ alex$ python -mtimeit -s'import re, array' -s'x=999999*"x"'
    -s'x=x+"a"+x' \
    > 'x.find("a")'

    100 loops, best of 3: 13.3 msec per loop

    Helen:~ alex$ python -mtimeit -s'import re, array' -s'x=999999*"x"'
    -s'x=x+"a"+x' 're.search("a", x)'
    100 loops, best of 3: 8.73 msec per loop

    Helen:~ alex$ python -mtimeit -s'import re, array' -s'x=999999*"x"'
    -s'x=x+"a"+x' -s'x=array.array("c",x)' 're.search("a", x)'
    100 loops, best of 3: 8.72 msec per loop


    Alex
     
    Alex Martelli, Nov 2, 2005
    #6
  7. On Wed, 2 Nov 2005 13:55:10 +0100, "Tor Erik Sønvisen" <> wrote:

    >Hi
    >
    >I need a time and space efficient way of storing up to 6 million bits. Time
    >efficency is more important then space efficency as I'm going to do searches
    >through the bit-set.
    >

    Very dependent on what kind of "searches" -- e.g., 1024*1024 suggests the
    possibility of two dimensions. Quad-trees? How sparse is the data? Etc.
    What kinds of patterns are you going to search for?

    Regards,
    Bengt Richter
     
    Bengt Richter, Nov 2, 2005
    #7
  8. On Wed, 02 Nov 2005 13:55:10 +0100, Tor Erik Sønvisen wrote:

    > Hi
    >
    > I need a time and space efficient way of storing up to 6 million bits.


    [inserts pinky in corner of mouth]

    Six MILLION bits!!!

    That's almost 750K. Are you sure your computer will handle that much data?


    > Time
    > efficency is more important then space efficency as I'm going to do searches
    > through the bit-set.


    Could you be more vague if you tried? Searching for what? Does your data
    have structure you can exploit? Can you put it in a tree structure? Are
    you going to be inserting and deleting data?

    If you can handle your six million bits in lots of eight, store them as a
    character string.

    If you can keep your data sorted, then you can do binary searches instead
    of linear searches. If your data is hashable, you can store it in a
    dictionary and potentially get up to constant-time search speeds.

    Or forget about manipulating bits, and just store your data as bools in
    a list.

    Explain your problem a little better, and you may get some better advice.


    --
    Steven.
     
    Steven D'Aprano, Nov 2, 2005
    #8
  9. Tor Erik Sønvisen

    Dan Bishop Guest

    Tor Erik Sønvisen wrote:
    > Hi
    >
    > I need a time and space efficient way of storing up to 6 million bits.


    The most space-efficient way of storing bits is to use the bitwise
    operators on an array of bytes:

    import array

    class BitList(object):
    def __init__(self, data=None):
    self._data = array.array('B')
    self._length = 0
    if data is not None:
    self.extend(data)
    def __len__(self):
    return self._length
    def __getitem__(self, index):
    if index < 0:
    index += self._length
    if index < 0 or index >= self._length:
    raise IndexError('BitList index out of range')
    byte_index, bit_index = divmod(index, 8)
    return int(bool(self._data[byte_index] & (1 << bit_index)))
    def __setitem__(self, index, value):
    if index < 0:
    index += self._length
    if index < 0 or index >= self._length:
    raise IndexError('BitList index out of range')
    byte_index, bit_index = divmod(index, 8)
    byte = self._data[byte_index]
    bitmask = 1 << bit_index
    byte &= ~bitmask & 0xFF
    if value:
    byte |= bitmask
    self._data[byte_index] = byte
    def __repr__(self):
    return 'BitList([%s])' % ', '.join(str(bit) for bit in self)
    def append(self, value):
    if not self._length % 8:
    self._data.append(0)
    self._length += 1
    self[-1] = value
    def extend(self, values):
    for v in values:
    self.append(v)

    > Time efficency is more important then space efficency


    In that case, you're better off simply using a list of bools.

    > as I'm going to do searches through the bit-set.


    What kind of searches?
     
    Dan Bishop, Nov 2, 2005
    #9
  10. Tor Erik Sønvisen

    Brandon K Guest

    BTW, it'd be 6 megabits or 750kb ;)
    > Six megabytes is pretty much nothing on a modern computer.




    ----== Posted via Newsgroups.com - Usenet Access to over 100,000 Newsgroups ==----
    Get Anonymous, Uncensored, Access to West and East Coast Server Farms!
    ----== Highest Retention and Completion Rates! HTTP://WWW.NEWSGROUPS.COM ==----
     
    Brandon K, Nov 3, 2005
    #10
  11. Brandon K <> wrote [inverting his topposting!]:

    > > Six megabytes is pretty much nothing on a modern computer.


    > BTW, it'd be 6 megabits or 750kb ;)


    ....but Mike was proposing using one digit per bit, hence, 6 megabytes.
    That makes it easy to search for bitpatterns with re or string.find; if
    the bits were packed 8 to a byte, such searches would be very hard.


    Alex
     
    Alex Martelli, Nov 3, 2005
    #11
  12. On 3 Nov 2005, at 05:03, Alex Martelli wrote:

    > Brandon K <> wrote [inverting his topposting!]:
    >
    >
    >>> Six megabytes is pretty much nothing on a modern computer.
    >>>

    >
    >
    >> BTW, it'd be 6 megabits or 750kb ;)
    >>

    >
    > ...but Mike was proposing using one digit per bit, hence, 6 megabytes.
    > That makes it easy to search for bitpatterns with re or
    > string.find; if
    > the bits were packed 8 to a byte, such searches would be very hard.
    >


    They would just require some out-of-the-box thinking using character
    arrays and stuff I think. It's definately still doable with regex's
    if you really want to.
     
    Alex Stapleton, Nov 3, 2005
    #12
  13. Alex Stapleton <> wrote:
    ...
    > >>> Six megabytes is pretty much nothing on a modern computer.

    > >
    > >> BTW, it'd be 6 megabits or 750kb ;)

    > >
    > > ...but Mike was proposing using one digit per bit, hence, 6 megabytes.
    > > That makes it easy to search for bitpatterns with re or
    > > string.find; if
    > > the bits were packed 8 to a byte, such searches would be very hard.

    >
    > They would just require some out-of-the-box thinking using character
    > arrays and stuff I think. It's definately still doable with regex's
    > if you really want to.


    "Doable", yes, of course -- that's pretty obvious, and I'm not sure
    what's "out of the box" about it -- but orders of magnitude harder. For
    example, to look for a fixed sequence of X bits, you'd have to search
    for any of 8 sequences of slightly more than X/8 characters (where the
    first and last are normally charsets) built by the possible shifts of
    the bitsequence by 0, 1, ... 7 bits. I also suspect that performance
    would badly suffer (dealing with 750 KB of data, plus all the auxiliary
    stuff for Python and re, isn't going to fit in cache anyway). All in
    all, doesn't feel worth pursuing, particularly when the OP mentioned
    time mattering more than space right from the word go.


    Alex
     
    Alex Martelli, Nov 3, 2005
    #13
  14. Tor Erik Sønvisen

    Tom Anderson Guest

    On Wed, 2 Nov 2005, Dan Bishop wrote:

    > Tor Erik Sønvisen wrote:
    >
    >> I need a time and space efficient way of storing up to 6 million bits.

    >
    > The most space-efficient way of storing bits is to use the bitwise
    > operators on an array of bytes:


    Actually, no, it's to xor all the bits together and store them in a single
    boolean.

    Getting them back out is kinda tricky though.

    >> Time efficency is more important then space efficency

    >
    > In that case, you're better off simply using a list of bools.


    Unlikely - the indexing is a bit simpler, but the cache hit rate is going
    to go through the floor.

    tom

    --
    power to the people and the beats
     
    Tom Anderson, Nov 3, 2005
    #14
  15. Tor Erik Sønvisen

    Ben Sizer Guest

    Tom Anderson wrote:
    > On Wed, 2 Nov 2005, Dan Bishop wrote:
    >
    > > Tor Erik Sønvisen wrote:
    > >
    > >> I need a time and space efficient way of storing up to 6 million bits.

    > >
    > > The most space-efficient way of storing bits is to use the bitwise
    > > operators on an array of bytes:

    >
    > Actually, no, it's to xor all the bits together and store them in a single
    > boolean.


    I'd use 'or' rather than 'xor'. The or operator is more likely to yield
    a '1' at the end of it, and a 1 is narrower than a 0, obviously making
    it more efficient to store.

    --
    Ben Sizer
     
    Ben Sizer, Nov 4, 2005
    #15
  16. On 4 Nov 2005, at 10:26, Ben Sizer wrote:

    > Tom Anderson wrote:
    >
    >> On Wed, 2 Nov 2005, Dan Bishop wrote:
    >>
    >>
    >>> Tor Erik Sønvisen wrote:
    >>>
    >>>
    >>>> I need a time and space efficient way of storing up to 6 million
    >>>> bits.
    >>>>
    >>>
    >>> The most space-efficient way of storing bits is to use the bitwise
    >>> operators on an array of bytes:
    >>>

    >>
    >> Actually, no, it's to xor all the bits together and store them in
    >> a single
    >> boolean.
    >>

    >
    > I'd use 'or' rather than 'xor'. The or operator is more likely to
    > yield
    > a '1' at the end of it, and a 1 is narrower than a 0, obviously making
    > it more efficient to store.


    <xahlee>
    Typical gas guzzling, SUV driving american logic. A would obviously
    use more POWER and hence INCREASE GLOBAL WARMING leading to the
    ultimate DEATH of EVERYBODY you know and LOVE!
    </xahlee>
     
    Alex Stapleton, Nov 4, 2005
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brent Minder
    Replies:
    3
    Views:
    414
    Brent
    Dec 28, 2003
  2. Peter
    Replies:
    1
    Views:
    379
    Steve C. Orr [MVP, MCSD]
    Nov 9, 2004
  3. Linus Nikander
    Replies:
    5
    Views:
    550
  4. Razvan
    Replies:
    11
    Views:
    546
    Dale King
    Oct 12, 2004
  5. davout
    Replies:
    0
    Views:
    350
    davout
    Oct 28, 2004
Loading...

Share This Page