Reading bz2 file into numpy array

Discussion in 'Python' started by Johannes Korn, Nov 22, 2010.

  1. Hi,

    is there a convenient way to read bz2 files into a numpy array?

    I tried:

    from bz2 import *
    from numpy import *
    fd = BZ2File(filename, 'rb')
    read_data = fromfile(fd, float32)

    but BZ2File doesn't seem to produce a transparent filehandle.

    Kind regards!

    Johannes
    Johannes Korn, Nov 22, 2010
    #1
    1. Advertising

  2. Johannes Korn

    Peter Otten Guest

    Johannes Korn wrote:

    > I tried:
    >
    > from bz2 import *
    > from numpy import *
    > fd = BZ2File(filename, 'rb')
    > read_data = fromfile(fd, float32)
    >
    > but BZ2File doesn't seem to produce a transparent filehandle.


    > is there a convenient way to read bz2 files into a numpy array?


    Try

    import numpy
    import bz2

    filename = ...

    f = bz2.BZ2File(filename)
    data = numpy.fromstring(f.read(), numpy.float32)

    print data
    Peter Otten, Nov 22, 2010
    #2
    1. Advertising

  3. Johannes Korn

    Nobody Guest

    On Mon, 22 Nov 2010 11:37:22 +0100, Peter Otten wrote:

    >> is there a convenient way to read bz2 files into a numpy array?

    >
    > Try


    > f = bz2.BZ2File(filename)
    > data = numpy.fromstring(f.read(), numpy.float32)


    That's going to hurt if the file is large.

    You might be better off either extracting to a temporary file, or creating
    a pipe with numpy.fromfile() reading the pipe and either a thread or
    subprocess decompressing the data into the pipe.

    E.g.:

    import os
    import threading

    class Pipe(threading.Thread):
    def __init__(self, f, blocksize = 65536):
    super(Pipe, self).__init__()
    self.f = f
    self.blocksize = blocksize
    rd, wr = os.pipe()
    self.rd = rd
    self.wr = wr
    self.daemon = True
    self.start()

    def run(self):
    while True:
    s = self.f.read(self.blocksize)
    if not s:
    break
    os.write(self.wr, s)
    os.close(self.wr)

    def make_real(f):
    return os.fdopen(Pipe(f).rd, 'rb')

    Given the number of situations where you need a "real" (OS-level) file
    handle or descriptor rather than a Python "file-like object",
    something like this should really be part of the standard library.
    Nobody, Nov 23, 2010
    #3
  4. Johannes Korn

    Peter Otten Guest

    Nobody wrote:

    > On Mon, 22 Nov 2010 11:37:22 +0100, Peter Otten wrote:
    >
    >>> is there a convenient way to read bz2 files into a numpy array?

    >>
    >> Try

    >
    >> f = bz2.BZ2File(filename)
    >> data = numpy.fromstring(f.read(), numpy.float32)

    >
    > That's going to hurt if the file is large.


    Yes, but memory usage will peak at about 2*sizeof(data), and most scripts
    need more data than just a single numpy.array.
    In short: the OP is unlikely to run into the problem.

    > You might be better off either extracting to a temporary file, or creating
    > a pipe with numpy.fromfile() reading the pipe and either a thread or
    > subprocess decompressing the data into the pipe.


    I like to keep it simple, so if available RAM turns out to be the limiting
    factor I think extracting the data into a temporary file is a good backup
    plan.

    Peter
    Peter Otten, Nov 23, 2010
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. TG
    Replies:
    3
    Views:
    456
    Robert Kern
    Jul 26, 2006
  2. Clodoaldo Pinto Neto

    Problem reading with bz2.BZ2File(). Bug?

    Clodoaldo Pinto Neto, Nov 15, 2006, in forum: Python
    Replies:
    4
    Views:
    378
    Clodoaldo Pinto Neto
    Nov 15, 2006
  3. Replies:
    2
    Views:
    477
    Robert Kern
    Nov 13, 2007
  4. Norman Rieß

    Reading a large bz2 textfile exits early

    Norman Rieß, Feb 20, 2010, in forum: Python
    Replies:
    6
    Views:
    239
    Stefan Behnel
    Feb 22, 2010
  5. Magdoll
    Replies:
    3
    Views:
    361
    Steven D'Aprano
    Jun 5, 2010
Loading...

Share This Page