Reading bz2 file into numpy array

J

Johannes Korn

Hi,

is there a convenient way to read bz2 files into a numpy array?

I tried:

from bz2 import *
from numpy import *
fd = BZ2File(filename, 'rb')
read_data = fromfile(fd, float32)

but BZ2File doesn't seem to produce a transparent filehandle.

Kind regards!

Johannes
 
P

Peter Otten

Johannes said:
I tried:

from bz2 import *
from numpy import *
fd = BZ2File(filename, 'rb')
read_data = fromfile(fd, float32)

but BZ2File doesn't seem to produce a transparent filehandle.
is there a convenient way to read bz2 files into a numpy array?

Try

import numpy
import bz2

filename = ...

f = bz2.BZ2File(filename)
data = numpy.fromstring(f.read(), numpy.float32)

print data
 
N

Nobody

f = bz2.BZ2File(filename)
data = numpy.fromstring(f.read(), numpy.float32)

That's going to hurt if the file is large.

You might be better off either extracting to a temporary file, or creating
a pipe with numpy.fromfile() reading the pipe and either a thread or
subprocess decompressing the data into the pipe.

E.g.:

import os
import threading

class Pipe(threading.Thread):
def __init__(self, f, blocksize = 65536):
super(Pipe, self).__init__()
self.f = f
self.blocksize = blocksize
rd, wr = os.pipe()
self.rd = rd
self.wr = wr
self.daemon = True
self.start()

def run(self):
while True:
s = self.f.read(self.blocksize)
if not s:
break
os.write(self.wr, s)
os.close(self.wr)

def make_real(f):
return os.fdopen(Pipe(f).rd, 'rb')

Given the number of situations where you need a "real" (OS-level) file
handle or descriptor rather than a Python "file-like object",
something like this should really be part of the standard library.
 
P

Peter Otten

Nobody said:
That's going to hurt if the file is large.

Yes, but memory usage will peak at about 2*sizeof(data), and most scripts
need more data than just a single numpy.array.
In short: the OP is unlikely to run into the problem.
You might be better off either extracting to a temporary file, or creating
a pipe with numpy.fromfile() reading the pipe and either a thread or
subprocess decompressing the data into the pipe.

I like to keep it simple, so if available RAM turns out to be the limiting
factor I think extracting the data into a temporary file is a good backup
plan.

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top