Progress when parsing a large file with SAX

M

marc.omorain

Hi there,

I have a 28mb XML file which I parse with SAX. I have some processing
to do in the startElement / endElement callbacks, which slows the
parsing down to about 60 seconds on my machine.

My application is unresponsive for this time, so I would like to show
a progress bar. I could show a spinner to show that the application is
responsive, but I would prefer to show a percentage. Is there any way
to query the parser to see how many bytes of the input file have been
processed so far?

Thanks,

Marc
 
D

Diez B. Roggisch

Hi there,

I have a 28mb XML file which I parse with SAX. I have some processing
to do in the startElement / endElement callbacks, which slows the
parsing down to about 60 seconds on my machine.

My application is unresponsive for this time, so I would like to show
a progress bar. I could show a spinner to show that the application is
responsive, but I would prefer to show a percentage. Is there any way
to query the parser to see how many bytes of the input file have been
processed so far?

I'd create a file-like object that does this for you. It should wrap the
original file, and count the number of bytes delivered. Something along
these lines (untested!!!):

class PercentageFile(object):

def __init__(self, filename):
self.size = os.stat(filename)[6]
self.delivered = 0
self.f = file(filename)

def read(self, size=None):
if size is None:
self.delivered = self.size
return self.f.read()
data = self.f.read(size)
self.delivered += len(data)
return data

@property
def percentage(self):
return float(self.delivered) / self.size * 100.0

Diez
 
A

Anastasios Hatzis

Diez B. Roggisch wrote:

....

I got the same problem with large XML as Marc.

So you deserve also my thanks for the example. :)
class PercentageFile(object):

def __init__(self, filename):
self.size = os.stat(filename)[6]
self.delivered = 0
self.f = file(filename)

def read(self, size=None):
if size is None:
self.delivered = self.size
return self.f.read()
data = self.f.read(size)
self.delivered += len(data)
return data

I guess some client impl need to call read() on a wrapped xml file until
all portions of the file are read.
@property
def percentage(self):
return float(self.delivered) / self.size * 100.0

@property?

What is that supposed to do?

Anastasios
 
D

Diez B. Roggisch

Anastasios said:
Diez B. Roggisch wrote:

...

I got the same problem with large XML as Marc.

So you deserve also my thanks for the example. :)
class PercentageFile(object):

def __init__(self, filename):
self.size = os.stat(filename)[6]
self.delivered = 0
self.f = file(filename)

def read(self, size=None):
if size is None:
self.delivered = self.size
return self.f.read()
data = self.f.read(size)
self.delivered += len(data)
return data

I guess some client impl need to call read() on a wrapped xml file until
all portions of the file are read.

You should fed the PercentageFile-object to the xml-parser, like this:

parser = xml.sax.make_parser()
pf = PercentageFile(filename)
parser.parse(pf)

@property?

What is that supposed to do?

It's making percentage a property, so that you can access it like this:

pf.percentage

instead of

pf.percentage()

Google python property for details, or pydoc property.

Diez
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top