bz2 & cpu usage

B

Brad Tilley

I'd like to keep at least 50% of the cpu free while doing bz2 file
compression. Currently, bz2 compression takes between 80 & 100 percent
of the cpu and the Windows GUI becomes almost useless. How can I lower
the strain on the cpu and still do compression? I'm willing for the
compression process to take longer.

Thanks,

Brad

def compress_file(filename):
path = r"C:\repository_backup"
print path
for root, dirs, files in os.walk(path):
for f in files:
if f == filename:
print "Compressing", f
x = file(os.path.join(root, f), 'rb')
os.chdir(path)
y = bz2.BZ2File(f + ".bz2", 'w')
while True:
data = x.read(1024000)
time.sleep(0.1)
if not data:
break
y.write(data)
time.sleep(0.1)
y.close()
x.close()
else:
return
 
S

Stefan Behnel

Brad said:
I'd like to keep at least 50% of the cpu free while doing bz2 file
compression. Currently, bz2 compression takes between 80 & 100 percent
of the cpu and the Windows GUI becomes almost useless. How can I lower
the strain on the cpu and still do compression? I'm willing for the
compression process to take longer.

Three approaches:

1) Use a thread for the I/O part (i.e. the de/compression) and another one
for the GUI.

2) Collect the files first, then handle one after the other and return to
the GUI in between.

3) Use a non-blocking scheme for the whole thing (may be the trickiest).

Stefan
 
B

Brad Tilley

Stefan said:
Three approaches:

1) Use a thread for the I/O part (i.e. the de/compression) and another
one for the GUI.

2) Collect the files first, then handle one after the other and return
to the GUI in between.

3) Use a non-blocking scheme for the whole thing (may be the trickiest).

Stefan

The part I wrote about the "Windows GUI" is not clear. I meant the
Windows XP Desktop. My script has no GUI. Sorry for the mix-up.

Brad
 
B

Benjamin Niemann

Under Linux/Unix you would set the processes 'nice' value to tell the system to
give your process a lower priority. Under windows do something similar using the
TaskManager -> Set Priority... There should some windows API call to do this
from your program, probably involving the win32 extension.
 
J

Jeremy Bowers

The part I wrote about the "Windows GUI" is not clear. I meant the Windows
XP Desktop. My script has no GUI. Sorry for the mix-up.

Before going too deeply into programmatic solutions, I'd try your program
on a completely different machine (different manufacturer or something so
the hardware is different). Excepting maybe priority setting, which is a
good idea anyhow.

While this was par for the course for Win9x, XP really shouldn't be
behaving like that in my experience. A little jumpy, maybe, but not
"unusable". Either:

* You're running it on a machine well below the stated XP requirements
(I've jammed XP onto a Pentium 233 w/ 96MB ram, but I don't recommend that
for most people; it was basically a server for me doing one specific
task.), or
* You're taking *way* too much data at a time, filling up your RAM, and
causing excessive swapping (though you ought to notice this), or
* You've got crappy hardware and your hardware can't handle the interrupt
load from the IO and the graphics at the same time; this is typically
caused by cheap motherboards, often from Via.

If your problem is hardware, you've pretty much already lost and your only
hope is different hardware.

Otherwise, as a hack, you can decompress as a stream and use the sleep
commands to voluntarily sleep for some period of time after some amount of
time or data has gone by, but this will still cause some jumpiness.
 
K

Kirk Job-Sluder

Sorry for the late post, the original scrolled off the server.
> I'd like to keep at least 50% of the cpu free while doing bz2 file
> compression. Currently, bz2 compression takes between 80 & 100 percent
> of the cpu and the Windows GUI becomes almost useless. How can I lower
> the strain on the cpu and still do compression? I'm willing for the
> compression process to take longer.
>
> Thanks,
>
> Brad
>
> def compress_file(filename):
> path = r"C:\repository_backup"
> print path
> for root, dirs, files in os.walk(path):
> for f in files:
> if f == filename:
> print "Compressing", f
> x = file(os.path.join(root, f), 'rb')
> os.chdir(path)
> y = bz2.BZ2File(f + ".bz2", 'w')
> while True:
> data = x.read(1024000)
> time.sleep(0.1)
> if not data:
> break
> y.write(data)
> time.sleep(0.1)
> y.close()
> x.close()
> else:
> return

One of the issues you may be running into is memory. Under windows,
using up 90% of the CPU shouldn't affect GUI performance (much) but
swapping does. According to the bzip2 man page, the maximum block size
is 900KB so you might be running into problems reading your file 1024KB
at a time. Use the system monitor control panel to check for excessive
swapping. Bzip2 uses 8x<blocksize> memory. So with the default setting
of a 900KB block size, you are looking at 7.2M + some bookeeping memory.

Another issue is that you might be better off downloading bzip2 for
windows and letting the gnu bzip2 implementation handle file input and
output. Using a shell command here might be more efficient in spite of
spawning a new process.

A third issue is that bzip2 achieves high compression efficiency at the
expense of CPU time and memory. It might be worth considering whether
gzip might occupy the sweet spot compromise between minimal archive size
and minimal cpu usage.

Fourth, how many of those files are uncompressible? I've noticed that
bzip2 tries really hard to eek out some form of savings from
uncompressible files. A filename filter for files that should not be
compressed (png, jpg, gif, sx*) might be worth doing here.
 
D

David Rushby

Benjamin Niemann said:
Under windows do something similar using the TaskManager -> Set Priority...
There should some windows API call to do this from your program,
probably involving the win32 extension.

Indeed.
----------------------------------------
import win32api, win32process

win32process.SetPriorityClass(win32api.GetCurrentProcess(),
win32process.BELOW_NORMAL_PRIORITY_CLASS)

.... Invoke the compression portion of the program ...

win32process.SetPriorityClass(win32api.GetCurrentProcess(),
win32process.NORMAL_PRIORITY_CLASS)
 
B

Brad Tilley

Kirk said:
Sorry for the late post, the original scrolled off the server.


One of the issues you may be running into is memory. Under windows,
using up 90% of the CPU shouldn't affect GUI performance (much) but
swapping does. According to the bzip2 man page, the maximum block size
is 900KB so you might be running into problems reading your file 1024KB
at a time. Use the system monitor control panel to check for excessive
swapping. Bzip2 uses 8x<blocksize> memory. So with the default setting
of a 900KB block size, you are looking at 7.2M + some bookeeping memory.

Another issue is that you might be better off downloading bzip2 for
windows and letting the gnu bzip2 implementation handle file input and
output. Using a shell command here might be more efficient in spite of
spawning a new process.

A third issue is that bzip2 achieves high compression efficiency at the
expense of CPU time and memory. It might be worth considering whether
gzip might occupy the sweet spot compromise between minimal archive size
and minimal cpu usage.

Fourth, how many of those files are uncompressible? I've noticed that
bzip2 tries really hard to eek out some form of savings from
uncompressible files. A filename filter for files that should not be
compressed (png, jpg, gif, sx*) might be worth doing here.

Thanks for the tips. I installed 512MB of ECC Ram and the problem went away.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top