Newbie queue question

J

Jure Erznožnik

Hi,
I'm pretty new to Python (2.6) and I've run into a problem I just
can't seem to solve.
I'm using dbfpy to access DBF tables as part of a little test project.
I've programmed two separate functions, one that reads the DBF in main
thread and the other which reads the DBF asynchronously in a separate
thread.
Here's the code:

def demo_01():
'''DBF read speed only'''

dbf1 = Dbf('D:\\python\\testdbf\\promet.dbf', readOnly=1)
for i1 in xrange(len(dbf1)):
rec = dbf1[i1]
dbf1.close()

def demo_03():
'''DBF read speed into a FIFO queue'''

class mt(threading.Thread):

q = Queue.Queue(64)
def run(self):
dbf1 = Dbf('D:\\python\\testdbf\\promet.dbf', readOnly=1)
for i1 in xrange(len(dbf1)):
self.q.put(dbf1[i1])
dbf1.close()
del dbf1
self.q.join()

t = mt()
t.start()
while t.isAlive():
try:
rec = t.q.get(False, 0.2)
t.q.task_done();
except:
pass

del t


However I'm having serious issues with the second method. It seems
that as soon as I start accessing the queue from both threads, the
reading speed effectively halves.

I have tried the following:
1. using deque instead of queue (same speed)
2. reading 10 records at a time and inserting them in a separate loop
(hoped the congestion would help)
3. Increasing queue size to infinite and waiting 10 seconds in main
thread before I started reading - this one yielded full reading speed,
but the waiting took away all the threading benefits

I'm sure I'm doing something very wrong here, I just can't figure out
what.

Can anyone help me with this?

Thanks,
Jure
 
P

Piet van Oostrum

Jure Erzno¸nik said:
JE> Hi,
JE> I'm pretty new to Python (2.6) and I've run into a problem I just
JE> can't seem to solve.
JE> I'm using dbfpy to access DBF tables as part of a little test project.
JE> I've programmed two separate functions, one that reads the DBF in main
JE> thread and the other which reads the DBF asynchronously in a separate
JE> thread.
JE> Here's the code:
JE> def demo_01():
JE> '''DBF read speed only'''
JE> dbf1 = Dbf('D:\\python\\testdbf\\promet.dbf', readOnly=1)
JE> for i1 in xrange(len(dbf1)):
JE> rec = dbf1[i1]
JE> dbf1.close()
JE> def demo_03():
JE> '''DBF read speed into a FIFO queue'''
JE> class mt(threading.Thread):
JE> q = Queue.Queue(64)
JE> def run(self):
JE> dbf1 = Dbf('D:\\python\\testdbf\\promet.dbf', readOnly=1)
JE> for i1 in xrange(len(dbf1)):
JE> self.q.put(dbf1[i1])
JE> dbf1.close()
JE> del dbf1
JE> self.q.join()
JE> t = mt()
JE> t.start()
JE> while t.isAlive():
JE> try:
JE> rec = t.q.get(False, 0.2)
JE> t.q.task_done();
JE> except:
JE> pass
JE> del t

JE> However I'm having serious issues with the second method. It seems
JE> that as soon as I start accessing the queue from both threads, the
JE> reading speed effectively halves.
JE> I have tried the following:
JE> 1. using deque instead of queue (same speed)
JE> 2. reading 10 records at a time and inserting them in a separate loop
JE> (hoped the congestion would help)
JE> 3. Increasing queue size to infinite and waiting 10 seconds in main
JE> thread before I started reading - this one yielded full reading speed,
JE> but the waiting took away all the threading benefits
JE> I'm sure I'm doing something very wrong here, I just can't figure out
JE> what.

For a start the thread switching and the queue administration just take
time, that you can avoid if you do everything sequentially.
Threading can have an advantage if there is the possiblilty of overlap.
But there is none in your example, so it's just overhead. If your
processing would do something substantially and if the reading of the
file would be I/O bound for example. You don't say how big the file is,
and it also may be in your O.S. cache so then reading it would
essentially be CPU bound.

And then there is this code:

while t.isAlive():
try:
rec = t.q.get(False, 0.2)
t.q.task_done();
except:
pass

t.q.get(False, 0.2) means do a non-blocking get, so that when there is
nothing in the queue it returns immediately and then takes the exception
path which also is substantial overhead. Whether this will happen or not
depends on the timing which depends on the scheduling of the O.S. For
example when the O.S. schedules the main task first it will be busy wait
looping quite a lot before the first item arrives in the queue. If the
file is small it will probably put then all the items in the queue and
there will be no more busy wait looping. But busy wait looping just
consumes CPU time.

By the way, the second parameter (0.2) which is supposed to be the
timeout period is just ignored if the first parameter is false. You
might be better off giving True as the first parameter to get.

I dislike any form of busy wait loop. It would be better to just use a
normal get(), but that conflicts with your end detection. while
t.isAlive() is not a particularly good way to detect that the processing
is finished I think because of timing issues. After the last
t.q.task_done() [which doesn't need a semicolon, by the way] it takes
some time before the self.q.join() will be processed and the thread
finishes. In the mean time while t.isAlive() is constantly being tested,
also wasting CPU time.

IMHO a better way is to put a sentinel object in the queue:

def run(self):
dbf1 = Dbf('D:\\python\\testdbf\\promet.dbf', readOnly=1)
for i1 in xrange(len(dbf1)):
self.q.put(dbf1[i1])
self.q.put(None)
dbf1.close()
del dbf1
self.q.join()

while True:
rec = t.q.get()
t.q.task_done()
if rec is None: break

And then you probably can also get rid of the self.q.join() and
t.q.task_done()
 
J

Jure Erzno¸nik

Thanks for the suggestions.
I've been looking at the source code of threading support objects and
I saw that non-blocking requests in queues use events, while blocking
requests just use InterlockedExchange.
So plain old put/get is much faster and I've managed to confirm this
today with further testing.

Sorry about the semicolon, just can't seem to shake it with my pascal
& C++ background :)

Currently, I've managed to get the code to this stage:

class mt(threading.Thread):

q = Queue.Queue()
def run(self):
dbf1 = Dbf('D:\\python\\testdbf\\promet.dbf', readOnly=1)
for i1 in xrange(len(dbf1)):
self.q.put(dbf1[i1])
dbf1.close()
del dbf1
self.q.put(None)

t = mt()
t.start()
time.sleep(22)
rec = 1
while rec <> None:
rec = t.q.get()

del t

Note the time.sleep(22). It takes about 22 seconds to read the DBF
with the 200K records (71MB). It's entirely in cache, yes.

So, If I put this sleep in there, the whole procedure finishes in 22
seconds with 100% CPU (core) usage. Almost as fast as the single
threaded procedure. There is very little overhead.
When I remove the sleep, the procedure finishes in 30 seconds with
~80% CPU (core) usage.
So the threading overhead only happens when I actually cause thread
interaction.

This never happened to me before. Usually (C, Pascal) there was some
threading overhead, but I could always measure it in tenths of a
percent. In this case it's 50% and I'm pretty sure InterlockedExchange
is the fastest thing there can be.

My example currently really is a dummy one. It doesn't do much, only
the reading thread is implemented, but that will change with time.
Reading the data source is one task, I will proceed with calculations
and with a rendering engine, both of which will be pretty CPU
intensive as well.

I'd like to at least make the reading part behave like I want it to
before I proceed. It's clear to me I don't understand Python's
threading concepts yet.

I'd still appreciate further advice on what to do to make this sample
work with less overhead.
 
J

Jure Erznožnik

I've done some further testing on the subject:

I also added some calculations in the main loop to see what effect
they would have on speed. Of course, I also added the same
calculations to the single threaded functions.
They were simple summary functions, like average, sum, etc. Almost no
interaction with the buffers was added, just retrieval of a single
field's value.

Single threaded, the calculations added another 4.3 seconds to the
processing time (~18%)
MultiThreaded, they added 1.8 seconds. CPU usage remained below 100%
of one core at all times. Made me check the process affinity.

I know the main thread uses way less CPU than DBF reading thread (4
secs vs 22 secs).
So I think adding these calculations should have but a minimal impact
on threaded execution time.

Instead, the execution time increases!!!
I'm beginning to think that Python's memory management / functions
introduce quite a significant overhead for threading.

I think I'll just write this program in one of the compilers today to
verify just how stupid I've become.
 
J

Jure Erznožnik

Digging further, I found this:
http://www.oreillynet.com/onlamp/blog/2005/10/does_python_have_a_concurrency.html

Looking up on this info, I found this:
http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock

If this is correct, no amount of threading would ever help in Python
since only one core / CPU could *by design* ever be utilized. Except
for the code that accesses *no* functions / memory at all.

This does seem to be a bit harsh though.
I'm now writing a simple test program to verify this. Multiple data-
independed threads just so I can see if more than one core can at all
be utilized.

:(
 
T

Tim Harig

If this is correct, no amount of threading would ever help in Python
since only one core / CPU could *by design* ever be utilized. Except
for the code that accesses *no* functions / memory at all.

Don't multithread...multiprocess. By running multiple python instances,
the operating system handles the processor scheduling for each so that you
can use all available CPUs/cores. It also tends to make debugging easier.
It does create more overhead -- significantly more on some OSs.
 
P

Piet van Oostrum

Jure Erzno¸nik said:
JE> If this is correct, no amount of threading would ever help in Python
JE> since only one core / CPU could *by design* ever be utilized. Except
JE> for the code that accesses *no* functions / memory at all.

It is not the design of the Python language, but of the Python
implementation. And yes, it will not benefit from more than one core.

You should watch/read this:
http://blip.tv/file/2232410
http://www.dabeaz.com/python/GIL.pdf
 
P

Piet van Oostrum

Jure Erzno¸nik said:
JE> If this is correct, no amount of threading would ever help in Python
JE> since only one core / CPU could *by design* ever be utilized. Except
JE> for the code that accesses *no* functions / memory at all.

It is not the design of the Python language, but of the *CPython*
implementation. And yes, it will not benefit from more than one core.

You should watch/read this:
http://blip.tv/file/2232410
http://www.dabeaz.com/python/GIL.pdf
 
J

Jure Erznožnik

1. say me dbf files count?
2. why dbf ?

It was just a test. It was the most compatible format I could get
between Python and the business application I work with without using
SQL servers and such.
Otherwise it's of no consequence. The final application will have a
separate input engine that will support multiple databases as input.

Jure
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,077
Latest member
SangMoor21

Latest Threads

Top