problems with threaded socket app

G

Gordon Messmer

I've been working on a threaded daemon application to filter email. The
source for the program is here:

http://phantom.dragonsdawn.net/~gordon/courier-patches/courier-pythonfilter/

The daemon loads individual filters as modules and hands the names of the
message and control files to each module in turn for processing. One of
the modules (filters/dialback.py) checks the address of the sender,
connects to the MX servers for the senders domain, and validates that the
sender address is valid. In order to implement a timeout on the dialback,
each message is processed by two threads. The first thread creates an
SMTP object and then starts a second thread to do the lookup using that
SMTP object. If the lookup takes too long, the first thread closes the
SMTP object's socket and collects the failure from the second thread.

During testing, that all works fine. However, in real world use, the
program eventually deadlocks. When it does so, there are several dialback
threads in process, and the first of each pair seems to be reading from
the status pipe. I cannot connect a debugger to the second of the pair to
see what state it's in.

I'm running this application on python2-2.2.2-11.7.3 under Red Hat Linux
7.3.

Does anyone have any suggestions for where I can start looking for the
problem?
 
A

Anthony McDonald

Gordon Messmer said:
I've been working on a threaded daemon application to filter email. The
source for the program is here:

http://phantom.dragonsdawn.net/~gordon/courier-patches/courier-pythonfilter/

The daemon loads individual filters as modules and hands the names of the
message and control files to each module in turn for processing. One of
the modules (filters/dialback.py) checks the address of the sender,
connects to the MX servers for the senders domain, and validates that the
sender address is valid. In order to implement a timeout on the dialback,
each message is processed by two threads. The first thread creates an
SMTP object and then starts a second thread to do the lookup using that
SMTP object. If the lookup takes too long, the first thread closes the
SMTP object's socket and collects the failure from the second thread.

During testing, that all works fine. However, in real world use, the
program eventually deadlocks. When it does so, there are several dialback
threads in process, and the first of each pair seems to be reading from
the status pipe. I cannot connect a debugger to the second of the pair to
see what state it's in.

I'm running this application on python2-2.2.2-11.7.3 under Red Hat Linux
7.3.

Does anyone have any suggestions for where I can start looking for the
problem?

if rpipe not in ready_pipes[0]:
# Time to cancel this SMTP conversation
smtpi.close()
# The dialback thread will now write a failure message to
# its status pipe, and we'll need to clear that out.
os.read( rpipe, 1024 )
continue

The code creates a "race" condition. To work correctly it requires the
worker thread to raise and handle an exception, and to write that result
onto the pipe BEFORE your main thread attempts to read the pipe.

If the worker thread loses the race, the next MX result you process will
recieve the last MX's results 400 error code, and your left with 1 thread at
the end of the sequence which can't terminate as it stays active until what
its written to the pipe is read from the pipe.

Simple enough to fix, just add a select call between closing the SMTP
connection and reading the expected 400 error response.

Anthony McDonald
 
G

Gordon Messmer

if rpipe not in ready_pipes[0]:
# Time to cancel this SMTP conversation
smtpi.close()
# The dialback thread will now write a failure message to
# its status pipe, and we'll need to clear that out.
os.read( rpipe, 1024 )
continue

The code creates a "race" condition. To work correctly it requires the
worker thread to raise and handle an exception, and to write that result
onto the pipe BEFORE your main thread attempts to read the pipe.

If the worker thread loses the race, the next MX result you process will
recieve the last MX's results 400 error code, and your left with 1 thread at
the end of the sequence which can't terminate as it stays active until what
its written to the pipe is read from the pipe.

Simple enough to fix, just add a select call between closing the SMTP
connection and reading the expected 400 error response.


I can do that, but can you explain why, if the monitor thread loses the
race, it wouldn't block until the worker thread writes its status to the
pipe?

The code below should have the same race condition, but it works properly.

[root@deerhunter root]# python2
Python 2.2.2 (#1, Jan 30 2003, 21:26:22)
[GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-112)] on linux2
Type "help", "copyright", "credits" or "license" for more information..... time.sleep(20)
.... os.write( wpipe, 'my status' )
....
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,479
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top