Problem: 'Threads' in Python?

R

Ralph Sluiters

Hi,
i've got a small problem with my python-script. It is a cgi-script, which is
called regulary (e.g. every 5 minutes) and returns a xml-data-structure.
This script calls a very slow function, with a duration of 10-40 seconds. To
avoid delays, i inserted a cache for the data. So, if the script is called,
it returns the last caculated data-structure and then the function is called
again and the new data is stored in the cache. (There is no problem to use
older but faster data)

My problem is, that the client (A Java program (or browser, command line))
waits, until the whole script has ended and so the cache is worthless. How
can I tell the client/browser/... that after the last print line there is no
more data and it can proceed? Or how can I tell the python script, that
everything after the return of the data (the retieval of the new data and
the storage in a file) can be done in an other thread or in the background?

Greetings

Ralph
 
F

Francis Avila

Ralph Sluiters wrote in message ...
Hi,
i've got a small problem with my python-script. It is a cgi-script, which is
called regulary (e.g. every 5 minutes) and returns a xml-data-structure.
This script calls a very slow function, with a duration of 10-40 seconds. To
avoid delays, i inserted a cache for the data. So, if the script is called,
it returns the last caculated data-structure and then the function is called
again and the new data is stored in the cache. (There is no problem to use
older but faster data)

My problem is, that the client (A Java program (or browser, command line))
waits, until the whole script has ended and so the cache is worthless. How
can I tell the client/browser/... that after the last print line there is no
more data and it can proceed? Or how can I tell the python script, that
everything after the return of the data (the retieval of the new data and
the storage in a file) can be done in an other thread or in the background?

Wouldn't a better approach be to decouple the cache mechanism from the cgi
script? Have a long-running Python process act as a memoizing cache and
delegate requests to the slow function. The cgi scripts then connect to
this cache process (via your favorite IPC mechanism). If the cache process
has a record of the call/request, it returns the previous value immediately,
and updates its cache in the meantime. If it doesn't have a record, then it
blocks the cgi script until it gets a result.

How can threading help you if the cgi-process dies after each request unless
you store the value somewhere else? And if you store the value somewhere,
why not have another process manage that storage? If it's possible to
output a complete page before the cgi script terminates (I don't know if the
server blocks until the script terminates), then you could do the cache
updating afterwards. In this case I guess you could use a pickled
dictionary or something as your cache, and you don't need a separate
process. But even here you wouldn't necessarily use threads.

Threads are up there with regexps: powerful, but avoid as much as possible.
 
R

Ralph Sluiters

Wouldn't a better approach be to decouple the cache mechanism from the cgi
script? Have a long-running Python process act as a memoizing cache and
delegate requests to the slow function. The cgi scripts then connect to
this cache process (via your favorite IPC mechanism). If the cache process
has a record of the call/request, it returns the previous value immediately,
and updates its cache in the meantime. If it doesn't have a record, then it
blocks the cgi script until it gets a result.
The caching can not be decoupled, because the cgi-script gets an folder ID
gets only data from this "folder". So if I decouple die processes, I don't
know which folders to cache and I can not cache all folders, because the
routine is to slow. So I must get the actual folder from cgi and then cache
this one as long as the uses is in this folder and pulls data every 2
Minutes and cache another folder, if
the uses changes his folder.
How can threading help you if the cgi-process dies after each request unless
you store the value somewhere else? And if you store the value somewhere,
why not have another process manage that storage? If it's possible to
output a complete page before the cgi script terminates (I don't know if the
server blocks until the script terminates), then you could do the cache
updating afterwards. In this case I guess you could use a pickled
dictionary or something as your cache, and you don't need a separate
process. But even here you wouldn't necessarily use threads.
The data is to large to store it in the memmory and with this method, as you
said, threading wouldn't help, but I store the data in the disk.

My code:

#Read from file
try:
oldfile = open(filename,"r")
oldresult =string.joinfields(oldfile.readlines(),'\r\n')
oldfile.close()
except:
# Start routine
oldresult = get_data(ID) # Get xml data
# Print header, so that it is returned via HTTP
print string.joinfields(header, '\r\n')
print oldresult

# ***

# Start routine
result = get_data(ID) # Get xml data
#Save to file
newfile = open(filename, "w")
newfile.writelines(result)
newfile.close()
#END

At the position *** the rest of the script must be uncoupled, so that the
client can proceed with the actual data, but the new data generation for the
next time ist stored in a file.

Ralph
 
D

Dennis Lee Bieber

Ralph Sluiters fed this fish to the penguins on Tuesday 06 January 2004
02:07 am:

The caching can not be decoupled, because the cgi-script gets an
folder ID gets only data from this "folder". So if I decouple die
processes, I don't know which folders to cache and I can not cache all
folders, because the routine is to slow. So I must get the actual
folder from cgi and then cache this one as long as the uses is in this
folder and pulls data every 2 Minutes and cache another folder, if
the uses changes his folder.
I've been having some difficulty following this thread but...

Isn't this what Cookies are for? Obtaining some sort of user ID/state
that can be passed into the processing to allow for continuing from a
previous connection?

HTTP is normally stateless. The client requests a page, the page
contents are obtained (either a static page, or some CGI-style
computation generates the immediate page data), the page is returned,
and the connection ends. If the page needs to be updated, that is a
completely separate transaction.

Cookies are used to link these separate transactions into one "whole";
the first time the client requests the page, a cookie is generated. On
subsequent requests (updates) the (now) existing cookie is sent back to
the server to identify the user and allow for selecting the proper
continuation state.
At the position *** the rest of the script must be uncoupled, so that
the client can proceed with the actual data, but the new data
generation for the next time ist stored in a file.
I've not coded CGI stuff (don't have access to a server that permits
user CGI) but my rough view of this task would be:

CGI******
if no cookie
generate a cookie for this user
endif
pass (received or generated) cookie to background process
wait for return-data from background process (if a new cookie, this
will take time to compute, otherwise the background process should
already have computed it)
return web-page with cookie and data

Background********
loop
scan "cache" list for expired cookies (unused threads)
terminate related process thread (process thread should clean up disk
files used)
clean up (delete) cookie from "cache" list
get request (and cookie) from CGI
if the cookie is not in the "cache" list
create new processing thread
endif
Use cookie data to identify (existing) processing thread and read next
data batch from it (queue.queue perhaps, one queue per cookie).
Return data (processing thread continues to compute next update)
endloop


You probably want to include, in "Background" a bit of logic to track
"last request time" and terminate processing threads if no client has
asked for an update in some period of time. The Cookies should also
have expiration times associated so that reconnecting after a period of
time will force a new cookie.

As for the folder? If the user physically navigates to other folders,
that can be passed to the background process and used to update the
threads (or create a new thread, if you assume the cookie identifies a
folder).

Caching would be semi-automatic here. The processing threads could be
folder specific, and when the thread is terminated (on lack of update
requests... let's see, you expect 2-minute update period, allow for a
slow net, say you terminate a process after 5 minutes of disuse...) you
can clean up the disk space (folder) that process was using. The cookie
expiration time would be updated on each update.

The master web page should have whatever HTML tags force a timed
reload to do a new request every 2 minutes.

--
 
R

Ralph Sluiters

You did everything, but not answer my question. I know what cookies are, but
I don't need cookies here. And you said in your answer "start background
process", that was my question. How can I start a background process.

But I've solved it now,

Ralph
 
R

Ralph Sluiters

Simply put the last part in an extra file 'cachedata.py', then use

import os
os.spawnlp(os.P_NOWAIT, 'python', 'python', 'cachedata.py')

to call this as child process and DON'T wait for this process.

Ralph
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top