Help making this script better

J

jakecjacobson

Hi,

After much Google searching and trial & error, I was able to write a
Python script that posts XML files to a REST API using HTTPS and
passing PEM cert & key file. It seems to be working but would like
some pointers on how to handle errors. I am using Python 2.4, I don't
have the capability to upgrade even though I would like to. I am very
new to Python so help will be greatly appreciated and I hope others
can use this script.

#!/usr/bin/python
#################################################################
# catalog_feeder.py
#################################################################
# This sciript will process a directory of XML files and push them to
the Enterprise Catalog.
# You configure this script by using a configuration file that
describes the required variables.
# The path to this file is either passed into the script as a command
line argument or hard coded
# in the script. The script will terminate with an error if it can't
process the XML file.
#################################################################

# IMPORT STATEMENTS
import httplib
import mimetypes
import os
import sys
import shutil
import time
from urllib import *
from time import strftime
from xml.dom import minidom

def main(c):
start_time = time.time()
# Set configuration parameters
try:
# Process the XML conf file ....
xmldoc = minidom.parse(c)
catalog_host = readConfFile(xmldoc, 'catalog_host')
catalog_port = int(readConfFile(xmldoc, 'catalog_port'))
catalog_path = readConfFile(xmldoc, 'catalog_path')
collection_name = readConfFile(xmldoc, 'collection_name')
cert_file = readConfFile(xmldoc, 'cert_file')
key_file = readConfFile(xmldoc, 'key_file')
log_file = readConfFile(xmldoc, 'log_file')
input_dir = readConfFile(xmldoc, 'input_dir')
archive_dir = readConfFile(xmldoc, 'archive_dir')
hold_dir = readConfFile(xmldoc, 'hold_dir')
except Exception, inst:
# I had an error so report it and exit script
print "Unexpected error opening %s: %s" % (c, inst)
sys.exit(1)
# Log Starting
logOut = verifyLogging(log_file)
if logOut:
log(logOut, "Processing Started ...")
# Get list of XML files to process
if os.path.exists(input_dir):
files = getFiles2Post(input_dir)
else:
if logOut:
log(logOut, "WARNING!!! Couldn't find input directory: " +
input_dir)
cleanup(logOut)
else:
print "Dir doen't exist: " + input_dir
sys.exit(1)
try:
# Process each file to the catalog
connection = httplib.HTTPSConnection(catalog_host, catalog_port,
key_file, cert_file)
for file in files:
log(logOut, "Processing " + file + " ...")
try:
response = post2Catalog(connection, catalog_path, os.path.join
(input_dir, file), collection_name)
if response.status == 200:
msg = "Succesfully posted " + file + " to cataloge ..."
print msg
log(logOut, msg)
# Move file to done directory
shutil.move(os.path.join(input_dir, file), os.path.join
(archive_dir, file))
else:
msg = "Error posting " + file + " to cataloge [" + response.read
() + "] ..."
print msg
log(logOut, response.read())
# Move file to error dir
shutil.move(os.path.join(input_dir, file), os.path.join(hold_dir,
file))
except IOError, (errno):
print "%s" % (errno)

except httplib.HTTPException, (e):
print "Unexpected error %s " % (e)

run_time = time.time() - start_time
print 'Run time: %f seconds' % run_time

# Clean up
connection.close()
cleanup(logOut)

# Get an arry of files from the input_dir
def getFiles2Post(d):
return (os.listdir(d))

# Read out the conf file and set the needed global variable
def readConfFile(xmldoc, tag):
return (xmldoc.getElementsByTagName(tag)[0].firstChild.data)

# Write out the message to log file
def log(f, m):
f.write(strftime("%Y-%m-%d %H:%M:%S") + " : " + m + '\n')

# Clean up and exit
def cleanup(logOut):
if logOut:
log(logOut, "Processing Ended ...\n")
logOut.close()
sys.exit(0)

# Verify if we can write to the log file and return a file handle if
we can
def verifyLogging(l):
fh='' # No logging default behavior
if os.path.exists(os.path.dirname(l)):
fh = open(l, 'a')
return(fh)

# Print out the usage of this script
def printusage():
print""
print
"**************************************************************************************************"
print " Must provide the path to the configuration file when calling
this script ..."
print " Example: " + sys.argv[0] + " /path_2_configuration_file/
conf.xml"
print
"**************************************************************************************************"
print ""
sys.exit(1)

#############################
# Posting XML file to the Catelog Service
#############################
def post2Catalog(connection, path, file, collection):
head = {"Content-Type" : "application/x-www-form-urlencoded",
"Accept" : "text/plain"}
parameters = urlencode({"collection" : collection, "entryxml" : open
(file,'r').read()})
connection.request('POST', path, parameters, head)
response = connection.getresponse()
# To reuse the connection, I have to read the response. I really
don't care about it so I do nothing with it.
body = response.read()
return response

# Main
if __name__ == "__main__":
_pathname = os.path.dirname(sys.argv[0])
total=len(sys.argv)
if len(sys.argv) > 1:
main(sys.argv[1])
else:
printusage()
 
G

Gabriel Genellina

En Thu, 06 Aug 2009 11:50:07 -0300, jakecjacobson
After much Google searching and trial & error, I was able to write a
Python script that posts XML files to a REST API using HTTPS and
passing PEM cert & key file. It seems to be working but would like
some pointers on how to handle errors. I am using Python 2.4, I don't
have the capability to upgrade even though I would like to. I am very
new to Python so help will be greatly appreciated and I hope others
can use this script.

Just a few remarks, mostly on style rather than functionality:
#!/usr/bin/python
#################################################################
# catalog_feeder.py
#################################################################
# This sciript will process a directory of XML files and push them to
the Enterprise Catalog.
# You configure this script by using a configuration file that
describes the required variables.
# The path to this file is either passed into the script as a command
line argument or hard coded
# in the script. The script will terminate with an error if it can't
process the XML file.
#################################################################

Note that Python has "docstrings" - the __doc__ attribute attached to
every module, class and function. The very first string in the
module/function/class becomes its docstring. The interactive help system
-and other tools like pydoc- can inspect and show such info.
The above comment could serve perfectly as this module's docstring - just
remove all the #'s and enclose the whole text in """triple quotes"""
(required as it spawns many lines).
By example, in that case you could print the text in your usage() function
like this:
print __doc__
try:
# Process the XML conf file ....
xmldoc = minidom.parse(c)
catalog_host = readConfFile(xmldoc, 'catalog_host')
catalog_port = int(readConfFile(xmldoc, 'catalog_port'))
catalog_path = readConfFile(xmldoc, 'catalog_path')
collection_name = readConfFile(xmldoc, 'collection_name')
cert_file = readConfFile(xmldoc, 'cert_file')
key_file = readConfFile(xmldoc, 'key_file')
log_file = readConfFile(xmldoc, 'log_file')
input_dir = readConfFile(xmldoc, 'input_dir')
archive_dir = readConfFile(xmldoc, 'archive_dir')
hold_dir = readConfFile(xmldoc, 'hold_dir')
except Exception, inst:
# I had an error so report it and exit script
print "Unexpected error opening %s: %s" % (c, inst)
sys.exit(1)

Ok, an unexpected error: but *which* one? doing exactly *what*? You're
hiding important information (the error type, the full traceback, the
source line that raised the error...) that's very valuable when something
goes wrong and you have to determine what happened.
In this case, you're adding a bit of information: the name of the file
being processed. That's good. But at the same time, hiding all the other
info, and that's not so good. What about this:

except Exception:
print >>sys.stderr, "Unexpected error opening %s" % c
raise

The final raise will propagate the exception; by default, Python will
print the exception type, the exception value, the full traceback
including source lines, and exit the script with a status code of 1. The
same effect that you intended, but more complete.

In other cases, where you don't have anything to add to the default
exception handling, the best thing to do is: nothing. That is, don't catch
an exception unless you have something good to do with it. (Exceptions
aren't Pokémon: you don't have to "catch 'em all!")
# Log Starting
logOut = verifyLogging(log_file)
if logOut:
log(logOut, "Processing Started ...")

I would move the decision into the log function (that is, just write
log("something") and make the log function decide whether to write to file
or not). For your next script, look at the logging module:
http://docs.python.org/library/logging.html
# Get an arry of files from the input_dir
def getFiles2Post(d):
return (os.listdir(d))

Note that return isn't a function but a statement. You don't need the
outer (). Also, using a docstring instead of a comment:

def getFiles2Post(input_dir):
"Return the list of files in input_dir to process"
return os.listdir(input_dir)
# Read out the conf file and set the needed global variable
def readConfFile(xmldoc, tag):
return (xmldoc.getElementsByTagName(tag)[0].firstChild.data)

Same as above. Which "needed global variable"?
def cleanup(logOut):
[...] sys.exit(0)

Exiting the script from everywhere makes it harder to reuse some of its
functions later. Just return the desired status code to the caller, which
in turn will return to main(), until it gets to the outermost level, which
is the only place I'd use sys.exit()
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top