Code Review

A

ad

Hello all,

Please review the code pasted below. I am wondering what other ways
there are of performing the same tasks. This was typed using version
3.2. The script is designed to clean up a directory (FTP, Logs, etc.)
Basically you pass two arguments. The first argument is an number of
days old to delete. The second argument is the directory where the
files and folders should be deleted. I imagine one enhancement would
be to create a function out of some of this.

### BEGIN ###

import os

import time

import shutil

import argparse



CurrentTime = time.time()

epocDay = 86400 # seconds





parser = argparse.ArgumentParser(description = "Delete files and
folders in a directory N days old", add_help=False,
prog='directorycleaner', usage='%(prog)s 7 c:\\temp')

parser.add_argument('days', type=int, help="Numeric value: delete
files and folders older then N days")

parser.add_argument('directory', help="delete files and folders in
this directory")

parser.print_help()

args = parser.parse_args()



dictKeys = (vars(args))



HowManyDays = dictKeys['days']

WhatDirectory = dictKeys['directory']

print (HowManyDays)

print (WhatDirectory)



DaysToDelete = HowManyDays * epocDay





dirExists = os.path.exists(WhatDirectory)



if dirExists == False: print ("The directory is missing")



DirListing = os.listdir(WhatDirectory)



for files in DirListing:

# Get the absolute path of the file name

abspath = (os.path.join(WhatDirectory, files))

# Get the current creation time of the file in epoc format
(midnight 1/1/1970)

FileCreationTime = (os.path.getctime(abspath))

# time.ctime converts epoch to a normal date

#print (time.ctime(CurrentTime))

# Get the date from seven days ago

WeekOldFileDate = CurrentTime - DaysToDelete

#print (CurrentTime)

#print (FileCreationTime)

#print (WeekOldFileDate)



#If the file is older than seve days doe something

if FileCreationTime < WeekOldFileDate:

#check if the object is a file

if os.path.isfile(abspath): os.remove(abspath)

# It is not a file it is a directory

elif os.path.isdir(abspath): shutil.rmtree(abspath)



##### END ####
 
P

Peter Otten

ad said:
Please review the code pasted below. I am wondering what other ways
there are of performing the same tasks. This was typed using version
3.2. The script is designed to clean up a directory (FTP, Logs, etc.)
Basically you pass two arguments. The first argument is an number of
days old to delete. The second argument is the directory where the
files and folders should be deleted. I imagine one enhancement would
be to create a function out of some of this.
CurrentTime = time.time()

Read PEP 8 on naming conventions etc., see
http://python.org/dev/peps/pep-0008/
epocDay = 86400 # seconds





parser = argparse.ArgumentParser(description = "Delete files and

What's the purpose of those many empty lines?
folders in a directory N days old", add_help=False,
prog='directorycleaner', usage='%(prog)s 7 c:\\temp')

parser.add_argument('days', type=int, help="Numeric value: delete
files and folders older then N days")

parser.add_argument('directory', help="delete files and folders in
this directory")

parser.print_help()

What's the idea behind add_help=False and the explicit print_help()?
dictKeys = (vars(args))
HowManyDays = dictKeys['days']

This can be simplified to

HowManyDays = args.days
if dirExists == False: print ("The directory is missing")

if x == False: ...

is normally written as

if not x: ...
DirListing = os.listdir(WhatDirectory)
for files in DirListing:

You might write this as

for filename in os.listdir(WhatDirectory): ...

or even

for filename in os.listdir(args.directory): ...

Personally I would put this stuff in a separate function like

def remove_old_files_and_folders(parent_directory, age_in_seconds):
...
# time.ctime converts epoch to a normal date

#print (time.ctime(CurrentTime))

# Get the date from seven days ago

WeekOldFileDate = CurrentTime - DaysToDelete

#print (CurrentTime)

#print (FileCreationTime)

#print (WeekOldFileDate)

Don't let out-commented code eclipse the actual code; remove it. If you want
to preserve it for eternity, have a look at version control systems, e. g.
Mercurial.
 
C

Chris Torek

Please review the code pasted below. I am wondering what other ways
there are of performing the same tasks. ... I imagine one enhancement
would be to create a function out of some of this.

Indeed -- "recommended styles" include defining a main()
function and calling main at the bottom, e.g., with:

if __name__ == '__main__':
main()

or:

if __name__ == '__main__':
main(sys.argv)

Something has also double-spaced your code; I will undo that below.
import os
import time
import shutil
import argparse

So far so good.

Let me start by putting the next parts into main(). I will use
the no-argument format since we can simply let parser.parse_args()
fetch sys.argv the same way you did (we could def main(argv) and
pass argv; this is sometimes useful for testing purposes, but here
it makes little difference one way or another).

def main():
CurrentTime = time.time()
epocDay = 86400 # seconds

(You could write 24 * 60 * 60, but again this is sort of a six-of-one
half-a-dozen-of-the-other situation.) What is not clear is why you
are setting these now, rather than later, closer to where they are
going to be used.

Many style guides, including Guido's and pylint's, dislike
UppercaseVariableName and camelCase style names within functions.
(UppercaseNames are reserved to classes.) I am leaving these alone
for now though.
parser = argparse.ArgumentParser(description = "Delete files and
folders in a directory N days old", add_help=False,
prog='directorycleaner', usage='%(prog)s 7 c:\\temp')

Line wrap has done bad things to this. Still, there is something
odd about it other than the line wrap:

- sometimes you write:
param_name = value
but sometimes you use:
param_name=value
and in one case you even use both:
usage= [string]

- you switch back and forth between single and double quoted
strings, without much reason.

Consistency is not *required* but it is generally a good idea.
(This is also minor, like deciding between 86400 or 24*60*60.
Eventually, though, lots of minor things add up.)

(I have to admit here that I tend to switch back and forth on
quotes too. :) )

Another "sixes" case occurs here with the backslashes: to get
a literal backslash in a string, you can use "raw strings". Here
it makes no difference but I will go ahead and do it just for
illustration:

parser = argparse.ArgumentParser(
description='Delete files and folders in a directory N days old',
add_help=False,
prog='directorycleaner',
usage=r'%(prog)s 7 c:\temp')

Finally, I am not sure why you do not want to allow a -h / --help
option (but if you take out the add_help=False, it's probably best
to take out the call to parser.print_help() too), and there is no
need to supply a usage= argument at all -- argparse() will build
one for you.
parser.add_argument('days', type=int, help="Numeric value: delete
files and folders older then N days")
parser.add_argument('directory', help="delete files and folders in
this directory")

(Again, line wrap has broken these; the fixes are obvious so I skip
over them here.)
parser.print_help()
args = parser.parse_args()

(So far so good, although again, you probably want to remove the call
to print_help() if you allow argparse to add a -h / --help option,
at least.)
dictKeys = (vars(args))

There is no *need* to do this, although it is documented as allowed.
I prefer to just use args. said:
HowManyDays = dictKeys['days']
WhatDirectory = dictKeys['directory']

so this would become:

HowManyDays = args.days
WhatDirectory = args.directory
print (HowManyDays)
print (WhatDirectory)

These are presumably debug statements and should be removed, but
until then, it might be good to prefix the output with what is
being printed (i.e., a debug message). (I have taken them out
of my copy, for output shown below.)

(In a fancier program, you could use the logging module and
logging.debug().)
DaysToDelete = HowManyDays * epocDay

Right before this would be a good place to create epocDay.
dirExists = os.path.exists(WhatDirectory)
if dirExists == False: print ("The directory is missing")

An explicit "if expr == False" is generally a bad idea -- if an
expression can be "considered boolean" (and the return value of
os.path.exists certainly can), just write "if not expr".

Most style guides suggest putting subsequent statements on new
lines, rather than right after the ":".

Checking that the directory exists seems reasonable enough. However,
after printing that it does not, you continue on with code that is
going to immediately raise an OSError exception:
DirListing = os.listdir(WhatDirectory)

In general, it is better to try to do the operation, and catch the
failure and do something about it at that point, than to test to
see if the operation is going to succeed. (Among other things,
this avoids a "race" between program 1 that says "does some directory
exist" and program 2 that says "delete the directory". If program
1 "wins" this race, the directory does exist at the point of the
test, then program 2 deletes it, then program 1 goes on to access
the now-deleted directory ... and crashes.)

I am using a Unix-like system so what I get may not be quite the
same as what you would get on a Windows-like system, but:

% cd /tmp
% python foo.py 1 /tmp/nosuchdir
The directory is missing
Traceback (most recent call last):
...
OSError: [Errno 2] No such file or directory: '/tmp/nosuchdir'
%

More significantly, consider this:

% python foo.py 1 /tmp/foo.py
Traceback (most recent call last):
...
OSError: [Errno 20] Not a directory: '/tmp/foo.py'
%

So instead of the three previous lines, consider:

try:
DirListing = os.listdir(WhatDirectory)
except OSError as err:
sys.exit("can't read %s: %s" % (WhatDirectory, err))

(you will need to "import sys", and I am using an older version of
Python and the "except OSError, err" syntax, but the effect is the
same). Now the second example results in:

% python foo.py 1 /tmp/foo.py
can't read /tmp/foo.py: [Errno 20] Not a directory: '/tmp/foo.py'
%
for files in DirListing:
# Get the absolute path of the file name
abspath = (os.path.join(WhatDirectory, files))

This is not necessarily an absolute path -- for instance, if the
program is run as:

python foo.py 7 rel\ative\path

the joined file names (on a Windows-like system) will be things
like "rel\ative\path\file.txt" and so on. I would suggest shortening
the variable name to just "path".

The outermost set of parentheses are unnecessary, too.
# Get the current creation time of the file in epoc format
# (midnight 1/1/1970)
FileCreationTime = (os.path.getctime(abspath))

Beware: on Unix-like systems, this gets a "time of last change"
rather than a "time of create". Even if you are sure you will use
Windows you may wish to use the file's mtime (time of last
"modification"; on Unix-like systems, the distinction between a
"modification" and a "change" is that "change" covers alterations
to the file's meta-data as well as the data, e.g., "chmod a-w file",
making it read-only, changes its ctime but not its mtime, while
"echo foo >> file" changes both its ctime and its mtime -- provided
it is not read-only, of course :) ).

Again, the outermost parentheses here are unnecessary.
# time.ctime converts epoch to a normal date
#print (time.ctime(CurrentTime))
# Get the date from seven days ago
WeekOldFileDate = CurrentTime - DaysToDelete
#print (CurrentTime)
#print (FileCreationTime)
#print (WeekOldFileDate)

#If the file is older than seve days doe something

Apparently, the program has evolved somewhat: originally you had
"seven days" hardcoded, now you have a variable number. The
comments, however, have not changed -- and the final variable name
is no longer appropriate.

It is probably also time to ditch the commented-out debug print
statements (and fix the comments, including the typo on the last
one above).
if FileCreationTime < WeekOldFileDate:
#check if the object is a file
if os.path.isfile(abspath): os.remove(abspath)
# It is not a file it is a directory
elif os.path.isdir(abspath): shutil.rmtree(abspath)

Again, the comment and code do not quite agree: the comment says
"if it is not a file it *is* a directory" but the code says "if it
is not a file, check to see if it is a directory", which leaves
open the possibility that it is some other kind of entity (this is
definitely possible on a Unix-like system, where it could be a
socket, symbolic link, or device node, for instance).

In this particular program, written the way it is, there is no
actual benefit (yet) to doing this, but I suggest moving the guts
of the "clean out a directory" process into a function. What
this allows is the ability to list more than one directory,
provided of course you also change the argument parser a bit.

Having put this all together (and neutered the actual file-or-directory
removing code) gives me the code below. There are still plenty of
things you could do with it -- for instance, exiting partway through
processing a list of directories if one in the middle does not exist
is perhaps not optimal:

% python foo.py 7 /tmp/dir1 /tmp/oops /tmp/dir2

(where /tmp/dir1 and /tmp/dir2 do exist, but /tmp/oops does not)
will clean out /tmp/dir1 but then exit without ever processing
/tmp/dir2. (There are lots of ways to handle this; you would
have to choose one and implement it.) Or, instead of the kind of
processing done here, you could generalize it into a Unix-like
"find" command. (Perhaps with a less-ugly syntax. :) ) The
"find" command can do what this script does:

find DIRLIST -ctime +N ( -type d -o -type f ) -exec rm -rf {} \;

but can also a great deal more since (a) it has many other options
than just -ctime, and (b) -exec will execute any arbitrary command.

---------------------------

import os
import time
import shutil
import argparse
import sys

def main():
"""
main program: parse arguments, and clean out directories.
"""
parser = argparse.ArgumentParser(
description="Delete files and folders in a directory N days old",
prog="directorycleaner")
parser.add_argument("days", type=int,
help="Numeric value: delete files and folders older than N days")
parser.add_argument("directory", nargs="+",
help="delete files and folders in this directory")

args = parser.parse_args()

for dirname in args.directory:
clean_dir(dirname, args.days)

def clean_dir(dirname, n_days):
"""
Clean one directory of files / subdirectories older than
the given number of days.
"""
time_to_live = n_days * 86400 # 86400 = seconds-per-day
current_time = time.time()

try:
contents = os.listdir(dirname)
except OSError, err:
sys.exit("can't read %s: %s" % (dirname, err))
for filename in contents:
# Get the path of the file name
path = os.path.join(dirname, filename)
# Get the creation time of the file
# NOTE: this only works on Windows-like systems
when_created = os.path.getctime(path)
# If the file/directory has expired, remove it
if when_created + time_to_live < current_time:
if os.path.isfile(path):
print "os.remove(%s)" % path
# It is not a file it is a directory
elif os.path.isdir(path):
print "shutil.rmtree(%s)" % path

if __name__ == "__main__":
main()
 
U

Ulrich Eckhardt

ad said:
Please review the code pasted below. I am wondering what other ways
there are of performing the same tasks.

On a unix system, you would call "find" with according arguments and then
handle the found files with "-exec rm ..." or something like that, but I see
you are on MS Windows.

args = parser.parse_args()

dictKeys = (vars(args))

The first of these looks okay, while I don't get the additional brackets in
the second one. Another habit I observe here is the Hungarian notation of
prefixing the type to the name and using camelCaps. PEP 8 (IIRC) has
something to say on the preferred naming. I'm not 100% against encoding the
type in the variable name in Python, since it lacks static type checking, I
would have chosen "key_dict" here though, or, due to the small size of the
overall program just "keys".
print (HowManyDays)

This puzzled me at first, again the useless additional brackets I thought.
However, in Python 3, "print" is a function, so that is correct. Still, it
should be "print(foo)" not "print (foo)".
for files in DirListing:

# Get the absolute path of the file name
abspath = (os.path.join(WhatDirectory, files))

"files" is just the name of a single file, right? In that case the name is a
bit confusing.
# Get the date from seven days ago
WeekOldFileDate = CurrentTime - DaysToDelete

You are repeating this calculation for every file in the loop.
if FileCreationTime < WeekOldFileDate:
#check if the object is a file
if os.path.isfile(abspath): os.remove(abspath)
# It is not a file it is a directory
elif os.path.isdir(abspath): shutil.rmtree(abspath)

I'm not sure, but I believe you could use shutil.rmtree() for both files and
directories. In any case, be prepared for the file still being open or
otherwise read-only, i.e. for having to handle errors.

Also, what if a directory is old but the content is new? Would this cause
the non-old content to be deleted?


Cheers!

Uli
 
A

ad

On a unix system, you would call "find" with according arguments and then
handle the found files with "-exec rm ..." or something like that, but I see
you are on MS Windows.



The first of these looks okay, while I don't get the additional brackets in
the second one. Another habit I observe here is the Hungarian notation of
prefixing the type to the name and using camelCaps. PEP 8 (IIRC) has
something to say on the preferred naming. I'm not 100% against encoding the
type in the variable name in Python, since it lacks static type checking,I
would have chosen "key_dict" here though, or, due to the small size of the
overall program just "keys".


This puzzled me at first, again the useless additional brackets I thought..
However, in Python 3, "print" is a function, so that is correct. Still, it
should be "print(foo)" not "print (foo)".



"files" is just the name of a single file, right? In that case the name is a
bit confusing.


You are repeating this calculation for every file in the loop.


I'm not sure, but I believe you could use shutil.rmtree() for both files and
directories. In any case, be prepared for the file still being open or
otherwise read-only, i.e. for having to handle errors.

Also, what if a directory is old but the content is new? Would this cause
the non-old content to be deleted?

Cheers!

Uli

Thank you guys very much for the excellent points. I will use this
information as a reference as I write more code and fix up the
existing script.

Chris, thank you for putting so much time into your post!

Until we type again...........
 
I

Iain King

Thank you guys very much for the excellent points. I will use this
information as a reference as I write more code and fix up the
existing script.

Chris, thank you for putting so much time into your post!

Until we type again...........


Wrote something to do the same basic thing a little while ago. Less
verbose than yours, and it only does files, not folders. If I was
going to do folders though, I'd do them by recursing into them and
pruning files, and not go by the folder modify date, which I don't
think changes the way you think it changes (for example, if a file
inside a folder got updated such that it shouln't be deleted it still
will be with your code if the folder modify date is old [this is on
Windows])

import os, glob, time, sys

if len(sys.argv) < 3:
print "USAGE: %s <PATTERN> <DAYS>" % sys.argv[0]
sys.exit(1)

pattern = sys.argv[1]
days = int(sys.argv[2])
threshold = days * 24 * 60 * 60
t = time.time()

for f in glob.glob(pattern):
if t - os.stat(f)[9] > threshold:
print f
os.remove(f)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,038
Messages
2,570,374
Members
47,019
Latest member
don ghjh

Latest Threads

Top