using functions and file renaming problem

Discussion in 'Python' started by hokiegal99, Jul 18, 2003.

  1. hokiegal99

    hokiegal99 Guest

    A few questions about the following code. How would I "wrap" this in a
    function, and do I need to?

    Also, how can I make the code smart enough to realize that when a file
    has 2 or more bad charcters in it, that the code needs to run until all
    bad characters are gone? For example, if a file has the name
    "<bad*mac\file" the program has to run 3 times to get all three bad
    chars out of the file name.

    The passes look like this:

    1. <bad*mac\file becomes -bad*mac\file
    2. -bad*mac\file becomes -bad-mac\file
    3. -bad-mac\file becomes -bad-mac-file

    I think the problem is that once the program finds a bad char in a
    filename it replaces it with a dash, which creates a new filename that
    wasn't present when the program first ran, thus it has to run again to
    see the new filename.

    import os, re, string
    bad = re.compile(r'%2f|%25|[*?<>/\|\\]') #search for these.
    print " "
    setpath = raw_input("Path to the dir that you would like to clean: ")
    print " "
    print "--- Remove Bad Charaters From Filenames ---"
    print " "
    for root, dirs, files in os.walk(setpath):
    for file in files:
    badchars = bad.findall(file)
    newfile = ''
    for badchar in badchars:
    newfile = file.replace(badchar,'-') #replace bad chars.
    if newfile:
    newpath = os.path.join(root,newfile)
    oldpath = os.path.join(root,file)
    os.rename(oldpath,newpath)
    print oldpath
    print newpath
    print " "
    print "--- Done ---"
    print " "
     
    hokiegal99, Jul 18, 2003
    #1
    1. Advertising

  2. hokiegal99

    Andy Jewell Guest

    On Friday 18 Jul 2003 2:33 am, hokiegal99 wrote:
    > A few questions about the following code. How would I "wrap" this in a
    > function, and do I need to?
    >
    > Also, how can I make the code smart enough to realize that when a file
    > has 2 or more bad charcters in it, that the code needs to run until all
    > bad characters are gone?


    It almost is ;-)

    > For example, if a file has the name
    > "<bad*mac\file" the program has to run 3 times to get all three bad
    > chars out of the file name.
    >
    > The passes look like this:
    >
    > 1. <bad*mac\file becomes -bad*mac\file
    > 2. -bad*mac\file becomes -bad-mac\file
    > 3. -bad-mac\file becomes -bad-mac-file
    >
    > I think the problem is that once the program finds a bad char in a
    > filename it replaces it with a dash, which creates a new filename that
    > wasn't present when the program first ran, thus it has to run again to
    > see the new filename.


    No, the problem is that you're throwing away all but the last correction.
    Read my comments below:

    import os, re, string
    bad = re.compile(r'%2f|%25|[*?<>/\|\\]') #search for these.
    print " "
    setpath = raw_input("Path to the dir that you would like to clean: ")
    print " "
    print "--- Remove Bad Charaters From Filenames ---"
    print " "
    for root, dirs, files in os.walk(setpath):
    for file in files:
    badchars = bad.findall(file) # find any bad characters
    newfile = file
    for badchar in badchars: # loop through each character in badchars
    # note that if badchars is empty, this loop is not entered
    # show whats happening
    print "replacing",badchar,"in",newfile,":",
    # replace all occurrences of this badchar with '-' and remember
    # it for next iteration of loop:
    newfile = newfile.replace(badchar,'-') #replace bad chars.
    print newfile
    if badchars: # were there any bad characters in the name?
    newpath = os.path.join(root,newfile)
    oldpath = os.path.join(root,file)
    os.rename(oldpath,newpath)
    print oldpath
    print newpath
    print " "
    print "--- Done ---"
    print " "


    wrt wrapping it up in a function, here's a starter for you... "fill in the
    blanks":


    -------8<-----------
    def cleanup(setpath):
    # compile regex for finding bad characters

    # walk directory tree...

    # find any bad characters

    # loop through each character in badchars

    # replace all occurrences of this badchar with '-' and remember

    # were there any bad characters in the name?

    -------8<-----------

    To call this you could do (on linux - mac paths are different):

    -------8<-----------
    cleanup("/home/andy/Python")
    -------8<-----------

    hope that helps
    -andyj
     
    Andy Jewell, Jul 18, 2003
    #2
    1. Advertising

  3. hokiegal99

    hokiegal99 Guest

    (Bengt Richter) wrote in message
    > This looks like an old post that ignores some responses you got to your original post like this.
    > Did some mail get lost? Or was this an accidental repost of something old? I still see
    > indentation misalignments, probably due to mixing tabs and spaces (bad news in python ;-)
    >
    > Regards,
    > Bengt Richter


    Sorry Bengt, I overlooked some responses to an earlier, similar
    question. I have too many computers in too many places, so forgive me.
    Thanks for taking the time to tell me again!!! I'll clean up the
    indentation... I promise.
     
    hokiegal99, Jul 18, 2003
    #3
  4. hokiegal99

    Andy Jewell Guest

    On Friday 18 Jul 2003 11:16 pm, hokiegal99 wrote:
    > Thanks again for the help Andy! One last question: What is the advantage
    > of placing code in a function? I don't see how having this bit of code
    > in a function improves it any. Could someone explain this?
    >
    > Thanks!

    8<--- (old quotes)

    The 'benefit' of functions is only really reaped when you have a specific need
    for them! You don't *have* to use them if you don't *need* to (but they can
    still improve the readability of your code).

    Consider the following contrived example:

    ----------8<------------
    # somewhere in the dark recesses of a large project...
    . . .
    for filename in os.listdir(cfg.userdir):
    newname = filename
    for ch in cfg.badchars:
    newname.replace(ch,"-")
    if newname != filename:
    os.rename(os.path.join(cfg.userdir,filename),
    os.path.join(cfg.userdir,newname)
    . . .
    . . .
    # in another dark corner...

    . . .
    for filename in os.listdir(cfg.tempdir):
    newname = filename
    for ch in cfg.badchars:
    newname.replace(ch,"-")
    if newname != filename:
    os.rename(os.path.join(cfg.userdir,filename),
    os.path.join(cfg.userdir,newname)
    . . .
    # somewhere else...

    . . .
    for filename in os.listdir(cfg.extradir):
    newname = filename
    for ch in cfg.badchars:
    newname.replace(ch,"-")
    if newname != filename:
    os.rename(os.path.join(cfg.userdir,filename),
    os.path.join(cfg.userdir,newname)
    . . .
    ----------8<------------

    See the repetition? ;-)

    Imagine a situation where you need to do something far more complicated over,
    and over again... It's not very programmer efficient, and it makes the code
    longer, too - thus costing more to write (time) and more to store (disks).

    Imagine having to change the behaviour of this 'hard-coded' routine, and what
    would happen if you missed one... however, if it is in a function, you only
    have *one* place to change it.

    When we generalise the algorithm and put it into a function we can do:

    ----------8<------------

    . . .
    . . .

    # somewhere near the top of the project code...
    def cleanup_filenames(dir):

    """ renames any files within dir that contain bad characters
    (ie. ones in cfg.badchars). Does not walk the directory tree.
    """

    for filename in os.listdir(dir):
    newname = filename
    for ch in cfg.badchars:
    newname.replace(ch,"-")
    if newname != filename:
    os.rename(os.path.join(cfg.userdir,filename),
    os.path.join(cfg.userdir,newname)

    . . .
    . . .

    # somewhere in the dark recesses of a large project...
    . . .
    cleanup_filenames(cfg.userdir)
    . . .
    . . .
    # in another dark corner...
    . . .
    cleanup_filenames(cfg.tempdir)
    . . .
    # somewhere else...
    . . .
    cleanup_filenames(cfg.extradir)
    . . .

    ----------8<------------

    Even in this small, contrived example, we've saved about 13 lines of code (ok,
    that's notwithstanding the blank lines and the """ docstring """ at the top
    of the function).

    There's another twist, too. In the docstring for cleanup_filenames it says
    "Does not walk the directory tree." because we didn't code it to deal with
    subdirectories. But we could, without using os.walk...

    Directories form a tree structure, and the easiest way to process trees is by
    using /recursion/, which means functions that call themselves. An old
    programmer's joke is this:

    Recursion, defn. [if not understood] see Recursion.

    Each time you call a function, it gets a brand new environment, called the
    'local scope'. All variables inside this scope are private; they may have
    the same names, but they refer to different objects. This can be really
    handy...

    ----------8<------------

    def cleanup_filenames(dir):

    """ renames any files within dir that contain bad characters
    (ie. ones in cfg.badchars). Walks the directory tree to process
    subdirectories.
    """

    for filename in os.listdir(dir):
    newname = filename
    for ch in cfg.badchars:
    newname.replace(ch,"-")
    if newname != filename:
    os.rename(os.path.join(cfg.userdir,filename),
    os.path.join(cfg.userdir,newname)
    # recurse if subdirectory...
    if os.path.isdir(os.path.join(cfg.userdir,newname)):
    cleanup_filenames(os.path.join(cfg.userdir,newname))

    ----------8<------------

    This version *DOES* deal with subdirectories... with only two extra lines,
    too! Trying to write this without recursion would be a nightmare (even in
    Python).

    A very important thing to note, however, is that there is a HARD LIMIT on the
    number of times a function can call itself, called the RecursionLimit:

    ----------8<------------
    >>>n=1
    >>>def rec():

    n=n+1
    rec()

    >>>rec()

    . . .
    (huge traceback list)
    . . .
    RuntimeError: maximum recursion limit reached.
    >>>n

    991
    ----------8<------------

    Another very important thing about recursion is that a recursive function
    should *ALWAYS* have a 'get-out-clause', a condition that stops the
    recursion. Guess what happens if you don't have one ... ;-)

    Finally (at least for now), functions also provide a way to break down your
    code into logical sections. Many programmers will write the higher level
    functions first, delegating 'complicated bits' to further sub-functions as
    they go, and worry about implementing them once they've got the overall
    algorithm finished. This allows one to concentrate on the right level of
    detail, rather than getting bogged down in the finer points: you just make up
    names for functions that you're *going* to implement later. Sometimes, you
    might make a 'stub' like:

    def doofer(dooby, doo):
    pass

    so that your program is /syntactically/ correct, and will run (to a certain
    degree). This allows debugging to proceed before you have written
    everything. You'd do this for functions which aren't *essential* to the
    program, but maybe add 'special features', for example, additonal
    error-checking or output formatting.

    A sort of extension of the function idea is 'modules', which make functions
    and other objects available to other 'client' programs. When you say:

    import os

    you are effectively adding all the functions and objects of the os module into
    your own program, without having to re-write them. This enables programmers
    to share their functions and other code as convenient 'black boxes'. Modules,
    however, are a slightly more advanced topic.


    Hope that helps.

    -andyj
     
    Andy Jewell, Jul 19, 2003
    #4
  5. hokiegal99

    hokiegal99 Guest

    My scripts aren't long and complex, so I don't really *need* to use
    functions. But the idea of using them is appealing to me because it
    seems the right thing to do from a design point of view. I can see how
    larger, more complex programs would get out of hand if the programmer
    did not use functions so they'd be absolutely necessary there. But if
    they allow larger programs to have a better overall design that's more
    compact and readable (like your examples showed) then one could argue
    that they would do the same for smaller, simplier programs too.

    Thanks for the indepth explanation. It was very helpful. I'm going to
    try using functions within my fix_files.py script.


    Andy Jewell wrote:
    > On Friday 18 Jul 2003 11:16 pm, hokiegal99 wrote:
    >
    >>Thanks again for the help Andy! One last question: What is the advantage
    >>of placing code in a function? I don't see how having this bit of code
    >>in a function improves it any. Could someone explain this?
    >>
    >>Thanks!

    >
    > 8<--- (old quotes)
    >
    > The 'benefit' of functions is only really reaped when you have a specific need
    > for them! You don't *have* to use them if you don't *need* to (but they can
    > still improve the readability of your code).
    >
    > Consider the following contrived example:
    >
    > ----------8<------------
    > # somewhere in the dark recesses of a large project...
    > . . .
    > for filename in os.listdir(cfg.userdir):
    > newname = filename
    > for ch in cfg.badchars:
    > newname.replace(ch,"-")
    > if newname != filename:
    > os.rename(os.path.join(cfg.userdir,filename),
    > os.path.join(cfg.userdir,newname)
    > . . .
    > . . .
    > # in another dark corner...
    >
    > . . .
    > for filename in os.listdir(cfg.tempdir):
    > newname = filename
    > for ch in cfg.badchars:
    > newname.replace(ch,"-")
    > if newname != filename:
    > os.rename(os.path.join(cfg.userdir,filename),
    > os.path.join(cfg.userdir,newname)
    > . . .
    > # somewhere else...
    >
    > . . .
    > for filename in os.listdir(cfg.extradir):
    > newname = filename
    > for ch in cfg.badchars:
    > newname.replace(ch,"-")
    > if newname != filename:
    > os.rename(os.path.join(cfg.userdir,filename),
    > os.path.join(cfg.userdir,newname)
    > . . .
    > ----------8<------------
    >
    > See the repetition? ;-)
    >
    > Imagine a situation where you need to do something far more complicated over,
    > and over again... It's not very programmer efficient, and it makes the code
    > longer, too - thus costing more to write (time) and more to store (disks).
    >
    > Imagine having to change the behaviour of this 'hard-coded' routine, and what
    > would happen if you missed one... however, if it is in a function, you only
    > have *one* place to change it.
    >
    > When we generalise the algorithm and put it into a function we can do:
    >
    > ----------8<------------
    >
    > . . .
    > . . .
    >
    > # somewhere near the top of the project code...
    > def cleanup_filenames(dir):
    >
    > """ renames any files within dir that contain bad characters
    > (ie. ones in cfg.badchars). Does not walk the directory tree.
    > """
    >
    > for filename in os.listdir(dir):
    > newname = filename
    > for ch in cfg.badchars:
    > newname.replace(ch,"-")
    > if newname != filename:
    > os.rename(os.path.join(cfg.userdir,filename),
    > os.path.join(cfg.userdir,newname)
    >
    > . . .
    > . . .
    >
    > # somewhere in the dark recesses of a large project...
    > . . .
    > cleanup_filenames(cfg.userdir)
    > . . .
    > . . .
    > # in another dark corner...
    > . . .
    > cleanup_filenames(cfg.tempdir)
    > . . .
    > # somewhere else...
    > . . .
    > cleanup_filenames(cfg.extradir)
    > . . .
    >
    > ----------8<------------
    >
    > Even in this small, contrived example, we've saved about 13 lines of code (ok,
    > that's notwithstanding the blank lines and the """ docstring """ at the top
    > of the function).
    >
    > There's another twist, too. In the docstring for cleanup_filenames it says
    > "Does not walk the directory tree." because we didn't code it to deal with
    > subdirectories. But we could, without using os.walk...
    >
    > Directories form a tree structure, and the easiest way to process trees is by
    > using /recursion/, which means functions that call themselves. An old
    > programmer's joke is this:
    >
    > Recursion, defn. [if not understood] see Recursion.
    >
    > Each time you call a function, it gets a brand new environment, called the
    > 'local scope'. All variables inside this scope are private; they may have
    > the same names, but they refer to different objects. This can be really
    > handy...
    >
    > ----------8<------------
    >
    > def cleanup_filenames(dir):
    >
    > """ renames any files within dir that contain bad characters
    > (ie. ones in cfg.badchars). Walks the directory tree to process
    > subdirectories.
    > """
    >
    > for filename in os.listdir(dir):
    > newname = filename
    > for ch in cfg.badchars:
    > newname.replace(ch,"-")
    > if newname != filename:
    > os.rename(os.path.join(cfg.userdir,filename),
    > os.path.join(cfg.userdir,newname)
    > # recurse if subdirectory...
    > if os.path.isdir(os.path.join(cfg.userdir,newname)):
    > cleanup_filenames(os.path.join(cfg.userdir,newname))
    >
    > ----------8<------------
    >
    > This version *DOES* deal with subdirectories... with only two extra lines,
    > too! Trying to write this without recursion would be a nightmare (even in
    > Python).
    >
    > A very important thing to note, however, is that there is a HARD LIMIT on the
    > number of times a function can call itself, called the RecursionLimit:
    >
    > ----------8<------------
    >
    >>>>n=1
    >>>>def rec():
    >>>

    > n=n+1
    > rec()
    >
    >
    >>>>rec()
    >>>

    > . . .
    > (huge traceback list)
    > . . .
    > RuntimeError: maximum recursion limit reached.
    >
    >>>>n
    >>>

    > 991
    > ----------8<------------
    >
    > Another very important thing about recursion is that a recursive function
    > should *ALWAYS* have a 'get-out-clause', a condition that stops the
    > recursion. Guess what happens if you don't have one ... ;-)
    >
    > Finally (at least for now), functions also provide a way to break down your
    > code into logical sections. Many programmers will write the higher level
    > functions first, delegating 'complicated bits' to further sub-functions as
    > they go, and worry about implementing them once they've got the overall
    > algorithm finished. This allows one to concentrate on the right level of
    > detail, rather than getting bogged down in the finer points: you just make up
    > names for functions that you're *going* to implement later. Sometimes, you
    > might make a 'stub' like:
    >
    > def doofer(dooby, doo):
    > pass
    >
    > so that your program is /syntactically/ correct, and will run (to a certain
    > degree). This allows debugging to proceed before you have written
    > everything. You'd do this for functions which aren't *essential* to the
    > program, but maybe add 'special features', for example, additonal
    > error-checking or output formatting.
    >
    > A sort of extension of the function idea is 'modules', which make functions
    > and other objects available to other 'client' programs. When you say:
    >
    > import os
    >
    > you are effectively adding all the functions and objects of the os module into
    > your own program, without having to re-write them. This enables programmers
    > to share their functions and other code as convenient 'black boxes'. Modules,
    > however, are a slightly more advanced topic.
    >
    >
    > Hope that helps.
    >
    > -andyj
    >
    >
    >
     
    hokiegal99, Jul 19, 2003
    #5
  6. hokiegal99

    Peter Hansen Guest

    hokiegal99 wrote:
    >
    > My scripts aren't long and complex, so I don't really *need* to use
    > functions. But the idea of using them is appealing to me because it
    > seems the right thing to do from a design point of view. I can see how
    > larger, more complex programs would get out of hand if the programmer
    > did not use functions so they'd be absolutely necessary there. But if
    > they allow larger programs to have a better overall design that's more
    > compact and readable (like your examples showed) then one could argue
    > that they would do the same for smaller, simplier programs too.


    Uh oh! You're being poisoned with the meme of "right design".

    Don't use functions because they "seem the right thing to do", use
    them because *they reduce repetition*, or simplify code by *making
    it more readable*.

    If you aren't reducing repetition with them you are losing the
    primary benefit. If you also aren't making your code more readable,
    don't use functions.

    Whatever you do, don't use them just because somebody has convinced
    you "they're the right thing".

    -Peter
     
    Peter Hansen, Jul 19, 2003
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. master007
    Replies:
    8
    Views:
    862
    Oliver Wong
    Mar 6, 2006
  2. peter
    Replies:
    9
    Views:
    312
    Michael Spencer
    Feb 17, 2005
  3. Chris
    Replies:
    6
    Views:
    728
    Chris Uppal
    Mar 6, 2007
  4. Buddha
    Replies:
    8
    Views:
    481
    =?ISO-8859-1?Q?Arne_Vajh=F8j?=
    Aug 11, 2007
  5. mohi
    Replies:
    3
    Views:
    596
Loading...

Share This Page