[perl-python] find & replace strings for all files in a dir

Discussion in 'Python' started by Xah Lee, Jan 31, 2005.

  1. Xah Lee

    Xah Lee Guest

    suppose you want to do find & replace of string of all files in a
    directory.
    here's the code:

    ©# -*- coding: utf-8 -*-
    ©# Python
    ©
    ©import os,sys
    ©
    ©mydir= '/Users/t/web'
    ©
    ©findStr='<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 FINAL//EN">'
    ©repStr='<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
    Transitional//EN">'
    ©
    ©def replaceStringInFile(findStr,repStr,filePath):
    © "replaces all findStr by repStr in file filePath"
    © tempName=filePath+'~~~'
    © input = open(filePath)
    © output = open(tempName,'w')
    ©
    © for s in input:
    © output.write(s.replace(findStr,repStr))
    © output.close()
    © input.close()
    © os.rename(tempName,filePath)
    © print filePath
    ©
    ©def myfun(dummy, dirr, filess):
    © for child in filess:
    © if '.html' == os.path.splitext(child)[1] and
    ©os.path.isfile(dirr+'/'+child):
    © replaceStringInFile(findStr,repStr,dirr+'/'+child)
    ©os.path.walk(mydir, myfun, 3)


    note that files will be overwritten.
    be sure to backup the folder before you run it.

    try to edit the code to suite your needs.

    previous tips can be found at:
    http://xahlee.org/perl-python/python.html

    ---------------------------------------
    the following is a Perl version i wrote few years ago.
    Note: if regex is turned on, correctness is not guranteed.
    it is very difficult if not impossible in Perl to move regex pattern
    around and preserve their meanings.


    #!/usr/local/bin/perl

    =pod

    Description:
    This script does find and replace on a given foler recursively.

    Features:
    * multiple Find and Replace string pairs can be given.
    * The find/replace strings can be set to regex or literal.
    * Files can be filtered according to file name suffix matching or other

    criterions.
    * Backup copies of original files will be made at a user specified
    folder that preserves all folder structures of original folder.
    * A report will be generated that indicates which files has been
    changed, how many changes, and total number of files changed.
    * files will retain their own/group/permissions settings.

    usage:
    1. edit the parts under the section '#-- arguments --'.
    2. edit the subroutine fileFilterQ to set which file will be checked or

    skipped.

    to do:
    * in the report, print the strings that are changed, possibly with
    surrounding lines.
    * allow just find without replace.
    * add the GNU syntax for unix command prompt.
    * Report if backup directory exists already, or provide toggle to
    overwrite, or some other smarties.

    Date created: 2000/02
    Author: Xah

    =cut

    #-- modules --

    use strict;
    use File::Find;
    use File::path;
    use File::Copy;
    use Data::Dumper;

    #-- arguments --

    # the folder to be search on.
    my $folderPath = q[/Users/t/web/UnixResource_dir];

    # this is the backup folder path.
    my $backupFolderPath = q[/Users/t/xxxb];

    my %findReplaceH = (
    q[<pre><a href="freebooks.html">back to Unix
    Pestilence</a><pre>]=>q[<pre>? Back to <a href="freebooks.html">Unix
    Pestilence</a></pre>],
    );

    # $useRegexQ has values 1 or 0. If 1, inteprets the pairs in
    %findReplaceH
    # to be regex.
    my $useRegexQ = 0;

    # in bytes. larger files will be skipped
    my $fileSizeLimit = 500 * 1000;


    #-- globals --

    $folderPath =~ s[/$][]; # e.g. '/home/joe/public_html'
    $backupFolderPath =~ s[/$][]; # e.g. '/tmp/joe_back';

    $folderPath =~ m[/(\w+)$];
    my $previousDir = $`; # e.g. '/home/joe'
    my $lastDir = $1; # e.g. 'public_html'
    my $backupRoot = $backupFolderPath . '/' . $1; # e.g.
    '/tmp/joe_back/public_html'

    my $refLargeFiles = [];
    my $totalFileChangedCount = 0;

    #-- subroutines --

    # fileFilterQ($fullFilePath) return true if file is desired.
    sub fileFilterQ ($) {
    my $fileName = $_[0];

    if ((-s $fileName) > $fileSizeLimit) {
    push (@$refLargeFiles, $fileName);
    return 0;
    };
    if ($fileName =~ m{\.html$}) {
    print "processing: $fileName\n";
    return 1;};

    ## if (-d $fileName) {return 0;}; # directory
    ## if (not (-T $fileName)) {return 0;}; # not text file

    return 0;
    };

    # go through each file, accumulate a hash.
    sub processFile {
    my $currentFile = $File::Find::name; # full path spect
    my $currentDir = $File::Find::dir;
    my $currentFileName = $_;

    if (not fileFilterQ($currentFile)) {
    return 1;
    }

    # open file. Read in the whole file.
    if (not(open FILE, "<$currentFile")) {die("Error opening file:
    $!");};
    my $wholeFileString;
    {local $/ = undef; $wholeFileString = <FILE>;};
    if (not(close(FILE))) {die("Error closing file: $!");};

    # do the replacement.
    my $replaceCount = 0;

    foreach my $key1 (keys %findReplaceH) {
    my $pattern = ($useRegexQ ? $key1 : quotemeta($key1));
    $replaceCount = $replaceCount + ($wholeFileString =~
    s/$pattern/$findReplaceH{$key1}/g);
    };

    if ($replaceCount > 0) { # replacement has happened
    $totalFileChangedCount++;
    # do backup
    # make a directory in the backup path, make a backup
    copy.
    my $pathAdd = $currentDir; $pathAdd =~
    s[$folderPath][];
    mkpath("$backupRoot/$pathAdd", 0, 0777);
    copy($currentFile,
    "$backupRoot/$pathAdd/$currentFileName") or
    die "error: file copying file failed on
    $currentFile\n$!";

    # write to the original
    # get the file mode.
    my ($mode, $uid, $gid) = (stat($currentFile))[2,4,5];

    # write out a new file.
    if (not(open OUTFILE, ">$currentFile")) {die("Error
    opening file: $!");};
    print OUTFILE $wholeFileString;
    if (not(close(OUTFILE))) {die("Error closing file:
    $!");};

    # set the file mode.
    chmod($mode, $currentFile);
    chown($uid, $gid, $currentFile);

    print "-----^$*%$@#-------------------------------\n";
    print "$replaceCount replacements made at\n";
    print "$currentFile\n";
    }

    };


    #-- main body --

    find(\&processFile, $folderPath);

    print "--------------------------------------------\n\n\n";
    print "Total of $totalFileChangedCount files changed.\n";

    if (scalar @$refLargeFiles > 0) {
    print "The following large files are skipped:\n";
    print Dumper($refLargeFiles);
    }


    __END__
    Xah

    http://xahlee.org/PageTwo_dir/more.html
    Xah Lee, Jan 31, 2005
    #1
    1. Advertising

  2. Xah Lee

    YYusenet Guest

    Xah Lee wrote:
    > suppose you want to do find & replace of string of all files in a
    > directory.
    > here's the code:

    [snip]
    > Xah
    >
    > http://xahlee.org/PageTwo_dir/more.html
    >


    When are you going to take the hint (from everybody in
    comp.lang.perl.misc and comp.lang.python) to stop posting! Your posts
    do not help anybody and will only hurt a beginner. *PLEASE STOP POSTING*!

    --
    k g a b e r t (@at@) x m i s s i o n (.dot.) c o m

    * After "extensive" research, I noticed
    * that received 12
    * spam e-mail messages after just two
    * posts on usenet groups. If you want
    * to email me, use the "encrypted"
    * email address at the beggining of my
    * signature.
    YYusenet, Jan 31, 2005
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Xah Lee
    Replies:
    9
    Views:
    2,435
    alex23
    Jan 27, 2005
  2. anonym
    Replies:
    1
    Views:
    1,004
    Knute Johnson
    Jan 15, 2009
  3. Xah Lee
    Replies:
    9
    Views:
    126
    Dave Cross
    Jan 30, 2005
  4. Xah Lee
    Replies:
    0
    Views:
    330
    Xah Lee
    Jan 31, 2005
  5. Guagua
    Replies:
    10
    Views:
    186
    John W. Krahn
    Aug 16, 2005
Loading...

Share This Page