combining the path and fileinput modules

W

wo_shi_big_stomach

Newbie to python writing a script to recurse a directory tree and delete
the first line of a file if it contains a given string. I get the same
error on a Mac running OS X 10.4.8 and FreeBSD 6.1.

Here's the script:

# start of program

# p.pl - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

import fileinput
import os
import re
import string
import sys
from path import path

# recurse dirs
dir = path(/home/wsbs/Maildir)
for f in dir.walkfiles('*'):
#
# test:
# print f
#
# open file, search, change if necessary, write backup
for line in fileinput.input(f, inplace=1, backup='.bak'):
# check first line only
if fileinput.isfirstline():
if not re.search('^From ',line):
print line.rstrip('\n')
# just print all other lines
if not fileinput.isfirstline():
print line.rstrip('\n')
fileinput.close()
# end of program

The script produces this error:

Traceback (most recent call last):
File "./p", line 22, in ?
for line in fileinput.input(f, inplace=1, backup='.bak'):
File "/sw/lib/python2.4/fileinput.py", line 231, in next
line = self.readline()
File "/sw/lib/python2.4/fileinput.py", line 300, in readline
os.rename(self._filename, self._backupfilename)
OSError: [Errno 21] Is a directory

If I uncomment that test routine, and comment out the fileinput stuff,
the program DOES print the full pathname/filename for the variable f.

Many thanks for clues as to why fileinput.input doesn't like f.
 
R

Rob Wolfe

wo_shi_big_stomach said:
Newbie to python writing a script to recurse a directory tree and delete
the first line of a file if it contains a given string. I get the same
error on a Mac running OS X 10.4.8 and FreeBSD 6.1.

Here's the script:

# start of program

# p.pl - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

import fileinput
import os
import re
import string
import sys
from path import path

# recurse dirs
dir = path(/home/wsbs/Maildir)
for f in dir.walkfiles('*'):
#
# test:
# print f

Are you absolutely sure that f list doesn't contain
any path to directory, not file?
Add this:

f = filter(os.path.isfile, f)

and try one more time.
 
W

wo_shi_big_stomach

Are you absolutely sure that f list doesn't contain
any path to directory, not file?
Add this:

f = filter(os.path.isfile, f)

and try one more time.

Sorry, no joy. Printing f then produces:

rppp
rppppp
rppppp
rpppr
rppppp
rpppP
rppppp
rppppp

which I assure you are not the filenames in this directory.

I've tried this with f and f.name. The former prints the full pathname
and filename; the latter prints just the filename. But neither works
with the fileinput.input() call below.

I get the same error with the filtered mod as before:

File "./p", line 23, in ?
for line in fileinput.input(f, inplace=1, backup='.bak'):

Thanks again for info on what to feed fileinput.input()
 
G

Gabriel Genellina

At said:
Sorry, no joy. Printing f then produces:

rppp
rppppp
rppppp

The filter should be applied to walkfiles. Something like this:

dir = path(/home/wsbs/Maildir)
for f in filter(os.path.isfile, dir.walkfiles('*')):
#
# test:
# print f


--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
 
W

wo_shi_big_stomach

Gabriel said:
The filter should be applied to walkfiles. Something like this:

dir = path(/home/wsbs/Maildir)
for f in filter(os.path.isfile, dir.walkfiles('*')):
#
# test:
# print f

Thanks, this way f will print the full pathname/filename. But f already
does that using Jason Orendorff's path module:

dir = path('/home/wsbs/Maildir')
for f in dir.walkfiles('*'):
print f

Printing the full path/filename isn't the problem. The problem instead
is how to supply f to fileinput.input().

Either the path or the os.path methods cause this line:

for line in fileinput.input(f, inplace=1, backup='.bak'):

to throw this error:

File "./p2.py", line 23, in ?
for line in fileinput.input(f, inplace=1, backup='.bak'):

At this point I believe the error has to do with fileinput, not the path
or os.path modules.

If I give fileinput.input() a hardcoded path/filename in place of 'f'
the program runs. However the program will not accept either f or 'f' as
an argument to fileinput.input().

Again, thanks for guidance on the care and feeding of fileinput.input()

/wsbs

import fileinput
import os
import re
import string
import sys
from path import path

# p.pl - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

# recurse dirs
dir = path('/home/wsbs/Maildir')
#for f in dir.walkfiles('*'):
for f in filter(os.path.isfile, dir.walkfiles('*')):
#
# test: this will print the full path/filename of each file
print f
#
# open file, search, change if necessary, write backup
# for line in fileinput.input('f', inplace=1, backup='.bak'):
# # just print 2nd and subsequent lines
# if not fileinput.isfirstline():
# print line.rstrip('\n')
# # check first line only
# elif fileinput.isfirstline():
# if not re.search('^From ',line):
# print line.rstrip('\n')
# fileinput.close()
 
G

Gabriel Genellina

At said:
Thanks, this way f will print the full pathname/filename. But f already
does that using Jason Orendorff's path module:

dir = path('/home/wsbs/Maildir')
for f in dir.walkfiles('*'):
print f

The filter is used to exclude directories. fileinput can't handle directories.
At this point I believe the error has to do with fileinput, not the path
or os.path modules.

If I give fileinput.input() a hardcoded path/filename in place of 'f'
the program runs. However the program will not accept either f or 'f' as
an argument to fileinput.input().

Tried with (f,) ?
Notice that *this* error is not the same as your previous error.


--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
 
W

wo_shi_big_stomach

Gabriel said:
The filter is used to exclude directories. fileinput can't handle
directories.

???

Both routines above produce identical output -- full path/filenames.
Neither prints just a directory name.
Tried with (f,) ?
Notice that *this* error is not the same as your previous error.

File "p2.py", line 23, in ?
for line in fileinput.input(f,):
File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/fileinput.py",
line 231, in next
line = self.readline()
File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/fileinput.py",
line 320, in readline
self._file = open(self._filename, "r")

This looks similar to before -- fileinput.input() still isn't operating
on the input.

Again, I'm looking 1) walk through all files in a directory tree and 2)
using fileinput, evaluate and possibly edit the files.

The current version of the program is below.

thanks!

/wsbs

# start of program
import fileinput
import os
import re
import string
import sys
from path import path

# p2.py - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

# recurse dirs
dir = path('/home/wsbs/Maildir')
#for f in dir.walkfiles('*'):
for f in filter(os.path.isfile, dir.walkfiles('*')):
#
# test: this will print the full path/filename of each file
# print f
#
# open file, search, change if necessary, write backup
for line in fileinput.input(f,):
# just print 2nd and subsequent lines
if not fileinput.isfirstline():
print line.rstrip('\n')
# check first line only
elif fileinput.isfirstline():
if not re.search('^From ',line):
print line.rstrip('\n')
fileinput.close()

# end of program
 
D

Dennis Lee Bieber

File "p2.py", line 23, in ?
for line in fileinput.input(f,):
File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/fileinput.py",
line 231, in next
line = self.readline()
File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/fileinput.py",
line 320, in readline
self._file = open(self._filename, "r")

This looks similar to before -- fileinput.input() still isn't operating
on the input.
And where is the actual exception message line -- the one with the
error code/description.

dir = path('/home/wsbs/Maildir')
#for f in dir.walkfiles('*'):
for f in filter(os.path.isfile, dir.walkfiles('*')):

If I understand the documentation of fileinput, you shouldn't even
need this output loop; fileinput is designed to expect a list of files
(that it works with a single file seems an afterthought)
for line in fileinput.input(f,):
for line in fileinput.input(filter(os.path.isfile,
dir.walkfiles("*")),
inplace=1):

should handle all the files... I don't have this third party path
module, so the directory tree walking isn't active, but...
.... os.listdir(os.getcwd()))): #no inplace, just a
test
.... if fileinput.isfirstline():
.... print "\n%s" % fileinput.filename()
.... print line
.... else:
.... print ".",

I do seem to be getting some sort of error at the end, but the other
files are seen:


01_Title.jpg
ÿØÿà
.. . .
405themovie_320x224.mpg

73P.gif
GIF89a1r
÷
.. . .
ana_cm_d_04.mov

.. . . . . . . . . . . . . .
ana_cm_d_15.mov

..
APRS101.pdf
%PDF-1.2

.. . . . . . . . . . . . . . . . . . . . . . . . . .
Bestiaria 20060328 1018.sql.zip
PK

Blue Ball Machine.htm
<html><head>

.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
<snip>
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . . . .
booklist
SQLite format 3
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
booklist.html
<html>

.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . .
booklist.py
#

.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . .
BookList.sql
..echo ON

.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
<snip>
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . . . . . . . . .
booklist.zip
PK

.. .
cp.py
import sys

.. . . . . . .
crash-me.php.htm
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
<snip>
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . . . . . . . . . . . . . . .
C__PROGRAM FILES_THE MASTER GENEALOGIST_RRW_DC_L1P0505.pdf
%PDF-1.4

..
desktop.ini
[DeleteOnCopy]

.. . .
eclipse_code.xml
<?xml version="1.0" encoding="UTF-8"?>

.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
<snip>
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . . . . . . . . . . . . . . .
FGS.pdf
%PDF-1.4

.. .
Glucose.xls
ÐÏࡱ

GunV1.wmv
0&²uŽfϦÙ
lasersaber.wmv
0&²uŽfϦÙ
MySQLdb.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
<snip>
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .
Once upon a time.doc
ÐÏࡱ

oth_tsr_rm_750.ram
http://www.dreamworks.com/trailers/oth/oth_tsr_rm_750.rm

pub_ax25.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
<snip>
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . . . . . . . .
t.html
<html>

.. . . . . . . . . . . . . . . . . . .
t.rx
var. = "Rotten"

.. . . . .
Thumbs.db
ÐÏࡱ

TMG Sources.htm
<html>

.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
<snip>
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . . . .
usage-guide.doc
ÐÏࡱ

_Scout_Ship Database.lnk
L File "<interactive input>", line 5, in ?
File
"E:\Python24\Lib\site-packages\pythonwin\pywin\framework\winout.py",
line 172, in write
return self.template.write(msg)
File
"E:\Python24\Lib\site-packages\pythonwin\pywin\framework\winout.py",
line 490, in write
self.HandleOutput(message)
File
"E:\Python24\Lib\site-packages\pythonwin\pywin\framework\winout.py",
line 482, in HandleOutput
win32api.OutputDebugString(message)
TypeError: OutputDebugString() argument 1 must be string without null
bytes, not str--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
D

Dennis Lee Bieber

If I understand the documentation of fileinput, you shouldn't even
need this output loop; fileinput is designed to expect a list of files OUTER LOOP
(that it works with a single file seems an afterthought)

for line in fileinput.input(filter(os.path.isfile,
dir.walkfiles("*")),
inplace=1):

should handle all the files... I don't have this third party path
module, so the directory tree walking isn't active, but...
Since I was also using "curdir", I didn't have the worry about
joining the file names with full path, but...

-=-=-=-=-=-=-=-
import os
import os.path
import fileinput

def processPath(aPath):
print "Processing: %s" % aPath
files = filter(os.path.isfile, [os.path.join(aPath, fid)
for fid in os.listdir(aPath)])
if files: #needed in case directory has no files
#as fileinput would try to read from stdin then...
for line in fileinput.input(files, inplace = 0):
if fileinput.isfirstline():
print "\n\n%s" % fileinput.filename()
print line
else:
pass #save the periods!
fileinput.close()
for bPath in filter(os.path.isdir, [os.path.join(aPath, path)
for path in os.listdir(aPath)]):
processPath(bPath)


if __name__ == "__main__":
processPath("e:/userdata/Dennis Lee Bieber/My Documents")

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
W

wo_shi_big_stomach

Dennis said:
And where is the actual exception message line -- the one with the
error code/description.



If I understand the documentation of fileinput, you shouldn't even
need this output loop; fileinput is designed to expect a list of files
(that it works with a single file seems an afterthought)

Yes, thanks. This is the key point.

Feeding fileinput.input() a list rather than a single file (or whatever
it's called in Python) got my program working. Thanks!
for line in fileinput.input(filter(os.path.isfile,
dir.walkfiles("*")),
inplace=1):

should handle all the files...

Indeed it does -- too many times.

Sorry, but this (and the program you provided) iterate over the entire
list N times, where N is the number of files, rather than doing one
iteration on each file.

For instance, using your program with inplace editing and a ".bak" file
extension for the originals, I ended up with filenames like
name.bak.bak.bak.bak.bak in a directory with five files in it.

I don't have this third party path
module, so the directory tree walking isn't active, but...

The path module:

http://www.jorendorff.com/articles/python/path/

is a *lot* cleaner than os.path; see the examples at that URL.

Thanks for the great tip about fileinput.input(), and thanks to all who
answered my query. I've pasted the working code below.

/wsbs

import fileinput
import os
import re
import string
import sys
from path import path

# p2.py - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

# recurse dirs
dir = path('/home/wsbs/Maildir')
g = dir.walkfiles('*')
for line in fileinput.input(g, inplace=1, backup='.bak'):
# just print 2nd and subsequent lines
if not fileinput.isfirstline():
print line.rstrip('\n')
# check first line only
elif fileinput.isfirstline():
if not re.search('^From ',line):
print line.rstrip('\n')
fileinput.close()
 
D

Dennis Lee Bieber

Indeed it does -- too many times.

Sorry, but this (and the program you provided) iterate over the entire
list N times, where N is the number of files, rather than doing one
iteration on each file.
Probably a case of the listdir() function working in iteration mode
and discovering the .bak file used by fileinput. Doing the listdir()
first, into a variable would avoid that...
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
J

John Machin

wo_shi_big_stomach said:
Thanks for the great tip about fileinput.input(), and thanks to all who
answered my query. I've pasted the working code below.
[snip]
# check first line only
elif fileinput.isfirstline():
if not re.search('^From ',line):

This "works", and in this case you are doing it on only the first line
in each file, but for future reference:

1. Read the re docs section about when to use search and when to use
match; the "^" anchor in your pattern means that search and match give
the same result here.

However the time they take to do it can differ quite a bit :-0

C:\junk>\python25\python -mtimeit -s"import re;text='x'*100"
"re.match('^From ',
text)"
100000 loops, best of 3: 4.39 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*1000"
"re.match('^From '
,text)"
100000 loops, best of 3: 4.41 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*10000"
"re.match('^From
',text)"
100000 loops, best of 3: 4.4 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*100"
"re.search('^From '
,text)"
100000 loops, best of 3: 6.54 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*1000"
"re.search('^From
',text)"
10000 loops, best of 3: 26 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*10000"
"re.search('^From
',text)"
1000 loops, best of 3: 219 usec per loop

Aside: I noticed this years ago but assumed that the simple
optimisation of search was not done as a penalty on people who didn't
RTFM, and so didn't report it :)

2. Then realise that your test is equivalent to

if not line.startswith('^From '):

which is much easier to understand without the benefit of comments, and
(bonus!) is also much faster than re.match:

C:\junk>\python25\python -mtimeit -s"text='x'*100"
"text.startswith('^From ')"
1000000 loops, best of 3: 0.584 usec per loop

C:\junk>\python25\python -mtimeit -s"text='x'*1000"
"text.startswith('^From ')"
1000000 loops, best of 3: 0.583 usec per loop

C:\junk>\python25\python -mtimeit -s"text='x'*10000"
"text.startswith('^From ')"

1000000 loops, best of 3: 0.612 usec per loop

HTH,
John
 
J

John Machin

John Machin wrote:
[snip]
2. Then realise that your test is equivalent to

if not line.startswith('^From '):

Whoops!

That '^From ' (and all later ones) should have been 'From '

(the perils of over-hasty copy/paste)

The timings are, if anything, a tiny bit faster than before.

Cheers,
John
 
G

Gabriel Genellina

At said:
for line in fileinput.input(g, inplace=1, backup='.bak'):
# just print 2nd and subsequent lines
if not fileinput.isfirstline():
print line.rstrip('\n')
# check first line only
elif fileinput.isfirstline():
if not re.search('^From ',line):
print line.rstrip('\n')

Just a note: the elif is redundant, use a simple else clause.


--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,151
Latest member
JaclynMarl
Top