Reading files in from the proper directory

S

SMac2347

Hello. I am admittedly a Python novice, and ran into some trouble
trying to write a program that will pull multiple excel files all into
one file, with each file on a different sheet.

I am confident most of the code is correct, as the program runs
without any errors and I found the base of it online, making changes
as necessary for my own purposes. However, I am having trouble
specifying the exact directory where my code should be pulling the
files from. All the files are in the same folder, and I have put the
folder on my desktop. Am I correct in thinking that I need to change
the current working directory to this folder in order for Python to
read in these files, then generate my output? Or should I be doing
something else?

Any and all help is appreciated, thanks!
 
D

Dave Angel

Hello. I am admittedly a Python novice, and ran into some trouble
trying to write a program that will pull multiple excel files all into
one file, with each file on a different sheet.

I am confident most of the code is correct, as the program runs
without any errors and I found the base of it online, making changes
as necessary for my own purposes. However, I am having trouble
specifying the exact directory where my code should be pulling the
files from. All the files are in the same folder, and I have put the
folder on my desktop. Am I correct in thinking that I need to change
the current working directory to this folder in order for Python to
read in these files,
No, Python certainly does not constrain you to working with files only
in the current working directory. My rule of thumb is never to change
the cwd in a Python program. You can use relative paths to open files,
or you can use absolute paths. There is even a library function
os.path.abspath() for converting a relative path to an absolute one.

If you do change cwd during the running of a program, then relative
filenames that worked earlier might no longer work. You could convert
them all to absolute paths, but that's more work.

You can piece together path strings using os.path.join(). It's smart
enough to know the path separator for your particular platform.

Check out this page: http://docs.python.org/library/os.path.html
 
P

Peter Otten

Hello. I am admittedly a Python novice, and ran into some trouble
trying to write a program that will pull multiple excel files all into
one file, with each file on a different sheet.

I am confident most of the code is correct, as the program runs
without any errors and I found the base of it online, making changes
as necessary for my own purposes.

That confidence usually evaporates once you write the first unit test ;)
However, I am having trouble
specifying the exact directory where my code should be pulling the
files from. All the files are in the same folder, and I have put the
folder on my desktop. Am I correct in thinking that I need to change
the current working directory to this folder in order for Python to
read in these files, then generate my output? Or should I be doing
something else?

Do it properly, allow specifying the files on the commandline:

import argparse

def process_files(files, destfile):
# put your real code here
print "merge " + "\n ".join(files)
print "into " + destfile

if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("files", metavar="file", nargs="+")
parser.add_argument("destfile")
args = parser.parse_args()
process_files(args.files, args.destfile)

If you have standard locations for sources and destination you can wrap your
python script into a little batch file containing something like

python \source\path\*.xls \dest\path\merged.xls

and invoke that to get both flexibility and convenience.
 
S

SMac2347

On 02/07/2012 01:14 PM, (e-mail address removed) wrote:> Hello. I am admittedly a Python novice, and ran into some trouble


No, Python certainly does not constrain you to working with files only
in the current working directory.  My rule of thumb is never to change
the cwd in a Python program.  You can use relative paths to open files,
or you can use absolute paths.  There is even a library function
os.path.abspath() for converting a relative path to an absolute one.

If you do change cwd during the running of a program, then relative
filenames that worked earlier might no longer work.  You could convert
them all to absolute paths, but that's more work.

You can piece together path strings using os.path.join().  It's smart
enough to know the path separator for your particular platform.

Check out this page:http://docs.python.org/library/os.path.html

Thanks Dave. I am a bit lost as to what the problem is then - the
program runs glitch free, but then only prints: "NOTE *** No xls files
in C:/Documents and Settings/smacdon/." as specified below by my
program. Any idea what the issue might be (my code is below):


import xlrd, xlwt
import glob, os.path
def merge_xls (in_dir, out_file="C:\Documents and Settings\smacdon
\Desktop\09 Aggregate JWS\09_merged_data.xls"):
xls_files = glob.glob(in_dir + "*.xls")
sheet_names = [os.path.basename(v)[:-4] for v in xls_files]
sheet_excl = [os.path.basename(v)[:-4] for v in xls_files if len
(os.path.basename(v)[:-4]) > 29]
merged_book = xlwt.Workbook()
if in_dir[-1:] != "/": in_dir = in_dir + "/"
xls_files.sort()
if xls_files:
for k, xls_file in enumerate(xls_files):
print "---> Processing file %s" % (xls_file)
if len (sheet_names[k]) <= 29:
book = xlrd.open_workbook(xls_file)
if book.nsheets == 1:
ws = merged_book.add_sheet(sheet_names[k])
sheet = book.sheet_by_index(0)
for rx in range(sheet.nrows):
for cx in range(sheet.ncols):
ws.write(rx, cx, sheet.cell_value(rx,
cx))
elif book.nsheets in range(2, 100):
for sheetx in range(book.nsheets):
sheet0n = sheet_names[k]+str(sheetx
+1).zfill(2)
ws = merged_book.add_sheet(sheet0n)
sheet = book.sheet_by_index(sheetx)
for rx in range(sheet.nrows):
for cx in range(sheet.ncols):
ws.write(rx, cx, sheet.cell_value(rx,
cx))
else:
print "ERROR *** File %s has %s sheets (maximum is
99)" % (xls_file, book.nsheets)
raise
else:
print "WARNING *** File name too long: <%s.xls>
(maximum is 29 chars) " % (sheet_names[k])
print "WARNING *** File <%s.xls> was skipped." %
(sheet_names[k])
merged_book.save(out_file)
print
print "---> Merged xls file written to %s using the following
source files: " % (out_file)
for k, v in enumerate(sheet_names):
if len(v) <= 29:
print "\t", str(k+1).zfill(3), "%s.xls" % (v)
print
if sheet_excl:
print "--> The following files were skipped because the
file name exceeds 29 characters: "
for k, v in enumerate(sheet_excl):
print "\t", str(k+1).zfill(3), v
else:
print "NOTE *** No xls files in %s." % (in_dir)
merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09
Aggregate JWS"
 
S

SMac2347

Thanks for the responses.


Below is the code I have thus far. while the program runs glitch-free,
it only results in the printing of the message: "NOTE *** No xls files
in C:/Documents and Settings/smacdon/." as specified by my code. Any
idea as to why it might be unable to find the .xls documents (yes they
are .xls documents and not .xlsx). Thanks!


import xlrd, xlwt
import glob, os.path
def merge_xls (in_dir, out_file="C:\Documents and Settings\smacdon
\Desktop\09 Aggregate JWS\09_merged_data.xls"):
xls_files = glob.glob(in_dir + "*.xls")
sheet_names = [os.path.basename(v)[:-4] for v in xls_files]
sheet_excl = [os.path.basename(v)[:-4] for v in xls_files if len
(os.path.basename(v)[:-4]) > 29]
merged_book = xlwt.Workbook()
if in_dir[-1:] != "/": in_dir = in_dir + "/"
xls_files.sort()
if xls_files:
for k, xls_file in enumerate(xls_files):
print "---> Processing file %s" % (xls_file)
if len (sheet_names[k]) <= 29:
book = xlrd.open_workbook(xls_file)
if book.nsheets == 1:
ws = merged_book.add_sheet(sheet_names[k])
sheet = book.sheet_by_index(0)
for rx in range(sheet.nrows):
for cx in range(sheet.ncols):
ws.write(rx, cx, sheet.cell_value(rx,
cx))
elif book.nsheets in range(2, 100):
for sheetx in range(book.nsheets):
sheet0n = sheet_names[k]+str(sheetx
+1).zfill(2)
ws = merged_book.add_sheet(sheet0n)
sheet = book.sheet_by_index(sheetx)
for rx in range(sheet.nrows):
for cx in range(sheet.ncols):
ws.write(rx, cx, sheet.cell_value(rx,
cx))
else:
print "ERROR *** File %s has %s sheets (maximum is
99)" % (xls_file, book.nsheets)
raise
else:
print "WARNING *** File name too long: <%s.xls>
(maximum is 29 chars) " % (sheet_names[k])
print "WARNING *** File <%s.xls> was skipped." %
(sheet_names[k])
merged_book.save(out_file)
print
print "---> Merged xls file written to %s using the following
source files: " % (out_file)
for k, v in enumerate(sheet_names):
if len(v) <= 29:
print "\t", str(k+1).zfill(3), "%s.xls" % (v)
print
if sheet_excl:
print "--> The following files were skipped because the
file name exceeds 29 characters: "
for k, v in enumerate(sheet_excl):
print "\t", str(k+1).zfill(3), v
else:
print "NOTE *** No xls files in %s." % (in_dir)
merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09
Aggregate JWS"
 
J

John Gordon

In said:
Am I correct in thinking that I need to change the current working
directory to this folder in order for Python to read in these files,
then generate my output?

You don't have to do it that way, no.

In general, when opening a file, you can do it two ways: Either provide
a full pathname, or provide a relative pathname.

If you provide a full pathname (for example "/usr/home/smith/myfile.txt"),
that file will be opened and it does not matter what the current working
directory is.

If you provide a relative pathname (for example "myfile.txt"), python
will attempt to open that file starting from the current working dir.
 
J

John Gordon

In said:
xls_files = glob.glob(in_dir + "*.xls")

You may want to put a directory separator character in between the
directory name and the filename glob pattern.
 
P

Peter Otten

xls_files = glob.glob(in_dir + "*.xls")

Try changing that to

pattern = os.path.join(in_dir, "*.xls")
xls_files = glob.glob(pattern)

os.path.join() inserts a (back)slash between directory and filename if
necessary.
merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS")

If you paste the directory name literal into the interactive interpreter
you'll be surprised:
'C:\\Documents and Settings\\smacdon\\Desktop\x009 Aggregate JWS'

"\09" is intrpreted as chr(9). Use a raw string to prevent Python from
interpreting a backslash as the start of an escape sequence
'C:\\Documents and Settings\\smacdon\\Desktop\\09 Aggregate JWS'

or use forward slashes as directory separators.
 
S

SMac2347

Try changing that to

pattern = os.path.join(in_dir, "*.xls")
xls_files = glob.glob(pattern)

os.path.join() inserts a (back)slash between directory and filename if
necessary.


If you paste the directory name literal into the interactive interpreter
you'll be surprised:


'C:\\Documents and Settings\\smacdon\\Desktop\x009 Aggregate JWS'

"\09" is intrpreted as chr(9). Use a raw string to prevent Python from
interpreting a backslash as the start of an escape sequence


'C:\\Documents and Settings\\smacdon\\Desktop\\09 Aggregate JWS'

or use forward slashes as directory separators.

Peter, thanks so much for your help, your suggestions were spot on. So
now my program runs and is able to find and process the files
correctly, but I end up getting the following message:

Traceback (most recent call last):
File "C:/Documents and Settings/smacdon/My Documents/
excel_merge_files_indirectory v2.py", line 49, in <module>
merge_xls(in_dir=r"C:\Documents and Settings\smacdon\Desktop\09
Aggregate JWS")
File "C:/Documents and Settings/smacdon/My Documents/
excel_merge_files_indirectory v2.py", line 36, in merge_xls
merged_book.save(out_file)
File "C:\Python27\lib\site-packages\xlwt\Workbook.py", line 634, in
save
doc.save(filename, self.get_biff_data())
File "C:\Python27\lib\site-packages\xlwt\CompoundDoc.py", line 507,
in save
f = open(file_name_or_filelike_obj, 'wb')
TypeError: file() argument 1 must be encoded string without NULL
bytes, not str


If I am interpreting correctly, am I to understand that it would
appear the issue is tracing back to functions in the xlwt module? If
so, what can I do to fix this? Again, any and all help is appreciated!
 
S

SMac2347

Try changing that to

pattern = os.path.join(in_dir, "*.xls")
xls_files = glob.glob(pattern)

os.path.join() inserts a (back)slash between directory and filename if
necessary.


If you paste the directory name literal into the interactive interpreter
you'll be surprised:


'C:\\Documents and Settings\\smacdon\\Desktop\x009 Aggregate JWS'

"\09" is intrpreted as chr(9). Use a raw string to prevent Python from
interpreting a backslash as the start of an escape sequence


'C:\\Documents and Settings\\smacdon\\Desktop\\09 Aggregate JWS'

or use forward slashes as directory separators.

Disregard my last post, I was able to figure it out, I also had to
cover the out_file file name into a raw string as well. Thanks again
for all the help!!!
 
P

Peter Otten

Sorry, I was wrong here. "\09" is actually "\0" (i. e. chr(0))
followed by "9". Escape sequences starting with 0 are octal numbers
in Python 2 and thus may never contain digits > 7.
Peter, thanks so much for your help, your suggestions were spot on. So
now my program runs and is able to find and process the files
correctly, but I end up getting the following message:

Traceback (most recent call last):
File "C:/Documents and Settings/smacdon/My Documents/
excel_merge_files_indirectory v2.py", line 49, in <module>
merge_xls(in_dir=r"C:\Documents and Settings\smacdon\Desktop\09
Aggregate JWS")
File "C:/Documents and Settings/smacdon/My Documents/
excel_merge_files_indirectory v2.py", line 36, in merge_xls
merged_book.save(out_file)
File "C:\Python27\lib\site-packages\xlwt\Workbook.py", line 634, in
save
doc.save(filename, self.get_biff_data())
File "C:\Python27\lib\site-packages\xlwt\CompoundDoc.py", line 507,
in save
f = open(file_name_or_filelike_obj, 'wb')
TypeError: file() argument 1 must be encoded string without NULL
bytes, not str


If I am interpreting correctly, am I to understand that it would
appear the issue is tracing back to functions in the xlwt module? If
so, what can I do to fix this? Again, any and all help is appreciated!

You probably forgot to convert the default value for out_file into a
raw string:

def merge_xls(in_dir, out_file=
r"C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS\09_merged_data.xls"):

"\0" is therefore interpreted as chr(0) which marks the end of a C string and
may not occur in a file name. chr(0) is called "NULL byte" in the error message
you get.
 
D

Dennis Lee Bieber

Or use
os.path.join([list, of, path, parts])
and have the correct separator for whatever operating system (though it
does have a problem with a drive letter component -- that does need a
"\" to indicate the root of the drive rather than what the OS thinks of
as the drive's "current directory"
File "C:/Documents and Settings/smacdon/My Documents/
excel_merge_files_indirectory v2.py", line 36, in merge_xls
merged_book.save(out_file)
If I am interpreting correctly, am I to understand that it would
appear the issue is tracing back to functions in the xlwt module? If
so, what can I do to fix this? Again, any and all help is appreciated!

Where does out_file come from? I suspect THAT is what is corrupt.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top