Reading files in from the proper directory

Discussion in 'Python' started by SMac2347@comcast.net, Feb 7, 2012.

  1. Guest

    Hello. I am admittedly a Python novice, and ran into some trouble
    trying to write a program that will pull multiple excel files all into
    one file, with each file on a different sheet.

    I am confident most of the code is correct, as the program runs
    without any errors and I found the base of it online, making changes
    as necessary for my own purposes. However, I am having trouble
    specifying the exact directory where my code should be pulling the
    files from. All the files are in the same folder, and I have put the
    folder on my desktop. Am I correct in thinking that I need to change
    the current working directory to this folder in order for Python to
    read in these files, then generate my output? Or should I be doing
    something else?

    Any and all help is appreciated, thanks!
     
    , Feb 7, 2012
    #1
    1. Advertising

  2. Dave Angel Guest

    On 02/07/2012 01:14 PM, wrote:
    > Hello. I am admittedly a Python novice, and ran into some trouble
    > trying to write a program that will pull multiple excel files all into
    > one file, with each file on a different sheet.
    >
    > I am confident most of the code is correct, as the program runs
    > without any errors and I found the base of it online, making changes
    > as necessary for my own purposes. However, I am having trouble
    > specifying the exact directory where my code should be pulling the
    > files from. All the files are in the same folder, and I have put the
    > folder on my desktop. Am I correct in thinking that I need to change
    > the current working directory to this folder in order for Python to
    > read in these files,

    No, Python certainly does not constrain you to working with files only
    in the current working directory. My rule of thumb is never to change
    the cwd in a Python program. You can use relative paths to open files,
    or you can use absolute paths. There is even a library function
    os.path.abspath() for converting a relative path to an absolute one.

    If you do change cwd during the running of a program, then relative
    filenames that worked earlier might no longer work. You could convert
    them all to absolute paths, but that's more work.

    You can piece together path strings using os.path.join(). It's smart
    enough to know the path separator for your particular platform.

    Check out this page: http://docs.python.org/library/os.path.html

    > then generate my output? Or should I be doing
    > something else?
    >
    > Any and all help is appreciated, thanks!
    >



    --

    DaveA
     
    Dave Angel, Feb 7, 2012
    #2
    1. Advertising

  3. Peter Otten Guest

    wrote:

    > Hello. I am admittedly a Python novice, and ran into some trouble
    > trying to write a program that will pull multiple excel files all into
    > one file, with each file on a different sheet.
    >
    > I am confident most of the code is correct, as the program runs
    > without any errors and I found the base of it online, making changes
    > as necessary for my own purposes.


    That confidence usually evaporates once you write the first unit test ;)

    > However, I am having trouble
    > specifying the exact directory where my code should be pulling the
    > files from. All the files are in the same folder, and I have put the
    > folder on my desktop. Am I correct in thinking that I need to change
    > the current working directory to this folder in order for Python to
    > read in these files, then generate my output? Or should I be doing
    > something else?


    Do it properly, allow specifying the files on the commandline:

    import argparse

    def process_files(files, destfile):
    # put your real code here
    print "merge " + "\n ".join(files)
    print "into " + destfile

    if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("files", metavar="file", nargs="+")
    parser.add_argument("destfile")
    args = parser.parse_args()
    process_files(args.files, args.destfile)

    If you have standard locations for sources and destination you can wrap your
    python script into a little batch file containing something like

    python \source\path\*.xls \dest\path\merged.xls

    and invoke that to get both flexibility and convenience.
     
    Peter Otten, Feb 7, 2012
    #3
  4. Guest

    On Feb 7, 1:40 pm, Dave Angel <> wrote:
    > On 02/07/2012 01:14 PM, wrote:> Hello. I am admittedly a Python novice, and ran into some trouble
    > > trying to write a program that will pull multiple excel files all into
    > > one file, with each file on a different sheet.

    >
    > > I am confident most of the code is correct, as the program runs
    > > without any errors and I found the base of it online, making changes
    > > as necessary for my own purposes. However, I am having trouble
    > > specifying the exact directory where my code should be pulling the
    > > files from. All the files are in the same folder, and I have put the
    > > folder on my desktop. Am I correct in thinking that I need to change
    > > the current working directory to this folder in order for Python to
    > > read in these files,

    >
    > No, Python certainly does not constrain you to working with files only
    > in the current working directory.  My rule of thumb is never to change
    > the cwd in a Python program.  You can use relative paths to open files,
    > or you can use absolute paths.  There is even a library function
    > os.path.abspath() for converting a relative path to an absolute one.
    >
    > If you do change cwd during the running of a program, then relative
    > filenames that worked earlier might no longer work.  You could convert
    > them all to absolute paths, but that's more work.
    >
    > You can piece together path strings using os.path.join().  It's smart
    > enough to know the path separator for your particular platform.
    >
    > Check out this page:http://docs.python.org/library/os.path.html
    >
    > > then generate my output? Or should I be doing
    > > something else?

    >
    > > Any and all help is appreciated, thanks!

    >
    > --
    >
    > DaveA


    Thanks Dave. I am a bit lost as to what the problem is then - the
    program runs glitch free, but then only prints: "NOTE *** No xls files
    in C:/Documents and Settings/smacdon/." as specified below by my
    program. Any idea what the issue might be (my code is below):


    import xlrd, xlwt
    import glob, os.path
    def merge_xls (in_dir, out_file="C:\Documents and Settings\smacdon
    \Desktop\09 Aggregate JWS\09_merged_data.xls"):
    xls_files = glob.glob(in_dir + "*.xls")
    sheet_names = [os.path.basename(v)[:-4] for v in xls_files]
    sheet_excl = [os.path.basename(v)[:-4] for v in xls_files if len
    (os.path.basename(v)[:-4]) > 29]
    merged_book = xlwt.Workbook()
    if in_dir[-1:] != "/": in_dir = in_dir + "/"
    xls_files.sort()
    if xls_files:
    for k, xls_file in enumerate(xls_files):
    print "---> Processing file %s" % (xls_file)
    if len (sheet_names[k]) <= 29:
    book = xlrd.open_workbook(xls_file)
    if book.nsheets == 1:
    ws = merged_book.add_sheet(sheet_names[k])
    sheet = book.sheet_by_index(0)
    for rx in range(sheet.nrows):
    for cx in range(sheet.ncols):
    ws.write(rx, cx, sheet.cell_value(rx,
    cx))
    elif book.nsheets in range(2, 100):
    for sheetx in range(book.nsheets):
    sheet0n = sheet_names[k]+str(sheetx
    +1).zfill(2)
    ws = merged_book.add_sheet(sheet0n)
    sheet = book.sheet_by_index(sheetx)
    for rx in range(sheet.nrows):
    for cx in range(sheet.ncols):
    ws.write(rx, cx, sheet.cell_value(rx,
    cx))
    else:
    print "ERROR *** File %s has %s sheets (maximum is
    99)" % (xls_file, book.nsheets)
    raise
    else:
    print "WARNING *** File name too long: <%s.xls>
    (maximum is 29 chars) " % (sheet_names[k])
    print "WARNING *** File <%s.xls> was skipped." %
    (sheet_names[k])
    merged_book.save(out_file)
    print
    print "---> Merged xls file written to %s using the following
    source files: " % (out_file)
    for k, v in enumerate(sheet_names):
    if len(v) <= 29:
    print "\t", str(k+1).zfill(3), "%s.xls" % (v)
    print
    if sheet_excl:
    print "--> The following files were skipped because the
    file name exceeds 29 characters: "
    for k, v in enumerate(sheet_excl):
    print "\t", str(k+1).zfill(3), v
    else:
    print "NOTE *** No xls files in %s." % (in_dir)
    merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09
    Aggregate JWS"
     
    , Feb 7, 2012
    #4
  5. Guest

    Thanks for the responses.


    Below is the code I have thus far. while the program runs glitch-free,
    it only results in the printing of the message: "NOTE *** No xls files
    in C:/Documents and Settings/smacdon/." as specified by my code. Any
    idea as to why it might be unable to find the .xls documents (yes they
    are .xls documents and not .xlsx). Thanks!


    import xlrd, xlwt
    import glob, os.path
    def merge_xls (in_dir, out_file="C:\Documents and Settings\smacdon
    \Desktop\09 Aggregate JWS\09_merged_data.xls"):
    xls_files = glob.glob(in_dir + "*.xls")
    sheet_names = [os.path.basename(v)[:-4] for v in xls_files]
    sheet_excl = [os.path.basename(v)[:-4] for v in xls_files if len
    (os.path.basename(v)[:-4]) > 29]
    merged_book = xlwt.Workbook()
    if in_dir[-1:] != "/": in_dir = in_dir + "/"
    xls_files.sort()
    if xls_files:
    for k, xls_file in enumerate(xls_files):
    print "---> Processing file %s" % (xls_file)
    if len (sheet_names[k]) <= 29:
    book = xlrd.open_workbook(xls_file)
    if book.nsheets == 1:
    ws = merged_book.add_sheet(sheet_names[k])
    sheet = book.sheet_by_index(0)
    for rx in range(sheet.nrows):
    for cx in range(sheet.ncols):
    ws.write(rx, cx, sheet.cell_value(rx,
    cx))
    elif book.nsheets in range(2, 100):
    for sheetx in range(book.nsheets):
    sheet0n = sheet_names[k]+str(sheetx
    +1).zfill(2)
    ws = merged_book.add_sheet(sheet0n)
    sheet = book.sheet_by_index(sheetx)
    for rx in range(sheet.nrows):
    for cx in range(sheet.ncols):
    ws.write(rx, cx, sheet.cell_value(rx,
    cx))
    else:
    print "ERROR *** File %s has %s sheets (maximum is
    99)" % (xls_file, book.nsheets)
    raise
    else:
    print "WARNING *** File name too long: <%s.xls>
    (maximum is 29 chars) " % (sheet_names[k])
    print "WARNING *** File <%s.xls> was skipped." %
    (sheet_names[k])
    merged_book.save(out_file)
    print
    print "---> Merged xls file written to %s using the following
    source files: " % (out_file)
    for k, v in enumerate(sheet_names):
    if len(v) <= 29:
    print "\t", str(k+1).zfill(3), "%s.xls" % (v)
    print
    if sheet_excl:
    print "--> The following files were skipped because the
    file name exceeds 29 characters: "
    for k, v in enumerate(sheet_excl):
    print "\t", str(k+1).zfill(3), v
    else:
    print "NOTE *** No xls files in %s." % (in_dir)
    merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09
    Aggregate JWS"
     
    , Feb 7, 2012
    #5
  6. John Gordon Guest

    In <> writes:

    > Am I correct in thinking that I need to change the current working
    > directory to this folder in order for Python to read in these files,
    > then generate my output?


    You don't have to do it that way, no.

    In general, when opening a file, you can do it two ways: Either provide
    a full pathname, or provide a relative pathname.

    If you provide a full pathname (for example "/usr/home/smith/myfile.txt"),
    that file will be opened and it does not matter what the current working
    directory is.

    If you provide a relative pathname (for example "myfile.txt"), python
    will attempt to open that file starting from the current working dir.

    --
    John Gordon A is for Amy, who fell down the stairs
    B is for Basil, assaulted by bears
    -- Edward Gorey, "The Gashlycrumb Tinies"
     
    John Gordon, Feb 7, 2012
    #6
  7. John Gordon Guest

    In <> writes:

    > xls_files = glob.glob(in_dir + "*.xls")


    You may want to put a directory separator character in between the
    directory name and the filename glob pattern.

    --
    John Gordon A is for Amy, who fell down the stairs
    B is for Basil, assaulted by bears
    -- Edward Gorey, "The Gashlycrumb Tinies"
     
    John Gordon, Feb 7, 2012
    #7
  8. Peter Otten Guest

    wrote:

    > xls_files = glob.glob(in_dir + "*.xls")


    Try changing that to

    pattern = os.path.join(in_dir, "*.xls")
    xls_files = glob.glob(pattern)

    os.path.join() inserts a (back)slash between directory and filename if
    necessary.

    > merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS")


    If you paste the directory name literal into the interactive interpreter
    you'll be surprised:

    >>> "C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"

    'C:\\Documents and Settings\\smacdon\\Desktop\x009 Aggregate JWS'

    "\09" is intrpreted as chr(9). Use a raw string to prevent Python from
    interpreting a backslash as the start of an escape sequence

    >>> r"C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"

    'C:\\Documents and Settings\\smacdon\\Desktop\\09 Aggregate JWS'

    or use forward slashes as directory separators.
     
    Peter Otten, Feb 7, 2012
    #8
  9. Guest

    On Feb 7, 3:16 pm, Peter Otten <> wrote:
    > wrote:
    > > xls_files   = glob.glob(in_dir + "*.xls")

    >
    > Try changing that to
    >
    > pattern = os.path.join(in_dir, "*.xls")
    > xls_files = glob.glob(pattern)
    >
    > os.path.join() inserts a (back)slash between directory and filename if
    > necessary.
    >
    > > merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS")

    >
    > If you paste the directory name literal into the interactive interpreter
    > you'll be surprised:
    >
    > >>> "C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"

    >
    > 'C:\\Documents and Settings\\smacdon\\Desktop\x009 Aggregate JWS'
    >
    > "\09" is intrpreted as chr(9). Use a raw string to prevent Python from
    > interpreting a backslash as the start of an escape sequence
    >
    > >>> r"C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"

    >
    > 'C:\\Documents and Settings\\smacdon\\Desktop\\09 Aggregate JWS'
    >
    > or use forward slashes as directory separators.


    Peter, thanks so much for your help, your suggestions were spot on. So
    now my program runs and is able to find and process the files
    correctly, but I end up getting the following message:

    Traceback (most recent call last):
    File "C:/Documents and Settings/smacdon/My Documents/
    excel_merge_files_indirectory v2.py", line 49, in <module>
    merge_xls(in_dir=r"C:\Documents and Settings\smacdon\Desktop\09
    Aggregate JWS")
    File "C:/Documents and Settings/smacdon/My Documents/
    excel_merge_files_indirectory v2.py", line 36, in merge_xls
    merged_book.save(out_file)
    File "C:\Python27\lib\site-packages\xlwt\Workbook.py", line 634, in
    save
    doc.save(filename, self.get_biff_data())
    File "C:\Python27\lib\site-packages\xlwt\CompoundDoc.py", line 507,
    in save
    f = open(file_name_or_filelike_obj, 'wb')
    TypeError: file() argument 1 must be encoded string without NULL
    bytes, not str


    If I am interpreting correctly, am I to understand that it would
    appear the issue is tracing back to functions in the xlwt module? If
    so, what can I do to fix this? Again, any and all help is appreciated!
     
    , Feb 9, 2012
    #9
  10. Guest

    On Feb 7, 3:16 pm, Peter Otten <> wrote:
    > wrote:
    > > xls_files   = glob.glob(in_dir + "*.xls")

    >
    > Try changing that to
    >
    > pattern = os.path.join(in_dir, "*.xls")
    > xls_files = glob.glob(pattern)
    >
    > os.path.join() inserts a (back)slash between directory and filename if
    > necessary.
    >
    > > merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS")

    >
    > If you paste the directory name literal into the interactive interpreter
    > you'll be surprised:
    >
    > >>> "C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"

    >
    > 'C:\\Documents and Settings\\smacdon\\Desktop\x009 Aggregate JWS'
    >
    > "\09" is intrpreted as chr(9). Use a raw string to prevent Python from
    > interpreting a backslash as the start of an escape sequence
    >
    > >>> r"C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"

    >
    > 'C:\\Documents and Settings\\smacdon\\Desktop\\09 Aggregate JWS'
    >
    > or use forward slashes as directory separators.


    Disregard my last post, I was able to figure it out, I also had to
    cover the out_file file name into a raw string as well. Thanks again
    for all the help!!!
     
    , Feb 9, 2012
    #10
  11. Peter Otten Guest

    wrote:

    > On Feb 7, 3:16 pm, Peter Otten <> wrote:
    >> wrote:
    >> > xls_files = glob.glob(in_dir + "*.xls")

    >>
    >> Try changing that to
    >>
    >> pattern = os.path.join(in_dir, "*.xls")
    >> xls_files = glob.glob(pattern)
    >>
    >> os.path.join() inserts a (back)slash between directory and filename if
    >> necessary.
    >>
    >> > merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09
    >> > Aggregate JWS")

    >>
    >> If you paste the directory name literal into the interactive interpreter
    >> you'll be surprised:
    >>
    >> >>> "C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"

    >>
    >> 'C:\\Documents and Settings\\smacdon\\Desktop\x009 Aggregate JWS'
    >>
    >> "\09" is intrpreted as chr(9). Use a raw string to prevent Python from


    Sorry, I was wrong here. "\09" is actually "\0" (i. e. chr(0))
    followed by "9". Escape sequences starting with 0 are octal numbers
    in Python 2 and thus may never contain digits > 7.

    >> interpreting a backslash as the start of an escape sequence
    >>
    >> >>> r"C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"

    >>
    >> 'C:\\Documents and Settings\\smacdon\\Desktop\\09 Aggregate JWS'
    >>
    >> or use forward slashes as directory separators.

    >
    > Peter, thanks so much for your help, your suggestions were spot on. So
    > now my program runs and is able to find and process the files
    > correctly, but I end up getting the following message:
    >
    > Traceback (most recent call last):
    > File "C:/Documents and Settings/smacdon/My Documents/
    > excel_merge_files_indirectory v2.py", line 49, in <module>
    > merge_xls(in_dir=r"C:\Documents and Settings\smacdon\Desktop\09
    > Aggregate JWS")
    > File "C:/Documents and Settings/smacdon/My Documents/
    > excel_merge_files_indirectory v2.py", line 36, in merge_xls
    > merged_book.save(out_file)
    > File "C:\Python27\lib\site-packages\xlwt\Workbook.py", line 634, in
    > save
    > doc.save(filename, self.get_biff_data())
    > File "C:\Python27\lib\site-packages\xlwt\CompoundDoc.py", line 507,
    > in save
    > f = open(file_name_or_filelike_obj, 'wb')
    > TypeError: file() argument 1 must be encoded string without NULL
    > bytes, not str
    >
    >
    > If I am interpreting correctly, am I to understand that it would
    > appear the issue is tracing back to functions in the xlwt module? If
    > so, what can I do to fix this? Again, any and all help is appreciated!


    You probably forgot to convert the default value for out_file into a
    raw string:

    def merge_xls(in_dir, out_file=
    r"C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS\09_merged_data.xls"):

    "\0" is therefore interpreted as chr(0) which marks the end of a C string and
    may not occur in a file name. chr(0) is called "NULL byte" in the error message
    you get.
     
    Peter Otten, Feb 9, 2012
    #11
  12. On Thu, 9 Feb 2012 12:32:35 -0800 (PST), wrote:

    >On Feb 7, 3:16 pm, Peter Otten <> wrote:


    >> or use forward slashes as directory separators.

    >

    Or use
    os.path.join([list, of, path, parts])
    and have the correct separator for whatever operating system (though it
    does have a problem with a drive letter component -- that does need a
    "\" to indicate the root of the drive rather than what the OS thinks of
    as the drive's "current directory"

    > File "C:/Documents and Settings/smacdon/My Documents/
    >excel_merge_files_indirectory v2.py", line 36, in merge_xls
    > merged_book.save(out_file)

    <snip>
    >
    >If I am interpreting correctly, am I to understand that it would
    >appear the issue is tracing back to functions in the xlwt module? If
    >so, what can I do to fix this? Again, any and all help is appreciated!


    Where does out_file come from? I suspect THAT is what is corrupt.
    --
    Wulfraed Dennis Lee Bieber AF6VN
    HTTP://wlfraed.home.netcom.com/
     
    Dennis Lee Bieber, Feb 10, 2012
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. TJS
    Replies:
    1
    Views:
    519
    Eliyahu Goldin
    Jun 23, 2004
  2. =?Utf-8?B?QmFzIEhlbmRyaWtz?=

    Directory and files names in temporary asp.net directory

    =?Utf-8?B?QmFzIEhlbmRyaWtz?=, Jan 28, 2005, in forum: ASP .Net
    Replies:
    5
    Views:
    843
    Juan T. Llibre
    Jan 31, 2005
  3. Replies:
    1
    Views:
    585
    Evertjan.
    Sep 30, 2005
  4. Java and Swing
    Replies:
    2
    Views:
    6,542
    Daniel
    May 18, 2006
  5. user
    Replies:
    3
    Views:
    678
    =?ISO-8859-1?Q?G=F6ran_Andersson?=
    Mar 31, 2007
Loading...

Share This Page