How to refer to data files without hardcoding paths?

M

Matthew Wilson

When a python package includes data files like templates or images,
what is the orthodox way of referring to these in code?

I'm working on an application installable through the Python package
index. Most of the app is just python code, but I use a few jinja2
templates. Today I realized that I'm hardcoding paths in my app. They
are relative paths based on os.getcwd(), but at some point, I'll be
running scripts that use this code, these open(...) calls will fail.

I found several posts that talk about using __file__ and then walking
to nearby directories.

I also came across pkg_resources, and that seems to work, but I don't
think I understand it all yet.

Matt
 
D

Dave Angel

Ben said:
The conventional solution to this is:

* Read configuration settings, whether directory paths or anything else,
from a configuration file of declarative options.

* Have the program read that configuration file from one location (or a
small number of locations), and make those locations well-known in the
documentation of the program.

Python's standard library has the ‘configparser’ module, which is one
possible implementation of this.
Before you can decide what libraries to use, you need to determine your
goal. Usually, you can separate the data files your application uses
into two groups. One is the read-only files. Those ship with the
application, and won't be edited after installation, or if they are,
they would be deliberate changes by the administrator of the machine,
not the individual user. Those should be located with the shipped .py
and .pyc files.

The other group (which might in turn be subdivided) is files that are
either created by the application for configuration purposes (config
files), or for the user (documents), or temp files (temp).

The first files can/should be found by looking up the full path to a
module at run time. Use the module's __file__ to get the full path, and
os.path.dirname() to parse it.

The second group of files can be located by various methods, such as
using the HOMEPATH
environment variable. But if there is more than one such location, one
should generally create a config file first, and have it store the
locations of the other files, after consulting with the end-user.

Once you've thought about your goals, you should then look at supporting
libraries to help with it. configparser is one such library, though both
its name and specs have changed over the years.

DaveA
 
T

Timothy Madden

Matthew said:
When a python package includes data files like templates or images,
what is the orthodox way of referring to these in code?

I'm working on an application installable through the Python package
index. Most of the app is just python code, but I use a few jinja2
templates. Today I realized that I'm hardcoding paths in my app. They
are relative paths based on os.getcwd(), but at some point, I'll be
running scripts that use this code, these open(...) calls will fail.

I found several posts that talk about using __file__ and then walking
to nearby directories.

I also came across pkg_resources, and that seems to work, but I don't
think I understand it all yet.

Matt

sys.path[0] should give you the path to your script. By reading the
documentation I would say it would give the path to the first script
passed to the interpreter at launch, but after using it I find it also
gives the current script path inside an imported file. So I use it to
group the script files in my application into subdirectories, and import
them as necessary from there.

My app works regardless of the current working directory, and can import
scripts and load icons from its various subdirectories.

Still I would like to know why it works in imported scripts, since the
doc page says sys.path[0] is the path to the script that caused the
interpreter to launch. What would that mean ?


Timothy Madden
 
G

Gabriel Genellina

En Sun, 06 Sep 2009 10:44:38 -0300, Timothy Madden
Matthew Wilson wrote:
When a python package includes data files like templates or images,
what is the orthodox way of referring to these in code?
I also came across pkg_resources, and that seems to work, but I don't
think I understand it all yet.

sys.path[0] should give you the path to your script. By reading the
documentation I would say it would give the path to the first script
passed to the interpreter at launch, but after using it I find it also
gives the current script path inside an imported file. So I use it to
group the script files in my application into subdirectories, and import
them as necessary from there.

No, I think you got it wrong. sys.argv[0] is the name of the script being
executed; you can get its full path using os.path.abspath(sys.argv[0])
sys.path[0] is the directory containing the script being executed right
when the program starts. Later, any module is free to add and remove
entries from sys.path, so one should not rely on sys.path[0] being that
specific directory.

What you refer as "script files" are actually modules, and they're
imported, not executed. There is only one script being executed, the one
named in the command line (either as `python scriptname.py` or
`scriptname.py` or just `scriptname` or by double-clicking scriptname.py)
My app works regardless of the current working directory, and can import
scripts and load icons from its various subdirectories.

Still I would like to know why it works in imported scripts, since the
doc page says sys.path[0] is the path to the script that caused the
interpreter to launch. What would that mean ?

The script that is being executed, scriptname.py in the example above.
Even if you later import module `foo` from package `bar`, sys.argv[0]
doesn't change.

To determine the directory containing the main script being executed, put
these lines near the top of it:

import os,sys
main_directory = os.path.dirname(os.path.abspath(sys.argv[0]))

You may locate other files relative to that directory. But that doesn't
work if some components aren't actually on the filesystem (egg files,
zipped libraries, or programs deployed using py2exe or similar). I prefer
to use pkgutil.get_data(packagename, resourcename) because it can handle
those cases too.
 
M

Matthew Wilson

I prefer
to use pkgutil.get_data(packagename, resourcename) because it can handle
those cases too.

I didn't know about pkgutil until. I thought I had to use setuptools to
do that kind of stuff. Thanks!

Matt
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top