How to refer to data files without hardcoding paths?

Matthew Wilson · Sep 6, 2009

When a python package includes data files like templates or images,
what is the orthodox way of referring to these in code?

I'm working on an application installable through the Python package
index. Most of the app is just python code, but I use a few jinja2
templates. Today I realized that I'm hardcoding paths in my app. They
are relative paths based on os.getcwd(), but at some point, I'll be
running scripts that use this code, these open(...) calls will fail.

I found several posts that talk about using __file__ and then walking
to nearby directories.

I also came across pkg_resources, and that seems to work, but I don't
think I understand it all yet.

Matt

Dave Angel · Sep 6, 2009

Ben said:
The conventional solution to this is:

* Read configuration settings, whether directory paths or anything else,
from a configuration file of declarative options.

* Have the program read that configuration file from one location (or a
small number of locations), and make those locations well-known in the
documentation of the program.

Python's standard library has the ‘configparser’ module, which is one
possible implementation of this.

Before you can decide what libraries to use, you need to determine your
goal. Usually, you can separate the data files your application uses
into two groups. One is the read-only files. Those ship with the
application, and won't be edited after installation, or if they are,
they would be deliberate changes by the administrator of the machine,
not the individual user. Those should be located with the shipped .py
and .pyc files.

The other group (which might in turn be subdivided) is files that are
either created by the application for configuration purposes (config
files), or for the user (documents), or temp files (temp).

The first files can/should be found by looking up the full path to a
module at run time. Use the module's __file__ to get the full path, and
os.path.dirname() to parse it.

The second group of files can be located by various methods, such as
using the HOMEPATH
environment variable. But if there is more than one such location, one
should generally create a config file first, and have it store the
locations of the other files, after consulting with the end-user.

Once you've thought about your goals, you should then look at supporting
libraries to help with it. configparser is one such library, though both
its name and specs have changed over the years.

DaveA

Timothy Madden · Sep 6, 2009

Matthew said:
When a python package includes data files like templates or images,
what is the orthodox way of referring to these in code?

I'm working on an application installable through the Python package
index. Most of the app is just python code, but I use a few jinja2
templates. Today I realized that I'm hardcoding paths in my app. They
are relative paths based on os.getcwd(), but at some point, I'll be
running scripts that use this code, these open(...) calls will fail.

I found several posts that talk about using __file__ and then walking
to nearby directories.

I also came across pkg_resources, and that seems to work, but I don't
think I understand it all yet.

Matt

sys.path[0] should give you the path to your script. By reading the
documentation I would say it would give the path to the first script
passed to the interpreter at launch, but after using it I find it also
gives the current script path inside an imported file. So I use it to
group the script files in my application into subdirectories, and import
them as necessary from there.

My app works regardless of the current working directory, and can import
scripts and load icons from its various subdirectories.

Still I would like to know why it works in imported scripts, since the
doc page says sys.path[0] is the path to the script that caused the
interpreter to launch. What would that mean ?

Timothy Madden

Gabriel Genellina · Sep 8, 2009

En Sun, 06 Sep 2009 10:44:38 -0300, Timothy Madden

Matthew Wilson wrote:

When a python package includes data files like templates or images,
what is the orthodox way of referring to these in code?
I also came across pkg_resources, and that seems to work, but I don't
think I understand it all yet.

Click to expand...

sys.path[0] should give you the path to your script. By reading the
documentation I would say it would give the path to the first script
passed to the interpreter at launch, but after using it I find it also
gives the current script path inside an imported file. So I use it to
group the script files in my application into subdirectories, and import
them as necessary from there.

No, I think you got it wrong. sys.argv[0] is the name of the script being
executed; you can get its full path using os.path.abspath(sys.argv[0])
sys.path[0] is the directory containing the script being executed right
when the program starts. Later, any module is free to add and remove
entries from sys.path, so one should not rely on sys.path[0] being that
specific directory.

What you refer as "script files" are actually modules, and they're
imported, not executed. There is only one script being executed, the one
named in the command line (either as `python scriptname.py` or
`scriptname.py` or just `scriptname` or by double-clicking scriptname.py)

My app works regardless of the current working directory, and can import
scripts and load icons from its various subdirectories.

Still I would like to know why it works in imported scripts, since the
doc page says sys.path[0] is the path to the script that caused the
interpreter to launch. What would that mean ?

The script that is being executed, scriptname.py in the example above.
Even if you later import module `foo` from package `bar`, sys.argv[0]
doesn't change.

To determine the directory containing the main script being executed, put
these lines near the top of it:

import os,sys
main_directory = os.path.dirname(os.path.abspath(sys.argv[0]))

You may locate other files relative to that directory. But that doesn't
work if some components aren't actually on the filesystem (egg files,
zipped libraries, or programs deployed using py2exe or similar). I prefer
to use pkgutil.get_data(packagename, resourcename) because it can handle
those cases too.

Matthew Wilson · Sep 9, 2009

I prefer
to use pkgutil.get_data(packagename, resourcename) because it can handle
those cases too.

I didn't know about pkgutil until. I thought I had to use setuptools to
do that kind of stuff. Thanks!

Matt

Issue with passing fetched data to POST form. How can I?	0	Jul 23, 2023
How to loop in folder through all excel files and all sheets using pandas?	0	Dec 1, 2022
setuptools: getting data files to be installed	0	Sep 17, 2011
Daemons and Paths	3	Mar 14, 2009
How to change all relative paths in a website???	9	Jun 16, 2009
Find Paths in log text - How to?	0	Mar 23, 2006
Need help with setup.py and data files	0	Apr 22, 2009
How to know that two pyc files contain the same code	1	Mar 10, 2012

How to refer to data files without hardcoding paths?

Matthew Wilson

Dave Angel

Timothy Madden

Gabriel Genellina

Matthew Wilson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads