how to organize a module that requires a data file

Steven Bethard · Nov 17, 2005

Ok, so I have a module that is basically a Python wrapper around a big
lookup table stored in a text file[1]. The module needs to provide a
few functions::

get_stem(word, pos, default=None)
stem_exists(word, pos)
...

Because there should only ever be one lookup table, I feel like these
functions ought to be module globals. That way, you could just do
something like::

import morph
assist = morph.get_stem('assistance', 'N')
...

My problem is with the text file. Where should I keep it? If I want to
keep the module simple, I need to be able to identify the location of
the file at module import time. That way, I can read all the data into
the appropriate Python structure, and all my module-level functions will
work immediatly after import.

I can only think of a few obvious places where I could find the text
file at import time -- in the same directory as the module (e.g.
lib/site-packages), in the user's home directory, or in a directory
indicated by an environment variable. The first seems weird because the
text file is large (about 10MB) and I don't really see any other
packages putting data files into lib/site-packages. The second seems
weird because it's not a per-user configuration - it's a data file
shared by all users. And the the third seems weird because my
experience with a configuration depending heavily on environment
variables is that this is difficult to maintain.

If I don't mind complicating the module functions a bit (e.g. by
starting each function with "if _lookup_table is not None"), I could
allow users to specify a location for the file after the module is
imported, e.g.::

import morph
morph.setfile(r'C:\resources\morph_english.flat')
...

Then all the module-level functions would have to raise Exceptions until
setfile() was called. I don't like that the user would have to
configure the module each time they wanted to use it, but perhaps that's
unaviodable.

Any suggestions? Is there an obvious place to put the text file that
I'm missing?

Thanks in advance,

STeVe

[1] In case you're curious, the file is a list of words and their
morphological stems provided by the University of Pennsylvania.

Paul Boddie · Nov 17, 2005

Steven Bethard wrote:

[Text file for a module's internal use.]

My problem is with the text file. Where should I keep it? If I want to
keep the module simple, I need to be able to identify the location of
the file at module import time. That way, I can read all the data into
the appropriate Python structure, and all my module-level functions will
work immediatly after import.

I tend to make use of the __file__ attribute available in every module.
For example:

resource_dir = os.path.join(os.path.split(__file__)[0], "Resources")

This assigns to resource_dir the path to the Resources directory
alongside the module itself in the filesystem. Of course, if you just
wanted the text file to reside alongside the module, rather than a
whole directory of stuff, you'd replace "Resources" with the name of
your file (and change the variable name, of course). For example:

filename = os.path.join(os.path.split(__file__)[0],
"morph_english.flat")

Having posted this solution, and in the tradition of Usenet, I'd be
interested to hear whether this is a particularly bad idea.

Paul

Terry Hancock · Nov 17, 2005

My problem is with the text file. Where should I keep it?

I can only think of a few obvious places where I could
find the text file at import time -- in the same
directory as the module (e.g. lib/site-packages), in the
user's home directory, or in a directory indicated by an
environment variable.

Why don't you search those places in order for it?

Check ~/.mymod/myfile, then /etc/mymod/myfile, then
/lib/site-packages/mymod/myfile or whatever. It won't take
long, just do the existence checks on import of the module.
If you don't find it after checking those places, *then*
raise an exception.

You don't say what this data file is or whether it is
subject to change or customization. If it is, then there is
a real justification for this approach, because an
individual user might want to shadow the system install with
his own version of the data.

That's pretty typical behavior for configuration files on
any Posix system.

Cheers,
Terry

Steven Bethard · Nov 17, 2005

Terry said:
Why don't you search those places in order for it?

Check ~/.mymod/myfile, then /etc/mymod/myfile, then
/lib/site-packages/mymod/myfile or whatever. It won't take
long, just do the existence checks on import of the module.
If you don't find it after checking those places, *then*
raise an exception.

You don't say what this data file is or whether it is
subject to change or customization. If it is, then there is
a real justification for this approach, because an
individual user might want to shadow the system install with
his own version of the data.

The file is a lookup table of word stems distributed by the University
of Pennsylvania. It doesn't really make sense for users to customize
it, because it's not a configuration file, but it is possible that UPenn
would distribute a new version at some point. That's what I meant when
I said "it's not a per-user configuration - it's a data file shared by
all users". So there should be exactly one copy of the file, so I
shouldn't have to deal with shadowing.

Of course, even with only one copy of the file, that doesn't mean that I
couldn't search a few places. Maybe I could by default put it in
lib/site-packages, but allow an option to setup.py to put it somewhere
else for anyone who was worried about putting 10MB into
lib/site-packages. Those folks would then have to use an environment
variable, say $MORPH_FLAT, to identify the directory they . At module
import I would just check both locations...

I'll have to think about this some more...

STeVe

Larry Bates · Nov 17, 2005

Personally I would do this as a class and pass a path to where
the file is stored as an argument to instantiate it (maybe try
to help user if they don't pass it). Something like:

class morph:
def __init__(self, pathtodictionary=None):
if pathtodictionary is None:
#
# Insert code here to see if it is in the current
# directory and/or look in other directories.
#

try: self.fp=open(pathtodictionary, 'r')
except:
print "unable to locate dictionary at: %s" % pathtodictionary

else:
#
# Insert code here to load data from .txt file
#

fp.close()
return

def get_stem(self, arg1, arg2):
#
# Code for get_stem method
#

The other way I've done this is to have a .INI file that always lives
in the same directory as the class with an entry in it that points me
to where the .txt file lives.

Hope this helps.

-Larry Bates

Steven said:
Ok, so I have a module that is basically a Python wrapper around a big
lookup table stored in a text file[1]. The module needs to provide a
few functions::

get_stem(word, pos, default=None)
stem_exists(word, pos)
...

Because there should only ever be one lookup table, I feel like these
functions ought to be module globals. That way, you could just do
something like::

import morph
assist = morph.get_stem('assistance', 'N')
...

My problem is with the text file. Where should I keep it? If I want to
keep the module simple, I need to be able to identify the location of
the file at module import time. That way, I can read all the data into
the appropriate Python structure, and all my module-level functions will
work immediatly after import.

I can only think of a few obvious places where I could find the text
file at import time -- in the same directory as the module (e.g.
lib/site-packages), in the user's home directory, or in a directory
indicated by an environment variable. The first seems weird because the
text file is large (about 10MB) and I don't really see any other
packages putting data files into lib/site-packages. The second seems
weird because it's not a per-user configuration - it's a data file
shared by all users. And the the third seems weird because my
experience with a configuration depending heavily on environment
variables is that this is difficult to maintain.

If I don't mind complicating the module functions a bit (e.g. by
starting each function with "if _lookup_table is not None"), I could
allow users to specify a location for the file after the module is
imported, e.g.::

import morph
morph.setfile(r'C:\resources\morph_english.flat')
...

Then all the module-level functions would have to raise Exceptions until
setfile() was called. I don't like that the user would have to
configure the module each time they wanted to use it, but perhaps that's
unaviodable.

Any suggestions? Is there an obvious place to put the text file that
I'm missing?

Thanks in advance,

STeVe

[1] In case you're curious, the file is a list of words and their
morphological stems provided by the University of Pennsylvania.

Steven Bethard · Nov 17, 2005

Larry said:
Personally I would do this as a class and pass a path to where
the file is stored as an argument to instantiate it (maybe try
to help user if they don't pass it). Something like:

class morph:
def __init__(self, pathtodictionary=None):
if pathtodictionary is None:
# Insert code here to see if it is in the current
# directory and/or look in other directories.
try: self.fp=open(pathtodictionary, 'r')
except:
print "unable to locate dictionary at: %s" % pathtodictionary
else:
# Insert code here to load data from .txt file
fp.close()
return

def get_stem(self, arg1, arg2):
# Code for get_stem method

Actually, this is basically what I have right now. It bothers me a
little because you can get two instances of "morph", with two separate
dictionaries loaded. Since they're all loading the same file, it
doesn't seem like there should be multiple instances. I know I could
use a singleton pattern, but aren't modules basically the singletons of
Python?

The other way I've done this is to have a .INI file that always lives
in the same directory as the class with an entry in it that points me
to where the .txt file lives.

That's a thought. Thanks.

Steve

manuelg · Nov 18, 2005

I have tried several ways, this is the way I like best (I develop in
Windows, but this technique should work in *NIX for your application)

:: \whereever\whereever\ (the directory your module is in,
obviously somewhere where PYTHONPATH can
see it)

:::: stevemodule.py (your module)

:::: stevemodule_workfiles\ (a subdirectory in the same directory as
your module)

:::::: __init__.py (an empty file in stevemodule_workfiles\,
only here to make stevemodule_workfiles\
look like a package)

:::::: stevelargetextfile.txt (your large textfile in
stevemodule_workfiles\)

Now, to load the large textfile, I agree that it should be done with
module functions, so if it gets used several times in the same process,
it is only loaded once. The Python module itself follows the
"singleton" pattern, so you get that behavior for free.

Here is the Python code for loading the file:

import os.path
import stevemodule_workfiles

workfiles_path =
os.path.split(stevemodule_workfiles.__file__)[0]

stevelargetextfile_fullpath =
os.path.join(workfiles_path, 'stevelargetextfile.txt')

stevelargetextfile_file = open(stevelargetextfile_fullpath)

Add a text file that a user specified the name of in a program to a directory	0	Apr 28, 2022
Translater + module + tkinter	1	Feb 16, 2023
I want to make a logging.conf file that outputs into a file that everytime the program runs it overwrites its self	0	Nov 30, 2022
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
How to save JSON Data to a file using fetch() api?	2	Apr 28, 2022
Automatically organize module imports	1	Oct 15, 2007
Python client/server that reads HTML body from server	1	Apr 12, 2023
Issue with passing fetched data to POST form. How can I?	0	Jul 23, 2023

how to organize a module that requires a data file

Steven Bethard

Paul Boddie

Terry Hancock

Steven Bethard

Larry Bates

Steven Bethard

manuelg

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads