automatically generating file dependency information from python tools

M

Moosebumps

Say you have a group of 20 programmers, and they're all writing python
scripts that are simple data crunchers -- i.e. command line tools that read
from one or more files and output one or more files.

I want to set up some sort of system that would automatically generate
makefile type information from the source code of these tools. Can anyone
think of a good way of doing it? You could make everyone call a special
function that wraps the file() and detects whether they are opening the file
for read or write. If read, it's an input, if write, it's an output file
(assume there is no r/w access). Then I guess your special function would
output the info in some sort of repository, which collects such info from
all the individual data crunchers.

The other thing I could think of is statically analyzing the source code --
but what if the filenames are generated dynamically? I'd be interested in
any ideas or links on this, I just started thinking about it today. For
some reason it seems to be a sort of problem to solve with metaclasses --
but I haven't thought of exactly how.

thanks,
MB
 
J

Jack Diederich

Say you have a group of 20 programmers, and they're all writing python
scripts that are simple data crunchers -- i.e. command line tools that read
from one or more files and output one or more files.

I want to set up some sort of system that would automatically generate
makefile type information from the source code of these tools. Can anyone
think of a good way of doing it? You could make everyone call a special
function that wraps the file() and detects whether they are opening the file
for read or write. If read, it's an input, if write, it's an output file
(assume there is no r/w access). Then I guess your special function would
output the info in some sort of repository, which collects such info from
all the individual data crunchers.

The other thing I could think of is statically analyzing the source code --
but what if the filenames are generated dynamically? I'd be interested in
any ideas or links on this, I just started thinking about it today. For
some reason it seems to be a sort of problem to solve with metaclasses --
but I haven't thought of exactly how.

In answer to the question you /almost/ asked:

http://www.google.com/search?q=python+make+replacement
 
J

John Roth

Moosebumps said:
Say you have a group of 20 programmers, and they're all writing python
scripts that are simple data crunchers -- i.e. command line tools that read
from one or more files and output one or more files.

I want to set up some sort of system that would automatically generate
makefile type information from the source code of these tools. Can anyone
think of a good way of doing it? You could make everyone call a special
function that wraps the file() and detects whether they are opening the file
for read or write. If read, it's an input, if write, it's an output file
(assume there is no r/w access). Then I guess your special function would
output the info in some sort of repository, which collects such info from
all the individual data crunchers.

The other thing I could think of is statically analyzing the source code --
but what if the filenames are generated dynamically? I'd be interested in
any ideas or links on this, I just started thinking about it today. For
some reason it seems to be a sort of problem to solve with metaclasses --
but I haven't thought of exactly how.

I'm not entirely clear on what the purpose of this is. I normally
think of "makefile" type information as something needed to compile
a program. This is something that isn't usually needed for Python
unless you're dealing with C extensions. Then I'd suggest looking at
SCons (www.scons.org).

What I'm getting is that you want to tie the individual programs
to the files that they're processing. In other words, build a catalog
of "if you have this kind of file, these are the availible programs that
will process it."

So the basic question is: are the files coming in from the command
line or are they built in? If the latter, I'd probably start out by pulling
strings that have a "." or a "/" or a "\" in them, and examining the
context. Or look at calls to modules from the os.path library.

More than likely you'll find a number of patterns that can be
processed and that will deal with the majority of programs. The
thing is, if you've got a bunch of programmers doing that kind
of work, they've probably fallen into habitual ways of coding
the repetitive stuff.

HTH

John Roth
 
M

Moosebumps

I'm not entirely clear on what the purpose of this is. I normally
think of "makefile" type information as something needed to compile
a program. This is something that isn't usually needed for Python
unless you're dealing with C extensions. Then I'd suggest looking at
SCons (www.scons.org).

Well sorry for being so abstract, let me be a little more concrete. I am
working at a video game company, and I have had some success using Python
for tools. I am just thinking about ways to convince other people to use
it. One way would be to improve the build processes, and be able to do
incremental builds of art assets without any additional effort from
programmers. Basically I'm trying to find a way to do some work for free
with python.

The idea is that there are many different types of assets, e.g. 3D models,
textures/other images, animations, audio, spreadsheet data, etc. Each of
these generally has some tool that converts it from the source format to the
format that is stored in the game on disk / in memory. Hence they are
usually simple command line data crunchers. They take some files as input
and just produce other files as output.

Currently, we don't have time to generate the dependency information
necessary for incremental building, so we generally just build everything
over again from scratch, which takes 20 PCs the entire night. The problem
is that the pipeline changes frequently, and nothing is really documented,
especially the dependencies. It would be nice if there was a way to
automatically get these from the individual data crunchers, which may be
written by many different people. It eliminates the redundancy of having
dependency information in the source code of the individual tools, and also
in a separate file that specifies dependency info (like a makefile).

So instead rebuilding the whole game, or having to know exactly which files
to rebuild (which some people know, but many others don't), the "make" tool
would be able to read the dependency information generated, and check dates
on the source files to see what changes, and build the minimum number of
things to get the game up to date. Currently lots of unnecessary things are
rebuilt constantly.
What I'm getting is that you want to tie the individual programs
to the files that they're processing. In other words, build a catalog
of "if you have this kind of file, these are the availible programs that
will process it."

Well, that is not exactly the point, but hopefully that information would
fall out of the automatic processing of the individual command line tools.
So the basic question is: are the files coming in from the command
line or are they built in? If the latter, I'd probably start out by pulling
strings that have a "." or a "/" or a "\" in them, and examining the
context. Or look at calls to modules from the os.path library.

They could be either "statically" specified in the source code, or only
known at runtime.
More than likely you'll find a number of patterns that can be
processed and that will deal with the majority of programs. The
thing is, if you've got a bunch of programmers doing that kind
of work, they've probably fallen into habitual ways of coding
the repetitive stuff.

Yes, that is true, and everything works OK now, but there are thousands and
thousands of lines of redundant code, and the build process is very slow.
I'm just trying to separate out the common parts of every tool, rather than
having all that information duplicated in dozens of little command line
utilities.

MB
 
M

Moosebumps

In answer to the question you /almost/ asked:

http://www.google.com/search?q=python+make+replacement

That is definitely of interest to me, but I would want to go one step
further and automatically generate the dependency info. I haven't looked
specifically at these make replacements, but I would assume you have to use
a makefile or specify dependency info in some form like a text file. What I
am looking for is a way to automatically generate it from the source code of
the individual tools that the make program will run, or by running the tools
in some special mode where they just spit out which files they will
read/write.

MB
 
P

Peter Hansen

Moosebumps said:
Say you have a group of 20 programmers, and they're all writing python
scripts that are simple data crunchers -- i.e. command line tools that read
from one or more files and output one or more files.

Shall we read into this the implication that there is no
coding standard of any kind being used for these tools? So
no hope of saying something as simple as "use constants for
all filenames, using the following conventions..."?
I want to set up some sort of system that would automatically generate
makefile type information from the source code of these tools. Can anyone
think of a good way of doing it? You could make everyone call a special
function that wraps the file() and detects whether they are opening the file
for read or write.

I think you've mixed up your two ideas in the above. You don't really
mean "source code" here, do you? You mean catching the information
dynamically from the running program, I think. That is something
that is probably quite easy to do with Python. For example, just
have everyone import a particular magic module that you create for
this purpose at the top of their scripts. That module installs a
replacement open() (or file()) function in the builtins module, and
then any file that is opened for reading or writing can be noticed
and relevant notes about it recorded in your repository.
The other thing I could think of is statically analyzing the source code --
but what if the filenames are generated dynamically?

As you've guessed, much harder to do. Especially with a language
that is not statically typed... (dare I say? ;-)

-Peter
 
S

Steven Knight

I'm not entirely clear on what the purpose of this is. I normally
Well sorry for being so abstract, let me be a little more concrete. I am
working at a video game company, and I have had some success using Python
for tools. I am just thinking about ways to convince other people to use
it. One way would be to improve the build processes, and be able to do
incremental builds of art assets without any additional effort from
programmers. Basically I'm trying to find a way to do some work for free
with python.

The idea is that there are many different types of assets, e.g. 3D models,
textures/other images, animations, audio, spreadsheet data, etc. Each of
these generally has some tool that converts it from the source format to the
format that is stored in the game on disk / in memory. Hence they are
usually simple command line data crunchers. They take some files as input
and just produce other files as output.

Check out SCons; it's specifically designed to be extensible in just
this way to handle different utilities for building different file types,
as well as allowing you to write scanners to return dependencies based on
any mechanism you can code up in Python. SCons is already in use by a
number of gaming companies to speed up and improve their builds.

--SK
 
J

John Roth

Moosebumps said:
That is definitely of interest to me, but I would want to go one step
further and automatically generate the dependency info. I haven't looked
specifically at these make replacements, but I would assume you have to use
a makefile or specify dependency info in some form like a text file. What I
am looking for is a way to automatically generate it from the source code of
the individual tools that the make program will run, or by running the tools
in some special mode where they just spit out which files they will
read/write.

SCons is what you want, then. It's got a scanner built in that can
be subclassed to scan anything to pull out dependency information
on the fly. Converting a build monstrosity to SCons isn't exactly
simple, but it's a lot simpler than any of the alternatives I can think
of.

John Roth
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top