automatically generating file dependency information from python tools

Discussion in 'Python' started by Moosebumps, Apr 9, 2004.

  1. Moosebumps

    Moosebumps Guest

    Say you have a group of 20 programmers, and they're all writing python
    scripts that are simple data crunchers -- i.e. command line tools that read
    from one or more files and output one or more files.

    I want to set up some sort of system that would automatically generate
    makefile type information from the source code of these tools. Can anyone
    think of a good way of doing it? You could make everyone call a special
    function that wraps the file() and detects whether they are opening the file
    for read or write. If read, it's an input, if write, it's an output file
    (assume there is no r/w access). Then I guess your special function would
    output the info in some sort of repository, which collects such info from
    all the individual data crunchers.

    The other thing I could think of is statically analyzing the source code --
    but what if the filenames are generated dynamically? I'd be interested in
    any ideas or links on this, I just started thinking about it today. For
    some reason it seems to be a sort of problem to solve with metaclasses --
    but I haven't thought of exactly how.

    thanks,
    MB
     
    Moosebumps, Apr 9, 2004
    #1
    1. Advertising

  2. Re: automatically generating file dependency information frompython tools

    On Fri, Apr 09, 2004 at 10:16:39PM +0000, Moosebumps wrote:
    > Say you have a group of 20 programmers, and they're all writing python
    > scripts that are simple data crunchers -- i.e. command line tools that read
    > from one or more files and output one or more files.
    >
    > I want to set up some sort of system that would automatically generate
    > makefile type information from the source code of these tools. Can anyone
    > think of a good way of doing it? You could make everyone call a special
    > function that wraps the file() and detects whether they are opening the file
    > for read or write. If read, it's an input, if write, it's an output file
    > (assume there is no r/w access). Then I guess your special function would
    > output the info in some sort of repository, which collects such info from
    > all the individual data crunchers.
    >
    > The other thing I could think of is statically analyzing the source code --
    > but what if the filenames are generated dynamically? I'd be interested in
    > any ideas or links on this, I just started thinking about it today. For
    > some reason it seems to be a sort of problem to solve with metaclasses --
    > but I haven't thought of exactly how.
    >


    In answer to the question you /almost/ asked:

    http://www.google.com/search?q=python make replacement
     
    Jack Diederich, Apr 9, 2004
    #2
    1. Advertising

  3. Moosebumps

    John Roth Guest

    "Moosebumps" <> wrote in message
    news:bhFdc.50011$...
    > Say you have a group of 20 programmers, and they're all writing python
    > scripts that are simple data crunchers -- i.e. command line tools that

    read
    > from one or more files and output one or more files.
    >
    > I want to set up some sort of system that would automatically generate
    > makefile type information from the source code of these tools. Can anyone
    > think of a good way of doing it? You could make everyone call a special
    > function that wraps the file() and detects whether they are opening the

    file
    > for read or write. If read, it's an input, if write, it's an output file
    > (assume there is no r/w access). Then I guess your special function would
    > output the info in some sort of repository, which collects such info from
    > all the individual data crunchers.
    >
    > The other thing I could think of is statically analyzing the source

    code --
    > but what if the filenames are generated dynamically? I'd be interested in
    > any ideas or links on this, I just started thinking about it today. For
    > some reason it seems to be a sort of problem to solve with metaclasses --
    > but I haven't thought of exactly how.


    I'm not entirely clear on what the purpose of this is. I normally
    think of "makefile" type information as something needed to compile
    a program. This is something that isn't usually needed for Python
    unless you're dealing with C extensions. Then I'd suggest looking at
    SCons (www.scons.org).

    What I'm getting is that you want to tie the individual programs
    to the files that they're processing. In other words, build a catalog
    of "if you have this kind of file, these are the availible programs that
    will process it."

    So the basic question is: are the files coming in from the command
    line or are they built in? If the latter, I'd probably start out by pulling
    strings that have a "." or a "/" or a "\" in them, and examining the
    context. Or look at calls to modules from the os.path library.

    More than likely you'll find a number of patterns that can be
    processed and that will deal with the majority of programs. The
    thing is, if you've got a bunch of programmers doing that kind
    of work, they've probably fallen into habitual ways of coding
    the repetitive stuff.

    HTH

    John Roth


    >
    > thanks,
    > MB
    >
    >
     
    John Roth, Apr 9, 2004
    #3
  4. Moosebumps

    Moosebumps Guest

    > I'm not entirely clear on what the purpose of this is. I normally
    > think of "makefile" type information as something needed to compile
    > a program. This is something that isn't usually needed for Python
    > unless you're dealing with C extensions. Then I'd suggest looking at
    > SCons (www.scons.org).


    Well sorry for being so abstract, let me be a little more concrete. I am
    working at a video game company, and I have had some success using Python
    for tools. I am just thinking about ways to convince other people to use
    it. One way would be to improve the build processes, and be able to do
    incremental builds of art assets without any additional effort from
    programmers. Basically I'm trying to find a way to do some work for free
    with python.

    The idea is that there are many different types of assets, e.g. 3D models,
    textures/other images, animations, audio, spreadsheet data, etc. Each of
    these generally has some tool that converts it from the source format to the
    format that is stored in the game on disk / in memory. Hence they are
    usually simple command line data crunchers. They take some files as input
    and just produce other files as output.

    Currently, we don't have time to generate the dependency information
    necessary for incremental building, so we generally just build everything
    over again from scratch, which takes 20 PCs the entire night. The problem
    is that the pipeline changes frequently, and nothing is really documented,
    especially the dependencies. It would be nice if there was a way to
    automatically get these from the individual data crunchers, which may be
    written by many different people. It eliminates the redundancy of having
    dependency information in the source code of the individual tools, and also
    in a separate file that specifies dependency info (like a makefile).

    So instead rebuilding the whole game, or having to know exactly which files
    to rebuild (which some people know, but many others don't), the "make" tool
    would be able to read the dependency information generated, and check dates
    on the source files to see what changes, and build the minimum number of
    things to get the game up to date. Currently lots of unnecessary things are
    rebuilt constantly.

    > What I'm getting is that you want to tie the individual programs
    > to the files that they're processing. In other words, build a catalog
    > of "if you have this kind of file, these are the availible programs that
    > will process it."


    Well, that is not exactly the point, but hopefully that information would
    fall out of the automatic processing of the individual command line tools.

    > So the basic question is: are the files coming in from the command
    > line or are they built in? If the latter, I'd probably start out by

    pulling
    > strings that have a "." or a "/" or a "\" in them, and examining the
    > context. Or look at calls to modules from the os.path library.


    They could be either "statically" specified in the source code, or only
    known at runtime.

    > More than likely you'll find a number of patterns that can be
    > processed and that will deal with the majority of programs. The
    > thing is, if you've got a bunch of programmers doing that kind
    > of work, they've probably fallen into habitual ways of coding
    > the repetitive stuff.


    Yes, that is true, and everything works OK now, but there are thousands and
    thousands of lines of redundant code, and the build process is very slow.
    I'm just trying to separate out the common parts of every tool, rather than
    having all that information duplicated in dozens of little command line
    utilities.

    MB
     
    Moosebumps, Apr 10, 2004
    #4
  5. Moosebumps

    Moosebumps Guest

    Re: automatically generating file dependency information frompython tools

    >
    > In answer to the question you /almost/ asked:
    >
    > http://www.google.com/search?q=python make replacement
    >


    That is definitely of interest to me, but I would want to go one step
    further and automatically generate the dependency info. I haven't looked
    specifically at these make replacements, but I would assume you have to use
    a makefile or specify dependency info in some form like a text file. What I
    am looking for is a way to automatically generate it from the source code of
    the individual tools that the make program will run, or by running the tools
    in some special mode where they just spit out which files they will
    read/write.

    MB
     
    Moosebumps, Apr 10, 2004
    #5
  6. Moosebumps

    Peter Hansen Guest

    Re: automatically generating file dependency information from pythontools

    Moosebumps wrote:

    > Say you have a group of 20 programmers, and they're all writing python
    > scripts that are simple data crunchers -- i.e. command line tools that read
    > from one or more files and output one or more files.


    Shall we read into this the implication that there is no
    coding standard of any kind being used for these tools? So
    no hope of saying something as simple as "use constants for
    all filenames, using the following conventions..."?

    > I want to set up some sort of system that would automatically generate
    > makefile type information from the source code of these tools. Can anyone
    > think of a good way of doing it? You could make everyone call a special
    > function that wraps the file() and detects whether they are opening the file
    > for read or write.


    I think you've mixed up your two ideas in the above. You don't really
    mean "source code" here, do you? You mean catching the information
    dynamically from the running program, I think. That is something
    that is probably quite easy to do with Python. For example, just
    have everyone import a particular magic module that you create for
    this purpose at the top of their scripts. That module installs a
    replacement open() (or file()) function in the builtins module, and
    then any file that is opened for reading or writing can be noticed
    and relevant notes about it recorded in your repository.

    > The other thing I could think of is statically analyzing the source code --
    > but what if the filenames are generated dynamically?


    As you've guessed, much harder to do. Especially with a language
    that is not statically typed... (dare I say? ;-)

    -Peter
     
    Peter Hansen, Apr 10, 2004
    #6
  7. > > I'm not entirely clear on what the purpose of this is. I normally
    > > think of "makefile" type information as something needed to compile
    > > a program. This is something that isn't usually needed for Python
    > > unless you're dealing with C extensions. Then I'd suggest looking at
    > > SCons (www.scons.org).

    >
    > Well sorry for being so abstract, let me be a little more concrete. I am
    > working at a video game company, and I have had some success using Python
    > for tools. I am just thinking about ways to convince other people to use
    > it. One way would be to improve the build processes, and be able to do
    > incremental builds of art assets without any additional effort from
    > programmers. Basically I'm trying to find a way to do some work for free
    > with python.
    >
    > The idea is that there are many different types of assets, e.g. 3D models,
    > textures/other images, animations, audio, spreadsheet data, etc. Each of
    > these generally has some tool that converts it from the source format to the
    > format that is stored in the game on disk / in memory. Hence they are
    > usually simple command line data crunchers. They take some files as input
    > and just produce other files as output.


    Check out SCons; it's specifically designed to be extensible in just
    this way to handle different utilities for building different file types,
    as well as allowing you to write scanners to return dependencies based on
    any mechanism you can code up in Python. SCons is already in use by a
    number of gaming companies to speed up and improve their builds.

    --SK
     
    Steven Knight, Apr 10, 2004
    #7
  8. Moosebumps

    John Roth Guest

    Re: automatically generating file dependency information frompython tools

    "Moosebumps" <> wrote in message
    news:vnHdc.50052$...
    > >
    > > In answer to the question you /almost/ asked:
    > >
    > > http://www.google.com/search?q=python make replacement
    > >

    >
    > That is definitely of interest to me, but I would want to go one step
    > further and automatically generate the dependency info. I haven't looked
    > specifically at these make replacements, but I would assume you have to

    use
    > a makefile or specify dependency info in some form like a text file. What

    I
    > am looking for is a way to automatically generate it from the source code

    of
    > the individual tools that the make program will run, or by running the

    tools
    > in some special mode where they just spit out which files they will
    > read/write.


    SCons is what you want, then. It's got a scanner built in that can
    be subclassed to scan anything to pull out dependency information
    on the fly. Converting a build monstrosity to SCons isn't exactly
    simple, but it's a lot simpler than any of the alternatives I can think
    of.

    John Roth
    >
    > MB
    >
    >
     
    John Roth, Apr 10, 2004
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Karsten Wutzke
    Replies:
    3
    Views:
    3,152
    Dale King
    Jun 5, 2005
  2. BioInfoGuy

    library dependency tools?

    BioInfoGuy, May 4, 2006, in forum: Java
    Replies:
    0
    Views:
    464
    BioInfoGuy
    May 4, 2006
  3. thinktwice
    Replies:
    10
    Views:
    1,759
    EventHelix.com
    Jul 1, 2008
  4. KLEIN Stphane
    Replies:
    3
    Views:
    805
  5. manos
    Replies:
    0
    Views:
    250
    manos
    Oct 6, 2007
Loading...

Share This Page