Parsing and Editing Source

P

Paul Wilson

Hi all,

I'd like to be able to do the following to a python source file
programmatically:
* Read in a source file
* Add/Remove/Edit Classes, methods, functions
* Add/Remove/Edit Decorators
* List the Classes
* List the imported modules
* List the functions
* List methods of classes

And then save out the result back to the original file (or elsewhere).

I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.

So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.

Any thoughts?
Thanks,
Paul
 
E

eliben

Hi all,

I'd like to be able to do the following to a python source file
programmatically:
 * Read in a source file
 * Add/Remove/Edit Classes, methods, functions
 * Add/Remove/Edit Decorators
 * List the Classes
 * List the imported modules
 * List the functions
 * List methods of classes

And then save out the result back to the original file (or elsewhere).

I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.

So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.

Any thoughts?
Thanks,
Paul

Consider using the 'compiler' module which will lend you more help
than 'tokenize'.

For example, the following demo lists all the method names in a file:

import compiler

class MethodFinder:
""" Print the names of all the methods

Each visit method takes two arguments, the node and its
current scope.
The scope is the name of the current class or None.
"""

def visitClass(self, node, scope=None):
self.visit(node.code, node.name)

def visitFunction(self, node, scope=None):
if scope is not None:
print "%s.%s" % (scope, node.name)
self.visit(node.code, None)

def main(files):
mf = MethodFinder()
for file in files:
f = open(file)
buf = f.read()
f.close()
ast = compiler.parse(buf)
compiler.walk(ast, mf)

if __name__ == "__main__":
import pprint
import sys

main(sys.argv)
 
W

Wilson

Consider using the 'compiler' module which will lend you more help
than 'tokenize'.

For example, the following demo lists all the method names in a file:

import compiler

class MethodFinder:
    """ Print the names of all the methods

        Each visit method takes two arguments, the node and its
        current scope.
        The scope is the name of the current class or None.
    """

    def visitClass(self, node, scope=None):
        self.visit(node.code, node.name)

    def visitFunction(self, node, scope=None):
        if scope is not None:
            print "%s.%s" % (scope, node.name)
        self.visit(node.code, None)

def main(files):
    mf = MethodFinder()
    for file in files:
        f = open(file)
        buf = f.read()
        f.close()
        ast = compiler.parse(buf)
        compiler.walk(ast, mf)

if __name__ == "__main__":
    import pprint
    import sys

    main(sys.argv)

Thanks! Will I be able to make changes to the ast such as "rename
decorator", "add decorator", etc.. and write them back out to a file
as Python source?

Regards,
Paul
 
R

Rafe

Hi all,

I'd like to be able to do the following to a python source file
programmatically:
* Read in a source file
* Add/Remove/Edit Classes, methods, functions
* Add/Remove/Edit Decorators
* List the Classes
* List the imported modules
* List the functions
* List methods of classes

And then save out the result back to the original file (or elsewhere).

I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.

So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.

Any thoughts?
Thanks,
Paul


I can't help much...yet, but I am also heavily interested in this as I
will be approaching a project which will require me to write code
which writes code back to a file or new file after being manipulated.
I had planned on using the inspect module's getsource(), getmodule()
and getmembers() methods rather than doing any sort of file reading.
Have you tried any of these yet? Have you found any insurmountable
limitations?

It looks like everything needed is there. Some quick thoughts
regarding inspect.getmembers(module) results...
* Module objects can be written based on their attribute name and
__name__ values. If they are the same, then just write "import %s" %
mod.__name__. If they are different, write "import %s as %s" % (name,
mod.__name__)

* Skipping built in stuff is easy and everything else is either an
attribute name,value pair or an object of type 'function' or 'class'.
Both of which work with inspect.getsource() I believe.

* If the module used any from-import-* lines, it doesn't look like
there is any difference between items defined in the module and those
imported in to the modules name space. writing this back directly
would 'flatten' this call to individual module imports and local
module attributes. Maybe reading the file just to test for this would
be the answer. You could then import the module and subtract items
which haven't changed. This is easy for attributes but harder for
functions and classes...right?


Beyond this initial bit of code, I'm hoping to be able to write new
code where I only want the new object to have attributes which were
changed. So if I have an instance of a Person object who's name has
been changed from it's default, I only want a new class which inherits
the Person class and has an attribute 'name' with the new value.
Basically using python as a text-based storage format instead of
something like XML. Thoughts on this would be great for me if it
doesn't hijack the thread ;) I know there a quite a few who have done
this already.


Cheers,

- Rafe
 
W

Wilson

I can't help much...yet, but I am also heavily interested in this as I
will be approaching a project which will require me to write code
which writes code back to a file or new file after being manipulated.
I had planned on using the inspect module's getsource(), getmodule()
and getmembers() methods rather than doing any sort of file reading.
Have you tried any of these yet? Have you found any insurmountable
limitations?

The inspect module's getsource() returns the source code as originally
defined. It does not return any changes that have been made during
runtime. So, if you attached a new class to a module, I don't belive
that getsource() would be any use for extracting the code again to be
saved. I have rejected this approach for this reason. getmembers()
seems to be fine for this purpose, however I don't seen anyway to get
class decorators and method decorators out.
It looks like everything needed is there. Some quick thoughts
regarding inspect.getmembers(module) results...
 * Module objects can be written based on their attribute name and
__name__ values. If they are the same, then just write "import %s" %
mod.__name__. If they are different, write "import %s as %s" % (name,
mod.__name__)

 * Skipping built in stuff is easy and everything else is either an
attribute name,value pair or an object of type 'function' or 'class'.
Both of which work with inspect.getsource() I believe.

True, but if you add a function or class at runtime,
inspect.getsource() will not pick it up. It's reading the source from
a file, not doing some sort of AST unparse magic as I'd hoped. You'll
also have to check getsource() will return the decorator of an object
too.
 * If the module used any from-import-* lines, it doesn't look like
there is any difference between items defined in the module and those
imported in to the modules name space. writing this back directly
would 'flatten' this call to individual module imports and local
module attributes. Maybe reading the file just to test for this would
be the answer. You could then import the module and subtract items
which haven't changed. This is easy for attributes but harder for
functions and classes...right?

Does getmodule() not tell you where objects are defined?
Beyond this initial bit of code, I'm hoping to be able to write new
code where I only want the new object to have attributes which were
changed. So if I have an instance of a Person object who's name has
been changed from it's default, I only want a new class which inherits
the Person class and has an attribute 'name' with the new value.
Basically using python as a text-based storage format instead of
something like XML. Thoughts on this would be great for me if it
doesn't hijack the thread ;) I know there a quite a few who have done
this already.

You want to be able to make class attribute changes and then have some
automated way of generating overriding subclasses that reflects this
change? Sounds difficult. Be sure to keep me posted on your journey!

Regards,
Paul
 
B

Benjamin

Hi all,

I'd like to be able to do the following to a python source file
programmatically:
 * Read in a source file
 * Add/Remove/Edit Classes, methods, functions
 * Add/Remove/Edit Decorators
 * List the Classes
 * List the imported modules
 * List the functions
 * List methods of classes

And then save out the result back to the original file (or elsewhere).

I've begun by using the tokenize module to generate a token-tuple list
and am building datastructures around it that enable the above
methods. I'm find that I'm getting a little caught up in the details
and thought I'd step back and ask if there's a more elegant way to
approach this, or if anyone knows a library that could assist.

So far, I've got code that generates a line number to token-tuple list
dictionary, and am working on a datastructure describing where the
classes begin and end, indexed by their name, such that they can be
later modified.


Look at the 2to3 tool which is good at this sort of thing. It lets you
define custom "fixers" that work on a fairly high-level representation
of the parse tree and then write the source back exactly unchanged.
 
W

Wilson

Look at the 2to3 tool which is good at this sort of thing. It lets you
define custom "fixers" that work on a fairly high-level representation
of the parse tree and then write the source back exactly unchanged.

Thanks for the hint. I've looked at lib2to3 and there might be some
useful stuff in there!

Thank you,
Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,585
Members
45,081
Latest member
AnyaMerry

Latest Threads

Top