Understanding other people's code

Discussion in 'Python' started by L O'Shea, Jul 12, 2013.

  1. L O'Shea

    L O'Shea Guest

    Hi all,
    I've been asked to take over a project from someone else and to extend the functionality of this. The project is written in Python which I haven't hadany real experience with (although I do really like it) so I've spent the last week or two settling in, trying to get my head around Python and the way in which this code works.

    The problem is the code was clearly written by someone who is exceptionallygood and seems to inherit everything from everywhere else. It all seems very dynamic, nothing is written statically except in some configuration files.
    Basically the problem is I am new to the language and this was clearly written by someone who at the moment is far better at it than I am!

    I'm starting to get pretty worried about my lack of overall progress and soI wondered if anyone out there had some tips and techniques for understanding other peoples code. There has to be 10/15 different scripts with at least 10 functions in each file I would say.

    Literally any idea will help, pen and paper, printing off all the code and doing some sort of highlighting session - anything! I keep reading bits of code and thinking "well where the hell has that been defined and what does it mean" to find it was inherited from 3 modules up the chain. I really need to get a handle on how exactly all this slots together! Any techniques,tricks or methodologies that people find useful would be much appreciated.
    L O'Shea, Jul 12, 2013
    1. Advertisements

  2. The first thing I'd recommend is getting yourself familiar with the
    language itself, and (to some extent) the standard library. Then
    you'll know that any unrecognized wotzit must have come from your own
    project, so you'll be able to search up its definition. Then I'd
    tackle source files one at a time, and look at the very beginning. If
    the original coder was at all courteous, each file will start off with
    a block of 'import' statements, looking something like this:

    import re
    import itertools

    Or possibly like this:

    from itertools import cycle, islice

    Or, if you're unlucky, like this:

    from tkinter import *

    The first form is easy. You'll find references to "re.sub" or
    "itertools.takewhile"; the second form at least names what it's
    grabbing (so you'll find "cycle" or "islice" in the code), and the
    third just dumps a whole lot of stuff into your namespace.

    Actually, if the programmer's been really nice, there'll be a block
    comment or a docstring at the top of the file, which might even be
    up-to-date and accurate. But I'm guessing you already know to look for
    that. :)

    The other thing I would STRONGLY recommend: Keep the interactive
    interpreter handy. Any line of code you don't understand, paste it
    into the interpreter. Chances are it won't wipe out your entire hard
    drive :) But seriously, there is much to gain and nothing to lose by
    keeping IDLE or the command-line interpreter handy.

    Chris Angelico, Jul 12, 2013
    1. Advertisements

  3. L O'Shea

    Peter Otten Guest

    That sounds like the project is well-organised and not too big. If you take
    one day per module you're there in two weeks...
    As you put it here, the project is too complex. So now we have a mixed
    message. Of course your impression may stem from lack of experience...
    Is there any documentation? Read that. Do the functions have docstrings?
    import the modules (start with the main entry point) in the interactive
    interpreter and use help():

    Or use

    $ python -m pydoc -g

    and hit "open browser" (the project directory has to be in PYTHONPATH).

    See if you can talk to the author/previous maintainer. He may be willing to
    give you the big picture or hints for the parts where he did "clever"

    Try to improve your Python by writing unrelated scripts.

    Make little changes to the project (add print statements, invoke functions
    from your own driver script, make a local variable global for further
    inspection in the interactive interpreter using dir() -- whatever you can
    think of.

    The latter should of course be done in a test installation rather than the
    production environment.

    Rely on version control once you start making modifications for real -- but
    I think you knew that already...
    Peter Otten, Jul 12, 2013
  4. glad to hear you're having a WTF moment (what's that function). Suggestion
    would be index cards, each containing notes on a class. truly understand
    what each parent class is in which methods are to be overloaded. Then look
    at one child and understand how it. Work your way breadth first down the
    inheritance tree.
    Eric S. Johansson, Jul 12, 2013
  5. L O'Shea

    Terry Reedy Guest

    If the functions are not documented in prose, is there a test suite that
    you can dive into?
    Terry Reedy, Jul 12, 2013
  6. L O'Shea

    CM Guest

    Basically the problem is I am new to the language and this was clearly
    Sure, as a beginner, yes, but also it sounds like the programmer didn't document it much at all, and that doesn't help you. I bet s/he didn't always use very human readable names for objects/methods/classes, either, eh?
    Unless the programmer was really super spaghetti coding, I would think thatthere would be some method to the madness, and that the 10-15 scripts eachhave some specific kind of purpose. The first thing, I'd think (and having not seen your codebase) would be to sketch out what those scripts do, andfamiliarize yourself with their names.

    Did the coder use this form for importing from modules?

    from client_utils import *

    If so, that's going to make your life much harder, because all of the namesof the module will now be available to the script it was imported into, and yet they are not defined in that script. If s/he had written:

    import client_utils

    Than at least you would expect lines like this in the script you're lookingat:

    customer_name = client_utils.GetClient()

    Or, if the naming is abstruse, at very least:

    cn = client_utils.GC()

    It's awful, but at least then you know that GC() is a function within the client_utils.py script and you don't have to go searching for it.

    If s/he did use "from module import *", then maybe it'd be worth it to re-do all the imports in the "import module" style, which will break everything, but then force you to go through all the errors and make the names like module.FunctionName() instead of just FunctionName().

    Some of that depends on how big this project is, of course.
    What tools are you using to work on this code? Do you have an IDE that hasa "browse to" function that allows you to click on a name and see where inthe code above it was defined? Or does it have UML or something like that?
    CM, Jul 14, 2013
  7. L O'Shea

    Azureaus Guest

    Thanks for all the suggestions, I'm afraid I didn't get a chance to view them over the weekend but I will get started with them this morning. I'm currently using sublime 2 for my text editor and tried to create a UML diagram using Pylint to try and get a map overview of what's going on. Unfortunately it seemed to map the classes into groups such as StringIO, ThreadPool, GrabOut etc.. rather than into the modules they belong go and how they fit together. Maybe this is just my inexperience showing through or I'm using theprogram wrong. If anyone has any 'mapping' programs they use to help them visualise program flow that would be a great bonus.

    To be fair to who programmed it, most functions are commented and I can't complain about the messiness of the code, It's actually very tidy. (I suppose Python forcing it's formatting is another reason it's an easily readable language!) Luckily not blanked import * were used otherwise I really would be up the creek without a paddle.
    Azureaus, Jul 15, 2013
  8. L O'Shea

    CM Guest

    Oh, good! OK, so then what you can think in terms of, in terms of a simplestrategy for getting clear without any fancy tools:

    Learn what each module is for. In my own application programming, I don't just put random classes and functions in any old module--the modules have some order to them. So, for example, one module may represent one panel in the application, or all the database stuff, or all the graphing stuff, or some other set of logic, or whatever. One might be the main GUI frame. Etc.. So I'd get a notebook or file and make notes for yourself about what each module is for, and the name. Even tack a piece of paper above your workstation with the module names and a one line note about what they do, like:


    Map_panel: Displays a panel with the map of the city, with a few buttons.
    Dbases: Has all utility functions relevant to the database.
    Utils: Has a collection of utility functions to format time, i18n, etc.

    Now, there's a cheat sheet. So, if you come across a line in your code like:

    pretty_time = Utils.GetPrettyTime(datetime)

    You can quickly look at Utils module and read more about that function.

    Does this approach make sense to at least clear the cobwebs?
    CM, Jul 15, 2013
  9. L O'Shea

    asimjalis Guest

    Here are some techniques I use in these situations.

    1. Do a superficial scan of the code looking at names of classes, functions, variables, and speculate where the modification that I have to make will go. Chances are you don't need to understand the entire system to make yourchange.

    2. Build some hypotheses about how the system works and use print statements or some other debugging technique to run the program and see if you get the result you expect.

    3. Insert your code into a separate class and function and see if you can inject a call to your new code from the existing code so that it now works with the new functionality.

    If you have to understand the details of some code, one approach is to try to summarize blocks of code with a single comment to wrap your mind around it.

    asimjalis, Jul 16, 2013
  10. Literally any idea will help, pen and paper, printing off all the code
    and doing some sort of highlighting session - anything!
    been defined and what does it mean" to find it was inherited from 3
    modules up the chain.
    Any techniques,tricks or methodologies that people find useful would be
    much appreciated.

    I'd highly recommend Eclipse with PyDev, unless you have some strong
    reason not to. That's what I use, and it saves pretty much all of those
    "what's this thing?" problems, as well as lots of others...

    David M Chess, Jul 16, 2013
  11. L O'Shea

    David Hutto Guest

    Any program, to me, is just like speaking english. The class, or function
    name might not fully mesh with what your cognitive structure assumes it to
    be.read through the imports first, and see the classes and functions come
    alive with experience comes intuition of what it does, and the instances
    that can be utilized with it. The term RTFM, and google always comes to
    mind as well.
    David Hutto, Jul 17, 2013
  12. L O'Shea

    Azureaus Guest

    Thank you to everyone who replied constructively, the various suggestions all helped a lot. I'd like to suggest to anyone who reads this in the futurewho is in a similar situation to do as David Chess suggested and install eclipse with pydev. Although I prefer to use Sublime to actually write code,Eclipse turned out to be invaluable in helping me jump around and understand the code especially how things were passed around) and for debugging things over the last few days. Success!
    Cheers everyone.
    Azureaus, Jul 25, 2013
  13. If the code is really tidy, it is possible to understand a function
    using only the *documentation* (not the code itself) of any function
    or data it uses. In oo you also need a context about what an object
    is supposed to do. The next step is to proof for yourself that the
    function exactly does what is promised in its own documentation.

    And you get nowhere without domain knowledge. If you're in railways
    and don't know the difference between a "normal" and an "English"
    whathaveyou, then you're lost, plain and simple.

    Don't treat the original comment as sacred. Any time it is unclear
    rewrite it. You may get it wrong, but that's wat source control
    systems are for. If at all possible, if you add a statement about
    a function, try to add a test that proves that statement.

    Anytime you come across something that is unsufficiently documented,
    you document it tentatively yourself, keeping in mind that what
    you write down may be wrong. This does no harm! Because you must
    keep in mind that everything written by the original programmer
    may be wrong, there is actually no difference! Now study the places
    where it is called and check whether it makes sense.
    This an infinite process. After one round of improvements you
    have to go through everything again. I've got pretty bad stuff under
    control this way.

    You'll find bugs this way. They may or may not let you fix them.

    There is however not much point in "working in" by reading through
    the code. Time is probably better spent by running and studying, maybe
    creating test cases.

    Trying to understand any substantial code body in detail is
    a waste of time.
    For example: I once had to change the call code of the gcc compiler
    to be able to use a 68000 assembler library (regarding which register
    contain what data passed to the function). There is absolutely no
    point in studying the gcc compiler. You must have an overview
    then zoom in on the relevant part. In the end maybe only a couple
    of lines need change. A couple of days, and a pretty hairy problem
    was solved. (The assembler library was totally undocumented.
    Nobody even tried to study it. ).

    There is an indication that the original programmer made it all very
    easy and maybe you go about it not quite the right way.
    If you have a tower of abstractions, then you must *not* go down
    all the way to find out "eactly" what happens. You must pick
    a level in the middle and understand it in terms of usage, then
    understand what is on top of that in terms of that usage.
    That is how good programmers build there programs. Once there is
    a certain level they don't think about what's underneath, but
    concentrate on how to use it. If it is done really well, each
    source module can be understood on its own.

    All this is of course general, not just for Python.
    Albert van der Horst, Jul 27, 2013
  14. I'd broaden that slightly to the function's signature, which consists
    of the declaration line and any associated comments (which in Python
    should be in the docstring). The docstring kinda violates this
    concept, but what I generally try to explain is that you should be
    able to understand a function without reading any of the indented

    Chris Angelico, Jul 27, 2013
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.