How does python build its AST

Discussion in 'Python' started by MonkeeSage, Dec 7, 2007.

  1. MonkeeSage

    MonkeeSage Guest

    A quick question about how python parses a file into compiled
    bytecode. Does it parse the whole file into AST first and then compile
    the AST, or does it build and compile the AST on the fly as it reads
    expressions? (If the former case, why can't functions be called before
    their definitions?)

    Thanks,
    Jordan
     
    MonkeeSage, Dec 7, 2007
    #1
    1. Advertisements

  2. MonkeeSage

    Kay Schluehr Guest

    Python uses a highly optimized table based LL(1) parser to create a
    syntax tree. In Python 2.5 it transforms the concrete syntax tree
    ( CST ) into an AST before compilation. Before that it compiled the
    CST directly. I'm not sure what you are asking for ( in parentheses )?
    Parser actions or preprocessing the tree? The latter is definitely
    possible and you can build your own compilation machinery using the
    parser module and the compile function.

    Kay
     
    Kay Schluehr, Dec 7, 2007
    #2
    1. Advertisements

  3. MonkeeSage

    MonkeeSage Guest

    Thanks for your reply. You answered my main question. The secondary
    question is why is it a NameError to try to use a variable/function
    prior to the declaration in a source file, since python has already
    seen the declaration on the first pass building the CST/AST? At
    compile time, shouldn't it already know about it? (Forgive my
    ignorance.)

    Regards,
    Jordan
     
    MonkeeSage, Dec 7, 2007
    #3
  4. MonkeeSage

    Jason Guest

    Remember that Python is a highly dynamic language. You can't
    guarantee that a name will be accessible until the actual execution
    point that you try to access it. For example:
    Traceback (most recent call last):
    .... def __call__(self):
    .... print 'Hi-diddly-ho, neighbor!'
    ....
    If this was in a file, which version of Hello should the first call go
    to? Should the 'Hello' name be bound to the function, the callable
    class instance, the result of new.function? What if another
    statement

    The problem is that, in Python, functions are first-class objects,
    just like class definitions and other variables. The 'def' statement
    is something that gets executed. It compiles the code block for the
    function into a function object, then assigns it to the name of the
    function.

    When Python hits a function call, it must look up the name in the
    current name dictionary. In any but that most simple cases, the name
    may not refer to its previous values: any intervening calls or
    statements could have rebound that name to another object, either
    directly, through side-effects, or even through the C interface. You
    can't call a function before its def statement is parsed any more than
    you can print the value of a variable before you assign anything to
    it.

    Unlike C, functions are not special under Python. Unlike C++, classes
    aren't terribly special either. They have syntactic sugar to help the
    coding go down, but they are only Python objects, and are bound to the
    same rules as all other Python objects.

    --Jason
     
    Jason, Dec 7, 2007
    #4
  5. Python (2.5+) does parse the whole file top-to-bottom into an AST, and then
    compile the entire AST, resulting in a module code object that may contain other
    compiled code objects for the function bodies.

    However, this isn't relevant to your parenthesized question, since compiling
    functions does not make them callable. A function becomes available only once
    its def statement has been executed. Since execution (generally) proceeds top
    to bottom, this is why functions can't be called before their definitions.

    HTH
    Michael
     
    Michael Spencer, Dec 7, 2007
    #5
  6. MonkeeSage

    Kay Schluehr Guest

    When the compiler finds a name not being bound in the function scope
    it supposes the name is global and a LOAD_GLOBAL opcode is created
    instead of LOAD_FAST ( accessing locals ). Other than the local/
    function scope the global ( module level ) scope is dynamic and the
    compiler can't know whether a name will be present at runtime.

    Kay
     
    Kay Schluehr, Dec 7, 2007
    #6
  7. MonkeeSage

    Terry Reedy Guest

    |A quick question about how python parses a file into compiled
    | bytecode. Does it parse the whole file into AST first and then compile
    | the AST, or does it build and compile the AST on the fly as it reads
    | expressions? (If the former case, why can't functions be called before
    | their definitions?)

    The direct answer is that names cannot be entered into namespaces and bound
    to objects to be looked up until the corresponding object is created by
    executing the corresponding code. Compiling Python code creates the
    internal code needed to create Python objects, but only exceptionally
    creates Python objects in the process. In particular, compiling a function
    may create code objects (since the code is a constant) referenced by the
    function creation code, but not function objects themselves.

    A less direct answer is the Python is designed to by usable interactively.
    In CPython interactive mode, you enter and the interpreter compiles and
    executes one top(module)-level statement at a time. Calling a function you
    have not yet entered would be magical.

    Terry Jan Reedy
     
    Terry Reedy, Dec 7, 2007
    #7
  8. MonkeeSage

    MonkeeSage Guest

    Thanks for your replies Kay, Michael and Terry. To summarize my
    understanding of your answers:

    - Python (meaning CPython here) first does a parsing pass over the
    entire file, with 2.5+ building a syntax tree and prior versions
    building a parse tree.

    - It then compiles the tree into bytecode, by walking down the nodes
    recursively from the top down.

    - When in interactive mode (e.g., python prompt), it builds the tree
    and compiles it on the fly, as individual expressions are parsed.

    Is this correct? If so, may I pick your brain on two more points?

    1.) What is the benefit of doing a two phase compilation (parsing/
    compiling), rather than a single, joint parse + compile phase (as in
    interactive mode)?

    2.) Wouldn't it be possible on the parsing phase to "tag" names as
    valid, even if they occur prior to the assignment of the name, if on a
    later branch that assignment is found (and have the compiler be aware
    of such tags)?

    The reason I'm wondering about these things is that on a different
    group, it came up that perl allows referencing before assignment,
    which seems to require a two-phase compilation, which made me wonder
    how python does things. And since (if I haven't misunderstood), python
    does use two-phase compilation, I just wondered if it would be
    possible to do what perl does. I'm not advocating it as a feature for
    python (it seems bass-ackwards to me to reference a name before it's
    assigned to in the script), this is just curiosity.

    Thanks,
    Jordan
     
    MonkeeSage, Dec 8, 2007
    #8
  9. MonkeeSage

    Terry Reedy Guest

    | 1.) What is the benefit of doing a two phase compilation (parsing/
    | compiling), rather than a single, joint parse + compile phase (as in
    | interactive mode)?

    As far as I know (without looking at the code), there is no difference
    between interactive and batch mode except that the unit processed is a
    statement versus file.

    | 2.) Wouldn't it be possible on the parsing phase to "tag" names as
    | valid, even if they occur prior to the assignment of the name, if on a
    | later branch that assignment is found (and have the compiler be aware
    | of such tags)?

    What would be the point? The semantics of Python code is essentially
    independent of whether it is executed in interactive or batch mode. (The
    exceptions are not relevant to your question.) So there is no point I can
    see to doing something in file mode that could not be done in statement
    mode.

    tjr
     
    Terry Reedy, Dec 8, 2007
    #9
  10. MonkeeSage

    MonkeeSage Guest

    I see.
    It would pretty much be pointless. Referencing a name before it's
    assigned seems confusing and strange to me. I suppose one sane use
    would be in implementing something like Haskell's "where" clause, but
    I don't think that would work well (or even be very useful) in python.
    Anyhow, this was more of an academic curiosity on how CPython does
    things, and I appreciate your answers.
    Regards,
    Jordan
     
    MonkeeSage, Dec 8, 2007
    #10
  11. MonkeeSage

    Carl Banks Guest

    Python creates certain objects at compile time but doesn't bind them
    to names in the modulespace until run time.

    Python could--and many other languages do--automatically bind these
    objects to names upon import. Python doesn't do it because it sees a
    module as "code to be executed" rather than a "list of global object
    definitions".

    Something like this would be awkward if Python bound the names at
    import time:

    if X:
    def a(): do_this()
    else:
    def a(): do_that()

    Which one gets bound to a?

    To do something similar in C would require preprocessor macros (ick).


    Carl Banks
     
    Carl Banks, Dec 8, 2007
    #11
  12. MonkeeSage

    MonkeeSage Guest

    Gotcha. Thanks again everyone for your answers.

    Regards,
    Jordan
     
    MonkeeSage, Dec 8, 2007
    #12
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.