Module/package hierarchy and its separation from file structure

Discussion in 'Python' started by Peter Schuller, Jan 23, 2008.

  1. Hello,

    In writing some non-trivial amount of Python code I keep running into
    an organizational issue. I will try to state the problem fairly
    generally, and follow up with a (contrived) example.

    The root cause of my difficulties is that by default, the relationship
    between a module hierarchy and the structure of files on disk is too
    strong for my taste. I want to separate the two as much as possible,
    but I do not want to resort to non-conventional "hacks" to do it. I am
    posting this in an attempt to present what I perceive to be a
    practical problem, and to get suggestions for solutions, or opinions
    on the most practical policy for how to deal with it.

    Like I said, I would like a weaker relationship between file system
    structure and module hierarchy. In particular there are two things I
    would like:

    * Least importantly, I don't like jamming code into __init__.py,
    as a personal preference.
    * Most importantly, I do not like to jam large amounts of code
    into a single source file, just for the purpose of keeping
    the public interface in the same package.

    An contrived but hopefully illustrative example:

    We have an organization "Org", which has a library, and as part of
    that library is code that relates to doing something with animals. As
    a result, the interesting top-level package for this example is:

    org.lib.animal

    Suppose now that I want an initial implementation of the most
    important animal. I want to create the class (but see [1]):

    org.lib.animal.Monkey

    The public interface consists of that class only (and possibly a small
    handful of functions). The implementation is quite significant however
    - it is 500 lines of code long.

    At this point, we had to jam those 500 lines of code into
    __init__.py. Let's ignore my personal preference of not liking to put
    code in __init__.py; the fact remains that we have 500 lines of code
    in a single source file.

    Now, we want to continue working on this library, adding ten
    additional animals.

    At this point, we have these choices (it seems to me):

    (1) Simply add these to __init__.py, resulting in
    __init__.py being 5000 lines long[2].

    (2) Put each animal into its own file, resulting in
    org.lib.animal.Monkey now becoming
    org.lib.animal.monkey.Monkey, and animal X becoming
    org.lib.animal.x.X.

    The problem I have is that both of these solutions are, in my opinion,
    very ugly:

    * (1) is ugly from a source code management perspective, because jamming
    5000 lines of code for ten different animals into a single file
    is bad for obvious reasons.

    * (2) is ugly because we introduce org.lib.animal.x.X for
    animal X, which:
    (a) is redundant in terms of naming
    (b) redundant in function since we have a single package for
    each animal containing nothing but a single class of
    the same name

    Clearly, (1) is bad due to file/source structure reasons, and (2) is
    bad for module organizational reasons. So we are back to my original
    wish - I want to separate the two, so that I can solve (1)
    indepeendently of (2).

    Now, I realize that __init__.py can contain arbitrary code, and that
    one can override __import__. However, I do not want to resort to
    "hacks" just to solve this problem; I would prefer some established
    convention in the community, or at least something that is elegant.

    Does are people's thoughts on this problem?

    Let me just shoot down one possible suggestion right away, to show you
    what I am trying to accomplish:

    I do *not* want to simply break out X into org.lib.animal.x, and have
    org.lib.animal import org.lib.animal.x.X as X. While this naively
    solves the problem of being able to refer to X as org.lib.animal.X,
    the solution is anything but consistent because the *identity* of X is
    still org.lib.animal.x.X. Examples of way this breaks things:

    * X().__class__.__name__ gives unexpected results.
    * Automatically generated documentation will document using the "real"
    package name.
    * Moving the *actual* classes around by way of this aliasing would
    break things like pickled data structure as a result of the change
    of actual identity, unless one *always* pre-emptively maintains
    this shadow hierarchy (which is a problem in and of itself).

    Thus, it's not clean. It breaks the module abstraction and as a result
    has unintended consequences. I am looking for some kind of clean
    solution. What do people do about this in practice?

    [1] Optionally, we might introduce an "animals" package such that it
    would become org.lib.animal.animals.Monkey, if we thought we were
    going to have a lot of public API outside of the animals themselves.
    This does not affect this dicussion however, as the exact same thing
    would apply to org.lib.animal.animals as applies to org.lib.animal in
    the above example.

    [2] Ignoring for now that it may not be realistic that every animal
    implementation would be that long; in many cases a lot of code would
    be in common. But feel free to substitude for something else (a Zoo
    say).

    --
    / Peter Schuller

    PGP userID: 0xE9758B7D or 'Peter Schuller <>'
    Key retrieval: Send an E-Mail to
    E-Mail: Web: http://www.scode.org
     
    Peter Schuller, Jan 23, 2008
    #1
    1. Advertising

  2. On Wed, 23 Jan 2008 03:49:56 -0600, Peter Schuller wrote:

    > Let me just shoot down one possible suggestion right away, to show you
    > what I am trying to accomplish:
    >
    > I do *not* want to simply break out X into org.lib.animal.x, and have
    > org.lib.animal import org.lib.animal.x.X as X.


    Then you shoot down the idiomatic answer I guess. That's what most people
    do.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Jan 23, 2008
    #2
    1. Advertising

  3. Peter Schuller

    Ben Finney Guest

    Peter Schuller <> writes:

    > Let me just shoot down one possible suggestion right away, to show
    > you what I am trying to accomplish:
    >
    > I do *not* want to simply break out X into org.lib.animal.x, and
    > have org.lib.animal import org.lib.animal.x.X as X.


    Nevertheless, that seems the best (indeed, the Pythonic) solution to
    your problem as stated. Rather than just shooting it down, we'll have
    to know more about ehat actual problem you're trying to solve to
    understand why this solution doesn't fit.

    > While this naively solves the problem of being able to refer to X as
    > org.lib.animal.X, the solution is anything but consistent because
    > the *identity* of X is still org.lib.animal.x.X.


    The term "identity" in Python means something separate from this
    concept; you seem to mean "the name of X".

    > Examples of way this breaks things:
    >
    > * X().__class__.__name__ gives unexpected results.


    Who is expecting them otherwise, and why is that a problem?

    > * Automatically generated documentation will document using the
    > "real" package name.


    Here I lose all track of what problem you're trying to solve. You want
    the documentation to say exactly where the class "is" (by name), but
    you don't want the class to actually be defined at that location? I
    can't make sense of that, so probably I don't understand the
    requirement.

    --
    \ "If it ain't bust don't fix it is a very sound principle and |
    `\ remains so despite the fact that I have slavishly ignored it |
    _o__) all my life." —Douglas Adams |
    Ben Finney
     
    Ben Finney, Jan 23, 2008
    #3
  4. >> I do *not* want to simply break out X into org.lib.animal.x, and
    >> have org.lib.animal import org.lib.animal.x.X as X.

    >
    > Nevertheless, that seems the best (indeed, the Pythonic) solution to
    > your problem as stated. Rather than just shooting it down, we'll have
    > to know more about ehat actual problem you're trying to solve to
    > understand why this solution doesn't fit.


    That is exactly what my original post was trying very hard to
    explain. The problem is the discrepancy that I described between the
    organization desired in terms of file system structure, and the
    organization required in terms of module hierarchy. The reason it is a
    problem is that, by default, there is an (in my opinion) too strong
    connection between file system structure and module hierarchy in
    Python.

    >> While this naively solves the problem of being able to refer to X as
    >> org.lib.animal.X, the solution is anything but consistent because
    >> the *identity* of X is still org.lib.animal.x.X.

    >
    > The term "identity" in Python means something separate from this
    > concept; you seem to mean "the name of X".


    Not necessarily. In part it is the name, in that __name__ will be
    different. But to the extent that calling code can potentially import
    them under differents names, it's identity. Because importing the same
    module under two names results in two distinct modules (two distinct
    module objects) that have no realation with each other. So for
    example, if a module has a single global protected by a mutex, there
    are suddenly two copies of that. In short: identity matters.

    >> Examples of way this breaks things:
    >>
    >> * X().__class__.__name__ gives unexpected results.

    >
    > Who is expecting them otherwise, and why is that a problem?


    Depends on situation. One example is that if your policy is that
    instances log using a logger named by the fully qualified name of the
    class, than someone importing and using x.y.z.Class will expect to be
    able to grep for x.y.z.Class in the output of the log file.

    >> * Automatically generated documentation will document using the
    >> "real" package name.

    >
    > Here I lose all track of what problem you're trying to solve. You want
    > the documentation to say exactly where the class "is" (by name), but
    > you don't want the class to actually be defined at that location? I
    > can't make sense of that, so probably I don't understand the
    > requirement.


    You are baffled that what I seem to want is that the definition of the
    class (file on disk) be different from the location inferred by the
    module name. Well, this is *exactly* what I want because, like I said,
    I do not want the strong connection beteween file system structure and
    module hierarchy. The fact that this connection exists, is what is
    causing my problems.

    Please note that this is not any kind of crazy-brained idea; lots of
    languages have absolutely zero relationship between file location and
    modules/namespaces.

    I realize that technically Python does not have this either. Like I
    said in the original post, I do realize that I can override __import__
    with any arbitrary function, and/or do magic in __init__. But I also
    did not want to resort to hacks, and would prefer that there be some
    kind of well-established solution to the problem.

    Although I was originally hesitant to use an actual example for fear
    of giving the sense that I was trying to start a language war, your
    answer above prompts me to do so anyway, to show in concrete terms
    what I mean, for those that wonder why/how it would work.

    So for example, in Ruby, there is no problem having:

    File monkey.rb:

    module Org
    module Lib
    module Animal
    class Monkey ...
    ..
    end
    end
    end
    end

    File tiger.rb:

    module Org
    module Lib
    module Animal
    class Tiger ...
    ..
    end
    end
    end
    end

    This is possible because the act of addressing code to be loaded into
    the interpreter is not connected to the namespace/module system, but
    rather to the file system.

    Some languages avoid (but does not eliminate) the problem I am having
    without having this disconnect. For example, Java does have a strong
    connection between file system structure and class names. However the
    critical difference is that in Java, everything is modeled around
    classes, and class names map directly to the file system structure. So
    in Java, you would have the class

    org.lib.animal.Monkey

    in

    <wherever>/org/lib/animal/Monkey.java

    and

    org.lib.animal.Tiger

    in

    <wherever>/org/lib/animal/Tiger.java

    In other words, introducing a separate file does not introduce a new
    package. This works well as long as you are fine with having
    everything related to a class in the same file.

    The problem is that with Python, everything is not a classes, and a
    file translates to a module, not a class. So you cannot have your
    source in different files without introducing as many packages as you
    introduce files.

    --
    / Peter Schuller

    PGP userID: 0xE9758B7D or 'Peter Schuller <>'
    Key retrieval: Send an E-Mail to
    E-Mail: Web: http://www.scode.org
     
    Peter Schuller, Jan 24, 2008
    #4
  5. Peter Schuller

    Carl Banks Guest

    On Jan 23, 4:49 am, Peter Schuller <>
    wrote:
    > I do *not* want to simply break out X into org.lib.animal.x, and have
    > org.lib.animal import org.lib.animal.x.X as X. While this naively
    > solves the problem of being able to refer to X as org.lib.animal.X,
    > the solution is anything but consistent because the *identity* of X is
    > still org.lib.animal.x.X. Examples of way this breaks things:
    >
    > * X().__class__.__name__ gives unexpected results.
    > * Automatically generated documentation will document using the "real"
    > package name.
    > * Moving the *actual* classes around by way of this aliasing would
    > break things like pickled data structure as a result of the change
    > of actual identity, unless one *always* pre-emptively maintains
    > this shadow hierarchy (which is a problem in and of itself).



    You can reassign the class's module:

    from org.lib.animal.monkey import Monkey
    Monkey.__module__ = 'org.lib.animal'


    (Which, I must admit, is not a bad idea in some cases.)


    Carl Banks
     
    Carl Banks, Jan 24, 2008
    #5
  6. En Thu, 24 Jan 2008 05:16:51 -0200, Peter Schuller
    <> escribió:

    >>> I do *not* want to simply break out X into org.lib.animal.x, and
    >>> have org.lib.animal import org.lib.animal.x.X as X.

    >>
    >>> While this naively solves the problem of being able to refer to X as
    >>> org.lib.animal.X, the solution is anything but consistent because
    >>> the *identity* of X is still org.lib.animal.x.X.

    >>
    >> The term "identity" in Python means something separate from this
    >> concept; you seem to mean "the name of X".

    >
    > Not necessarily. In part it is the name, in that __name__ will be
    > different. But to the extent that calling code can potentially import
    > them under differents names, it's identity. Because importing the same
    > module under two names results in two distinct modules (two distinct
    > module objects) that have no realation with each other. So for
    > example, if a module has a single global protected by a mutex, there
    > are suddenly two copies of that. In short: identity matters.


    That's not true. It doesn't matter if you Import a module several times
    at different places and with different names, it's always the same module
    object.

    py> from xml.etree import ElementTree
    py> import xml.etree.ElementTree as ET2
    py> import xml.etree
    py> ET3 = getattr(xml.etree, 'ElementTree')
    py> ElementTree is ET2
    True
    py> ET2 is ET3
    True

    Ok, there is one exception: the main script is loaded as __main__, but if
    you import it using its own file name, you get a duplicate module.
    You could confuse Python adding a package root to sys.path and doing
    imports from inside that package and from the outside with different
    names, but... just don't do that!

    > I realize that technically Python does not have this either. Like I
    > said in the original post, I do realize that I can override __import__
    > with any arbitrary function, and/or do magic in __init__. But I also
    > did not want to resort to hacks, and would prefer that there be some
    > kind of well-established solution to the problem.


    I don't really understand what your problem is exactly, but I think you
    don't require any __import__ magic or arcane hacks. Perhaps the __path__
    package attribute may be useful to you. You can add arbitrary directories
    to this list, which are searched for submodules of the package. This way
    you can (partially) decouple the file structure from the logical package
    structure. But I don't think it's a good thing...

    > in Java, you would have the class
    >
    > org.lib.animal.Monkey
    >
    > in
    >
    > <wherever>/org/lib/animal/Monkey.java
    >
    > and
    >
    > org.lib.animal.Tiger
    >
    > in
    >
    > <wherever>/org/lib/animal/Tiger.java
    >
    > In other words, introducing a separate file does not introduce a new
    > package. This works well as long as you are fine with having
    > everything related to a class in the same file.
    >
    > The problem is that with Python, everything is not a classes, and a
    > file translates to a module, not a class. So you cannot have your
    > source in different files without introducing as many packages as you
    > introduce files.


    Isn't org.lib.animal a package, reflected as a directory on disk? That's
    the same both for Java and Python. Monkey.py and Tiger.py would be modules
    inside that directory, just like Monkey.java and Tiger.java. Aren't the
    same thing?

    --
    Gabriel Genellina
     
    Gabriel Genellina, Jan 24, 2008
    #6
  7. >> Not necessarily. In part it is the name, in that __name__ will be
    >> different. But to the extent that calling code can potentially import
    >> them under differents names, it's identity. Because importing the same
    >> module under two names results in two distinct modules (two distinct
    >> module objects) that have no realation with each other. So for
    >> example, if a module has a single global protected by a mutex, there
    >> are suddenly two copies of that. In short: identity matters.

    >
    > That's not true. It doesn't matter if you Import a module several times
    > at different places and with different names, it's always the same module
    > object.


    Sorry, this is all my stupidity. I was being daft. When I said
    importing under different names, I meant exactly that. As in, applying
    hacks to import a module under a different name by doing it relative
    to a different root directory. This is however not what anyone is
    suggesting in this discussion. I got my wires crossed. I fully
    understand that "import x.y.z" or "import x.y.z as B", and so one do
    not affect the identity of the module.

    > Ok, there is one exception: the main script is loaded as __main__, but if
    > you import it using its own file name, you get a duplicate module.
    > You could confuse Python adding a package root to sys.path and doing
    > imports from inside that package and from the outside with different
    > names, but... just don't do that!


    Right :)

    > I don't really understand what your problem is exactly, but I think you
    > don't require any __import__ magic or arcane hacks. Perhaps the __path__
    > package attribute may be useful to you. You can add arbitrary directories
    > to this list, which are searched for submodules of the package. This way
    > you can (partially) decouple the file structure from the logical package
    > structure. But I don't think it's a good thing...


    That sounds useful if I want to essentially put the contents of a
    directory somewhere else, without using a symlink. In this case my
    problem is more related to the "file == module" and "directory ==
    module" semantics, since I want to break contents in a single module
    out into several files.

    > Isn't org.lib.animal a package, reflected as a directory on disk? That's
    > the same both for Java and Python. Monkey.py and Tiger.py would be modules
    > inside that directory, just like Monkey.java and Tiger.java. Aren't the
    > same thing?


    No, because in Java Monkey.java is a class. So we have class Monkey in
    package org.lib.animal. In Python we would have class Monkey in module
    org.lib.animal.monkey, which is redundant and does not reflect the
    intended hierarchy. I have to either live with this, or put Monkey in
    ..../animal/__init__.py. Neither option is what I would want, ideally.

    Java does still suffer from the same problem since it forces "class ==
    file" (well, "public class == file"). However it is less of a problem
    since you tend to want to keep a single class in a single file, while
    I have a lot more incentive to split up a module into different files
    (because you may have a lot of code hiding behind the public interface
    of a module).

    So essentially, Java and Python have the same problem, but certain
    aspects of Java happens to mitigate the effects of it. Languages like
    Ruby do not have the problem at all, because the relationship between
    files and modules is non-existent.

    --
    / Peter Schuller

    PGP userID: 0xE9758B7D or 'Peter Schuller <>'
    Key retrieval: Send an E-Mail to
    E-Mail: Web: http://www.scode.org
     
    Peter Schuller, Jan 24, 2008
    #7
  8. En Thu, 24 Jan 2008 11:57:49 -0200, Peter Schuller
    <> escribió:

    > In this case my
    > problem is more related to the "file == module" and "directory ==
    > module" semantics, since I want to break contents in a single module
    > out into several files.


    You already can do that, just import the public interfase of those several
    files onto the desired container module. See below for an example.

    >> Isn't org.lib.animal a package, reflected as a directory on disk? That's
    >> the same both for Java and Python. Monkey.py and Tiger.py would be
    >> modules
    >> inside that directory, just like Monkey.java and Tiger.java. Aren't the
    >> same thing?

    >
    > No, because in Java Monkey.java is a class. So we have class Monkey in
    > package org.lib.animal. In Python we would have class Monkey in module
    > org.lib.animal.monkey, which is redundant and does not reflect the
    > intended hierarchy. I have to either live with this, or put Monkey in
    > .../animal/__init__.py. Neither option is what I would want, ideally.


    You can also put, in animal/__init__.py:
    from monkey import Monkey
    and now you can refer to it as org.lib.animal.Monkey, but keep the
    implementation of Monkey class and all related stuff into
    ..../animal/monkey.py

    --
    Gabriel Genellina
     
    Gabriel Genellina, Jan 25, 2008
    #8
  9. Peter Schuller

    Ben Finney Guest

    "Gabriel Genellina" <> writes:

    > You can also put, in animal/__init__.py:
    > from monkey import Monkey
    > and now you can refer to it as org.lib.animal.Monkey, but keep the
    > implementation of Monkey class and all related stuff into
    > .../animal/monkey.py


    This (as far as I can understand) is exactly the solution the original
    poster desired to "shoot down", for reasons I still don't understand.

    --
    \ "Reichel's Law: A body on vacation tends to remain on vacation |
    `\ unless acted upon by an outside force." -- Carol Reichel |
    _o__) |
    Ben Finney
     
    Ben Finney, Jan 25, 2008
    #9
  10. Peter Schuller

    Carl Banks Guest

    On Jan 25, 6:45 pm, Ben Finney <>
    wrote:
    > "Gabriel Genellina" <> writes:
    > > You can also put, in animal/__init__.py:
    > > from monkey import Monkey
    > > and now you can refer to it as org.lib.animal.Monkey, but keep the
    > > implementation of Monkey class and all related stuff into
    > > .../animal/monkey.py

    >
    > This (as far as I can understand) is exactly the solution the original
    > poster desired to "shoot down", for reasons I still don't understand.


    Come on, the OP explained it quite clearly in his original post. Did
    you guys even read it?

    The module where org.lib.animal.Monkey is actually defined should be
    an implementation detail of the library, but simply importing Monkey
    into org.lib.animal doesn't quite make it one.

    If a user pickles a Monkey class, and then the OP decides to refactor
    the Monkey class into a new module (say
    org.lib.animal.primate.monkey), then the user would not be able to
    unpickle it. Because, you see, pickles record the module a class is
    defined in. So, now the user has to worry about where Monkey is
    actually defined. It is not an implementation detail.

    The solution is to modify the class's __module__ attribute as well as
    importing it, as I've already pointed out:

    from org.lib.animal.monkey import Monkey
    Monkey.__module__ = 'org.lib.animal'

    This should be enough to satisfy the OP's requirements, at least for
    classes, without softening the one-to-one module-to-file relationship,
    or using "hacks".

    In fact, I'd say this is good practice.


    Carl Banks
     
    Carl Banks, Jan 26, 2008
    #10
  11. Peter Schuller

    Ben Finney Guest

    Carl Banks <> writes:

    > On Jan 25, 6:45 pm, Ben Finney <>
    > wrote:
    > > "Gabriel Genellina" <> writes:
    > > > You can also put, in animal/__init__.py:
    > > > from monkey import Monkey
    > > > and now you can refer to it as org.lib.animal.Monkey, but keep the
    > > > implementation of Monkey class and all related stuff into
    > > > .../animal/monkey.py

    > >
    > > This (as far as I can understand) is exactly the solution the
    > > original poster desired to "shoot down", for reasons I still don't
    > > understand.

    >
    > The solution is to modify the class's __module__ attribute as well as
    > importing it, as I've already pointed out:
    >
    > from org.lib.animal.monkey import Monkey
    > Monkey.__module__ = 'org.lib.animal'


    Thanks, that makes it clear.

    > This should be enough to satisfy the OP's requirements, at least for
    > classes, without softening the one-to-one module-to-file
    > relationship, or using "hacks".
    >
    > In fact, I'd say this is good practice.


    I've not seen that before, but it seems an elegant way to address what
    the OP is asking for.

    --
    \ "Madness is rare in individuals, but in groups, parties, |
    `\ nations and ages it is the rule." -- Friedrich Nietzsche |
    _o__) |
    Ben Finney
     
    Ben Finney, Jan 26, 2008
    #11
  12. > You can also put, in animal/__init__.py:
    > from monkey import Monkey
    > and now you can refer to it as org.lib.animal.Monkey, but keep the
    > implementation of Monkey class and all related stuff into
    > .../animal/monkey.py


    The problem is that we are now back to the identity problem. The class
    won't actually *BE* org.lib.animal.Monkey. Perhaps manipulating
    __module__ is enough; perhaps not (for example, what about
    sys.modules?). Looks like I'll just live with putting more than I
    would like in the same file.

    --
    / Peter Schuller

    PGP userID: 0xE9758B7D or 'Peter Schuller <>'
    Key retrieval: Send an E-Mail to
    E-Mail: Web: http://www.scode.org
     
    Peter Schuller, Jan 29, 2008
    #12
  13. > You can reassign the class's module:
    >
    > from org.lib.animal.monkey import Monkey
    > Monkey.__module__ = 'org.lib.animal'
    >
    >
    > (Which, I must admit, is not a bad idea in some cases.)


    Is there a sense whether this is truly a supported way of doing this,
    in terms of not running into various unintended side-effects? One
    example would be sys.modules that I mentioned in the previous
    post. Another, possibly related, might be interaction with the import
    keyword and its implementation.

    I will probably have to read up more on the semantics of __import__
    and related machinery.

    --
    / Peter Schuller

    PGP userID: 0xE9758B7D or 'Peter Schuller <>'
    Key retrieval: Send an E-Mail to
    E-Mail: Web: http://www.scode.org
     
    Peter Schuller, Jan 29, 2008
    #13
  14. Peter Schuller

    Carl Banks Guest

    On Jan 29, 7:48 am, Peter Schuller <>
    wrote:
    > > You can also put, in animal/__init__.py:
    > > from monkey import Monkey
    > > and now you can refer to it as org.lib.animal.Monkey, but keep the
    > > implementation of Monkey class and all related stuff into
    > > .../animal/monkey.py

    >
    > The problem is that we are now back to the identity problem. The class
    > won't actually *BE* org.lib.animal.Monkey.


    The usage is the same; it works in all cases once you redefine
    __module__. Who cares what it really is?


    > Perhaps manipulating
    > __module__ is enough; perhaps not (for example, what about
    > sys.modules?).


    It's enough. It satisfies the criteria you listed. sys.modules has
    nothing to do with it. Monkey is a class, not a module.

    If you set __module__, the only remaining discernable difference is
    that the global variables accessed from the Monkey class will be in
    org.lib.animal.monkey instead of org.lib.animal. This has no ill
    effects when unpickling or instantiating the class from
    org.lib.animal.

    > Looks like I'll just live with putting more than I
    > would like in the same file.


    Whatever. ISTM you came here looking for a particular means and not a
    particular end. Python already has the power to meet your stated
    needs, but you won't use that solution because it's "hacky".
    Apparently all you really wanted was the loosened file structure in
    the first place.


    Carl Banks
     
    Carl Banks, Jan 29, 2008
    #14
  15. Peter Schuller

    Robert Kern Guest

    Carl Banks wrote:
    > On Jan 29, 7:48 am, Peter Schuller <>
    > wrote:
    >>> You can also put, in animal/__init__.py:
    >>> from monkey import Monkey
    >>> and now you can refer to it as org.lib.animal.Monkey, but keep the
    >>> implementation of Monkey class and all related stuff into
    >>> .../animal/monkey.py

    >> The problem is that we are now back to the identity problem. The class
    >> won't actually *BE* org.lib.animal.Monkey.

    >
    > The usage is the same; it works in all cases once you redefine
    > __module__. Who cares what it really is?


    The inspect module.

    [animals]$ ls
    animals
    [animals]$ rm animals/*.pyc
    [animals]$ ls
    animals
    [animals]$ ls animals
    __init__.py monkey.py
    [animals]$ cat animals/monkey.py
    class Monkey(object):
    pass
    [animals]$ cat animals/__init__.py
    from animals.monkey import Monkey
    Monkey.__module__ = 'animals'
    [animals]$ python
    Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
    [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from animals import Monkey
    >>> import inspect
    >>> inspect.getsource(Monkey)

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File
    "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/inspect.py",
    line 629, in getsource
    lines, lnum = getsourcelines(object)
    File
    "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/inspect.py",
    line 618, in getsourcelines
    lines, lnum = findsource(object)
    File
    "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/inspect.py",
    line 494, in findsource
    raise IOError('could not find class definition')
    IOError: could not find class definition
    >>>


    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
     
    Robert Kern, Jan 29, 2008
    #15
  16. >> The problem is that we are now back to the identity problem. The class
    >> won't actually *BE* org.lib.animal.Monkey.

    >
    > The usage is the same; it works in all cases once you redefine
    > __module__. Who cares what it really is?


    The cases I listed were just examples. My point was that I wanted it
    to *be* the right class, to avoid unintended consequences. If I knew
    what all those possible consequences were, there would not be a
    problem to begin with.

    The other follow-up to your E-Mail points out a possible problem for
    example. I would not have come up with that, but that does not mean
    the effect does not exist. And committing to a solution that "seems to
    work", only to break massively for some particular use case in the
    future, is exactly why I don't want a "hack" for a solution.

    I don't know Python internals enough to state of believe with any
    authority wither, let's say, stomping __module__ and hacking
    sys.modules would be enough to *truly* do it correctly in a proper way
    such that it is entirely transparent. This is why I care about whether
    it truly changes the real identity of the class; it's not about
    satisfying my particular list of examples (because they *were* just
    examples).

    > Whatever. ISTM you came here looking for a particular means and not a
    > particular end.


    My particular preferred end is to be able to separate file hierarchy
    from module hierarchy without causing unforseen consequences. This was
    the stated goal all along.

    > Python already has the power to meet your stated
    > needs, but you won't use that solution because it's "hacky".
    > Apparently all you really wanted was the loosened file structure in
    > the first place.


    Yes, or failing that an alternative that mitigates the problem. And it
    *is* hacky, in my opinion, if things break as a result of it (such as
    the other poster's inspect example).

    --
    / Peter Schuller

    PGP userID: 0xE9758B7D or 'Peter Schuller <>'
    Key retrieval: Send an E-Mail to
    E-Mail: Web: http://www.scode.org
     
    Peter Schuller, Jan 30, 2008
    #16
  17. On Tue, 29 Jan 2008 13:44:33 -0600, Robert Kern wrote:

    > Carl Banks wrote:
    >> On Jan 29, 7:48 am, Peter Schuller <> wrote:
    >>>> You can also put, in animal/__init__.py:
    >>>> from monkey import Monkey
    >>>> and now you can refer to it as org.lib.animal.Monkey, but keep the
    >>>> implementation of Monkey class and all related stuff into
    >>>> .../animal/monkey.py
    >>> The problem is that we are now back to the identity problem. The class
    >>> won't actually *BE* org.lib.animal.Monkey.

    >>
    >> The usage is the same; it works in all cases once you redefine
    >> __module__. Who cares what it really is?

    >
    > The inspect module.


    [snip example]

    I call that a bug in the inspect module. In fact, looking at the source
    for the findsource() function, I can see no fewer than two bugs, just in
    the way it handles classes:

    (1) it assumes that the only way to create a class is with a class
    statement, which is wrong; and

    (2) it assumes that the first occurrence of "class <name>" must be the
    correct definition, which is also wrong.


    It isn't hard to break the inspect module. Here's an example:


    >>> import broken
    >>> import inspect
    >>> lines, lineno = inspect.findsource(broken.Parrot)
    >>> lines[lineno]

    'class Parrot which will be defined later.\n'
    >>>
    >>> lines, lineno = inspect.findsource(broken.Wensleydale)
    >>> lines[lineno]

    'class Wensleydale: # THIS IS GONE\n'

    Here's the source of broken.py:


    $ cat broken.py
    """Here is a doc string, where I happen to discuss the
    class Parrot which will be defined later.
    """
    class Parrot:
    pass

    class Wensleydale: # THIS IS GONE
    pass

    del Wensleydale
    class Wensleydale(object): # but this exists
    pass



    It isn't often that I would come right out and say that part of the
    Python standard library is buggy, but this is one of those cases.


    --
    Steven
     
    Steven D'Aprano, Jan 30, 2008
    #17
  18. On Tue, 29 Jan 2008 06:48:59 -0600, Peter Schuller wrote:

    >> You can also put, in animal/__init__.py:
    >> from monkey import Monkey
    >> and now you can refer to it as org.lib.animal.Monkey, but keep the
    >> implementation of Monkey class and all related stuff into
    >> .../animal/monkey.py

    >
    > The problem is that we are now back to the identity problem. The class
    > won't actually *BE* org.lib.animal.Monkey.


    It what sense will it not be? Why do you care so much about where the
    source code for Monkey is defined? If you actually want to read the
    source, you might need to follow the chain from "animal", see that Monkey
    is imported from "monkey", and go look at that. But the rest of the time,
    why would you care?

    There is a very good reason to care *in practice*: if there is code out
    there that assumes that the source code from Monkey is in the file it was
    found in. In practice, you might be stuck needing to work around that.
    But that's not a good reason to care *in principle*. In principle, the
    actual location of the source code should be an implementation detail of
    which we care nothing. It's possible that the source for Monkey doesn't
    exist *anywhere*.

    It is important to deal with buggy tools. But the correct way to do so is
    to fix the bugs, not to throw away perfectly good abstractions.



    --
    Steven
     
    Steven D'Aprano, Jan 30, 2008
    #18
  19. On 30 ene, 12:00, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:

    > I call that a bug in the inspect module. In fact, looking at the source
    > for the findsource() function, I can see no fewer than two bugs, just in
    > the way it handles classes:
    >
    > (1) it assumes that the only way to create a class is with a class
    > statement, which is wrong; and
    >
    > (2) it assumes that the first occurrence of "class <name>" must be the
    > correct definition, which is also wrong.


    Yes, it's broken. But I'm afraid that's the only available thing to
    do.
    Python stores filename and line number information in code objects
    (only). If you have a reference to any code object (a method, a
    function, a traceback...) inspect can use it to retrieve that
    information.
    Once a class is defined, there is no code object attached to it. (The
    class statement is executed when the module is loaded and initialized,
    but that code object is discarded afterwards because it's not required
    anymore).
    If you *know* that a certain method is defined in a class, you can use
    it to find the real module. But in general, there is nothing to start
    with.
    I'm eagerly waiting for someone to come and say I'm wrong...

    --
    Gabriel Genellina
     
    Gabriel Genellina, Jan 30, 2008
    #19
  20. Peter Schuller

    Carl Banks Guest

    On Jan 30, 4:31 am, Peter Schuller <>
    wrote:
    > I don't know Python internals enough to state of believe with any
    > authority wither, let's say, stomping __module__ and hacking
    > sys.modules would be enough to *truly* do it correctly in a proper way
    > such that it is entirely transparent. This is why I care about whether
    > it truly changes the real identity of the class; it's not about
    > satisfying my particular list of examples (because they *were* just
    > examples).


    Well, all I will say is that many people on this list, myself
    included, do know Python internals, and we use the method we've been
    suggesting here, without problems.

    I think you're slipping to a level of paranoia that's more harmful
    that helpful now.


    The ironic thing is, breaking the one-to-one module-to-file
    relationship is more likely to have "unintended consequences", by a
    very large margin. Python has always been one-to-one module-to-file
    (excepting modules built into the interpretter), and many codes and
    tools have come to depend on it.


    Carl Banks
     
    Carl Banks, Jan 30, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Wayne Liu

    Question on view and code separation

    Wayne Liu, Jul 25, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    350
    Wayne Liu
    Jul 25, 2003
  2. Torsten Mohr

    Package / Module Hierarchy question

    Torsten Mohr, Jan 4, 2009, in forum: Python
    Replies:
    0
    Views:
    249
    Torsten Mohr
    Jan 4, 2009
  3. thunk
    Replies:
    1
    Views:
    334
    thunk
    Mar 30, 2010
  4. thunk
    Replies:
    0
    Views:
    512
    thunk
    Apr 1, 2010
  5. thunk
    Replies:
    14
    Views:
    646
    thunk
    Apr 3, 2010
Loading...

Share This Page