Automatic debugging of copy by reference errors?

Discussion in 'Python' started by Niels L Ellegaard, Dec 9, 2006.

  1. Is there a module that allows me to find errors that occur due to copy
    by reference? I am looking for something like the following:

    >>> import mydebug
    >>> mydebug.checkcopybyreference = True
    >>> a=2
    >>> b=[a]
    >>> a=4

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    CopyByReferenceError: Variable b refers to variable a, so please do not
    change variable a.

    Does such a module exist?
    Would it be possible to code such a module?
    Would it be possible to add the functionality to array-copying in
    numpy?
    What would be the extra cost in terms of memory and CPU power?

    I realize that this module would disable some useful features of the
    language. On the other hand it could be helpful for new python users.

    Niels
     
    Niels L Ellegaard, Dec 9, 2006
    #1
    1. Advertising

  2. On 9 dic, 02:22, "Niels L Ellegaard" <> wrote:

    > Is there a module that allows me to find errors that occur due to copy
    > by reference?


    What do you mean by "copy by reference"?

    > I am looking for something like the following:
    >
    > >>> import mydebug
    > >>> mydebug.checkcopybyreference = True
    > >>> a=2
    > >>> b=[a]
    > >>> a=4

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > CopyByReferenceError: Variable b refers to variable a, so please do not
    > change variable a.


    (I won't be pedantic to say that Python has no "variables"). What's
    wrong with that code? You are *not* changing "variable a", you are
    binding the integer 4 to the name "a". That name used previously to be
    bound to another integer, 2 - what's wrong with it? Anyway it has no
    effect on the list referred by the name "b", still holds b==[2]
    What do you want? Forbid the re-use of names? So once you say a=2, you
    can't modify it? That could be done, yes, a "write-once-read-many"
    namespace. But I don't see the usefullness.

    > Does such a module exist?

    No
    > Would it be possible to code such a module?

    I don't know what do you want to do exactly, but I feel it's not
    useful.
    > Would it be possible to add the functionality to array-copying in
    > numpy?

    No idea.
    > What would be the extra cost in terms of memory and CPU power?

    No idea yet.
    > I realize that this module would disable some useful features of the
    > language.

    Not only "some useful features", most programs would not even work!
    > On the other hand it could be helpful for new python users.

    I think you got in trouble with something and you're trying to avoid it
    again - but perhaps this is not the right way. Could you provide some
    example?

    --
    Gabriel Genellina
     
    Gabriel Genellina, Dec 9, 2006
    #2
    1. Advertising

  3. Gabriel Genellina wrote:
    > I think you got in trouble with something and you're trying to avoid it
    > again - but perhaps this is not the right way. Could you provide some
    > example?


    I have been using scipy for some time now, but in the beginning I made
    a few mistakes with copying by reference. The following example is
    exagerated
    for clarity, but the principle is the same:

    import os
    output=[]
    firstlines =[0,0]
    for filename in os.listdir('.'):
    try:
    firstlines[0] = open(filename,"r").readlines()[0]
    firstlines[1] = open(filename,"r").readlines()[1]
    output.append((filename,firstlines))
    except:continue
    print output

    Now some of my fortran-using friends would like to use python to
    analyze their data files. I wanted them to avoid making the same
    mistakes as I did so I thought it would be good if they could get some
    nanny-like warnings saying: "Watch out young man. If do this, then
    python will behave differently from fortran and matlab". That could
    teach them to do things the right way.

    I am not an expert on all this, but I guessed that it would be possible
    to make a set of constraints that could catch a fair deal of simple
    errors such as the one above, but still allow for quite a bit of
    programming.

    Niels
     
    Niels L Ellegaard, Dec 9, 2006
    #3
  4. In <>, Niels L
    Ellegaard wrote:

    > I have been using scipy for some time now, but in the beginning I made
    > a few mistakes with copying by reference.


    But "copying by reference" is the way Python works. Python never copies
    objects unless you explicitly ask for it. So what you want is a warning
    for *every* assignment.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Dec 9, 2006
    #4
  5. Marc 'BlackJack' Rintsch wrote:
    > In <>, Niels L
    > Ellegaard wrote:
    > > I have been using scipy for some time now, but in the beginning I made
    > > a few mistakes with copying by reference.

    > But "copying by reference" is the way Python works. Python never copies
    > objects unless you explicitly ask for it. So what you want is a warning
    > for *every* assignment.


    Maybe I am on the wrong track here, but just to clarify myself:

    I wanted a each object to know whether or not it was being referred to
    by a living object, and I wanted to warn the user whenever he tried to
    change an object that was being refered to by a living object. As far
    as I can see the garbage collector module would allow to do some of
    this, but one would still have to edit the assignment operators of each
    of the standard data structures:

    http://docs.python.org/lib/module-gc.html

    Anyway you are probably right that the end result would be a somewhat
    crippled version of python


    Niels
     
    Niels L Ellegaard, Dec 9, 2006
    #5
  6. Niels L Ellegaard wrote:


    > I wanted a each object to know whether or not it was being referred to
    > by a living object, and I wanted to warn the user whenever he tried to
    > change an object that was being refered to by a living object. As far
    > as I can see the garbage collector module would allow to do som


    *all* objects in Python are referred to by "living" objects; those that
    don't are garbage, and are automatically destroyed sooner or later.

    (maybe you've missed that namespaces are objects too?)

    </F>
     
    Fredrik Lundh, Dec 9, 2006
    #6
  7. On Sat, 09 Dec 2006 05:58:22 -0800, Niels L Ellegaard wrote:

    > I wanted a each object to know whether or not it was being referred to
    > by a living object, and I wanted to warn the user whenever he tried to
    > change an object that was being refered to by a living object. As far
    > as I can see the garbage collector module would allow to do some of
    > this, but one would still have to edit the assignment operators of each
    > of the standard data structures:


    I think what you want is a namespace that requires each object to have
    exactly one reference - the namespace. Of course, additional references
    will be created during evaluation of expressions. So the best you can do
    is provide a function that checks reference counts for a namespace when
    called, and warns about objects with multiple references. If that could
    be called for every statement (i.e. not during expression evaluation -
    something like C language "sequence points"), it would probably catch the
    type of error you are looking for. Checking such a thing efficiently would
    require deep changes to the interpreter.

    The better approach is to revel in the ease with which data can be
    referenced rather than copied. I'm not sure it's worth turning python
    into fortran - even for selected namespaces.

    --
    Stuart D. Gathman <>
    Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154
    "Confutatis maledictis, flamis acribus addictis" - background song for
    a Microsoft sponsored "Where do you want to go from here?" commercial.
     
    Stuart D. Gathman, Dec 9, 2006
    #7
  8. > I think what you want is a namespace that requires each object to have
    > exactly one reference - the namespace. Of course, additional references
    > will be created during evaluation of expressions. So the best you can do


    It's not enough. It won't catch the case where a list holds many
    references to the same object - the example provided by the OP.

    --
    Gabriel Genellina
     
    Gabriel Genellina, Dec 9, 2006
    #8
  9. On 9 dic, 09:08, "Niels L Ellegaard" <> wrote:

    > Now some of my fortran-using friends would like to use python to
    > analyze their data files. I wanted them to avoid making the same
    > mistakes as I did so I thought it would be good if they could get some
    > nanny-like warnings saying: "Watch out young man. If do this, then
    > python will behave differently from fortran and matlab". That could
    > teach them to do things the right way.


    Best bet is to learn to use python the right way. It's not so hard...

    > I am not an expert on all this, but I guessed that it would be possible
    > to make a set of constraints that could catch a fair deal of simple
    > errors such as the one above, but still allow for quite a bit of
    > programming.


    The problem is, it's an error only becasuse it's not the *intent* of
    the programmer - but it's legal Python code, and useful. It's hard for
    the computer to guess the intent of the programmer -yet!...-

    --
    Gabriel Genellina
     
    Gabriel Genellina, Dec 10, 2006
    #9
  10. Niels L Ellegaard

    greg Guest

    Niels L Ellegaard wrote:
    > I wanted to warn the user whenever he tried to
    > change an object that was being refered to by a living object.


    To see how fundamentally misguided this idea is,
    consider that, under your proposal, the following
    program would produce a warning:

    a = 1

    The reason being that the assignment is modifying
    the dictionary holding the namespace of the main
    module, which is referred to by the main module
    itself. So you are "changing an object that is
    being referred to by a living object".

    --
    Greg
     
    greg, Dec 10, 2006
    #10
  11. Niels L Ellegaard

    Russ Guest

    Niels L Ellegaard wrote:
    > Is there a module that allows me to find errors that occur due to copy
    > by reference? I am looking for something like the following:


    I don't have a solution for you, but I certainly appreciate your
    concern. The copy by reference semantics of Python give it great
    efficiency but are also its achille's heel for tough-to-find bugs.

    I once wrote a simple "vector" class in Python for doing basic vector
    operations such as adding and subtracting vectors and multiplying them
    by a scalar. (I realize that good solutions exist for that problem, but
    I was more or less just experimenting.) The constructor took a list as
    an argument to initialize the vector. I later discovered that a
    particularly nasty bug was due to the fact that my constructor "copied"
    the initializing list by reference. When I made the constructor
    actually make its own copy using "deepcopy," the bug was fixed. But
    deecopy is less efficient than copy by reference, of course.

    So a fundamental question in Python, it seems to me, is when to take
    the performance hit and use "copy" or "deepcopy."

    If a debugger could tell you how many references exist to an object,
    that would be helpful. For all I know, maybe some of them already do. I
    confess I don't use debuggers as much as I should.
     
    Russ, Dec 11, 2006
    #11
  12. Niels L Ellegaard

    Carl Banks Guest

    Russ wrote:
    > If a debugger could tell you how many references exist to an object,
    > that would be helpful.


    import sys
    sys.getrefcount(a)

    But I doubt it would be very helpful.

    Carl Banks
     
    Carl Banks, Dec 11, 2006
    #12
  13. Niels L Ellegaard

    Carl Banks Guest

    Niels L Ellegaard wrote:
    > Marc 'BlackJack' Rintsch wrote:
    > > In <>, Niels L
    > > Ellegaard wrote:
    > > > I have been using scipy for some time now, but in the beginning I made
    > > > a few mistakes with copying by reference.

    > > But "copying by reference" is the way Python works. Python never copies
    > > objects unless you explicitly ask for it. So what you want is a warning
    > > for *every* assignment.

    >
    > Maybe I am on the wrong track here, but just to clarify myself:
    >
    > I wanted a each object to know whether or not it was being referred to
    > by a living object, and I wanted to warn the user whenever he tried to
    > change an object that was being refered to by a living object.


    This really wouldn't work, trust us. Objects do not know who
    references them, and are not notified when bound to a symbol or added
    to a container. However, I do think you're right about one thing: it
    would be nice to have a tool that can catch errors of this sort, even
    if it's imperfect (as it must be).

    ISTM the big catch for Fortran programmers is when a mutable container
    is referenced from multiple places; thus a change via one reference
    will confusingly show up via the other one. (This, of course, is
    something that happens all the time in Python, but it's not something
    people writing glue for numerical code need to do a lot.) Instead of
    the interpreter doing it automatically, it might be better (read:
    possible) to use an explicit check. Here is a simple, untested
    example:


    class DuplicateReference(Exception):
    pass

    def _visit_container(var,cset):
    if id(var) in cset:
    raise DuplicateReference("container referenced twice")
    cset.add(id(var))

    def assert_no_duplicate_container_refs(datalist,cset=None):
    if cset is None:
    cset = set()
    for var in data:
    if isinstance(var,(list,set)):
    _visit_container(var,cset)
    assert_no_duplicate_container_refs(var,cset)
    elif isinstance(var,dict):
    _visit_container(var,cset)
    assert_no_duplicate_container_refs(var.itervalues(),cset)


    Then, at important points along the way, call this function with data
    you provide yourself. For example, we might modify your example as
    follows:

    import os
    output=[]
    firstlines =[0,0]
    for filename in os.listdir('.'):
    try:
    firstlines[0] = open(filename,"r").readlines()[0]
    firstlines[1] = open(filename,"r").readlines()[1]
    output.append((filename,firstlines))
    except (IOError, IndexError):
    # please don't use bare except unless reraising exception
    continue
    assert_no_duplicate_container_refs([output,firstlines])
    print output


    We passed the function output and firstlines because those are the two
    names you defined yourself. It'll check to see if any containers are
    referenced more than once. And it turns out there are--firstlines is
    referenced many times. It'll raise DuplicateReference on you.

    There's plenty of room for improvement in the recipe, for sure. It'll
    catch real basic stuff, but it doesn't account for all containers, and
    doesn't show you the loci of the duplicate references.

    Now, to be honest, the biggest benefit of this check is it gives
    newbies a chance to learn about references some way other than the hard
    way. It's not meant to catch a common mistake, so much as a
    potentially very confusing one. (It's really not a common mistake for
    programmers who are aware of references.) Once the lesson is learned,
    this check should be put aside, as it unnecessarily restricts
    legitimate use of referencing.


    Carl Banks
     
    Carl Banks, Dec 11, 2006
    #13
  14. Niels L Ellegaard

    greg Guest

    Russ wrote:
    > The copy by reference semantics of Python give it great
    > efficiency but are also its achille's heel for tough-to-find bugs.


    You need to stop using the term "copy by reference",
    because it's meaningless. Just remember that assignment
    in Python is always reference assignment. If you want
    something copied, you need to be explicit about it.

    > I later discovered that a
    > particularly nasty bug was due to the fact that my constructor "copied"
    > the initializing list by reference.


    The reason you made that mistake was that you were
    using the wrong mental model for how assignment works
    in Python -- probably one that you brought over from
    some other language.

    When you become more familiar with Python, you won't
    make mistakes like that anywhere near as often. And
    if you do, you'll be better at recognising the
    symptoms, so the cause won't be hard to track down.

    > So a fundamental question in Python, it seems to me, is when to take
    > the performance hit and use "copy" or "deepcopy."


    Again, this is something you'll find easier when
    you've had more experience with Python. Generally,
    you only need to copy something when you want an
    independent object that you can manipulate without
    affecting anything else, although that probably
    doesn't sound very helpful.

    In your vector example, it depends on whether you
    want your vectors to be mutable or immutable. It
    sounds like you were treating them as mutable, i.e.
    able to be modified in-place. In that case, each
    vector obviously needs to be a new object with the
    initial values copied into it.

    The alternative would be to treat your vectors as
    immutable, i.e. once created you never change their
    contents, and any operation, such as adding two
    vectors, produces a new vector holding the result.
    In that case, two vectors could happily share a
    reference to a list of values (as long as there is
    nothing else that might modify the contents of the
    list).

    --
    Greg
     
    greg, Dec 11, 2006
    #14
  15. Niels L Ellegaard

    Beliavsky Guest

    Carl Banks wrote:
    > Niels L Ellegaard wrote:
    > > Marc 'BlackJack' Rintsch wrote:
    > > > In <>, Niels L
    > > > Ellegaard wrote:
    > > > > I have been using scipy for some time now, but in the beginning I made
    > > > > a few mistakes with copying by reference.
    > > > But "copying by reference" is the way Python works. Python never copies
    > > > objects unless you explicitly ask for it. So what you want is a warning
    > > > for *every* assignment.

    > >
    > > Maybe I am on the wrong track here, but just to clarify myself:
    > >
    > > I wanted a each object to know whether or not it was being referred to
    > > by a living object, and I wanted to warn the user whenever he tried to
    > > change an object that was being refered to by a living object.

    >
    > This really wouldn't work, trust us. Objects do not know who
    > references them, and are not notified when bound to a symbol or added
    > to a container. However, I do think you're right about one thing: it
    > would be nice to have a tool that can catch errors of this sort, even
    > if it's imperfect (as it must be).
    >
    > ISTM the big catch for Fortran programmers is when a mutable container
    > is referenced from multiple places; thus a change via one reference
    > will confusingly show up via the other one.


    As a Fortranner, I agree. Is there an explanation online of why Python
    treats lists the way it does? I did not see this question in the Python
    FAQs at http://www.python.org/doc/faq/ .

    Here is a short Python code and a Fortran 95 equivalent.

    a = [1]
    c = a[:]
    b = a
    b[0] = 10
    print a,b,c

    output: [10] [10] [1]

    program xalias
    implicit none
    integer, target :: a(1)
    integer :: c(1)
    integer, pointer :: b:))
    a = [1]
    c = a
    b => a
    b(1) = 10
    print*,a,b,c
    end program xalias

    output: 10 10 1

    It is possible to get similar behavior when assigning an array (list)
    in Fortran as in Python, but one must explicitly use a pointer and "=>"
    instead of "=". This works well IMO, causing fewer surprises, and I
    have never heard Fortranners complain about it.

    Another way of writing the Fortran code so that "a" and "b" occupy the
    same memory is to use EQUIVALENCE.

    program xequivalence
    implicit none
    integer :: a(1),b(1)
    integer :: c(1)
    equivalence (a,b)
    a = [1]
    c = a
    b = a
    b(1) = 10
    print*,a,b,c
    end program xequivalence

    output: 10 10 1

    EQUIVALENCE is considered a "harmful" feature of early FORTRAN
    http://www.ibiblio.org/pub/languages/fortran/ch1-5.html .
     
    Beliavsky, Dec 11, 2006
    #15
  16. In <>, Beliavsky wrote:

    >> ISTM the big catch for Fortran programmers is when a mutable container
    >> is referenced from multiple places; thus a change via one reference
    >> will confusingly show up via the other one.

    >
    > As a Fortranner, I agree. Is there an explanation online of why Python
    > treats lists the way it does? I did not see this question in the Python
    > FAQs at http://www.python.org/doc/faq/ .


    This question sounds as if lists are treated somehow special. They are
    treated like any other object in Python. Any assignment binds an object
    to a name or puts a reference into a "container" object.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Dec 11, 2006
    #16
  17. Niels L Ellegaard

    Russ Guest

    greg wrote:

    > You need to stop using the term "copy by reference",
    > because it's meaningless. Just remember that assignment


    I agree that "copy by reference" is a bad choice of words. I meant pass
    by reference and assign by reference. But the effect is to make a
    virtual copy, so although the phrase is perhaps misleading, it is not
    "meaningless."

    <cut>

    > Again, this is something you'll find easier when
    > you've had more experience with Python. Generally,
    > you only need to copy something when you want an
    > independent object that you can manipulate without
    > affecting anything else, although that probably
    > doesn't sound very helpful.


    Yes, I realize that, but deciding *when* you need a copy is the hard
    part.
     
    Russ, Dec 11, 2006
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Goldin

    Errors, errors, errors

    Mark Goldin, Jan 17, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    965
    Mark Goldin
    Jan 17, 2004
  2. Guest
    Replies:
    1
    Views:
    771
    Guest
    Jun 29, 2004
  3. Alex
    Replies:
    2
    Views:
    1,236
  4. Replies:
    26
    Views:
    2,120
    Roland Pibinger
    Sep 1, 2006
  5. puzzlecracker
    Replies:
    8
    Views:
    431
    James Kanze
    Apr 15, 2008
Loading...

Share This Page