change of random state when pyc created??

A

Alan G Isaac

Robert said:
http://docs.python.org/lib/typesmapping.html
"""
Keys and values are listed in an arbitrary order which is non-random, varies
across Python implementations, and depends on the dictionary's history of
insertions and deletions.
"""


Even this does not tell me that if I use a specified implementation
that my results can vary from run to run. That is, it still does
not communicate that rerunning an *unchanged* program with an
*unchanged* implementation can produce a change in results.

Alan Isaac
 
C

Chris Mellon

Even this does not tell me that if I use a specified implementation
that my results can vary from run to run. That is, it still does
not communicate that rerunning an *unchanged* program with an
*unchanged* implementation can produce a change in results.

Well, now you know. I'm not sure why you expect any given program to
be idempotent unless you take specific measures to ensure that anyway.
 
R

Robert Kern

Alan said:
Even this does not tell me that if I use a specified implementation
that my results can vary from run to run. That is, it still does
not communicate that rerunning an *unchanged* program with an
*unchanged* implementation can produce a change in results.

The last clause does tell me that.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
C

Carsten Haese

Even this does not tell me that if I use a specified implementation
that my results can vary from run to run. That is, it still does
not communicate that rerunning an *unchanged* program with an
*unchanged* implementation can produce a change in results.

It doesn't say that rerunning the program won't produce a change in
results. It doesn't say that the order depends *only* on those factors
in a deterministic and reproducible manner.

The documentation shouldn't be expected to list every little thing that
might change the order of keys in a dictionary. The documentation does
say explicitly what *is* guaranteed: Order of keys is preserved as long
as no intervening modifications happen to the dictionary. Tearing down
the interpreter, starting it back up, and rebuilding the dictionary from
scratch is very definitely an intervening modification.

Regards,
 
A

Alan Isaac

Robert Kern said:
The last clause does tell me that.

1. About your reading of the current language:
I believe you, of course, but can you tell me **how** it tells you that?
To be concrete, let us suppose parallel language were added to
the description of sets. What about that language should allow
me to anticipate Peter's example (in this thread)?

2. About possibly changing the docs:
You are much more sophisticated than ordinary users.
Did this thread not demonstrate that even sophisticated users
do not see into this "implication" immediately? Replicability
of results is a huge deal in some circles. I think the docs
for sets and dicts should include a red flag: do not use
these as iterators if you want replicable results.
(Side note to Carsten: this does not require listing "every little thing".)

Cheers,
Alan Isaac
 
R

Robert Kern

Alan said:
1. About your reading of the current language:
I believe you, of course, but can you tell me **how** it tells you that?
To be concrete, let us suppose parallel language were added to
the description of sets. What about that language should allow
me to anticipate Peter's example (in this thread)?

Actually, the root cause of Peter's specific example is the fact that the
default implementation of __hash__() and __eq__() rely on identity comparisons.
Two separate invocations of the same script give different objects by identity
and thus the "history of insertions and deletions" is different.
2. About possibly changing the docs:
You are much more sophisticated than ordinary users.
Did this thread not demonstrate that even sophisticated users
do not see into this "implication" immediately?

Well, if you had a small test case that demonstrated the problem, we would have.
Your example was large, complicated, and involved other semi-deterministic red
herrings (the PRNG). It's quite easy to see the problem with Peter's example.
Replicability
of results is a huge deal in some circles. I think the docs
for sets and dicts should include a red flag: do not use
these as iterators if you want replicable results.
(Side note to Carsten: this does not require listing "every little thing".)

They do. They say very explicitly that they are not ordered and that the
sequence of iteration should not be relied upon. The red flags are there.

But I'm not going to stop you from writing up something that's even more explicit.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
C

Carsten Haese

Did this thread not demonstrate that even sophisticated users
do not see into this "implication" immediately?

Knowing that maps don't have reproducible ordering is one thing.
Realizing that that's the cause of the problem that's arbitrarily and
wrongly attributed to the 'random' module, in a piece of code that's not
posted to the public, and presumably not trimmed down to the shortest
possible example of the problem, is quite another.

I'll venture the guess that most Python programmers with a modicum of
experience will, when asked point blank if it's safe to rely on a
dictionary to be iterated in a particular order, answer no.
Replicability
of results is a huge deal in some circles.

Every software engineer wants their results to be replicable. Software
engineers also know that they can only expect their results to be
replicable if they use deterministic functions. You wouldn't expect
time.time() to return the same result just because you're running the
same code, would you?
I think the docs
for sets and dicts should include a red flag: do not use
these as iterators if you want replicable results.

It does, at least for dicts: "Keys and values are listed in an arbitrary
order." If this wording is not present for sets, something to this
effect should be added.

Regards,
 
S

Steven D'Aprano

The last clause does tell me that.

Actually it doesn't. If you run a program twice, with the same inputs,
and no other source of randomness (or at most have pseudo-randomness
starting with the same seed), then the dictionary will have the same
history of insertions and deletions from run to run.

Go back to Peter Otten's diagnosis of the issue:

"... your GridPlayer instances are located in different memory locations
and get different hash values. This in turn affects the order in which
they occur when you iterate over the GridPlayer.players_played set."

There is nothing in there about the dictionary having a different history
of insertions and deletions. It is having the same insertions and
deletions each run, but the items being inserted are located at different
memory locations, and _that_ changes their hash value and hence the order
they occur in when you iterate over the set.

That's quite a subtle thread to follow, and with all respect Robert, it's
easy to say it is obvious in hindsight, but I didn't notice you solving
the problem in the first place. Maybe you would have, if you had tried...
and maybe you would have scratched your head too. Who can tell?

As Carsten Haese says in another post:

"The documentation shouldn't be expected to list every little thing that
might change the order of keys in a dictionary. The documentation does say
explicitly what *is* guaranteed: Order of keys is preserved as long as no
intervening modifications happen to the dictionary. Tearing down the
interpreter, starting it back up, and rebuilding the dictionary from
scratch is very definitely an intervening modification."

That's all very true, but nevertheless it is a significant gotcha. It is
natural to expect two runs of any program to give the same result if there
are (1) no random numbers involved; (2) the same input data; (3) and no
permanent storage from run to run. One doesn't normally expect the output
of a well-written, bug-free program to depend on the memory location of
objects. And that's the gotcha -- with dicts and sets, they can.
 
A

Alan Isaac

Robert Kern said:
Actually, the root cause of Peter's specific example is the fact that the
default implementation of __hash__() and __eq__() rely on identity comparisons.
Two separate invocations of the same script give different objects by identity
and thus the "history of insertions and deletions" is different.


OK. Thank you.
Alan
 
A

Alan Isaac

Carsten Haese said:
Knowing that maps don't have reproducible ordering is one thing.
Realizing that that's the cause of the problem that's arbitrarily and
wrongly attributed to the 'random' module, in a piece of code that's not
posted to the public, and presumably not trimmed down to the shortest
possible example of the problem, is quite another.

There is no reason to be unfriendly about this.
I posted an observation about my code behavior
and my best understanding of it. I asked for an
explanation and did not assert a bug, although when
someone doubted that the presence or absence of the
..pyc file mattered for the results I said that *if* it should
not matter *then* there was a bug. I offered the code
to all that asked for it. I did not post it **because**
I had not adequately isolated the problem. (But indeed,
I was not isolating the problem due to misconceptions.)
I'll venture the guess that most Python programmers with a modicum of
experience will, when asked point blank if it's safe to rely on a
dictionary to be iterated in a particular order, answer no.

Again, that misses the point. This is clearly documented.
I would have said the same thing: no, that's not safe. But
the question is whether the same people will be surprised when
*unchanged* code rerun with an *unchanged* implementation
produces *changed* results. I do not see how a reader of
this thread cannot conclude that yes, even some sophisticated
users (who received my code) will be surprised. The docs
should not be useful only to the most sophisticated users.
It does, at least for dicts: "Keys and values are listed in an arbitrary
order." If this wording is not present for sets, something to this
effect should be added.

Even Robert did not claim that *that* phrase was adequate.
I note that you cut off "which is non-random"!

Alan Isaac
 
S

Steven D'Aprano

Actually, the root cause of Peter's specific example is the fact that the
default implementation of __hash__() and __eq__() rely on identity comparisons.
Two separate invocations of the same script give different objects by identity
and thus the "history of insertions and deletions" is different.

The history is the same. The objects inserted are the same (by equality).
The memory address those objects are located at is different.

Would you expect that "hello world".find("w") should depend on the address
of the string "w"? No, of course not. Programming in a high level language
like Python, we hope to never need to think about memory addresses. And
that's the gotcha.
 
R

Robert Kern

Steven said:
The history is the same. The objects inserted are the same (by equality).

No, they *were* different by equality (identity being the default implementation
equality that was not overridden in either Peter's code nor Alan's).

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
C

Carsten Haese

There is no reason to be unfriendly about this.

I did not mean this to be unfriendly. I'm sorry if you got that impression. I
was simply pointing out all the ways in which you made it difficult for the
community to explain your problem.
Again, that misses the point. This is clearly documented.
I would have said the same thing: no, that's not safe. But
the question is whether the same people will be surprised when
*unchanged* code rerun with an *unchanged* implementation
produces *changed* results.

That only means that a program can behave non-deterministically if you're not
carefully restricting it to functions that are guaranteed to be deterministic.
No experienced software engineer, whether they are experienced in Python or
some other programming language should be surprised by this notion.

I don't think that the cause of non-determinism in your case was exceptionally
subtle, you just made it harder to find.
The docs
should not be useful only to the most sophisticated users.

Please feel free to suggest specific wording changes to make the documentation
more useful.
Even Robert did not claim that *that* phrase was adequate.
I note that you cut off "which is non-random"!

In my opinion, that phrase is adequate. I did cut off the non-random part
because it's irrelevant. Non-random doesn't mean deterministic.

Regards,

Carsten.
 
C

Carsten Haese

It is natural to expect two runs of any program to give the same
result if there are (1) no random numbers involved; (2) the same
input data; (3) and no permanent storage from run to run.

Which of those three categories does time.time() fall into? What about
id("hello")?

-Carsten
 
S

Steven D'Aprano

Which of those three categories does time.time() fall into? What about
id("hello")?

I didn't say there were no exceptions to the heuristic "expect any
computer program to do the same thing on subsequent runs". I said it was a
natural expectation.

Obviously one of the differences between a naive programmer and a
sophisticated programmer is that the sophisticated programmer has learnt
more exceptions to the rule.

And that's why I have described this behaviour as a gotcha, not as a bug
or a mis-feature or anything else.
 
S

Steven D'Aprano

No, they *were* different by equality (identity being the default implementation
equality that was not overridden in either Peter's code nor Alan's).

Ah yes, you are right in the sense that Python's notion of equality for
class instances is to fall back on identity by default.

But in the vernacular human sense, an instance X with the same state as an
instance Y is "equal", despite being at another memory address. I was
using equality in the sense that two copies of the same edition of a book
are the same, despite being in different places.

For the record, and for the avoidance of all confusion, I'm not suggesting
that Python's default behaviour is "wrong" or even "bad", merely pointing
out to all those wise in hindsight that the behaviour was extremely
puzzling for the reasons I've given. But you can be sure that I'll never
forget this lesson :)
 
A

Alan Isaac

Carsten Haese said:
I was simply pointing out all the ways in which you made it difficult for the
community to explain your problem.

And without that community, I would still not have a clue.
Thanks to all!
Please feel free to suggest specific wording changes to make the documentation
more useful.

I'm sure my first pass will be flawed, but here goes:

http://docs.python.org/lib/typesmapping.html:
to footnote (3), add phrase "which may depend on the memory location of the
keys" to get:

Keys and values are listed in an arbitrary order,
which may depend on the memory location of the keys.
This order is non-random, varies across Python implementations,
and depends on the dictionary's history of insertions and deletions.

http://docs.python.org/lib/types-set.html: append a new sentence to 2nd
paragraph

Iteration over a set returns elements in an arbitrary order,
which may depend on the memory location of the elements.

fwiw,
Alan Isaac
 
R

Raymond Hettinger

Is there
a warning anywhere in the docs? Should
there be?

I do not think additional documentation here would be helpful. One
could note that the default hash value is the object id. Somewhere
else you could write that the placement of objects in memory is
arbitrary and can be affected by a number of factors not explicity
under user control.

With those notes scattered throughout the documentation, I'm not sure
that you would have found them and recognized the implications with
respect to your design and with respect to the deletion of pyc files
(which is just one factor among many that could cause different
placements in memory).

Also, the existing docs describe behavior at a more granular level.
How the parts interact is typically left to third-party documentation
(i.e. the set docs say what the set methods do but do not give advice
on when to use them instead of a dict or list).

Out of this thread, the more important lesson is that the docs
intentionally do not comment on implemation specific details. When the
docs do not make specific guarantees and behavior is left undefined,
it is not a good practice to make assumptions about invariants that
may or may not be true (in your case, you assumed that objects would
be located in the same memory locations between runs -- while that
sometimes happens to be true, it is certainly not guaranteed behavior
as you found out -- moreover, you've made non-guaranteed assumptions
about the arbitrary ordering of an unordered collection -- a definite
no-no).


Raymond Hettinger
 
R

Robert Kern

Alan said:
And without that community, I would still not have a clue.
Thanks to all!


I'm sure my first pass will be flawed, but here goes:

http://docs.python.org/lib/typesmapping.html:
to footnote (3), add phrase "which may depend on the memory location of the
keys" to get:

Keys and values are listed in an arbitrary order,
which may depend on the memory location of the keys.
This order is non-random, varies across Python implementations,
and depends on the dictionary's history of insertions and deletions.

http://docs.python.org/lib/types-set.html: append a new sentence to 2nd
paragraph

Iteration over a set returns elements in an arbitrary order,
which may depend on the memory location of the elements.

It's misleading. It only depends on the memory location of the elements if
__hash__() is implemented as id() (the default).

How about this?

"""Never rely on the order of dictionaries and sets."""

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
C

Carsten Haese

"""Never rely on the order of dictionaries and sets."""

Easy, Robert, there's a baby in that bathwater.

I think it's useful to note that the arbitrary ordering returned by
dict.keys() et al. is locally stable in the absence of intervening
modifications, as long as the guarantee is worded in a way that prevents
overly optimistic reliance on that ordering.

-Carsten
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,434
Messages
2,571,691
Members
48,796
Latest member
Greg L.

Latest Threads

Top