change of random state when pyc created??

Alan Isaac · May 5, 2007

This may seem very strange, but it is true.
If I delete a .pyc file, my program executes with a different state!

In a single directory I have
module1 and module2.

module1 imports random and MyClass from module2.
module2 does not import random.

module1 sets a seed like this::

if __name__ == "__main__":
random.seed(314)
main()

I execute module1.py from the (Windows) shell.
I get a result, let's call it result1.
I execute it again. I get another result, say result2.
Running it again and again, I get result2.

Now I delete module2.pyc.
I execute module1.py from the shell.
I get result1.
I execute it again; I get result2.
From then on I get result2,
unless I delete module.pyc again,
in which case I once again get result1.

Can someone explain this to me?

Thank you,
Alan Isaac

Dustan · May 5, 2007

This may seem very strange, but it is true.
If I delete a .pyc file, my program executes with a different state!

In a single directory I have
module1 and module2.

module1 imports random and MyClass from module2.
module2 does not import random.

module1 sets a seed like this::

if __name__ == "__main__":
random.seed(314)
main()

I execute module1.py from the (Windows) shell.
I get a result, let's call it result1.
I execute it again. I get another result, say result2.
Running it again and again, I get result2.

Now I delete module2.pyc.
I execute module1.py from the shell.
I get result1.
I execute it again; I get result2.
From then on I get result2,
unless I delete module.pyc again,
in which case I once again get result1.

Can someone explain this to me?

Thank you,
Alan Isaac

I can't imagine why that would be, and I was unable to reproduce that
behavior, using Microsoft Windows XP and Python 2.5:

<module1.py>
import module2
import random

def main():
for i in range(10): print module2.aRandom()

if __name__ == '__main__':
random.seed(314)
main()
</module1.py>

<module2.py>
import random
print "module2 imported"

def aRandom():
return random.randrange(1000000)
</module2.py>

C:\Documents and Settings\DUSTAN\Desktop\apackage>module1.py
module2 imported
196431
111465
2638
628136
234231
207699
546775
449804
633844
179171

C:\Documents and Settings\DUSTAN\Desktop\apackage>module1.py
module2 imported
196431
111465
2638
628136
234231
207699
546775
449804
633844
179171

C:\Documents and Settings\DUSTAN\Desktop\apackage>module1.py
module2 imported
196431
111465
2638
628136
234231
207699
546775
449804
633844
179171

C:\Documents and Settings\DUSTAN\Desktop\apackage>module1.py
module2 imported
196431
111465
2638
628136
234231
207699
546775
449804
633844
179171

I deleted module2.pyc right before that last call.

Alan Isaac · May 6, 2007

I have documented this behavior
on two completely different systems
(Win 2000 and Win XP SP2), using Python 2.5.1.

It two modules where this happens,
as described before.
If it should not happen, there is a bug.
I am looking for potential explanation,
since I realize that finding bugs is unlikely.

Alan Isaac

John Machin · May 6, 2007

I have documented this behavior
on two completely different systems
(Win 2000 and Win XP SP2), using Python 2.5.1.

You can't say that you have "documented" the behaviour when you
haven't published files that exhibit the alleged behaviour.

Alan Isaac · May 6, 2007

John Machin said:
You can't say that you have "documented" the behaviour when you
haven't published files that exhibit the alleged behaviour.

Fine. I have "observed" this behavior.
The files are not appropriate for posting.
I do not yet have a "minimum" case.
But surely I am not the first to notice this!
Alan Isaac
PS I'll send you the files off list.

John Machin · May 6, 2007

This may seem very strange, but it is true.
If I delete a .pyc file, my program executes with a different state!

In a single directory I have
module1 and module2.

module1 imports random and MyClass from module2.

That's rather ambiguous. Do you mean
(a) module1 imports random and (MyClass from module2)
or
(b) module1 imports (random and MyClass) from module2

module2 does not import random.

This statement would *appear* to rule out option (b) but appearances
can be deceptive

It's a bit of a worry that you call the first file "module1" and not
"the_script". Does module2 import module1, directly or indirectly?

module1 sets a seed like this::

if __name__ == "__main__":
random.seed(314)
main()

I execute module1.py from the (Windows) shell.
I get a result, let's call it result1.
I execute it again. I get another result, say result2.
Running it again and again, I get result2.

Stop right there. Never mind what happens when you delete module2.pyc.
Should you not expect to get the same result each time? Is that not
the point of setting a constant seed each time you run the script?
====>>> Problem 1.

Now I delete module2.pyc.
I execute module1.py from the shell.
I get result1.
I execute it again; I get result2.
From then on I get result2,
unless I delete module.pyc again,
in which case I once again get result1.

Can someone explain this to me?

Thank you,
Alan Isaac

Compiling module2 is causing code to be executed that probably
shouldn't be executed. ===>>> Problem 2.

With all due respect to your powers of description

no, it can't be
explained properly, without seeing the contents of the source files. I
strongly suggest that if you continue to experience Problem1 and/or
Problem 2, you cut your two files down to the bare minima and post
them here.

Meanwhile, my deja-vu detector is kicking in ...

uh-huh (1), from 25 April:
===
%%%%% test2.py %%%%%%%%%%%%%
from random import seed
seed(314)
class Trivial:
pass
===
Is module2 (still) doing that?
Is module1 importing itself (directly or indirectly)?

uh-huh (2), the long thread about relative imports allegedly being
broken ...

It appears to me that you need to divorce the two concepts "module"
and "script" in your mind.

Modules when executed should produce only exportables: classes,
functions, NAMED_CONSTANTS, etc. It is OK to do things like process
the easier-to-create
_ds = """\
foo 1
bar 42
zot 666"""
into the easier-to-use
USEFUL_DICT = {'foo': 1, 'bar': 42, zot: 666}
but not to change global state.

Scripts which use functions etc from a module or package should be
independent of the module/package such that they don't need anything
more complicated than simple importing of the module/package. The
notion of inspecting the script's path to derive the module/package
path and then stuffing that into sys.paths is mind boggling. Are
module1/script1 and module2 parts of a package?

Here's a suggestion for how you should structure scripts:

def main():
# All productive code is inside a function to take advantage
# of access to locals being faster than access to globals
import mymodule
mymodule.do_something()
if __name__ == "__main__":
main()
else:
raise Exception("Attempt to import script containing nothing
importable")

and your modules should *start* with:
if __name__ == "__main__":
raise Exception("Attempt to execute hopefully-pure module as a
script")

HTH,
John

Alan Isaac · May 6, 2007

John Machin said:
(a) module1 imports random and (MyClass from module2)
Right.

It's a bit of a worry that you call the first file "module1" and not
"the_script". Does module2 import module1, directly or indirectly?

No.
I call a module any file meant to be imported by others.
Many of my modules include a "main" function,
which allow the module to be executed as a script.
I do not think this is unusual, even as terminology.

Should you not expect to get the same result each time? Is that not
the point of setting a constant seed each time you run the script?

Yes. That is the problem.
If I delete module2.pyc,
I do not get the same result.

With all due respect to your powers of description no, it can't be
explained properly, without seeing the contents of the source files.

I sent them to you.
What behavior did you see?

from random import seed
seed(314)
class Trivial:
pass
===
Is module2 ... doing that?
Is module1 importing itself (directly or indirectly)?

No.

Separate issue
==============

Here's a suggestion for how you should structure scripts:

def main():
# All productive code is inside a function to take advantage
# of access to locals being faster than access to globals
import mymodule
mymodule.do_something()
if __name__ == "__main__":
main()
else:
raise Exception("Attempt to import script containing nothing
importable")

and your modules should *start* with:
if __name__ == "__main__":
raise Exception("Attempt to execute hopefully-pure module as a
script")

I'm not going to call this a bad practice, since it has clear virtues.
I will say that it does not seem to be a common practice, although that
may be my lack of exposure to other's code. And it still does not
address the common need of playing with a "package in progress"
or a "package under consideration" without installing it.

Cheers,
Alan Isaac

Dustan · May 6, 2007

Fine. I have "observed" this behavior.
The files are not appropriate for posting.
I do not yet have a "minimum" case.
But surely I am not the first to notice this!
Alan Isaac
PS I'll send you the files off list.

I got the files and tested them, and indeed got different results
depending on whether or not there was a pyc file. I haven't looked at
the source files in great detail yet, but I will. I would certainly
agree that there's a bug going on here; we just need to narrow down
the problem (ie come up with a "minimum" case).

Steven D'Aprano · May 6, 2007

Yes. That is the problem.
If I delete module2.pyc,
I do not get the same result.

I think you have missed what John Machin is pointing out. According to
your original description, you get different results even if you DON'T
delete module2.pyc.

According to your original post, you get the _same_ behaviour the first
time you run the script, regardless of the pyc file being deleted or not.
You wrote:

module1 sets a seed like this::

if __name__ == "__main__":
random.seed(314)
main()

I execute module1.py from the (Windows) shell.
I get a result, let's call it result1.
I execute it again. I get another result, say result2.
Running it again and again, I get result2.
[end quote]

So, with module2.pyc file existing, you get result1 the first time you
execute module1.py, and then you get result2 every time from then onwards.

How is that different from what you wrote next?

Now I delete module2.pyc.
I execute module1.py from the shell.
I get result1.
I execute it again; I get result2.
From then on I get result2,
unless I delete module.pyc again,
in which case I once again get result1.
[end quote]

You get the same behaviour with or without module2.pyc: the first run of
the script gives different results from subsequent runs. You can reset
that first run by deleting module2.pyc.

I'm still perplexed how this is possible, but now I'm more perplexed.

If you want to send me the modules, I will have a look at them as well.
Many eyes make for shallow bugs...

Click to expand...

Dustan · May 6, 2007

Yes. That is the problem.
If I delete module2.pyc,
I do not get the same result.

Click to expand...

I think you have missed what John Machin is pointing out. According to
your original description, you get different results even if you DON'T
delete module2.pyc.

According to your original post, you get the _same_ behaviour the first
time you run the script, regardless of the pyc file being deleted or not.

You wrote:

module1 sets a seed like this::

if __name__ == "__main__":
random.seed(314)
main()

I execute module1.py from the (Windows) shell.
I get a result, let's call it result1.
I execute it again. I get another result, say result2.
Running it again and again, I get result2.
[end quote]

So, with module2.pyc file existing, you get result1 the first time you
execute module1.py, and then you get result2 every time from then onwards.

Click to expand...

Umm... no.

module2.pyc is created by the first run.

How is that different from what you wrote next?

Now I delete module2.pyc.
I execute module1.py from the shell.
I get result1.
I execute it again; I get result2.
From then on I get result2,
unless I delete module.pyc again,
in which case I once again get result1.
[end quote]

You get the same behaviour with or without module2.pyc: the first run of
the script gives different results from subsequent runs. You can reset
that first run by deleting module2.pyc.

I'm still perplexed how this is possible, but now I'm more perplexed.

If you want to send me the modules, I will have a look at them as well.
Many eyes make for shallow bugs...

Click to expand...

Click to expand...

John Machin · May 6, 2007

I think you have missed what John Machin is pointing out. According to
your original description, you get different results even if you DON'T
delete module2.pyc.

Click to expand...

According to your original post, you get the _same_ behaviour the first
time you run the script, regardless of the pyc file being deleted or not.

Click to expand...

You wrote:

module1 sets a seed like this::

Click to expand...

if __name__ == "__main__":
random.seed(314)
main()

Click to expand...

I execute module1.py from the (Windows) shell.
I get a result, let's call it result1.
I execute it again. I get another result, say result2.
Running it again and again, I get result2.
[end quote]

Click to expand...

So, with module2.pyc file existing, you get result1 the first time you
execute module1.py, and then you get result2 every time from then onwards.

Click to expand...

Umm... no.

module2.pyc is created by the first run.

Click to expand...

Yes, I've realised that too.

Some (1) Alan has sent Dustan and me a second smaller version of the two
files.
I'll forward them on to Steven -- they're now called test.py and
test1.py, but I'll continue to call them module[12].py
(2) The problem is definitely reproducible -- whether or not
module2.pyc has been deleted or retained from the previous run is
affecting the results. [Windows XP SP2; Python 2.5.1]
(3) module2.py appears to me not to be guilty of causing any changes
in state; it contains only rather inocuous functions and classes.
(4) I have put
print random.getstate()
before and after the call to main() in the executed script
(module1.py). Diffing the stdout of a no-pyc run and a with-pyc run
shows differences in Alan's output but NO DIFFERENCES in either the
"before" or the "after" random.getstate() output. Looks like the
problem is nothing to do with the random module.
(5) I have backported the files to Python 2.4 by replacing the use of
defaultdict in module2.py with explicit "if key in the_dict" code --
now the problem is reproducible with Python 2.4.3 as well as with
2.5.1.
(6) I've changed the 'from module2 import foo, bar, zot; foo()" to use
the "import module2; module2.foo()" style -- no effect; still has the
problem.

Cheers,
John

Alan Isaac · May 8, 2007

Steven D'Aprano said:
If you want to send me the modules, I will have a look at them as well.
Many eyes make for shallow bugs...

Dustan and John Machin have confirmed the
apparent bug, and I have sent you the files.
Explanation welcome!!

Cheers,
Alan Isaac

Steven D'Aprano · May 8, 2007

message

Dustan and John Machin have confirmed the apparent bug, and I have sent
you the files. Explanation welcome!!

My testing suggests the bug is *not* to do with pyc files at all. I'm
getting different results when running the files, even when the directory
is read-only (and therefore no pyc files can be created).

My results suggest that setting the seed to the same value does NOT give
identical results, *even though* the random number generator is giving
the same results.

So I think we can discount the issue being anything to do with either
the .pyc files or the random number generator.

Alan Isaac · May 8, 2007

Steven D'Aprano said:
My testing suggests the bug is *not* to do with pyc files at all. I'm
getting different results when running the files, even when the directory
is read-only (and therefore no pyc files can be created).

My results suggest that setting the seed to the same value does NOT give
identical results, *even though* the random number generator is giving
the same results.

So I think we can discount the issue being anything to do with either
the .pyc files or the random number generator.

I do not know how Python handles your use of a readonly directory.
What I have seen is:

- when a test1.pyc file is present, I always get the
same outcome (result1)
- when a test1.pyc file is NOT present, I always get
the same outcome (result2)
- the two outcomes are different (result1 != result2)

Do you see something different than this if you run the
test as I suggested? If not, how can in not involve the
..pyc file (in some sense)?

Cheers,
Alan Isaac

Gabriel Genellina · May 9, 2007

What I have seen is:

- when a test1.pyc file is present, I always get the
same outcome (result1)
- when a test1.pyc file is NOT present, I always get
the same outcome (result2)
- the two outcomes are different (result1 != result2)

I've logged all Random calls (it appears to be only one shuffle call,
after the initial seed) and in both cases they get the same numbers. So
the program always starts with the same "shuffled" values.

Perhaps there is a tiny discrepancy in the marshal representation of some
floating point values. When there is no .pyc, Python parses the literal
from source; when a .pyc is found, Python loads the value from there; they
could be slightly different.
I'll investigate further... tomorrow.

Peter Otten · May 9, 2007

Alan said:
This may seem very strange, but it is true.
If I delete a .pyc file, my program executes with a different state!

Can someone explain this to me?

There is nothing wrong with the random module -- you get the same numbers on
every run. When there is no pyc-file Python uses some RAM to create it and
therefore your GridPlayer instances are located in different memory
locations and get different hash values. This in turn affects the order in
which they occur when you iterate over the GridPlayer.players_played set.

Here is a minimal example:

import test # sic

class T:
def __init__(self, name):
self.name = name
def __repr__(self):
return "T(name=%r)" % self.name

if __name__ == "__main__":
print set(T(i) for i in range(4))

$ python2.5 test.py
set([T(name=2), T(name=1), T(name=0), T(name=3)])
$ python2.5 test.py
set([T(name=3), T(name=1), T(name=0), T(name=2)])
$ python2.5 test.py
set([T(name=3), T(name=1), T(name=0), T(name=2)])
$ rm test.pyc
$ python2.5 test.py
set([T(name=2), T(name=1), T(name=0), T(name=3)])

Peter

Alan Isaac · May 9, 2007

Peter Otten said:
Alan Isaac wrote:
There is nothing wrong with the random module -- you get the same numbers on
every run. When there is no pyc-file Python uses some RAM to create it and
therefore your GridPlayer instances are located in different memory
locations and get different hash values. This in turn affects the order in
which they occur when you iterate over the GridPlayer.players_played set.

Thanks!!
This also explains Steven's results.

If I sort the set before iterating over it,
the "anomaly" disappears.

This means that currently the use of sets
(and, I assume, dictionaries) as iterators
compromises replicability. Is that a fair
statement?

For me (and apparently for a few others)
this was a very subtle problem. Is there
a warning anywhere in the docs? Should
there be?

Thanks again!!

Alan Isaac

Diez B. Roggisch · May 9, 2007

Alan said:
Thanks!!
This also explains Steven's results.

If I sort the set before iterating over it,
the "anomaly" disappears.

This means that currently the use of sets
(and, I assume, dictionaries) as iterators
compromises replicability. Is that a fair
statement?

Yes.

For me (and apparently for a few others)
this was a very subtle problem. Is there
a warning anywhere in the docs? Should
there be?

Not really, but that depends on what you know about the concept of sets and
maps as collections of course.

The contract for sets and dicts doesn't imply any order whatsoever. Which is
essentially the reason why

set(xrange(10))[0]

doesn't exist, and quite a few times cries for an ordered dictionary as part
of the standard libraries was made.

Diez

Alan G Isaac · May 9, 2007

Diez said:
Not really, but that depends on what you know about the concept of sets and
maps as collections of course.

The contract for sets and dicts doesn't imply any order whatsoever. Which is
essentially the reason why

set(xrange(10))[0]

doesn't exist, and quite a few times cries for an ordered dictionary as part
of the standard libraries was made.

It seems to me that you are missing the point,
but maybe I am missing your point.

The question of whether a set or dict guarantees
some order seems quite different from the question
of whether rerunning an **unchanged program** yields the
**unchanged results**. The latter question is the question
of replicability.

Again I point out that some sophisticated users
(among which I am not numbering myself) did not
see into the source of this "anomaly". This
suggests that an explicit warning is warranted.

Cheers,
Alan Isaac

PS I know ordered dicts are under discussion;
what about ordered sets?

Robert Kern · May 9, 2007

Alan said:
Diez said:

Not really, but that depends on what you know about the concept of sets and
maps as collections of course.

The contract for sets and dicts doesn't imply any order whatsoever. Which is
essentially the reason why

set(xrange(10))[0]

doesn't exist, and quite a few times cries for an ordered dictionary as part
of the standard libraries was made.

Click to expand...

It seems to me that you are missing the point,
but maybe I am missing your point.

The question of whether a set or dict guarantees
some order seems quite different from the question
of whether rerunning an **unchanged program** yields the
**unchanged results**. The latter question is the question
of replicability.

Again I point out that some sophisticated users
(among which I am not numbering myself) did not
see into the source of this "anomaly". This
suggests that an explicit warning is warranted.

http://docs.python.org/lib/typesmapping.html
"""
Keys and values are listed in an arbitrary order which is non-random, varies
across Python implementations, and depends on the dictionary's history of
insertions and deletions.
"""

The sets documentation is a bit less explicit, though.

http://docs.python.org/lib/types-set.html
"""
Like other collections, sets support x in set, len(set), and for x in set. Being
an unordered collection, sets do not record element position or order of insertion.
"""

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Relative Imports, why the hell is it so hard?	37	Mar 23, 2009
Importing package on Windows XP	5	Apr 22, 2010
execute python script question	2	Mar 10, 2008
gotcha or bug? random state reset on irrelevant import	6	Apr 25, 2007
problem with packages and path	7	Aug 27, 2008
Puzzling output when executing .pyc file directly	2	Dec 14, 2005
Exceptions and modules	1	Jul 4, 2003
Notify of change to list	3	Jun 13, 2008

change of random state when pyc created??

Alan Isaac

Dustan

Alan Isaac

John Machin

Alan Isaac

John Machin

Alan Isaac

Dustan

Steven D'Aprano

Dustan

John Machin

Alan Isaac

Steven D'Aprano

Alan Isaac

Gabriel Genellina

Peter Otten

Alan Isaac

Diez B. Roggisch

Alan G Isaac

Robert Kern

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads