difference between random module in python 2.6 and 3.2?

Discussion in 'Python' started by Matej Cepl, Feb 6, 2012.

  1. Matej Cepl

    Matej Cepl Guest

    Hi,

    I have this working function:

    def as_xml(self):
    out = etree.Element("or")
    for k in sorted(self.keys()):
    out.append(etree.Element("hostname",
    attrib={'op': '=', 'value': random.choice(self[k])}))

    # ... return somehow string representing XML

    and this unit test

    def test_XML_print(self):
    random.seed(1)
    expected = ... # expected XML
    observed = self.data.as_xml()
    self.assertEqual(observed, expected,
    "Verbose print (including PCI IDs)")

    Strange thing is that this unit tests correctly with python3, but fails
    with python2. The problem is that apparently python3 random.choice picks
    different element of self[k] than the one python2 (at least, both of
    them are constant in their choice).

    Is it known that there is this difference? Is there a way how to make
    both random.choice select the same?

    Best,

    Matěj
    Matej Cepl, Feb 6, 2012
    #1
    1. Advertising

  2. On Mon, 06 Feb 2012 02:27:38 +0100, Matej Cepl wrote:

    > Strange thing is that this unit tests correctly with python3, but fails
    > with python2. The problem is that apparently python3 random.choice picks
    > different element of self[k] than the one python2 (at least, both of
    > them are constant in their choice).


    Confirmed:

    steve@runes:~$ python2.6 -c "from random import choice, seed; seed(1);
    print choice(range(1000))"
    134
    steve@runes:~$ python3.2 -c "from random import choice, seed; seed(1);
    print(choice(list(range(1000))))"
    137

    steve@runes:~$ python2.6 -c "from random import choice, seed; seed(42);
    print choice(range(1000))"
    639
    steve@runes:~$ python3.2 -c "from random import choice, seed; seed(42);
    print(choice(list(range(1000))))"
    654


    > Is it known that there is this difference? Is there a way how to make
    > both random.choice select the same?


    Reading the docs, I would expect that when using an int as seed, you
    should get identical results. There is no mention that the PRNG has
    changed between 2.6 and 3.2; both should use the given int as seed. There
    is a change of behaviour when using strings/bytes/bytearrays, and
    Python3.2 provides a "version=N" argument to seed to set the old
    behaviour. But this doesn't apply to integer seeds.

    I call this a bug.

    It appears to be a bug in 3.2, because 3.1 gives the same results as 2.6:

    steve@runes:~$ python3.1 -c "from random import choice, seed; seed(42);
    print(choice(list(range(1000))))"
    639
    steve@runes:~$ python3.1 -c "from random import choice, seed; seed(1);
    print(choice(list(range(1000))))"
    134


    --
    Steven
    Steven D'Aprano, Feb 6, 2012
    #2
    1. Advertising

  3. Matej Cepl

    Terry Reedy Guest

    On 2/5/2012 11:01 PM, Steven D'Aprano wrote:

    > Reading the docs, I would expect that when using an int as seed, you
    > should get identical results.


    That is similar to expecting hash to be consistent from version to version.

    > There is no mention that the PRNG has changed between 2.6 and 3.2;


    There is at best an informal policy. This was discussed in
    http://bugs.python.org/issue9025
    Antoine argued that if there were a written policy, it should be limited
    to bug-fix releases within a version. I agree.

    > It appears to be a bug in 3.2, because 3.1 gives the same results as 2.6:


    This change is a side effect of fixing the bug of non-uniformity
    discussed in that issue. In any case, in 2.7 and probably 3.1:

    def choice(self, seq):
    """Choose a random element from a non-empty sequence."""
    return seq[int(self.random() * len(seq))] # raises IndexError

    whereas in 3.2:

    def choice(self, seq):
    """Choose a random element from a non-empty sequence."""
    try:
    i = self._randbelow(len(seq))
    except ValueError:
    raise IndexError('Cannot choose from an empty sequence')
    return seq

    The change was announced in What's New in 3.2

    random
    The integer methods in the random module now do a better job of
    producing uniform distributions. Previously, they computed selections
    with int(n*random()) which had a slight bias whenever n was not a power
    of two. Now, multiple selections are made from a range up to the next
    power of two and a selection is kept only when it falls within the range
    0 <= x < n. The functions and methods affected are randrange(),
    randint(), choice(), shuffle() and sample().

    --
    Terry Jan Reedy
    Terry Reedy, Feb 6, 2012
    #3
  4. On Mon, 06 Feb 2012 00:07:04 -0500, Terry Reedy wrote:

    > On 2/5/2012 11:01 PM, Steven D'Aprano wrote:
    >
    >> Reading the docs, I would expect that when using an int as seed, you
    >> should get identical results.

    >
    > That is similar to expecting hash to be consistent from version to
    > version.


    No. hash is not intended to be consistent across versions, or even across
    runs of the interpreter. Of course it may be, but that's not an implicit
    or explicit promise. Seeding a pseudo-random number generator, on the
    other hand, is explicitly for generating the same repeatable, consistent
    set of results. That's what seed is *for*.

    It is even documented that way:

    http://docs.python.org/py3k/library/random.html#notes-on-reproducibility

    although the docs weasel out of promising anything other than
    random.random() will be predictable.

    When the Mersenne Twister was introduced, the old Wichman-Hill PRNG was
    provided for those who needed repeatability. (I see it's gone now, but if
    people haven't migrated their code from 2.3 yet, shame on them.)


    >> There is no mention that the PRNG has changed between 2.6 and 3.2;

    >
    > There is at best an informal policy. This was discussed in
    > http://bugs.python.org/issue9025
    > Antoine argued that if there were a written policy, it should be limited
    > to bug-fix releases within a version. I agree.


    I think this thread demonstrates that there are people who depend on
    repeatability of the random number routines, and not just for
    random.random().

    I think it is ironic (and annoying) that the same release of Python that
    introduced a version argument to seed() to provide a backward compatible
    seeding algorithm, also introduced a backward incompatible change to
    choice().

    This, plus Raymond Hettinger's comments on the bug report, make me think
    that the change in behaviour of randrange and choice is not deliberate
    and should be treated as a bug. Raymond made a strong case arguing for
    repeatability, and then approved a bug fix that broke repeatability. I
    doubt that was deliberate.


    --
    Steven
    Steven D'Aprano, Feb 6, 2012
    #4
  5. Matej Cepl

    Terry Reedy Guest

    On 2/6/2012 12:56 AM, Steven D'Aprano wrote:
    > On Mon, 06 Feb 2012 00:07:04 -0500, Terry Reedy wrote:
    >
    >> On 2/5/2012 11:01 PM, Steven D'Aprano wrote:
    >>
    >>> Reading the docs, I would expect that when using an int as seed, you
    >>> should get identical results.

    >>
    >> That is similar to expecting hash to be consistent from version to
    >> version.

    >
    > No. hash is not intended to be consistent across versions, or even across


    Oh, but it was. Changing it will break tests, including some in the
    Python test suite.

    ....
    > This, plus Raymond Hettinger's comments on the bug report, make me think
    > that the change in behaviour of randrange and choice is not deliberate


    As I said, it was a necessary consequence of a bug fix.

    > and should be treated as a bug. Raymond made a strong case arguing for
    > repeatability, and then approved a bug fix that broke repeatability. I
    > doubt that was deliberate.


    It was deliberate that randrange was changed to an even distribution
    from a slightly uneven distribute. That implies a change in the pattern.
    That was known and the main subject of discussion. As Antoine said,
    making functions exactly repeatable across versions means not fixing
    bugs or otherwise improving them. This statement is not limited to the
    random module.

    You have persuaded me that the doc should be more explicit that while
    the basic random.random sequence will be kept repeatable with seed set
    (except perhaps after a changeover of several releases), the convenience
    transformations can be changed if improvements are needed or thought
    sufficiently desirable.

    --
    Terry Jan Reedy
    Terry Reedy, Feb 6, 2012
    #5
  6. On Mon, 06 Feb 2012 02:27:14 -0500, Terry Reedy wrote:

    [...]
    >> and should be treated as a bug. Raymond made a strong case arguing for
    >> repeatability, and then approved a bug fix that broke repeatability. I
    >> doubt that was deliberate.

    >
    > It was deliberate that randrange was changed to an even distribution
    > from a slightly uneven distribute. That implies a change in the pattern.


    Okay, granted.


    > That was known and the main subject of discussion. As Antoine said,
    > making functions exactly repeatable across versions means not fixing
    > bugs or otherwise improving them. This statement is not limited to the
    > random module.
    >
    > You have persuaded me that the doc should be more explicit that while
    > the basic random.random sequence will be kept repeatable with seed set
    > (except perhaps after a changeover of several releases), the convenience
    > transformations can be changed if improvements are needed or thought
    > sufficiently desirable.


    A more explicit note will help, but the basic problem applies: how do you
    write deterministic tests given that the random.methods (apart from
    random.random itself) can be changed without warning?


    --
    Steven
    Steven D'Aprano, Feb 6, 2012
    #6
  7. Matej Cepl

    Matej Cepl Guest

    On 6.2.2012 09:05, Steven D'Aprano wrote:
    >> You have persuaded me that the doc should be more explicit that while
    >> the basic random.random sequence will be kept repeatable with seed set
    >> (except perhaps after a changeover of several releases), the convenience
    >> transformations can be changed if improvements are needed or thought
    >> sufficiently desirable.

    >
    > A more explicit note will help, but the basic problem applies: how do you
    > write deterministic tests given that the random.methods (apart from
    > random.random itself) can be changed without warning?


    Also, how could I write a re-implementation of random.choice which would
    work same on python 2.6 and python 3.2? It is not only matter of unit
    tests, but I would really welcome if the results on both versions
    produce the same results.

    Could we get some hint in the release notes?

    Thanks for the help,

    Matěj
    Matej Cepl, Feb 6, 2012
    #7
  8. Matej Cepl

    Matej Cepl Guest

    On 6.2.2012 09:45, Matej Cepl wrote:
    > Also, how could I write a re-implementation of random.choice which would
    > work same on python 2.6 and python 3.2? It is not only matter of unit
    > tests, but I would really welcome if the results on both versions
    > produce the same results.


    Silly, of course, the solution is obvious ... I have just embedded
    random.choice from 2.6 to my code.

    Matěj
    Matej Cepl, Feb 6, 2012
    #8
  9. Matej Cepl

    Mel Wilson Guest

    Steven D'Aprano wrote:

    > A more explicit note will help, but the basic problem applies: how do you
    > write deterministic tests given that the random.methods (apart from
    > random.random itself) can be changed without warning?


    Biting the bullet would mean supplying your own PRNG, under your control.
    Jon Bentley somewhere, sometime, published a portable PRNG for that exact
    reason. (I wish I could find that article.) Specifically he wanted
    portability across various manufacturer's O/Ss.

    Mel.
    Mel Wilson, Feb 6, 2012
    #9
  10. Matej Cepl

    Tim Chase Guest

    On 02/06/12 12:48, Aaron France wrote:
    > On 02/06/2012 09:57 AM, Matej Cepl wrote:
    >> Silly, of course, the solution is obvious ... I have just
    >> embedded random.choice from 2.6 to my code.
    >>
    >> Matěj

    > Is the above actually a good idea though?
    >
    > What I understand you're doing is embedding the source from
    > the Python2.6 random.py file, /into/ your project files?


    In an ideal world, the code wouldn't have broken backwards
    compat. However, given the conditions, if Matej is willing to
    forgo bug-fixes, it's a reasonable solution. The alternate might
    be to try moving the recent/fixed version into the old project
    and updating tests/data to work with it. I have some 2.4
    production code in which I've pulled 2.6's zipfile module in to
    give access to the iterator access (rather than the .read()
    method which tries to read in umpteen gigs of data that I just
    want to spool out to a file on disk).

    -tkc
    Tim Chase, Feb 6, 2012
    #10
  11. Matej Cepl

    Matej Cepl Guest

    On 6.2.2012 20:26, Tim Chase wrote:
    > In an ideal world, the code wouldn't have broken backwards compat.
    > However, given the conditions, if Matej is willing to forgo bug-fixes,
    > it's a reasonable solution. The alternate might be to try moving the
    > recent/fixed version into the old project and updating tests/data to
    > work with it. I have some 2.4 production code in which I've pulled 2.6's
    > zipfile module in to give access to the iterator access (rather than the
    > .read() method which tries to read in umpteen gigs of data that I just
    > want to spool out to a file on disk).


    Given I really don't care that much about distribution of my choice,
    this function

    def _choice(self, seq):
    """Choose a random element from a non-empty sequence.

    Embedding random.choice from 2.6 in order to get an uniform
    results between 2.6 and 3.2, where random module has been
    changed because of http://bugs.python.org/issue9025.
    See also http://groups.google.com/group/comp.lang.python\
    /browse_thread/thread/2b000b8ca8c5e98e

    Raises IndexError if seq is empty
    """
    return seq[int(random.random() * len(seq))]

    doesn't seem like something so terrible (and maintenance intense). :)

    Thanks for the help though

    Matěj
    Matej Cepl, Feb 6, 2012
    #11
  12. 07.02.12 00:06, Matej Cepl напиÑав(ла):
    > return seq[int(random.random() * len(seq))]
    >
    > doesn't seem like something so terrible (and maintenance intense). :)


    _choice('abc') returns 'a' with probability P('a') = 1501199875790165/4503599627370496 = 1/3 - 1/13510798882111488 and 'b' with probability P('b') = 3002399751580331/9007199254740992 = 1/3 + 1/27021597764222976. P('b') - P('a') = 1/9007199254740992.

    This may be acceptable for your application, but for applications that are concerned not only about the repeatability of results, but the accuracy and consistency with the results obtained by other (not Python) applications, it is better to get rid of such a notorious bias.
    Serhiy Storchaka, Feb 7, 2012
    #12
  13. Am 06.02.2012 09:45, schrieb Matej Cepl:
    > Also, how could I write a re-implementation of random.choice which would
    > work same on python 2.6 and python 3.2? It is not only matter of unit
    > tests, but I would really welcome if the results on both versions
    > produce the same results.


    Two approaches come to mind:
    1. make two runs and verify that they differ
    This is a bit problematic, because there is a small but nonzero chance
    that two runs don't differ. Random numbers are just not random enough.
    Anyhow, afterwards, sort the results again and verify that it was just
    the order that differs.

    2. hack the random module into something nonrandom
    I think you can "import debug_random as random" and then have your
    testee automatically pick up that module instead of the real one. Since
    you already hardcoded the results to check against ("expected = ..." in
    your code), you can also hard-code the sequence of random numbers
    corresponding to that. This is even cleaner, as you don't rely on
    something being deterministic that is supposed to be random, which is
    not totally surprising but still somehow paradox.

    Uli
    Ulrich Eckhardt, Feb 7, 2012
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jakk
    Replies:
    4
    Views:
    12,039
  2. Xiao Jianfeng
    Replies:
    2
    Views:
    758
    Ben Finney
    Dec 16, 2005
  3. globalrev
    Replies:
    4
    Views:
    742
    Gabriel Genellina
    Apr 20, 2008
  4. sintral
    Replies:
    9
    Views:
    4,312
    Ben Bacarisse
    Dec 7, 2008
  5. VK
    Replies:
    15
    Views:
    1,119
    Dr J R Stockton
    May 2, 2010
Loading...

Share This Page