Request for source code review of simple Ising model

Discussion in 'C Programming' started by Udyant Wig, Apr 10, 2014.

  1. Udyant Wig

    Udyant Wig Guest

    | His code initialized the memory as follows:
    | /* Clear second bit */
    | *cell &= ~0x02;
    | /* Set second bit randomly */
    | *cell |= ((byte) (rand () % 2)) << 1;
    |
    | Since *cell has the type "unsigned char", it cannot have a trap
    | representation. Therefore, in the original version of his code, if the
    | use of uninitialized memory had been intentional (perhaps as some
    | bizarre substitute for proper randomization?), such code would make
    | sense. However, since he's corrected the code to memset() the entire
    | lattice (presumably to 0), this code is overly complicated. The second
    | bit doesn't need to be cleared, because it's already guaranteed to be
    | clear. The "|" in the "|=" is unnecessary, because the original value
    | was already 0. Therefore, those two lines can be simplified to:
    |
    | *cell = ((byte) (rand () % 2)) << 1;
    |
    | And that change, in turn, renders the memset() unnecessary, since the
    | result no longer depends, in any way, upon the pre-existing value of
    | *cell.

    The changes have been made. The initialization code now does only one
    thing: initialization of the memory allocated by allocate_lattice().
     
    Udyant Wig, Apr 18, 2014
    #41
    1. Advertisements

  2. Valgrind isn't clever enough t o realise that only one bit is
     
    Malcolm McLean, Apr 18, 2014
    #42
    1. Advertisements

  3. Udyant Wig

    Udyant Wig Guest

    | On 04/15/2014 03:42 PM, Udyant Wig wrote:
    | ...
    |> *cell |= ((byte) (rand () % 2)) << 1;
    |
    | The C standard doesn't impose any significant restrictions on the
    | quality of implementation of rand(), and on many implementations it's
    | not very good. It might be good enough, if your needs are simple enough.
    | However, if you're going to rely upon rand() despite the fact that it
    | might not be very good, you should be aware of the fact that, in poor
    | quality implementations, lower-order bits are likely to be significantly
    | less random than higher-order ones. RAND_MAX is required to be >= 32767,
    | so 0x4000 is the highest bit that guaranteed to be <RAND_MAX. If you're
    | only going extract one bit from rand(), I'd recommend extracting that
    | one: ((rand()%0x4000) == 0x4000 ? 2 : 1).

    How does this work? Is it ever the case that

    x mod 16384 = 16384?

    | If, instead, you have access to high-quality random number generator,
    | it's a waste to extract only one bit from each number: you should use
    | each of the bits to initialize a different cell of your lattice.

    It took me a while to work this out. I used the exact integers
    facility of C99 via <stdint.h>. I allocated an array of int16_t and
    then, for as many cells in the lattice, I took bits from a 16-bit
    integer in the array until I had run through all bits in the one,
    whereupon I went to the next one.

    This necessitated changes to the allocators and initializers:

    /* 16-bit integer bit source */
    int16_t *allocate_bitsource (size_t size)
    {
    int16_t *bitsource;
    bitsource = malloc (size);
    if (bitsource == NULL) {
    fprintf (stderr, "allocate_bitsource: %s\n", strerror (errno));
    exit (EXIT_FAILURE);
    }

    return bitsource;
    }

    void initialize_bitsource (int16_t *bitsource)
    {
    int i;
    int int_count = minimum_multiple_16 (dimension * dimension);

    for (i = 0; i < int_count; i++) {
    *bitsource = (int16_t)(rand () % RAND_MAX);
    }
    }


    void initialize_lattice (byte *lattice, int16_t *bitsource)
    {
    int16_t *ip = bitsource;
    byte *bp = lattice, *end = bp + dimension * dimension;
    int i;

    for (i = 0; bp < end; i++) {
    if (i == 16) {
    ip++;
    i = 0;
    }

    *bp = ((byte)(*ip >> i)) & 0x01;
    bp++;
    }
    }

    /* defined in utilities.c */
    int minimum_multiple_16 (int n)
    {
    int mm16 = n >> 4;
    if (n % 16 != 0) {
    mm16++;
    }
    return mm16;
    }


    | And that brings to mind another issue. Once you're sure that your
    | program is working, and you've reached the point where it's reasonable
    | to worry about performance, you might consider an alternative data
    | storage strategy. You're using one byte to store one cell representing
    | a spin 1/2 object: it therefore has only two quantized orientations,
    | up and down, so you're storing only one bit of information for that
    | cell. It might be better to use, for example, a 32-bit integer to
    | store the values of 32 consecutive cells. Not only will this use up a
    | lot less memory, but with (quite) a bit of ingenuity, you can use
    | bit-wise operations to handle all 32 cells at the same time, for a
    | naive speed-up by a factor of 32. In practice, the speed up will
    | actually be less than that, but should still be substantial. On the
    | other hand, if you implement this idea your code will be a lot more
    | complicated, and much harder to understand. You'll have to decide
    | whether that's a acceptable cost for the increased processing speed
    | and decreased memory requirements.

    This would be another rewrite of the program, as much of the existing
    source would have to be modified for this data representation.

    It should be worth looking into, because, as I said elsewhere, a given
    run of a 10x10 lattice can take upwards of three hours and not
    complete, while a 6x6 one finishes in mere minutes.
     
    Udyant Wig, Apr 18, 2014
    #43
  4. Udyant Wig

    James Kuyper Guest

    It doesn't - that was a typo. It should have been

    (rand()&0x4000 == 0x4000) ? 2 : 1

    Sorry!
     
    James Kuyper, Apr 18, 2014
    #44
  5. Udyant Wig

    James Kuyper Guest

    On 04/18/2014 03:02 AM, Udyant Wig wrote:
    ....
    The basic source and execution character sets are defined as containing
    the following characters, which I've listed in the order in which they
    are referred to in section 5.2.1:

    "\0ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!"
    "!\"#%&'()*+,-./:;<=>?[\\]^_{|}~ \t\v\f"

    In addition, the basic execution character set must also include "\a\b\r\n"

    Note: The basic source character set is NOT required to contain '\n'.
    Source code files are required to have some method of indicating the end
    of a line. That method is treated as if it were a newline character at
    the end of each line, regardless of whether or not that is the method
    actually used.

    Implementations are free to support additional characters, these are
    referred to as being members of the extended character set.

    "If a member of the basic execution character set is stored in a
    char object, its value is guaranteed to be nonnegative. If any other
    character is stored in a char object, the resulting value is
    implementation-defined but shall be within the range of values that can
    be represented in that type." (6.2.5p3)

    "The implementation shall define char to have the same range,
    representation, and behavior as either signed char or unsigned char."
    (6.2.5p15).

    This wasn't an arbitrary decision by the committee - at the time that
    the standard was written, there were many implementations where plain
    char was signed, an many others where it was unsigned.

    If an implementation chooses for char to have the same range as signed
    char, then for any member of the extended execution character set, if
    that character is stored in a char object, the implementation is free to
    define it to have a negative value. This is normally the case for some
    of those characters on any system where char is signed that uses just
    about any character encoding other than 7-bit ASCII.
     
    James Kuyper, Apr 18, 2014
    #45
  6. Udyant Wig

    Udyant Wig Guest

    | void initialize_bitsource (int16_t *bitsource)
    | {
    | int i;
    | int int_count = minimum_multiple_16 (dimension * dimension);
    |
    | for (i = 0; i < int_count; i++) {
    | *bitsource = (int16_t)(rand () % RAND_MAX);
    | }
    | }

    Note that this code does not work as would be expected (it keeps
    setting the same integer.) Either add

    bitsource++;

    to the loop body, or use this rewrite (using the loop form suggested by
    Ben Bacarisse):

    void initialize_bitsource (int16_t *bitsource)
    {
    int int_count = minimum_multiple_16 (dimension * dimension);

    int16_t *ip = bitsource, *end = ip + int_count * sizeof *ip;
    while (ip < end) {
    *ip = (int16_t)(rand () % RAND_MAX);
    ip++;
    }
    }
     
    Udyant Wig, Apr 18, 2014
    #46
  7. Udyant Wig

    Udyant Wig Guest

    | (rand()&0x4000 == 0x4000) ? 2 : 1

    Is this what this does: if RAND_MAX is in fact 32767, then this piece
    of code produces 2 or 1 with roughly equal probability?

    | Sorry!

    No harm done. The original piece was puzzling, is all.
     
    Udyant Wig, Apr 18, 2014
    #47
  8. Udyant Wig

    James Kuyper Guest

    Yes. If RAND_MAX is not 2^n-1 for some integer n (which is permitted),
    then the equality would be much rougher. For best results, there's no
    substitute for paying close attention to the value of RAND_MAX, and
    adjusting your method for using the values returned by rand()
    accordingly, but if you need to achieve a precisely specified
    probability, that's not easy to do. For instance, your code uses

    double random_value = (double) (rand () % 100) / 100.0;

    It is guaranteed that 0 <= random_value && random_value < 1.0, and
    (after allowing for floating point round-off), random_value will be an
    integer multiple of 0.01. However, it will not produce all 100
    different possible values with equal probability (not even if rand()
    were an ideal random number generator). See if you can figure out why.
    Hint: if RAND_MAX were 32799, it would work.

    For this purpose, I'd normally use rand()/(RAND_MAX+1.0), which would
    produce each possible value with equal probability if rand() were an
    ideal random number generator. However, if you need to implement a
    precise rational probability m/n, where m and n are small integers, it's
    a very real issue.
     
    James Kuyper, Apr 18, 2014
    #48
  9. (As discussed downthread, that "%" was meant to be "&".)

    Is it still the case that the low-order bits are commonly less random
    than the high-order bits? Question 13.18 of the FAQ
    <http://www.c-faq.com/> says so, but I suspect it's out of date.

    I think I've seen PRNGs where the low-order bit actually alternates
    0, 1, 0, 1, 0, 1, ..., but the sample implementation in the standard
    doesn't do that.

    Here's a test program if anyone wants to try it. If it prints either
    "01010101..." or "10101010...", please let us know what implementation
    you're using.

    #include <stdio.h>
    #include <stdlib.h>

    int main(void) {
    for (int i = 0; i < 64; i ++) {
    printf("%d", rand() % 2);
    }
    putchar('\n');
    }

    On one of my systems, the output is

    1011110011010110000010110001111000111010111101001010100100011101
     
    Keith Thompson, Apr 18, 2014
    #49
  10. Udyant Wig

    James Kuyper Guest

    I don't know, which is why I deliberately prefixed my remarks with "in
    poor quality implementations". The average quality of implementation of
    rand() may be improving, but I believe that what I've written is still
    true of poor quality implementations. However, that belief is entirely
    second hand - I can't claim to have performed relevant studies myself.
     
    James Kuyper, Apr 18, 2014
    #50
  11. And the question is, are such poor quality implementations still common
    enough that they're worth worrying about? I'm not expecting you to have
    a definitive answer to that (nor do I), but if someone else can provide
    a concrete example it would be very interesting.

    Given that the sample implementation that's been in the standard for the
    last 25 years doesn't appear to suffer from this problem[*], there's
    really no excuse for any current real-world implementation to be that
    bad.

    [*] At least it doesn't return 0, 1, 0, 1, ... for the low-order bit.
    It *might* still be the case that the low-order bits are lower quality
    than the high-order bits. I lack the expertise to analyze it.
     
    Keith Thompson, Apr 18, 2014
    #51
  12. Udyant Wig

    Tim Rentsch Guest

    Note: "representations ... shall define values by ...". I take
    this statement to concern only those bits in the representation that
    define the value, not necessarily all bits in the representation.
    I read these two statements as saying rather different things, with
    neither being a subset of the other. The provision in C89/C90 I
    read as giving a constraint on those bits that define the value of
    the corresponding type, but not on other bits (ie, what we now call
    "value bits"). The provision in C99 I read as giving a constraint
    on /all/ bits in the object representation of the corresponding type
    (ie, that value bits make up all bits of the representation).
    Thus the C99 statement is a stronger condition, but applies less
    generally - it requires more, but only in the two cases listed.
    Focusing on the "pure binary ..." phrase is a misdirection. In all
    cases it refers to all the bits under consideration, but which bits
    those are is determined by context outside the phrase itself. In
    some cases that is just the value bits, in others all bits in the
    object representation.

    In any case, the question of whether C89/C90 allows padding bits, or
    trap representations, in integral types is addressed in Defect
    Report #069, dated December 3, 1993. The short answer is that it
    allows both, not counting some exceptions in the case of character
    types. Since I have the page open here is the link:

    http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_069.html
     
    Tim Rentsch, Apr 18, 2014
    #52
  13. Udyant Wig

    Udyant Wig Guest

    | I think I've seen PRNGs where the low-order bit actually alternates 0,
    | 1, 0, 1, 0, 1, ..., but the sample implementation in the standard
    | doesn't do that.

    There is a discussion of this very issue (or something extremely close
    to it) in one of the assignments of Steve Summit's (fine) online C
    programming classes found here:

    http://www.eskimo.com/~scs/cclass/cclass.html

    It is Exercise 8 of the third assignment:

    http://www.eskimo.com/~scs/cclass/asgn.beg/PS3.html

    The explanation is here in the answers section:

    http://www.eskimo.com/~scs/cclass/asgn.beg/PS3a.html

    | Here's a test program if anyone wants to try it. If it prints either
    | "01010101..." or "10101010...", please let us know what implementation
    | you're using.
    |
    | #include <stdio.h>
    | #include <stdlib.h>
    |
    | int main(void) {
    | for (int i = 0; i < 64; i ++) {
    | printf("%d", rand() % 2);
    | }
    | putchar('\n');
    | }

    I changed it thusly to calm gcc down:

    #include <stdio.h>
    #include <stdlib.h>

    int main(void) {
    int i;
    for (i = 0; i < 64; i ++) {
    printf("%d", rand() % 2);
    }
    putchar('\n');

    return 0;
    }

    1011110011010110000010110001111000111010111101001010100100011101

    | On one of my systems, the output is| 1011110011010110000010110001111000111010111101001010100100011101

    The two outputs are equal. Independent confirmations by Emacs Lisp,

    (string-equal
    "1011110011010110000010110001111000111010111101001010100100011101"
    "1011110011010110000010110001111000111010111101001010100100011101")
    => t

    and their MD5 sums,

    $ md5sum # Keith Thompson's output
    1011110011010110000010110001111000111010111101001010100100011101
    89e91cff9eefca6a24afaee2809c0859 -

    $ md5sum # Udyant Wig's output
    1011110011010110000010110001111000111010111101001010100100011101
    89e91cff9eefca6a24afaee2809c0859 -

    verify this.
     
    Udyant Wig, Apr 18, 2014
    #53
  14. Udyant Wig

    Udyant Wig Guest

    The output using the rand() from FreeBSD libc:


    /* freebsd-rand.c begins */

    /*-
    * Copyright (c) 1990, 1993
    * The Regents of the University of California. All rights reserved.
    *
    * Redistribution and use in source and binary forms, with or without
    * modification, are permitted provided that the following conditions
    * are met:
    * 1. Redistributions of source code must retain the above copyright
    * notice, this list of conditions and the following disclaimer.
    * 2. Redistributions in binary form must reproduce the above copyright
    * notice, this list of conditions and the following disclaimer in the
    * documentation and/or other materials provided with the distribution.
    * 3. Neither the name of the University nor the names of its contributors
    * may be used to endorse or promote products derived from this software
    * without specific prior written permission.
    *
    * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
    * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
    * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
    * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
    * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
    * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
    * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
    * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
    * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
    * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
    * SUCH DAMAGE.
    */

    #include <stdio.h>
    #include <stdlib.h>

    static int
    do_rand(unsigned long *ctx)
    {
    #ifdef USE_WEAK_SEEDING
    /*
    * Historic implementation compatibility.
    * The random sequences do not vary much with the seed,
    * even with overflowing.
    */
    return ((*ctx = *ctx * 1103515245 + 12345) % ((u_long)RAND_MAX + 1));
    #else /* !USE_WEAK_SEEDING */
    /*
    * Compute x = (7^5 * x) mod (2^31 - 1)
    * without overflowing 31 bits:
    * (2^31 - 1) = 127773 * (7^5) + 2836
    * From "Random number generators: good ones are hard to find",
    * Park and Miller, Communications of the ACM, vol. 31, no. 10,
    * October 1988, p. 1195.
    */
    long hi, lo, x;

    /* Must be in [1, 0x7ffffffe] range at this point. */
    hi = *ctx / 127773;
    lo = *ctx % 127773;
    x = 16807 * lo - 2836 * hi;
    if (x < 0)
    x += 0x7fffffff;
    *ctx = x;
    /* Transform to [0, 0x7ffffffd] range. */
    return (x - 1);
    #endif /* !USE_WEAK_SEEDING */
    }

    static u_long next =
    #ifdef USE_WEAK_SEEDING
    1;
    #else
    2;
    #endif

    int
    rand()
    {
    return (do_rand(&next));
    }

    int main(void) {
    int i;
    for (i = 0; i < 64; i ++) {
    printf("%d", rand() % 2);
    }
    putchar('\n');

    return 0;
    }

    /* freebsd-rand.c ends */


    $ ./freebsd-rand
    1101011000100110011110000010100011010001100001001101111110010001
     
    Udyant Wig, Apr 18, 2014
    #54
  15. Udyant Wig

    Kaz Kylheku Guest

    A nice little PRNG is WELL. Reference C code is available (not freeware,
    though: non-commercial use only!)

    www.iro.umontreal.ca/~panneton/WELLRNG.html
    http://en.wikipedia.org/wiki/Well_Equidistributed_Long-period_Linear

    I've implemented this myself also, using a "clean-room" approach (not reusing any
    of the above source code), and put it under a BSD license:

    http://www.kylheku.com/cgit/txr/tree/rand.c

    This is not packaged for general reuse, however. The heart of it is the
    struct rand_state declaration, the rand32 function, and some useful
    cases in the make_random_state function for seeding in various ways.
    (The reference implementation has only an API for seeding using a vector
    of ints, which must have the same size as the state vector.)
     
    Kaz Kylheku, Apr 18, 2014
    #55
  16. That merely asserts that:

    In particular, it's not uncommon for the rand funciton to produce
    alternately even and odd numbers, such that if you repeatedly
    compute rand % 2, you'll get the decidedly non-random sequence 0,
    1, 0, 1, 0, 1, 0, 1... .

    I'm looking for concrete examples of real-world implementations that
    behave this way.
    [...]

    Compiling with "-std=c99" also works.

    It's a pity that gcc rejects this particular C99 feature by default.
    Even if the gcc maintainers don't feel they're ready to make C99 the
    default standard, they could easily treat this as a GNU extension on top
    of C90.
     
    Keith Thompson, Apr 18, 2014
    #56
  17. You have to be careful with Ising models. Obviously it depends what you're
    using the model for. They're statistical models of random but coherent
    systems, with emergent properties. The so called critical temperature
    might well be affected by the quality of the RNG.
     
    Malcolm McLean, Apr 18, 2014
    #57
  18. Udyant Wig

    Tim Rentsch Guest

    Apparently you misunderstood or are misremembering. C90 does not
    mention one's complement or sign-magnitude, and mentions two's
    complement only in footnotes and examples, not in normative text.
    The only types[*] required to have all bits of the representation
    participate in the value are the character types, and all other
    integral types may have what are now called padding bits, in both
    C90 and C99. There are no requirements placed on what sets of
    values of any padding bits correspond to numeric values in C90 or
    the original version of C99; in C90 they are not mentioned, in
    C99 explicitly unspecified. The wording used in C90 can be read
    different ways, but this aspect was addressed and resolved quite
    clearly in the official response of Defect Report 069. So the
    requirement that an object representation of all zero bits must
    be a valid representation of an integer-type zero was not
    expressed in the Standard until N1256.

    [*] In C99 there also are some optional integer types where all
    bits of the representation must participate in the value. Also
    I think IEEE-compliant floating point values have the property
    that all bits of the representation participate in the value,
    but again this is optional, not required of all implementations.
     
    Tim Rentsch, Apr 19, 2014
    #58
  19. Udyant Wig

    Ike Naar Guest

    Here it prints

    0101010101010101010101010101010101010101010101010101010101010101

    The implementation is gcc version 4.5.3 (NetBSD nb2 20110806).
     
    Ike Naar, Apr 19, 2014
    #59
  20. Udyant Wig

    Ike Naar Guest

    This produces 1 or 2, depending on the output of rand().

    The original expression was

    ((byte) (rand () % 2)) << 1

    which produces 0 or 2, depending on the output of rand().
     
    Ike Naar, Apr 19, 2014
    #60
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.