Request for source code review of simple Ising model

U

Udyant Wig

| His code initialized the memory as follows:
| /* Clear second bit */
| *cell &= ~0x02;
| /* Set second bit randomly */
| *cell |= ((byte) (rand () % 2)) << 1;
|
| Since *cell has the type "unsigned char", it cannot have a trap
| representation. Therefore, in the original version of his code, if the
| use of uninitialized memory had been intentional (perhaps as some
| bizarre substitute for proper randomization?), such code would make
| sense. However, since he's corrected the code to memset() the entire
| lattice (presumably to 0), this code is overly complicated. The second
| bit doesn't need to be cleared, because it's already guaranteed to be
| clear. The "|" in the "|=" is unnecessary, because the original value
| was already 0. Therefore, those two lines can be simplified to:
|
| *cell = ((byte) (rand () % 2)) << 1;
|
| And that change, in turn, renders the memset() unnecessary, since the
| result no longer depends, in any way, upon the pre-existing value of
| *cell.

The changes have been made. The initialization code now does only one
thing: initialization of the memory allocated by allocate_lattice().
 
M

Malcolm McLean

His code initialized the memory as follows:
/* Clear second bit */
*cell &= ~0x02;

/* Set second bit randomly */
*cell |= ((byte) (rand () % 2)) << 1;
Valgrind isn't clever enough t o realise that only one bit is
 
U

Udyant Wig

| On 04/15/2014 03:42 PM, Udyant Wig wrote:
| ...
|> *cell |= ((byte) (rand () % 2)) << 1;
|
| The C standard doesn't impose any significant restrictions on the
| quality of implementation of rand(), and on many implementations it's
| not very good. It might be good enough, if your needs are simple enough.
| However, if you're going to rely upon rand() despite the fact that it
| might not be very good, you should be aware of the fact that, in poor
| quality implementations, lower-order bits are likely to be significantly
| less random than higher-order ones. RAND_MAX is required to be >= 32767,
| so 0x4000 is the highest bit that guaranteed to be <RAND_MAX. If you're
| only going extract one bit from rand(), I'd recommend extracting that
| one: ((rand()%0x4000) == 0x4000 ? 2 : 1).

How does this work? Is it ever the case that

x mod 16384 = 16384?

| If, instead, you have access to high-quality random number generator,
| it's a waste to extract only one bit from each number: you should use
| each of the bits to initialize a different cell of your lattice.

It took me a while to work this out. I used the exact integers
facility of C99 via <stdint.h>. I allocated an array of int16_t and
then, for as many cells in the lattice, I took bits from a 16-bit
integer in the array until I had run through all bits in the one,
whereupon I went to the next one.

This necessitated changes to the allocators and initializers:

/* 16-bit integer bit source */
int16_t *allocate_bitsource (size_t size)
{
int16_t *bitsource;
bitsource = malloc (size);
if (bitsource == NULL) {
fprintf (stderr, "allocate_bitsource: %s\n", strerror (errno));
exit (EXIT_FAILURE);
}

return bitsource;
}

void initialize_bitsource (int16_t *bitsource)
{
int i;
int int_count = minimum_multiple_16 (dimension * dimension);

for (i = 0; i < int_count; i++) {
*bitsource = (int16_t)(rand () % RAND_MAX);
}
}


void initialize_lattice (byte *lattice, int16_t *bitsource)
{
int16_t *ip = bitsource;
byte *bp = lattice, *end = bp + dimension * dimension;
int i;

for (i = 0; bp < end; i++) {
if (i == 16) {
ip++;
i = 0;
}

*bp = ((byte)(*ip >> i)) & 0x01;
bp++;
}
}

/* defined in utilities.c */
int minimum_multiple_16 (int n)
{
int mm16 = n >> 4;
if (n % 16 != 0) {
mm16++;
}
return mm16;
}


| And that brings to mind another issue. Once you're sure that your
| program is working, and you've reached the point where it's reasonable
| to worry about performance, you might consider an alternative data
| storage strategy. You're using one byte to store one cell representing
| a spin 1/2 object: it therefore has only two quantized orientations,
| up and down, so you're storing only one bit of information for that
| cell. It might be better to use, for example, a 32-bit integer to
| store the values of 32 consecutive cells. Not only will this use up a
| lot less memory, but with (quite) a bit of ingenuity, you can use
| bit-wise operations to handle all 32 cells at the same time, for a
| naive speed-up by a factor of 32. In practice, the speed up will
| actually be less than that, but should still be substantial. On the
| other hand, if you implement this idea your code will be a lot more
| complicated, and much harder to understand. You'll have to decide
| whether that's a acceptable cost for the increased processing speed
| and decreased memory requirements.

This would be another rewrite of the program, as much of the existing
source would have to be modified for this data representation.

It should be worth looking into, because, as I said elsewhere, a given
run of a 10x10 lattice can take upwards of three hours and not
complete, while a 6x6 one finishes in mere minutes.
 
J

James Kuyper

| so 0x4000 is the highest bit that guaranteed to be <RAND_MAX. If you're
| only going extract one bit from rand(), I'd recommend extracting that
| one: ((rand()%0x4000) == 0x4000 ? 2 : 1).

How does this work? Is it ever the case that

x mod 16384 = 16384?
It doesn't - that was a typo. It should have been

(rand()&0x4000 == 0x4000) ? 2 : 1

Sorry!
 
J

James Kuyper

On 04/18/2014 03:02 AM, Udyant Wig wrote:
....
But when might this situation arise? That is, could there be a string
some of whose elements were characters < 0?

The basic source and execution character sets are defined as containing
the following characters, which I've listed in the order in which they
are referred to in section 5.2.1:

"\0ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!"
"!\"#%&'()*+,-./:;<=>?[\\]^_{|}~ \t\v\f"

In addition, the basic execution character set must also include "\a\b\r\n"

Note: The basic source character set is NOT required to contain '\n'.
Source code files are required to have some method of indicating the end
of a line. That method is treated as if it were a newline character at
the end of each line, regardless of whether or not that is the method
actually used.

Implementations are free to support additional characters, these are
referred to as being members of the extended character set.

"If a member of the basic execution character set is stored in a
char object, its value is guaranteed to be nonnegative. If any other
character is stored in a char object, the resulting value is
implementation-defined but shall be within the range of values that can
be represented in that type." (6.2.5p3)

"The implementation shall define char to have the same range,
representation, and behavior as either signed char or unsigned char."
(6.2.5p15).

This wasn't an arbitrary decision by the committee - at the time that
the standard was written, there were many implementations where plain
char was signed, an many others where it was unsigned.

If an implementation chooses for char to have the same range as signed
char, then for any member of the extended execution character set, if
that character is stored in a char object, the implementation is free to
define it to have a negative value. This is normally the case for some
of those characters on any system where char is signed that uses just
about any character encoding other than 7-bit ASCII.
 
U

Udyant Wig

| void initialize_bitsource (int16_t *bitsource)
| {
| int i;
| int int_count = minimum_multiple_16 (dimension * dimension);
|
| for (i = 0; i < int_count; i++) {
| *bitsource = (int16_t)(rand () % RAND_MAX);
| }
| }

Note that this code does not work as would be expected (it keeps
setting the same integer.) Either add

bitsource++;

to the loop body, or use this rewrite (using the loop form suggested by
Ben Bacarisse):

void initialize_bitsource (int16_t *bitsource)
{
int int_count = minimum_multiple_16 (dimension * dimension);

int16_t *ip = bitsource, *end = ip + int_count * sizeof *ip;
while (ip < end) {
*ip = (int16_t)(rand () % RAND_MAX);
ip++;
}
}
 
U

Udyant Wig

| (rand()&0x4000 == 0x4000) ? 2 : 1

Is this what this does: if RAND_MAX is in fact 32767, then this piece
of code produces 2 or 1 with roughly equal probability?

| Sorry!

No harm done. The original piece was puzzling, is all.
 
J

James Kuyper

| (rand()&0x4000 == 0x4000) ? 2 : 1

Is this what this does: if RAND_MAX is in fact 32767, then this piece
of code produces 2 or 1 with roughly equal probability?

Yes. If RAND_MAX is not 2^n-1 for some integer n (which is permitted),
then the equality would be much rougher. For best results, there's no
substitute for paying close attention to the value of RAND_MAX, and
adjusting your method for using the values returned by rand()
accordingly, but if you need to achieve a precisely specified
probability, that's not easy to do. For instance, your code uses

double random_value = (double) (rand () % 100) / 100.0;

It is guaranteed that 0 <= random_value && random_value < 1.0, and
(after allowing for floating point round-off), random_value will be an
integer multiple of 0.01. However, it will not produce all 100
different possible values with equal probability (not even if rand()
were an ideal random number generator). See if you can figure out why.
Hint: if RAND_MAX were 32799, it would work.

For this purpose, I'd normally use rand()/(RAND_MAX+1.0), which would
produce each possible value with equal probability if rand() were an
ideal random number generator. However, if you need to implement a
precise rational probability m/n, where m and n are small integers, it's
a very real issue.
 
K

Keith Thompson

James Kuyper said:
On 04/15/2014 03:42 PM, Udyant Wig wrote:
...

The C standard doesn't impose any significant restrictions on the
quality of implementation of rand(), and on many implementations it's
not very good. It might be good enough, if your needs are simple enough.
However, if you're going to rely upon rand() despite the fact that it
might not be very good, you should be aware of the fact that, in poor
quality implementations, lower-order bits are likely to be significantly
less random than higher-order ones. RAND_MAX is required to be >= 32767,
so 0x4000 is the highest bit that guaranteed to be <RAND_MAX. If you're
only going extract one bit from rand(), I'd recommend extracting that
one: ((rand()%0x4000) == 0x4000 ? 2 : 1).

(As discussed downthread, that "%" was meant to be "&".)

Is it still the case that the low-order bits are commonly less random
than the high-order bits? Question 13.18 of the FAQ
<http://www.c-faq.com/> says so, but I suspect it's out of date.

I think I've seen PRNGs where the low-order bit actually alternates
0, 1, 0, 1, 0, 1, ..., but the sample implementation in the standard
doesn't do that.

Here's a test program if anyone wants to try it. If it prints either
"01010101..." or "10101010...", please let us know what implementation
you're using.

#include <stdio.h>
#include <stdlib.h>

int main(void) {
for (int i = 0; i < 64; i ++) {
printf("%d", rand() % 2);
}
putchar('\n');
}

On one of my systems, the output is

1011110011010110000010110001111000111010111101001010100100011101
 
J

James Kuyper

(As discussed downthread, that "%" was meant to be "&".)

Is it still the case that the low-order bits are commonly less random
than the high-order bits? Question 13.18 of the FAQ
<http://www.c-faq.com/> says so, but I suspect it's out of date.

I don't know, which is why I deliberately prefixed my remarks with "in
poor quality implementations". The average quality of implementation of
rand() may be improving, but I believe that what I've written is still
true of poor quality implementations. However, that belief is entirely
second hand - I can't claim to have performed relevant studies myself.
 
K

Keith Thompson

James Kuyper said:
I don't know, which is why I deliberately prefixed my remarks with "in
poor quality implementations". The average quality of implementation of
rand() may be improving, but I believe that what I've written is still
true of poor quality implementations. However, that belief is entirely
second hand - I can't claim to have performed relevant studies myself.

And the question is, are such poor quality implementations still common
enough that they're worth worrying about? I'm not expecting you to have
a definitive answer to that (nor do I), but if someone else can provide
a concrete example it would be very interesting.

Given that the sample implementation that's been in the standard for the
last 25 years doesn't appear to suffer from this problem[*], there's
really no excuse for any current real-world implementation to be that
bad.

[*] At least it doesn't return 0, 1, 0, 1, ... for the low-order bit.
It *might* still be the case that the low-order bits are lower quality
than the high-order bits. I lack the expertise to analyze it.
 
T

Tim Rentsch

Keith Thompson said:
C90 did not say much about the representation of signed integer
types. It says:

The representations of integral types shall define values by
use of a pure binary numeration system.

Note: "representations ... shall define values by ...". I take
this statement to concern only those bits in the representation that
define the value, not necessarily all bits in the representation.
There's no mention of 2's-complement or any other signed
representation scheme. But it did require any value representable
both as a signed int and as an unsigned int to have the same
representation in both (likewise for other signed/unsigned paris),
which eliminates some possible schemes.

C99 limited the scope of that statement:

Values stored in unsigned bit-fields and objects of type
unsigned char shall be represented using a pure binary
notation.

I read these two statements as saying rather different things, with
neither being a subset of the other. The provision in C89/C90 I
read as giving a constraint on those bits that define the value of
the corresponding type, but not on other bits (ie, what we now call
"value bits"). The provision in C99 I read as giving a constraint
on /all/ bits in the object representation of the corresponding type
(ie, that value bits make up all bits of the representation).
Thus the C99 statement is a stronger condition, but applies less
generally - it requires more, but only in the two cases listed.
It introduced the distinction between value bits and padding bits
(which C90 didn't mention) and required signed integer types to be
represented using either sign and magnitude, two's complement, or
one's complement.

I *think* that C90 implicitly permitted padding bits, even though it
didn't mention them. One could argue that the phrase "pure binary
numeration system" is inconsistent with padding bits, but C99 does
use the phrase "pure binary representation" in reference to just the
value bits. And it would be odd for C99 to loosen the requirements
for integer representation relative to C90 while simultaneously
tightening the requirements for signed representation by listing the
three permitted forms.

Focusing on the "pure binary ..." phrase is a misdirection. In all
cases it refers to all the bits under consideration, but which bits
those are is determined by context outside the phrase itself. In
some cases that is just the value bits, in others all bits in the
object representation.

In any case, the question of whether C89/C90 allows padding bits, or
trap representations, in integral types is addressed in Defect
Report #069, dated December 3, 1993. The short answer is that it
allows both, not counting some exceptions in the case of character
types. Since I have the page open here is the link:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_069.html
 
U

Udyant Wig

| I think I've seen PRNGs where the low-order bit actually alternates 0,
| 1, 0, 1, 0, 1, ..., but the sample implementation in the standard
| doesn't do that.

There is a discussion of this very issue (or something extremely close
to it) in one of the assignments of Steve Summit's (fine) online C
programming classes found here:

http://www.eskimo.com/~scs/cclass/cclass.html

It is Exercise 8 of the third assignment:

http://www.eskimo.com/~scs/cclass/asgn.beg/PS3.html

The explanation is here in the answers section:

http://www.eskimo.com/~scs/cclass/asgn.beg/PS3a.html

| Here's a test program if anyone wants to try it. If it prints either
| "01010101..." or "10101010...", please let us know what implementation
| you're using.
|
| #include <stdio.h>
| #include <stdlib.h>
|
| int main(void) {
| for (int i = 0; i < 64; i ++) {
| printf("%d", rand() % 2);
| }
| putchar('\n');
| }

I changed it thusly to calm gcc down:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
int i;
for (i = 0; i < 64; i ++) {
printf("%d", rand() % 2);
}
putchar('\n');

return 0;
}

1011110011010110000010110001111000111010111101001010100100011101

| On one of my systems, the output is| 1011110011010110000010110001111000111010111101001010100100011101

The two outputs are equal. Independent confirmations by Emacs Lisp,

(string-equal
"1011110011010110000010110001111000111010111101001010100100011101"
"1011110011010110000010110001111000111010111101001010100100011101")
=> t

and their MD5 sums,

$ md5sum # Keith Thompson's output
1011110011010110000010110001111000111010111101001010100100011101
89e91cff9eefca6a24afaee2809c0859 -

$ md5sum # Udyant Wig's output
1011110011010110000010110001111000111010111101001010100100011101
89e91cff9eefca6a24afaee2809c0859 -

verify this.
 
U

Udyant Wig

The output using the rand() from FreeBSD libc:


/* freebsd-rand.c begins */

/*-
* Copyright (c) 1990, 1993
* The Regents of the University of California. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. Neither the name of the University nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/

#include <stdio.h>
#include <stdlib.h>

static int
do_rand(unsigned long *ctx)
{
#ifdef USE_WEAK_SEEDING
/*
* Historic implementation compatibility.
* The random sequences do not vary much with the seed,
* even with overflowing.
*/
return ((*ctx = *ctx * 1103515245 + 12345) % ((u_long)RAND_MAX + 1));
#else /* !USE_WEAK_SEEDING */
/*
* Compute x = (7^5 * x) mod (2^31 - 1)
* without overflowing 31 bits:
* (2^31 - 1) = 127773 * (7^5) + 2836
* From "Random number generators: good ones are hard to find",
* Park and Miller, Communications of the ACM, vol. 31, no. 10,
* October 1988, p. 1195.
*/
long hi, lo, x;

/* Must be in [1, 0x7ffffffe] range at this point. */
hi = *ctx / 127773;
lo = *ctx % 127773;
x = 16807 * lo - 2836 * hi;
if (x < 0)
x += 0x7fffffff;
*ctx = x;
/* Transform to [0, 0x7ffffffd] range. */
return (x - 1);
#endif /* !USE_WEAK_SEEDING */
}

static u_long next =
#ifdef USE_WEAK_SEEDING
1;
#else
2;
#endif

int
rand()
{
return (do_rand(&next));
}

int main(void) {
int i;
for (i = 0; i < 64; i ++) {
printf("%d", rand() % 2);
}
putchar('\n');

return 0;
}

/* freebsd-rand.c ends */


$ ./freebsd-rand
1101011000100110011110000010100011010001100001001101111110010001
 
K

Kaz Kylheku

| I think I've seen PRNGs where the low-order bit actually alternates 0,
| 1, 0, 1, 0, 1, ..., but the sample implementation in the standard
| doesn't do that.

There is a discussion of this very issue (or something extremely close
to it) in one of the assignments of Steve Summit's (fine) online C
programming classes found here:

A nice little PRNG is WELL. Reference C code is available (not freeware,
though: non-commercial use only!)

www.iro.umontreal.ca/~panneton/WELLRNG.html
http://en.wikipedia.org/wiki/Well_Equidistributed_Long-period_Linear

I've implemented this myself also, using a "clean-room" approach (not reusing any
of the above source code), and put it under a BSD license:

http://www.kylheku.com/cgit/txr/tree/rand.c

This is not packaged for general reuse, however. The heart of it is the
struct rand_state declaration, the rand32 function, and some useful
cases in the make_random_state function for seeding in various ways.
(The reference implementation has only an API for seeding using a vector
of ints, which must have the same size as the state vector.)
 
K

Keith Thompson

Udyant Wig said:
| I think I've seen PRNGs where the low-order bit actually alternates 0,
| 1, 0, 1, 0, 1, ..., but the sample implementation in the standard
| doesn't do that.

There is a discussion of this very issue (or something extremely close
to it) in one of the assignments of Steve Summit's (fine) online C
programming classes found here:

http://www.eskimo.com/~scs/cclass/cclass.html

It is Exercise 8 of the third assignment:

http://www.eskimo.com/~scs/cclass/asgn.beg/PS3.html

The explanation is here in the answers section:

http://www.eskimo.com/~scs/cclass/asgn.beg/PS3a.html

That merely asserts that:

In particular, it's not uncommon for the rand funciton to produce
alternately even and odd numbers, such that if you repeatedly
compute rand % 2, you'll get the decidedly non-random sequence 0,
1, 0, 1, 0, 1, 0, 1... .

I'm looking for concrete examples of real-world implementations that
behave this way.
| Here's a test program if anyone wants to try it. [...]
| for (int i = 0; i < 64; i ++) { [...]

I changed it thusly to calm gcc down:
[...]
int i;
for (i = 0; i < 64; i ++) {
[...]

Compiling with "-std=c99" also works.

It's a pity that gcc rejects this particular C99 feature by default.
Even if the gcc maintainers don't feel they're ready to make C99 the
default standard, they could easily treat this as a GNU extension on top
of C90.
 
M

Malcolm McLean

I don't know, which is why I deliberately prefixed my remarks with "in
poor quality implementations". The average quality of implementation of
rand() may be improving, but I believe that what I've written is still
true of poor quality implementations. However, that belief is entirely
second hand - I can't claim to have performed relevant studies myself.
You have to be careful with Ising models. Obviously it depends what you're
using the model for. They're statistical models of random but coherent
systems, with emergent properties. The so called critical temperature
might well be affected by the quality of the RNG.
 
T

Tim Rentsch

Kaz Kylheku said:
Yes, it is required in C90.

C90 requires a pure binary representation for unsigned integers,
and one of three choices for signed integers: two's complement,
one's complement or sign-magnitude for signed ones. All these
representations have an all bits zero.

Apparently you misunderstood or are misremembering. C90 does not
mention one's complement or sign-magnitude, and mentions two's
complement only in footnotes and examples, not in normative text.
The only types[*] required to have all bits of the representation
participate in the value are the character types, and all other
integral types may have what are now called padding bits, in both
C90 and C99. There are no requirements placed on what sets of
values of any padding bits correspond to numeric values in C90 or
the original version of C99; in C90 they are not mentioned, in
C99 explicitly unspecified. The wording used in C90 can be read
different ways, but this aspect was addressed and resolved quite
clearly in the official response of Defect Report 069. So the
requirement that an object representation of all zero bits must
be a valid representation of an integer-type zero was not
expressed in the Standard until N1256.

[*] In C99 there also are some optional integer types where all
bits of the representation must participate in the value. Also
I think IEEE-compliant floating point values have the property
that all bits of the representation participate in the value,
but again this is optional, not required of all implementations.
 
I

Ike Naar

Here's a test program if anyone wants to try it. If it prints either
"01010101..." or "10101010...", please let us know what implementation
you're using.

#include <stdio.h>
#include <stdlib.h>

int main(void) {
for (int i = 0; i < 64; i ++) {
printf("%d", rand() % 2);
}
putchar('\n');
}

On one of my systems, the output is

1011110011010110000010110001111000111010111101001010100100011101

Here it prints

0101010101010101010101010101010101010101010101010101010101010101

The implementation is gcc version 4.5.3 (NetBSD nb2 20110806).
 
I

Ike Naar

It doesn't - that was a typo. It should have been

(rand()&0x4000 == 0x4000) ? 2 : 1

This produces 1 or 2, depending on the output of rand().

The original expression was

((byte) (rand () % 2)) << 1

which produces 0 or 2, depending on the output of rand().
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top