Request for source code review of simple Ising model

G

glen herrmannsfeldt

(snip, someone wrote)
| If you don't mind making your code a bit non-portable (and in fact
| systems where 0.0 and NULL *aren't* represented as all-bits-zero are
| rare), then calloc() or memset() can be reasonable. It will at least
| make the initial contents of the allocated memory consistent, even if
| it's not necessarily correct.
As I did not use floats or doubles, I suppose the program is safe.

Is there a processor built in the last 50 years that doesn't use
all bits zero for floating point zero?

Using a biased exponent, an exponent of zero bits is the smallest
exponent, which a zero should have.

From Blaauw and Brooks "Computer Architecture: Concepts and Evolution"

there are processors with sign magnitude, ones complement, or
twos complement exponents, Some such machines are the IBM 1620,
Ferranti Atlas, CDC 6600, Burroughs B5500, and the CDC STAR-100.

Even so, they could special case zero, which I believe some of
those do. I believe you will only find them running in a museum.

-- glen
 
B

BartC

glen herrmannsfeldt said:
(snip, someone wrote)


Is there a processor built in the last 50 years that doesn't use
all bits zero for floating point zero?

Using a biased exponent, an exponent of zero bits is the smallest
exponent, which a zero should have.

From Blaauw and Brooks "Computer Architecture: Concepts and Evolution"

there are processors with sign magnitude, ones complement, or
twos complement exponents, Some such machines are the IBM 1620,
Ferranti Atlas, CDC 6600, Burroughs B5500, and the CDC STAR-100.

Even so, they could special case zero, which I believe some of
those do. I believe you will only find them running in a museum.

That's easy enough to test for. Just put a range of checks at the start of
an application, which determine whether in fact 0.0 floats or doubles, and
null pointers, are all-bits-zero, if this is the assumption made in the rest
of program.

Then it can simply abort on any rogue machine (and maybe you can then decide
to deal with that if you think it's worthwhile).

Otherwise no point crippling the software on every other machine just in
case.
 
M

Malcolm McLean

Is there a processor built in the last 50 years that doesn't use
all bits zero for floating point zero?
People still produce processors without floating point units, not normally for general-purpose
computers, but for small embedded applications.
C floating point arithmetic is then implemented in a quick and dirty way, often by having one
byte of sign/exponent and two of mantissa. Probably most still use biased exponents, but a few
might use a signed exponent with -128 coded to zero. I don't actually know of any examples.
 
L

Les Cargill

Malcolm said:
People still produce processors without floating point units, not normally for general-purpose
computers, but for small embedded applications.
C floating point arithmetic is then implemented in a quick and dirty way, often by having one
byte of sign/exponent and two of mantissa. Probably most still use biased exponents, but a few
might use a signed exponent with -128 coded to zero. I don't actually know of any examples.

I am always willing to be proven wrong, but I don't think strong
portability will be of that much value in those cases.
 
K

Keith Thompson

Udyant Wig said:
| If you don't mind making your code a bit non-portable (and in fact
| systems where 0.0 and NULL *aren't* represented as all-bits-zero are
| rare), then calloc() or memset() can be reasonable. It will at least
| make the initial contents of the allocated memory consistent, even if
| it's not necessarily correct.

As I did not use floats or doubles, I suppose the program is safe.

It should be (at least as far as that's concerned). The standard
guarantees that all-bits-zero is a representation of 0 for any
integer type.

(Interestingly, this was not guaranteed by the C90 or C99 standard.
It was added to one of the C99 Technical Corrigenda, presumably
because *all* implementations already work that way.)
 
K

Kaz Kylheku

It should be (at least as far as that's concerned). The standard
guarantees that all-bits-zero is a representation of 0 for any
integer type.

(Interestingly, this was not guaranteed by the C90 or C99 standard.
It was added to one of the C99 Technical Corrigenda, presumably
because *all* implementations already work that way.)

Yes, it is required in C90.

C90 requires a pure binary representation for unsigned integers, and one of
three choices for signed integers: two's complement, one's complement or
sign-magnitude for signed ones. All these representations have an all bits
zero.
 
K

Keith Thompson

BartC said:
Is there a processor built in the last 50 years that doesn't use
all bits zero for floating point zero?
[...]
That's easy enough to test for. Just put a range of checks at the start of
an application, which determine whether in fact 0.0 floats or doubles, and
null pointers, are all-bits-zero, if this is the assumption made in the rest
of program.

Then it can simply abort on any rogue machine (and maybe you can then decide
to deal with that if you think it's worthwhile).

In principle, that could fail if all-bits-zero is a trap
representation for floating-point or pointer types. For that matter,
it's conceivable that all-bits-zero could be a representation of
0.0 for some floating-point types but not for others; likewise for
pointer types.

Another problem is that if you write such checks, you have code in your
application that will never be executed (unless you somehow manage to
run it on an exotic system with non-zero representations for 0.0 and/or
NULL). If you get the checking code wrong, you might never know.

It might be better just to document the assumption in a comment.
Otherwise no point crippling the software on every other machine just in
case.

Crippling? I haven't found it particularly difficult to write code
that doesn't *care* whether all-bits-zero is a valid representation
for 0.0 or NULL. Sure, it's sometimes convenient to be able to use
memset() to clear an array or structure that might contain data
of arbitrary types, but a { 0 } initializer will do the same thing
more portably.

I don't advocate going (very far) out of your way to allow for the
possibility that your code might run on some exotic, and perhaps
nonexistent, architecture. What I do advocate is being aware of
the issues, and understanding the distinction between the guarantees
made by the C standard and the somewhat larger set of things you can
(reasonably) safely assume.
 
K

Keith Thompson

Udyant Wig said:
|
| There's no need because you go and set every cell to some initial
| value. Calling calloc might have zero extra cost, but it's
| unnecessary, and calling memset is just a waste when you subsequently
| set the cells yourself.

I have erred on the side of caution in the latest revision and
explicitly initialized the lattice memory with memset(). At the very
least, it made Valgrind report no errors.
[...]

Here's a question (I don't know the answer in your case because I
haven't studied your code; perhaps I should).

If you allocate a chunk of memory, it's clear that accessing that
memory before you've stored anything in it is a bug.

If you allocate a chunk of memory and then use memset() to set
it to all zeros (let's assume all-bits-zero represents 0 for all
relevant types), is accessing that memory before you've stored some
meaningful value still a bug?

If a zero value is meaningful, then using memset is probably a good
idea. For example, zeroing an array of char that's intended to store
a string will give you an empty string.) But there's a risk that
arbitarily initializing a chunk of memory could just mask errors.
Valgrind, for example, doesn't know what all-bits-zero *means*,
whether it's a meaningful value or just a marker for something you
haven't yet initialized.

If you remove the memset() and Valgrind starts complaining again, it
might be pointing to something that you need to fix. A bug with
consistent behavior is no better than a bug with random behavior.

Taking a quick look at some of your source code, I see the following
in utilities.c:

bool is_positive_integer (char *string)
{
char *sp;

for (sp = string; *sp != '\0'; sp++) {
if (!isdigit (*sp)) {
return false;
}
}

return (is_all_zeroes (string) ? false : true);
}

isdigit() takes an int argument, not a char, and it requires the
argument to be either within the range of *unsigned* char or EOF;
otherwise its behavior is undefined. If plain char is signed, you have
undefined behavior if *sp < 0. You need to write

isdigit((unsigned char)*sp)

(Yes, it's annoying and counterintuitive, but we're stuck with it.)

This:

return (is_all_zeroes (string) ? false : true);

is more clearly written as:

return !is_all_zeroes (string);
 
J

James Kuyper

I have erred on the side of caution in the latest revision and
explicitly initialized the lattice memory with memset(). At the very
least, it made Valgrind report no errors.
[...]

Here's a question (I don't know the answer in your case because I
haven't studied your code; perhaps I should).

If you allocate a chunk of memory, it's clear that accessing that
memory before you've stored anything in it is a bug.

If you allocate a chunk of memory and then use memset() to set
it to all zeros (let's assume all-bits-zero represents 0 for all
relevant types), is accessing that memory before you've stored some
meaningful value still a bug?

If a zero value is meaningful, then using memset is probably a good
idea. For example, zeroing an array of char that's intended to store
a string will give you an empty string.) But there's a risk that
arbitarily initializing a chunk of memory could just mask errors.
Valgrind, for example, doesn't know what all-bits-zero *means*,
whether it's a meaningful value or just a marker for something you
haven't yet initialized.

His code initialized the memory as follows:
/* Clear second bit */
*cell &= ~0x02;
/* Set second bit randomly */
*cell |= ((byte) (rand () % 2)) << 1;

Since *cell has the type "unsigned char", it cannot have a trap
representation. Therefore, in the original version of his code, if the
use of uninitialized memory had been intentional (perhaps as some
bizarre substitute for proper randomization?), such code would make
sense. However, since he's corrected the code to memset() the entire
lattice (presumably to 0), this code is overly complicated. The second
bit doesn't need to be cleared, because it's already guaranteed to be
clear. The "|" in the "|=" is unnecessary, because the original value
was already 0. Therefore, those two lines can be simplified to:

*cell = ((byte) (rand () % 2)) << 1;

And that change, in turn, renders the memset() unnecessary, since the
result no longer depends, in any way, upon the pre-existing value of *cell.
 
J

James Kuyper

On 04/15/2014 03:42 PM, Udyant Wig wrote:
....
*cell |= ((byte) (rand () % 2)) << 1;

The C standard doesn't impose any significant restrictions on the
quality of implementation of rand(), and on many implementations it's
not very good. It might be good enough, if your needs are simple enough.
However, if you're going to rely upon rand() despite the fact that it
might not be very good, you should be aware of the fact that, in poor
quality implementations, lower-order bits are likely to be significantly
less random than higher-order ones. RAND_MAX is required to be >= 32767,
so 0x4000 is the highest bit that guaranteed to be <RAND_MAX. If you're
only going extract one bit from rand(), I'd recommend extracting that
one: ((rand()%0x4000) == 0x4000 ? 2 : 1).

If, instead, you have access to high-quality random number generator,
it's a waste to extract only one bit from each number: you should use
each of the bits to initialize a different cell of your lattice.

And that brings to mind another issue. Once you're sure that your
program is working, and you've reached the point where it's reasonable
to worry about performance, you might consider an alternative data
storage strategy. You're using one byte to store one cell representing a
spin 1/2 object: it therefore has only two quantized orientations, up
and down, so you're storing only one bit of information for that cell.
It might be better to use, for example, a 32-bit integer to store the
values of 32 consecutive cells. Not only will this use up a lot less
memory, but with (quite) a bit of ingenuity, you can use bit-wise
operations to handle all 32 cells at the same time, for a naive speed-up
by a factor of 32. In practice, the speed up will actually be less than
that, but should still be substantial.
On the other hand, if you implement this idea your code will be a lot
more complicated, and much harder to understand. You'll have to decide
whether that's a acceptable cost for the increased processing speed and
decreased memory requirements.
 
G

glen herrmannsfeldt

(snip)
It should be (at least as far as that's concerned). The standard
guarantees that all-bits-zero is a representation of 0 for any
integer type.
(Interestingly, this was not guaranteed by the C90 or C99 standard.
It was added to one of the C99 Technical Corrigenda, presumably
because *all* implementations already work that way.)

Can you give an example of what C90 allowed?

My interpetation of C90 was that it allowed for sign magnitude,
ones complement, and twos complement. (Of the ones that actually
made any sense.)

As well as I knew it, K&R allowed for more than that.

-- glen
 
J

James Kuyper

(snip)


Can you give an example of what C90 allowed?

I don't have a copy of C90, so I can't be sure. I do know that a
definition for "trap representation", and wording that made use of that
term, were added in C99. That addition was bitterly debated - it was
felt by many that the wording of C90 already clearly allowed for trap
representations, without explicitly needing to name them as such; I
think there was even a DR whose resolution said so, possibly implicitly.
Personally, I think it's convenient to have a name for them, and
explicit mention of their significance, even if it is in fact
technically redundant.

I believe that one possibility allowed by C90 was an integer type with
one or more padding bits. Such a padding bit could be either a mark
parity bit, or an odd parity bit - either of those options would
prohibit 0 from being represented by all-bits-zero.
 
U

Udyant Wig

I thought it best to provide the latest source code for perusal. It
incorporates most of the observations and commentary offered thus far.
This should provide an updated point of reference.


/* common.h begins */

#ifndef COMMON_H
#define COMMON_H

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <math.h>
#include <stdbool.h>
#include <time.h>
#include <ctype.h>

#endif

/* common.h ends */


/* utilities.h begins */

#ifndef UTILITIES_H
#define UTILITIES_H

#include "common.h"

bool is_all_zeroes (char *string);
bool is_positive_integer (char *string);
int how_many (const char *s, int c);
char *copy_substring (char *substring, const char *string, size_t start, size_t end);

#endif

/* utilities.h ends */


/* utilities.c begins */

#include "utilities.h"

bool is_all_zeroes (char *string)
{
return string [strspn (string, "0")] == '\0';
}

bool is_positive_integer (char *string)
{
char *sp;

for (sp = string; *sp != '\0'; sp++) {
if (!isdigit (*sp)) {
return false;
}
}

return !is_all_zeroes (string);
}

/* From /C: A Reference Manual/, 5th ed., by Harbison and Steele
* page 352
*/
int how_many (const char *s, int c)
{
int n = 0;
if (c == 0) return 0;
while (s) {
s = strchr (s, c);
if (s) n++, s++;
}
return n;
}

char *copy_substring (char *substring, const char *string, size_t start, size_t end)
{
size_t length = end - start + 1;

errno = 0;
substring = malloc (1 + length);
if (substring == NULL) {
fprintf (stderr, "copy_substring: %s\n", strerror (errno));
exit (1);
}

memset (substring, 0, 1 + length);
strncpy (substring, string + start, length - 1);
substring [length] = '\0';

return substring;
}

/* utilities.c ends */


/* bitsing.h begins */

#ifndef BITSING_H
#define BITSING_H

typedef unsigned char byte;

/* Lattice */
byte *allocate_lattice (size_t size);
void initialize_lattice (byte *lattice);
void print_lattice (byte *lattice);

/* Core */
int check_and_add_neighbor (byte *lattice, byte cell, int col, int row);
void next_state (byte *lattice);
void iterate (byte *lattice);

/* Miscellaneous */
int count_ones (byte *lattice);
void print_iteration (byte *lattice, const char *format_string, int count);

/* Error checking */
void minority_error (void);
void beta_error (void);
void dimension_error (void);

#endif

/* bitsing.h ends */


/* bitsing.c begins */

#include "common.h"
#include "bitsing.h"
#include "utilities.h"

#define ENERGY_MASK 0x1c

static int dimension = 3;
static double beta = 0.5;
static int minority_size = 1;


/* Lattice */
byte *allocate_lattice (size_t size)
{
byte *lattice;

errno = 0;
lattice = malloc (size);
if (lattice == NULL) {
fprintf (stderr, "allocate_lattice: %s\n", strerror (errno));
exit (1);
}

return lattice;
}

void initialize_lattice (byte *lattice)
{
byte *bp = lattice, *end = bp + dimension * dimension;
while (bp < end) {
*bp = ((byte) (rand () % 2)) << 1;
bp++;
}
}

void print_lattice (byte *lattice)
{
int row, col;

for (col = 0; col < dimension; col++) {
for (row = 0; row < dimension; row++) {
printf ("%d", lattice [col * dimension + row] & 0x01);
}
putchar ('\n');
}
}


/* Core */
int check_and_add_neighbor (byte *lattice, byte cell, int col, int row)
{
byte neighbor;
if (0 <= col && 0 <= row && col < dimension && row < dimension) {
neighbor = lattice [col * dimension + row];
return ((cell & 0x02) ^ (neighbor & 0x02));
}
return 0;
}

void next_state (byte *lattice)
{
byte *bp = lattice, *end = bp + dimension * dimension;
while (bp < end) {
*bp |= (0x01 & ((*bp & 0x02) >> 1));
bp++;
}
}

void iterate (byte *lattice)
{
int rrow, rcol;
byte rcell, ccell;
int old_energy;
int new_energy;
int delta;

rrow = rand () % dimension;
rcol = rand () % dimension;
rcell = lattice [rcol * dimension + rrow];
ccell = (rcell ^ 0x02) & 0x02; /* comlemented cell */
old_energy = (rcell & ENERGY_MASK) >> 2;

new_energy = check_and_add_neighbor (lattice, ccell, rrow - 1, rcol);
new_energy = check_and_add_neighbor (lattice, ccell, rrow + 1, rcol);
new_energy = check_and_add_neighbor (lattice, ccell, rrow, rcol - 1);
new_energy = check_and_add_neighbor (lattice, ccell, rrow, rcol + 1);

delta = new_energy - old_energy;
if (delta < 0) {
lattice [rcol * dimension + rrow] = ccell;
/* clear energy */
ccell &= ~ENERGY_MASK;
/* set energy */
ccell |= (ENERGY_MASK & ((byte) new_energy << 2));
}
else {
double ising_window = exp (-(beta * delta));
double random_value = (double) (rand () % 100) / 100.0;

if (random_value < ising_window) {
lattice [rcol * dimension + rrow] = ccell;
/* clear energy */
ccell &= ~ENERGY_MASK;
/* set energy */
ccell |= (ENERGY_MASK & ((byte) new_energy << 2));
}
}

}


/* Miscellaneous */
int count_ones (byte *lattice)
{
int count = 0;
byte *bp = lattice, *end = bp + dimension * dimension;
while (bp < end) {
if ((*bp & 0x01) == (byte) 1) {
count++;
}
bp++;
}

return count;
}

static const char initial_format [] = \
"Initial configuration %6d\n----------------------------\n";
static const char iteration_format [] = \
"Iteration %6d\n----------------\n";
static const char final_format [] = \
"Final configuration %6d\n--------------------------\n";

void print_iteration (byte *lattice, const char *format_string, int count)
{
putchar ('\n');
printf (format_string, count);
print_lattice (lattice);
}


/* Error checking */
void minority_error (void)
{
fputs ("minority_size <argv [3]> is not a nonnegative integer\n", stderr);
exit (2);
}

void beta_error (void)
{
fputs ("beta <argv [2]> is not positive floating-point number.\n", stderr);
exit (2);
}

void dimension_error (void)
{
fputs ("dimension <argv [1]> is not a nonnegative integer.\n", stderr);
exit (2);
}


/* Return codes:
* 0 -- success
* 1 -- memory allocation failure
* 2 -- invalid command line argument
*/
int main (int argc, char *argv [])
{
if (argc > 3) {
if (is_positive_integer (argv [3]) || is_all_zeroes (argv [3])) {
minority_size = atoi (argv [3]);
}
else {
minority_error ();
}

if (minority_size < 0) {
minority_size = 0;
}
else if (minority_size > dimension * dimension) {
minority_size = dimension * dimension;
}
else if (minority_size > (dimension * dimension) / 2) {
minority_size = (dimension * dimension) - minority_size;
}
}
if (argc > 2) {
char *beta_string = argv [2];
size_t length = strlen (beta_string);

if (how_many (beta_string, '.') == 1) {
size_t position_decimal = strcspn (beta_string, ".");
char *integral_part = NULL;
char *fractional_part = NULL;

if (position_decimal == 0) {
beta_error ();
}
if (position_decimal == strlen (beta_string) - 1) {
beta_error ();
}

integral_part = copy_substring (integral_part, \
beta_string, \
0, \
position_decimal - 1);
if (is_positive_integer (integral_part) || \
is_all_zeroes (integral_part)) {
fractional_part = copy_substring (fractional_part, \
beta_string, \
position_decimal + 1, \
length - 1);
if (is_positive_integer (fractional_part) || \
is_all_zeroes (fractional_part)) {
beta = atof (argv [2]);
}
else {
beta_error ();
}

free (fractional_part);
}
else {
beta_error ();
}

free (integral_part);
}
else {
beta_error ();
}
}
if (argc > 1) {
if (is_positive_integer (argv [1]))
dimension = atoi (argv [1]);
else {
dimension_error ();
}
}

srand (time (0));

byte *lattice;
lattice = allocate_lattice (dimension * dimension * sizeof *lattice);
initialize_lattice (lattice);
next_state (lattice);
print_iteration (lattice, initial_format, 0);

long count = 1;
while (true) {
iterate (lattice);

int ones_count = count_ones (lattice);
if ((ones_count == minority_size) || \
((dimension * dimension - ones_count) == minority_size)) {
break;
}

print_iteration (lattice, iteration_format, count++);
next_state (lattice);
}

print_iteration (lattice, final_format, count);
free (lattice);
exit (0);
}

/* bitsing.c ends */
 
B

BartC

I believe that one possibility allowed by C90 was an integer type with
one or more padding bits. Such a padding bit could be either a mark
parity bit, or an odd parity bit - either of those options would
prohibit 0 from being represented by all-bits-zero.

These are parity bits accessible from software? The ones I've come across
were dealt with in hardware and wouldn't impact on using all-bits-zero for
any values. They would in fact be completely transparent.
 
J

James Kuyper

These are parity bits accessible from software? The ones I've come across
were dealt with in hardware and wouldn't impact on using all-bits-zero for
any values. They would in fact be completely transparent.

Completely transparent bits don't count as padding bits for the purposes
of the C standard. Padding bits can always be made visible by accessing
the memory as if it were an array of unsigned char. The fact that those
bits are padding bits means that they would be invisible (unless they
give the object a trap representation) when viewing that same memory
using an lvalue of the integer type for which they constitute padding
bits. It would be up to the implementation to make sure a) that the
padding bits are actually invisible and b) that they get set to valid
values when code with defined behavior is used to write the values to
memory.

Please do NOT ask me to provide real-world examples. I'm just trying to
give an example of something that was allowed by C90, but not by C99.
I'm fairly certain that the committee would not have changed the
standard to disallow such an implementation, if they had been aware of
any significant number of real-world implementations where something
like that was actually being done. If nobody was doing anything like
that, there's probably no good motivation for doing anything like that -
so for the same reason, please don't ask me to provide such a motivation.
 
K

Keith Thompson

glen herrmannsfeldt said:
Can you give an example of what C90 allowed?

My interpetation of C90 was that it allowed for sign magnitude,
ones complement, and twos complement. (Of the ones that actually
made any sense.)

As well as I knew it, K&R allowed for more than that.

C90 did not say much about the representation of signed integer types.
It says:

The representations of integral types shall define values by use of
a pure binary numeration system.

There's no mention of 2's-complement or any other signed
representation scheme. But it did require any value representable
both as a signed int and as an unsigned int to have the same
representation in both (likewise for other signed/unsigned paris),
which eliminates some possible schemes.

C99 limited the scope of that statement:

Values stored in unsigned bit-fields and objects of type unsigned
char shall be represented using a pure binary notation.

It introduced the distinction between value bits and padding bits (which
C90 didn't mention) and required signed integer types to be represented
using either sign and magnitude, two's complement, or one's complement.

I *think* that C90 implicitly permitted padding bits, even though it
didn't mention them. One could argue that the phrase "pure binary
numeration system" is inconsistent with padding bits, but C99 does use
the phrase "pure binary representation" in reference to just the value
bits. And it would be odd for C99 to loosen the requirements for integer
representation relative to C90 while simultaneously tightening the
requirements for signed representation by listing the three permitted
forms.

So assuming that C90 permitted padding bits, you could imagine an
implementation where int is 32 bits, but one of the bits is a padding
bit that must be set to 1. I'm fairly sure such an implementation is
permitted (perhaps unintentionally) by the original C99 standard. It's
specifically forbidden the second Technical Corrigendum, and therefore
by N1256 and C11.

The change was introduced in response to Defect Report #263 and
published in TC 2.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_263.htm
 
K

Keith Thompson

Keith Thompson said:
I *think* that C90 implicitly permitted padding bits, even though it
didn't mention them. One could argue that the phrase "pure binary
numeration system" is inconsistent with padding bits, but C99 does use
the phrase "pure binary representation" in reference to just the value
bits. And it would be odd for C99 to loosen the requirements for integer
representation relative to C90 while simultaneously tightening the
requirements for signed representation by listing the three permitted
forms.
[...]

And I realized that a standard could permit padding bits but not permit
trap representations. I believe that C90 implicitly permitted both
padding bits and trap representations (basically because it didn't
forbid them).
 
G

glen herrmannsfeldt

These are parity bits accessible from software? The ones I've come across
were dealt with in hardware and wouldn't impact on using all-bits-zero for
any values. They would in fact be completely transparent.

There might have been processors where parity bits weren't
completely transparent. There are stories of WATFOR on the 7090
using parity to detect undefined variables. It would set the wrong
parity on all memory at the beginning, and trap access with parity
errors.

With ECC, it is also not completely transparent. If you write smaller
than the ECC word size, where 64 bits isn't unusual, the memory
system has to fetch (check and correct) the whole word, change
the byte or bytes needed, then write back the new value with new
ECC bits. At power up, the ECC bits are usually not right, so
someone (usually the OS) has to write with the appropriate
instruction some value (usually zero) through all of memory.

(ECC on 64 bits is convenient, as it takes 8 ECC bits, the same
as needed for byte parity.)

-- glen
 
G

glen herrmannsfeldt

(snip, I wrote)
C90 did not say much about the representation of signed integer types.
It says:
The representations of integral types shall define values by use of
a pure binary numeration system.
There's no mention of 2's-complement or any other signed
representation scheme. But it did require any value representable
both as a signed int and as an unsigned int to have the same
representation in both (likewise for other signed/unsigned paris),
which eliminates some possible schemes.

Not counting padding bits, and using the same number of bits for
signed and unsigned, three choices that work and make sense are
sign magnitude, ones, and twos, complement.

You could imagine a system where the value bits had the same position
(except the sign bit) as unsigned when the sign bit was zero, and
completely different positions when it was one, but that doesn't
make much sense. It also disallows a biased representation
(twos complement with the sign bit inverted).

Seems to me that there could be hardware without an unsigned
representation (or the ability to do arithmetic on one), such
that C unsigned types had half the range of C signed types.
(That is, INT_MAX equals UINT_MAX.) (For twos complement, you
can fake unsigned with some extra work. Much harder with ones
complement.) For hardware with a biased represenation, you
could call the sign bit a padding bit, which should be 1 for
unsigned types.

I believe, though, that the "all bits zero for zero" idea is so
deep in the minds of hardware designers that it isn't likely
to happen.
C99 limited the scope of that statement:
Values stored in unsigned bit-fields and objects of type unsigned
char shall be represented using a pure binary notation.
It introduced the distinction between value bits and padding bits (which
C90 didn't mention) and required signed integer types to be represented
using either sign and magnitude, two's complement, or one's complement.

I am still waiting for a C compiler for the 7090 to try out
sign magnitude. As far as I know, the last (well, maybe the 7094)
of the sign magnitude machines.
I *think* that C90 implicitly permitted padding bits, even though it
didn't mention them. One could argue that the phrase "pure binary
numeration system" is inconsistent with padding bits, but C99 does use
the phrase "pure binary representation" in reference to just the value
bits. And it would be odd for C99 to loosen the requirements for integer
representation relative to C90 while simultaneously tightening the
requirements for signed representation by listing the three permitted
forms.
So assuming that C90 permitted padding bits, you could imagine an
implementation where int is 32 bits, but one of the bits is a padding
bit that must be set to 1. I'm fairly sure such an implementation is
permitted (perhaps unintentionally) by the original C99 standard. It's
specifically forbidden the second Technical Corrigendum, and therefore
by N1256 and C11.
The change was introduced in response to Defect Report #263 and
published in TC 2.

-- glen
 
U

Udyant Wig

| Here's a question (I don't know the answer in your case because I
| haven't studied your code; perhaps I should).
|
| If you allocate a chunk of memory, it's clear that accessing that
| memory before you've stored anything in it is a bug.
|
| If you allocate a chunk of memory and then use memset() to set it to
| all zeros (let's assume all-bits-zero represents 0 for all relevant
| types), is accessing that memory before you've stored some meaningful
| value still a bug?

It could be considered a bug unless the zeroes in question had meaning.

| If a zero value is meaningful, then using memset is probably a good
| idea. For example, zeroing an array of char that's intended to store
| a string will give you an empty string.) But there's a risk that
| arbitarily initializing a chunk of memory could just mask errors.
| Valgrind, for example, doesn't know what all-bits-zero *means*,
| whether it's a meaningful value or just a marker for something you
| haven't yet initialized.
|
| If you remove the memset() and Valgrind starts complaining again, it
| might be pointing to something that you need to fix. A bug with
| consistent behavior is no better than a bug with random behavior.

Point taken.

James Kuyper pointed out the allocated memory could be set directly,
obviating the need to access it when it is uninitialized. This in turn
would obviate the need for malloc()+memset() or calloc().

I have made this change.

| Taking a quick look at some of your source code, I see the following
| in utilities.c:
|
| bool is_positive_integer (char *string)
| {
| char *sp;
|
| for (sp = string; *sp != '\0'; sp++) {
| if (!isdigit (*sp)) {
| return false;
| }
| }
|
| return (is_all_zeroes (string) ? false : true);
| }
|
| isdigit() takes an int argument, not a char, and it requires the
| argument to be either within the range of *unsigned* char or EOF;
| otherwise its behavior is undefined. If plain char is signed, you have
| undefined behavior if *sp < 0. You need to write
|
| isdigit((unsigned char)*sp)
|
| (Yes, it's annoying and counterintuitive, but we're stuck with it.)

But when might this situation arise? That is, could there be a string
some of whose elements were characters < 0?

| This:
|
| return (is_all_zeroes (string) ? false : true);
|
| is more clearly written as:
|
| return !is_all_zeroes (string);

Yes. That does express the intent more clearly.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,874
Messages
2,569,925
Members
46,183
Latest member
FideliaWol

Latest Threads

Top