cast musings

mathog · Mar 19, 2013

{ Introductory material, skip down to the end bracket if you don't
care how I got onto this subject.

I have recently been taking some heat for using constructs like this:

float some_float;
/* code sets some_float */
int i = (int) round(some_float);

in C++ code. Some programmers claim that if it is not rewritten as

int i = static_cast<int> round(some_float);

the sky will fall. So far none of the folks claiming this
have actually presented an example where using C style casts on these
sorts of simple data types actually does something unexpected. They
have a point where inheritance is involved, but there is none of that in
the code in question. I like () better here because it doesn't waste 11
characters on every cast, which becomes a factor if more than one cast
must fit on the same line.
}

(For everything below here assume that float and int are the same size.)

This got me thinking about C casts, where

int i = (int) round(some_float); /* 1a */

specifies a type conversion, whereas this

int i = *(int *)(&some_float); /* 2a */

specifies a reinterpretation of the bits in memory. I always thought
the second form was a pretty awkward way to accomplish this, and that
there was room for improvement there. It isn't that the meaning is not
clear, it is just that having to resort to using a pointer to the data
in order to get its bits reinterpreted (still) feels like a kludge.
There is also "const" and "volatile" to consider, which are fine in
declarations but have always seemed too lengthy when placed inside casts.

So bear me with me and consider this more generalized form for a C cast
(obviously this is entirely hypothetical, there are no compilers that do
this):

([task] [access] type)

Where type must be specified and task and access are optional,
and:

Task specifies the action of the cast, one of:
to Short for "convert to". The target is to be
converted to the type specified.
Default when no [task] is present.
as Short for "use as". The target's bits in memory
are to be interpreted as specified by this cast.
Access specifies read/write access to the result of the cast one of:
rw read/write access.
Default when no [access] is specified.
(Is there currently a keyword to specify this?)
ro read only access (equivalent to "const")
wo write only access (equivalent to "volatile")
Type is the "output" type of the cast, like int or double, just
as in current C casts.

The general form allows these alternate casts:

int i = (int) round(some_float); /* 1a */
int i = (to int) round(some_float); /* 1b */
int i = (to rw int) round(some_float); /* 1c */

int i = *(int *)(&some_float); /* 2a */
int i = (as int) some_float; /* 2b */

const int i = *(const int *)(&some_float); /* 3a */
const int i = (as ro int)some_float; /* 3b */

int i = (int) some_float; /* 4a */
int i = (to int) some_float; /* 4b */

function((volatile float *)&some_float); /* 5a */
function((wo float *)&some_float); /* 5b */

Not a big win in clarity for 1b or 1c vs. 1a, since the extra
text is just specifying defaults. 1b might be a bit clearer
for a beginner, but after a few weeks those training wheels
would come off. I think 2b is clearer than 2a, and 3b clearer
than 3a. Incorrect code, with the wrong task employed, might
be easier to spot if the 4b and 2b forms were uniformly employed, but
otherwise not a clean win for 4b over 4a. 5b versus 5a feels like a wash.

Is something along these lines worth incorporating into the C language,
or no? Clearly we can get along without it - I am just wondering if
adding this would be beneficial.

Regards,

David Mathog

Shao Miller · Mar 19, 2013

(For everything below here assume that float and int are the same size.)

This got me thinking about C casts, where

int i = (int) round(some_float); /* 1a */

specifies a type conversion, whereas this

int i = *(int *)(&some_float); /* 2a */

specifies a reinterpretation of the bits in memory. I always thought
the second form was a pretty awkward way to accomplish this, and that
there was room for improvement there. It isn't that the meaning is not
clear, it is just that having to resort to using a pointer to the data
in order to get its bits reinterpreted (still) feels like a kludge.
There is also "const" and "volatile" to consider, which are fine in
declarations but have always seemed too lengthy when placed inside casts.

You don't often want to do this. The result of the cast might not
address properly-aligned memory, which would be undefined behaviour.
Getting past that, the indirection might break effective type rules,
which would be undefined behaviour. Sometimes, when you know what
you're doing, there isn't a risk of undefined behaviour, but better
constructs are available.

So bear me with me and consider this more generalized form for a C cast
(obviously this is entirely hypothetical, there are no compilers that do
this):

([task] [access] type)

Where type must be specified and task and access are optional,
and:

Task specifies the action of the cast, one of:
to Short for "convert to". The target is to be
converted to the type specified.
Default when no [task] is present.
as Short for "use as". The target's bits in memory
are to be interpreted as specified by this cast.
Access specifies read/write access to the result of the cast one of:
rw read/write access.
Default when no [access] is specified.
(Is there currently a keyword to specify this?)
ro read only access (equivalent to "const")
wo write only access (equivalent to "volatile")

"wo" would be a misnomer, since a 'volatile' can be read.

Type is the "output" type of the cast, like int or double, just
as in current C casts.

The general form allows these alternate casts:

int i = (int) round(some_float); /* 1a */
int i = (to int) round(some_float); /* 1b */
int i = (to rw int) round(some_float); /* 1c */

int i = *(int *)(&some_float); /* 2a */
int i = (as int) some_float; /* 2b */

const int i = *(const int *)(&some_float); /* 3a */
const int i = (as ro int)some_float; /* 3b */

int i = (int) some_float; /* 4a */
int i = (to int) some_float; /* 4b */

function((volatile float *)&some_float); /* 5a */
function((wo float *)&some_float); /* 5b */

Not a big win in clarity for 1b or 1c vs. 1a, since the extra
text is just specifying defaults. 1b might be a bit clearer
for a beginner, but after a few weeks those training wheels
would come off. I think 2b is clearer than 2a, and 3b clearer
than 3a. Incorrect code, with the wrong task employed, might
be easier to spot if the 4b and 2b forms were uniformly employed, but
otherwise not a clean win for 4b over 4a. 5b versus 5a feels like a wash.

Is something along these lines worth incorporating into the C language,
or no? Clearly we can get along without it - I am just wondering if
adding this would be beneficial.

Interesting stuff, but I don't understand what the "access" is supposed
to specify... We already have 'const' and 'volatile', so...? Also, the
result of a cast is not an lvalue, so it can't be modified, so...?

glen herrmannsfeldt · Mar 19, 2013

mathog said:
{ Introductory material, skip down to the end bracket if you don't
care how I got onto this subject.

(snip on C++, casts, and such)

(For everything below here assume that float and int are the same size.)

This got me thinking about C casts, where

int i = (int) round(some_float); /* 1a */

specifies a type conversion, whereas this

int i = *(int *)(&some_float); /* 2a */

specifies a reinterpretation of the bits in memory.

Well, to me it is that the cast specifies a conversion
for the pointer. While on most systems the bit representation
for pointers to different types are the same, on some machines
they are different. Specifically, on word addressed machines
where char is smaller than a machine word.

I always thought the second form was a pretty awkward way to
accomplish this, and that there was room for improvement there.

I suppose so. In Fortran, we used to do with with EQUIVALENCE,
but now there is TRANSFER.

It isn't that the meaning is not clear, it is just that
having to resort to using a pointer to the data in order
to get its bits reinterpreted (still) feels like a kludge.

I suppose so, but in most cases where you use it, it is
a kludge.

But OK, PL/I has the UNSPEC function and pseudo-variable so
you can write:

UNSPEC(F)=UNSPEC(I);

as you note, assuming that they are the same size. UNSPEC converts
to or from a bit string.

Java has floatToIntBits, doubleToLongBits, and their inverse functions.
Otherwise, the kludges, by definition, don't work in Java.

(snip related to const and volatile)

-- glen

James Kuyper · Mar 19, 2013

{ Introductory material, skip down to the end bracket if you don't
care how I got onto this subject.

I have recently been taking some heat for using constructs like this:

float some_float;
/* code sets some_float */
int i = (int) round(some_float);

in C++ code. Some programmers claim that if it is not rewritten as

int i = static_cast<int> round(some_float);

the sky will fall. So far none of the folks claiming this
have actually presented an example where using C style casts on these
sorts of simple data types actually does something unexpected. They
have a point where inheritance is involved, but there is none of that in
the code in question. I like () better here because it doesn't waste 11
characters on every cast, which becomes a factor if more than one cast
must fit on the same line.
}

The fundamental problem is that all of the safe conversions occur
implicitly without requiring a cast. Any conversion for which a cast is
actually needed is dangerous. Therefore, they should be made easy to
find and easy to notice, so they can receive the careful attention that
they require.

That is part of the reason why C++'s named casts were made as long as
they are: to make it easier to search for them, and to discourage their
unnecessary use. It is unfortunately too late to do the same for the
C-style casts, which had to be allowed for backwards compatibility with
C (and, at this late date, for backwards compatibility with older
versions of C++ well).

Each named cast can only do certain kinds of conversions, which is
useful, because if the type of the the thing you are converting is
different from what you thought it was when your wrote the code, the
cast might have unexpected consequences. The C style cast can do almost
anything that can be done with a named cast, and a couple of obscure
additional things as well, and that's part of what makes it so
dangerous. Consistently using the named casts rather than a C-style cast
turns much of the otherwise-dangerous code into unexpected constraint
violations requiring a diagnostic, because that particular named cast
cannot be used to perform the specified conversion.

In C++, function overloading and templates make it a commonplace for the
type of an expression to depend in a complex fashion upon far-distant
code, which gives this feature greater value than it would have in C.
However, even in C, typedefs in third-party headers can make it hard to
be sure of the type of an expression.

mathog · Mar 19, 2013

Shao said:
You don't often want to do this.

Agreed, but sometimes you have to.

Access specifies read/write access to the result of the cast one of:
rw read/write access.
Default when no [access] is specified.
(Is there currently a keyword to specify this?)
ro read only access (equivalent to "const")
wo write only access (equivalent to "volatile")

Click to expand...

"wo" would be a misnomer, since a 'volatile' can be read.

Oops, right, this is not exactly "volatile". The case I was thinking of
for "wo" was a store to a write only register mapped into memory. It
might be that a read from that address would return some value, but that
value is meaningless. The "wo" tells the compiler it must not generate
any code that would read from that address. The closest thing I could
think of to that was "volatile", which tells the compiler that the value
at that address may change, which is not quite the same thing.

Interesting stuff, but I don't understand what the "access" is supposed
to specify... We already have 'const' and 'volatile', so...? Also, the
result of a cast is not an lvalue, so it can't be modified, so...?

Hmm, the access concept does seem a little underdeveloped. "to" and
"as" where the starting point. Still, would not the casts below affect
lvalues?

char *base; /* base of stack of 3 registers */
/* something sets base */
*(rw char *)(base+0) = 1; /* first register is RW, */
*(wo char *)(base+1) = 2; /* second register is WO */
*(ro char *)(base+2) = 3; /* third register is RO */

and the compiler throws an error on the last one.

Regards,

David Mathog

Jorgen Grahn · Mar 19, 2013

.

I have recently been taking some heat for using constructs like this:

float some_float;
/* code sets some_float */
int i = (int) round(some_float);

in C++ code. Some programmers claim that if it is not rewritten as

int i = static_cast<int> round(some_float);

the sky will fall.

Maybe I'm ignorant, but I fail to see why a cast is needed at all.

So far none of the folks claiming this
have actually presented an example where using C style casts on these
sorts of simple data types actually does something unexpected.

It doesn't of course. It's a style issue, and a question of
maintainability.

/Jorgen

mathog · Mar 19, 2013

James said:
In C++, function overloading and templates make it a commonplace for the
type of an expression to depend in a complex fashion upon far-distant
code, which gives this feature greater value than it would have in C.
However, even in C, typedefs in third-party headers can make it hard to
be sure of the type of an expression.

In general I agree with the preceding - except the last sentence. While
the programmer might have some difficulty figuring out where something
is defined, the compiler will do it successfully - or throw an error and
exit.

In the specific application a data file consisting of about 100 types of
records is read into a char buffer in memory, and that is traversed
record by record using a (char *) pointer. In all instances the
structure corresponding to each record is known when the compiler does
its work. The code is basically just this sort of thing:

char *cptr;
...processing input in a loop ...
COMMON_PREFIX *pre = (COMMON_PREFIX *)cptr;
switch (pre->type){
... other cases ...
case RECORDTYPE32:
{
RTYPE32 *prec = (RTYPE32 *) cptr;
// access fields via prec structure
break;
}
... other cases ...
}
cptr += pre->recordsize;

where the structures for COMMON_PREFIX and RTYPE32 and the value for
RECORDTYPE32 will not change at run time. In this context, I just do
not see see how the C style casts are in any way more dangerous than the
C++ style casts. The only major complication in this approach concerns
the proper alignment of the structs, and both types of casts will have
the same issues with that.

Regards,

David Mathog

glen herrmannsfeldt · Mar 19, 2013

mathog said:
James Kuyper wrote:

In general I agree with the preceding - except the last sentence. While
the programmer might have some difficulty figuring out where something
is defined, the compiler will do it successfully - or throw an error and
exit.

Somewhere you mentioned assuming that sizeof(int)==sizeof(float).

It might be true on your machine, but not on the machine someone else
wants to run your program on.

It might also be dependent on the exact representation of int and
float on a machine, including endianness.

In the specific application a data file consisting of about 100 types of
records is read into a char buffer in memory, and that is traversed
record by record using a (char *) pointer. In all instances the
structure corresponding to each record is known when the compiler does
its work. The code is basically just this sort of thing:

char *cptr;
...processing input in a loop ...
COMMON_PREFIX *pre = (COMMON_PREFIX *)cptr;
switch (pre->type){
... other cases ...
case RECORDTYPE32:
{
RTYPE32 *prec = (RTYPE32 *) cptr;
// access fields via prec structure
break;
}
... other cases ...
}
cptr += pre->recordsize;

Often enough, I am lazy, know it only needs to work on one specific
machine, and just do it.

where the structures for COMMON_PREFIX and RTYPE32 and the value for
RECORDTYPE32 will not change at run time. In this context, I just do
not see see how the C style casts are in any way more dangerous than the
C++ style casts. The only major complication in this approach concerns
the proper alignment of the structs, and both types of casts will have
the same issues with that.

The way to get around alignment is to memcpy() to/from appropriately
aligned fields. It takes a little more work, but can sometimes be
automated. There is, for example XDR which is designed specifically
for transfering data between dissimilar machines.

-- glen

James Kuyper · Mar 19, 2013

In general I agree with the preceding - except the last sentence. While
the programmer might have some difficulty figuring out where something
is defined, the compiler will do it successfully - or throw an error and
exit.

I was thinking primarily about the human difficulties, not the computer
ones, and not all of the mistakes that can be made by humans due to
being unsure of a data type result in situations where a diagnostic is
required. For instance, a typedef for an unsigned type that might or
might not be smaller than 'int', depending upon which platform it's
compiled on, could leave a programmer uncertain whether the usual
arithmetic conversions will cause a given expression to be evaluated
using signed or unsigned arithmetic. There's no guarantee that making a
mistake about that issue will result in code for which a diagnostic is
required. Using the size-named types that were introduced in C99 is
another potential source for similar uncertainties.

In the specific application a data file consisting of about 100 types of
records is read into a char buffer in memory, and that is traversed
record by record using a (char *) pointer. In all instances the
structure corresponding to each record is known when the compiler does
its work. The code is basically just this sort of thing:

char *cptr;
...processing input in a loop ...
COMMON_PREFIX *pre = (COMMON_PREFIX *)cptr;

Well-motivated and widely followed coding conventions reserve names in
all caps for macros; but the context suggests that COMMON_PREFIX should
instead be a typedef.

switch (pre->type){
... other cases ...
case RECORDTYPE32:
{
RTYPE32 *prec = (RTYPE32 *) cptr;

A lot of potential problems with such code become less clear when the
relationship between COMMON_PREFIX an RTYPE32 is unknown.

// access fields via prec structure
break;
}
... other cases ...
}
cptr += pre->recordsize;

where the structures for COMMON_PREFIX and RTYPE32 and the value for
RECORDTYPE32 will not change at run time. In this context, I just do
not see see how the C style casts are in any way more dangerous than the
C++ style casts. The only major complication in this approach concerns
the proper alignment of the structs, and both types of casts will have
the same issues with that.

I'd wondered in your previous message why you were talking so much about
C++ in a newsgroup devoted to C, but the reason seemed to be that a
discussion in a C++ context got you thinking about a C. That explanation
is getting harder to accept - it looks to me like your primary interest
is C-style casts vs. C++ casts, an issue that can only matter in a C++
context. If that's that case, it really belongs on a C++ newsgroup,
where more people are better qualified to discuss the issue.

Without knowing anything about the nature of COMMON_PREFIX and RTYPE32,
I can't be sure, but this looks to me a lot like something that, in C++,
would be better handled by type derivation and virtual member functions,
rather than type codes and switch statements. It's a lot more type safe
that way, and avoiding that cast is a key part of the reason why.

Tim Rentsch · Mar 19, 2013

mathog said:
{ Introductory material, skip down to the end bracket if you don't
care how I got onto this subject.

I have recently been taking some heat for using constructs like this:

float some_float;
/* code sets some_float */
int i = (int) round(some_float);

in C++ code. Some programmers claim that if it is not rewritten as

int i = static_cast<int> round(some_float);

the sky will fall. So far none of the folks claiming this
have actually presented an example where using C style casts on these
sorts of simple data types actually does something unexpected. They
have a point where inheritance is involved, but there is none of that
in the code in question. I like () better here because it doesn't
waste 11 characters on every cast, which becomes a factor if more than
one cast must fit on the same line.
}

(For everything below here assume that float and int are the same size.)

This got me thinking about C casts, where

int i = (int) round(some_float); /* 1a */

specifies a type conversion, whereas this

int i = *(int *)(&some_float); /* 2a */

specifies a reinterpretation of the bits in memory. I always thought
the second form was a pretty awkward way to accomplish this, and that
there was room for improvement there. It isn't that the meaning is
not clear, it is just that having to resort to using a pointer to the
data in order to get its bits reinterpreted (still) feels like a
kludge. There is also "const" and "volatile" to consider, which are
fine in declarations but have always seemed too lengthy when placed
inside casts.

So bear me with me and consider this more generalized form for a C
cast (obviously this is entirely hypothetical, there are no compilers
that do this):

([task] [access] type)

Where type must be specified and task and access are optional,
and:

Task specifies the action of the cast, one of:
to Short for "convert to". The target is to be
converted to the type specified.
Default when no [task] is present.
as Short for "use as". The target's bits in memory
are to be interpreted as specified by this cast.
Access specifies read/write access to the result of the cast one of:
rw read/write access.
Default when no [access] is specified.
(Is there currently a keyword to specify this?)
ro read only access (equivalent to "const")
wo write only access (equivalent to "volatile")
Type is the "output" type of the cast, like int or double, just
as in current C casts.

The general form allows these alternate casts:

int i = (int) round(some_float); /* 1a */
int i = (to int) round(some_float); /* 1b */
int i = (to rw int) round(some_float); /* 1c */

int i = *(int *)(&some_float); /* 2a */
int i = (as int) some_float; /* 2b */

const int i = *(const int *)(&some_float); /* 3a */
const int i = (as ro int)some_float; /* 3b */

int i = (int) some_float; /* 4a */
int i = (to int) some_float; /* 4b */

function((volatile float *)&some_float); /* 5a */
function((wo float *)&some_float); /* 5b */

Not a big win in clarity for 1b or 1c vs. 1a, since the extra
text is just specifying defaults. 1b might be a bit clearer
for a beginner, but after a few weeks those training wheels
would come off. I think 2b is clearer than 2a, and 3b clearer
than 3a. Incorrect code, with the wrong task employed, might
be easier to spot if the 4b and 2b forms were uniformly employed, but
otherwise not a clean win for 4b over 4a. 5b versus 5a feels like a
wash.

Is something along these lines worth incorporating into the C
language, or no? Clearly we can get along without it - I am just
wondering if adding this would be beneficial.

Good one!!! But you posted it 13 days early...

Jorgen Grahn · Mar 20, 2013

On 03/19/2013 06:02 PM, mathog wrote: ....

Without knowing anything about the nature of COMMON_PREFIX and RTYPE32,
I can't be sure, but this looks to me a lot like something that, in C++,
would be better handled by type derivation and virtual member functions,
rather than type codes and switch statements. It's a lot more type safe
that way, and avoiding that cast is a key part of the reason why.

Well, he said the data came from a file; C++ doesn't help there, at
least not the run-time polymorphism stuff. IME at that point he has
the options to:

(a) Say "I have guarantees that this I/O buffer maps exactly to one of
these structs. Alignment, padding, sizes, endianness and
representations of floats etc are guaranteed not to be problems."

(b) Deserialization of a documented, binary protocol. More work
short-term, but this can be done as portable C or C++, and without
casts.

I prefer (b) because I have been bitten by (a)-style guarantees
failing in the past. Like when moving to a 64-bit architecture, or
between big/little endian.

(Of course, if the program itself generated the data by writing a
struct to a file, you can be pretty sure the guarantee holds ...)

/Jorgen

army1987 · Mar 21, 2013

[stuff about C++]
int i = static_cast<int> round(some_float);

Meh. I use int(round(some_float)).

[stuff about C]
int i = *(int *)(&some_float); /* 2a */

I *think* the only way to do that that's not technically UB is
assert(sizeof i == sizeof some_float); memcpy(&i, &some_float; sizeof i);

Tim Rentsch · Mar 21, 2013

army1987 said:
[stuff about C++]
int i = static_cast<int> round(some_float);

Click to expand...

Meh. I use int(round(some_float)).

[stuff about C]
int i = *(int *)(&some_float); /* 2a */

Click to expand...

I *think* the only way to do that that's not technically UB is
assert(sizeof i == sizeof some_float); memcpy(&i, &some_float; sizeof i);

Another way is to use a union.

Shao Miller · Mar 21, 2013

army1987 said:
army1987 said:

[stuff about C++]
int i = static_cast<int> round(some_float);

Click to expand...

Meh. I use int(round(some_float)).

[stuff about C]
int i = *(int *)(&some_float); /* 2a */

Click to expand...

I *think* the only way to do that that's not technically UB is
assert(sizeof i == sizeof some_float); memcpy(&i, &some_float; sizeof i);

Click to expand...

Another way is to use a union.

Although Mr. T. Rentsch doesn't appear to read my posts any more, I
think it's still worth pointing out that even by using a union, you have
the same risks of undefined behaviour, with regards to "effective type".

A quirk is that you can get a union value that's no longer tied to a
stored value:

union {
int i;
float f;
} u;
u.i = 42;
(0, u.i).f;

The rvalue here doesn't have a stored value, so I think it can be argued
about whether or not effective type applies, or if this type-punning is
always safe, if no trap representations are involved.

Shao Miller · Mar 21, 2013

Although Mr. T. Rentsch doesn't appear to read my posts any more, I
think it's still worth pointing out that even by using a union, you have
the same risks of undefined behaviour, with regards to "effective type".

A quirk is that you can get a union value that's no longer tied to a
stored value:

union {
int i;
float f;
} u;
u.i = 42;

/* Erm, rather */
(0, u).f;

mathog · Mar 28, 2013

Tim said:
Good one!!! But you posted it 13 days early...

Musings, right?

Coming back to this after a very busy week.

Imagine that one wanted to write a C interface for a set of registers
that are mapped into memory. (This also brings up the memstruct vs.
struct argument of another thread, since strictly speaking and in the
general case there may not be a way to do this with a struct without C
extensions, because of uncontrollable use of padding by the compiler.)

Anyway, C has always used "naked" type specifications in declarations,
and the same type keywords in () for casts. Add another () keyword "is"
and then a () could be used as a declaration and not a cast.

Add "vo" to mean "read volatile".

Assuming in the following that uint8_t is present on the platform
and that a struct can actually be used to map onto the registers (which
might not be OK if the first register is not aligned with a 4 or 8 byte
boundary, but presumably if these registers exist in real hardware that
issue would not occur).

Then a set of registers might be declared like this:

typedef struct {
(is vo uint8_t) read_state; // sent, received, data pending, etc.
(is rw uint8_t) bidirectional_buffer;
(is ro uint8_t) last_buffer_sent;
(is wo uint8_t) set_state; // OK to send, OK to receive, etc.
} *REGISTER_SET;

This tries to make a distinction between two different types of
"volatile". "read_state" is truly volatile, because it can change
values even if nothing in the program accesses these registers. On
the other hand, "last_buffer_sent", which is a copy the hardware makes
of the contents of "bidirectional_buffer" when that is sent, is only
"sort of" or "conditionally" volatile. It can change value, but only
when the program has sent the appropriate signal to "set_state".
Otherwise "last_buffer_sent" may be read repeatedly like any normal
memory location. The same issue with "bidirectional_buffer" - most of
the time it may be re-read safely (when "OK to receive" has not been
set) other times not (after a "OK to receive" is set).

It isn't clear to me how to express exactly this state of affairs in
current C syntax. This is closest, I guess:

typedef struct {
volatile const uint8_t read_state;
volatile uint8_t bidirectional_buffer;
volatile const uint8_t last_buffer_sent;
volatile uint8_t set_state; // ????
} *REGISTER_SET;

This drops the distinction between vo and ro, but that is probably
not a problem, either way the compiler knows that every time the program
specifies a read a different value might come back. However,
the declaration for set_state is not sufficient. "volatile" is not at
all "write only", it just says that the value read can change
unpredictably, not that reads are forbidden. What if the physical
register is such that a read of "set_state" causes a hardware error,
or at the very least, will return line noise? Is there some way to
express that situation in current C syntax, so that if this snippet was
buried elsewhere in the code the compiler would always flag it as an error:

REGISTER_SET rset=(REGISTER_SET) pointer_to_a_set_of_registers;
// next line would crash the machine
printf("value of set_state is: %d\n",rset->set_state);

Regards,

David Mathog

Les Cargill · Mar 28, 2013

mathog said:
Musings, right?

Coming back to this after a very busy week.

Imagine that one wanted to write a C interface for a set of registers
that are mapped into memory. (This also brings up the memstruct vs.
struct argument of another thread, since strictly speaking and in the
general case there may not be a way to do this with a struct without C
extensions, because of uncontrollable use of padding by the compiler.)

Padding is frequently ( but not always ) controllable. There are
pragmas and such to influence alignment. "But it isn't portable" - well,
it doesn't have to be - it's a memory map, right? It's already
hardware-specific.

Anyway, C has always used "naked" type specifications in declarations,
and the same type keywords in () for casts. Add another () keyword "is"
and then a () could be used as a declaration and not a cast.

Icky.

Add "vo" to mean "read volatile".

Assuming in the following that uint8_t is present on the platform
and that a struct can actually be used to map onto the registers (which
might not be OK if the first register is not aligned with a 4 or 8 byte
boundary, but presumably if these registers exist in real hardware that
issue would not occur).

Right. And if they do, there's no sin in adding spacers so it aligns
correctly.

Then a set of registers might be declared like this:

typedef struct {
(is vo uint8_t) read_state; // sent, received, data pending, etc.
(is rw uint8_t) bidirectional_buffer;
(is ro uint8_t) last_buffer_sent;
(is wo uint8_t) set_state; // OK to send, OK to receive, etc.
} *REGISTER_SET;

This tries to make a distinction between two different types of
"volatile". "read_state" is truly volatile, because it can change
values even if nothing in the program accesses these registers. On
the other hand, "last_buffer_sent", which is a copy the hardware makes
of the contents of "bidirectional_buffer" when that is sent, is only
"sort of" or "conditionally" volatile.

So far as 'C' is concerned, if it's a little bit volatile, then you have
to declare it volatile.

It can change value, but only
when the program has sent the appropriate signal to "set_state".
Otherwise "last_buffer_sent" may be read repeatedly like any normal
memory location. The same issue with "bidirectional_buffer" - most of
the time it may be re-read safely (when "OK to receive" has not been
set) other times not (after a "OK to receive" is set).

It isn't clear to me how to express exactly this state of affairs in
current C syntax. This is closest, I guess:

typedef struct {
volatile const uint8_t read_state;
volatile uint8_t bidirectional_buffer;
volatile const uint8_t last_buffer_sent;
volatile uint8_t set_state; // ????
} *REGISTER_SET;

This drops the distinction between vo and ro, but that is probably
not a problem, either way the compiler knows that every time the program
specifies a read a different value might come back. However,
the declaration for set_state is not sufficient. "volatile" is not at
all "write only", it just says that the value read can change
unpredictably, not that reads are forbidden.

This isn't a problem. You can enforce the actual semantics otherwise.

What if the physical
register is such that a read of "set_state" causes a hardware error,
or at the very least, will return line noise?

Then doctor, doctor it hurts when I do that. I find this sort of FPGA
behavior nauseating. Yeah, it happens. It completely precludes
memmove() to copy the struct, a great sin IMO.

I'd file a bug report on the FPGA.

Is there some way to
express that situation in current C syntax, so that if this snippet was
buried elsewhere in the code the compiler would always flag it as an error:

I don't think there's a single thing in 'C' that does not have an
r-value. You can fly the W/Rbar pin all day in 'C', but there's no
write-only.

REGISTER_SET rset=(REGISTER_SET) pointer_to_a_set_of_registers;
// next line would crash the machine
printf("value of set_state is: %d\n",rset->set_state);

I'd write getters/setters for it myself, to keep that crash off
the table.

Tim Rentsch · Mar 29, 2013

mathog said:
Tim said:

Good one!!! But you posted it 13 days early...

Click to expand...

Musings, right?

Coming back to this after a very busy week.

Imagine that one wanted to write a C interface for a set of
registers that are mapped into memory. [snip example]

Except for giving an example, you don't really say what
you're hoping to accomplish, or what the costs or benefits
are for doing so. It looks like everything you're trying to
do can be done in standard C simply by applying volatile
semantics selectively and wrapping accesses inside functions
(said functions then having the responsibility for using
volatile appropriately). So to me it looks like an awful lot
of cost for almost no benefit, especially since the range of
applicability is so small -- most C code has no need for any
kind of volatile access, let alone those that are selectively
volatile.

mathog · Apr 11, 2013

Jorgen said:
Maybe I'm ignorant, but I fail to see why a cast is needed at all.

Normally it is not needed and that was a bad example. This actually
comes up in association with printf() statements where the arguments
must match the conversion specifiers or warnings result (and the result
is not usually what was intended). In the next line "xe" is a double
variable

printf("width %d\n",(int) (xe * 64.0));

and gcc generates this warning

warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has
type ‘double’ [-Wformat]

when the (int) cast is not present.

Regards,

David Mathog

glen herrmannsfeldt · Apr 11, 2013

(snip)

Normally it is not needed and that was a bad example. This actually
comes up in association with printf() statements where the arguments
must match the conversion specifiers or warnings result (and the result
is not usually what was intended). In the next line "xe" is a double
variable

printf("width %d\n",(int) (xe * 64.0));

and gcc generates this warning

warning: format ???%d??? expects argument of type ???int???, but argument 2 has
type ???double??? [-Wformat]

when the (int) cast is not present.

Some people have been known to print out the hex representation
of floating point values by using %x with float or double.
(Likely two %8.8x in the latter case.)

-- glen

How to read a file as binary or hex "string" so that I can do regex search?	3	Dec 18, 2024
cast	20	Apr 16, 2010
Bit manipulation	10	May 3, 2007
8 buttons ,3 states and PJON Arduino	0	Jan 15, 2022
How to use ufixed when it involves multiplication a number of times?(VHDL question)	0	Aug 22, 2016
Qsort() messing with my entire Code	0	Apr 25, 2022
corrupt zip files	10	May 6, 2012
retriving escape unicode sequences from files ...	1	Aug 4, 2012

cast musings

mathog

Shao Miller

glen herrmannsfeldt

James Kuyper

mathog

Jorgen Grahn

mathog

glen herrmannsfeldt

James Kuyper

Tim Rentsch

Jorgen Grahn

army1987

Tim Rentsch

Shao Miller

Shao Miller

mathog

Les Cargill

Tim Rentsch

mathog

glen herrmannsfeldt

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads