Implicit integer promotion

A

Andrew Cooper

Hello c.l.c,

I have recently encountered a curiosity with integer promotion and
wonder whether anyone can shed light on the rational behind the
decisions of the spec.

uint16_t foo;
uint64_t bar;

bar = (foo << 16);

I would expect the above to be well formed, although it appears to be
undefined behaviour under the C specification.

From what I can gather, foo will be promoted from a 16 bit unsigned
integer to a 32bit signed integer. foo will be explicitly 0 extended
from 16 bits to 32 bits, but will end up being undefined if foo has its
15th bit set before the shift, as that would result in the shift
changing the sign bit.

I understand that there is a petition to make the above defined
behaviour, by permitting the sign bit to be changed by a shift.


What I am wondering is why the promotion is from unsigned 16 to signed
32 (other than "because that's what the spec says")?

Given that you have to go out of your way for an unsigned integer in C
(as integers default to signed), I can't see any reason why it would be
better to promote to an signed integer rather than an unsigned one, and
can see several reasons why it might be better to keep the underlying
signed/unsigned-ness when promoting to a wider integer.

Am I missing something obvious?


For those of you wondering, the specific situation involves a uin16_t
being shifted by 16, and being or'd into a 64 bit value, where the
programmer can reason that the 15th bit will never be set. We have had
a security issue because on punitive compilation, and are trying to
avoid similar problems in the future. As a result, I wish to avoid UB
wherever possible.


Thanks,

~Andrew
 
T

Tim Rentsch

Andrew Cooper said:
I have recently encountered a curiosity with integer promotion
and wonder whether anyone can shed light on the rational behind
the decisions of the spec.

uint16_t foo;
uint64_t bar;

bar = (foo << 16);

I would expect the above to be well formed, although it appears
to be undefined behaviour under the C specification.

From what I can gather, foo will be promoted from a 16 bit
unsigned integer to a 32bit signed integer. foo will be
explicitly 0 extended from 16 bits to 32 bits, but will end up
being undefined if foo has its 15th bit set before the shift, as
that would result in the shift changing the sign bit.

I understand that there is a petition to make the above defined
behaviour, by permitting the sign bit to be changed by a shift.

What I am wondering is why the promotion is from unsigned 16 to
signed 32 (other than "because that's what the spec says")?

The short (no pun intended) explanation is this. There are
arguments on both sides. If unsigned shorts were promoted to
unsigned int, then expressions like

x < y /* x has type int, y has type unsigned short */

might yield surprising results when x is less than zero (I expect
you can work out the details). Because expressions yielding such
surprising results were thought to be comparatively common and
also likely to trip up beginning programmers, the ANSI committee
decided in favor of promoting smaller unsigned types to signed
int rather than unsigned int. I encourage reading about this
conclusion yourself in the Rationale document -

http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf
Given that you have to go out of your way for an unsigned integer
in C (as integers default to signed), I can't see any reason why
it would be better to promote to an signed integer rather than an
unsigned one, and can see several reasons why it might be better
to keep the underlying signed/unsigned-ness when promoting to a
wider integer.

Am I missing something obvious?

Personally I think it would have been better if unsigned operands
were always promoted to unsigned int rather than signed int, but
there are arguments on both sides, and it isn't obvious which
factors should be given higher weight. So, even though you
probably are missing something, what you are missing is not
obvious. :)
For those of you wondering, the specific situation involves a
uin16_t being shifted by 16, and being or'd into a 64 bit value,
where the programmer can reason that the 15th bit will never be
set. We have had a security issue because on punitive
compilation, and are trying to avoid similar problems in the
future. As a result, I wish to avoid UB wherever possible.

For this particular example it's easy to avoid UB by adopting
a simple idiom:

bar = foo+0UL << 16;

If foo is of type uint16_t then this right-hand-side is
guaranteed to produce a defined 32-bit (or possibly larger)
non-negative result (ie, what you want), regardless of the
high-order bit of foo. The '+0UL' idiom is a good one to
know if one is shifting unsigned short or unsigned char
operands.
 
J

James Kuyper

Hello c.l.c,

I have recently encountered a curiosity with integer promotion and
wonder whether anyone can shed light on the rational behind the
decisions of the spec.

uint16_t foo;
uint64_t bar;

bar = (foo << 16);

I would expect the above to be well formed, although it appears to be
undefined behaviour under the C specification.

From what I can gather, foo will be promoted from a 16 bit unsigned
integer to a 32bit signed integer.

foo will only be promoted if UINT16_MAX < INT_MAX. You seem to to be
assuming that int is 32 bits. While that's true for many modern
implementation, it's not required by the standard. There have been many
systems where int had either 16 or 64 bits, any many of those systems
are quite current, not at all obsolete.
foo will be explicitly 0 extended
from 16 bits to 32 bits, but will end up being undefined if foo has its
15th bit set before the shift, as that would result in the shift
changing the sign bit.

I know of two different conventions for numbering the bits. The bit that
would end up getting shifted into the sign bit in this case is usually
described as either the first bit, or the 16th bit. I don't know of any
reasonable convention that would cause it to be called the 15th bit. I
do know of an unreasonable one that miscalls the first bit as the
"zeroeth bit" - I hope you're not using that convention.
I understand that there is a petition to make the above defined
behaviour, by permitting the sign bit to be changed by a shift.
What I am wondering is why the promotion is from unsigned 16 to signed
32 (other than "because that's what the spec says")?

According to the C Rationale:
 
G

glen herrmannsfeldt

James Kuyper said:
On 11/28/2013 08:04 PM, Andrew Cooper wrote:
(snip)

foo will only be promoted if UINT16_MAX < INT_MAX. You seem to to be
assuming that int is 32 bits. While that's true for many modern
implementation, it's not required by the standard. There have been many
systems where int had either 16 or 64 bits, any many of those systems
are quite current, not at all obsolete.
I know of two different conventions for numbering the bits. The bit that
would end up getting shifted into the sign bit in this case is usually
described as either the first bit, or the 16th bit. I don't know of any
reasonable convention that would cause it to be called the 15th bit. I
do know of an unreasonable one that miscalls the first bit as the
"zeroeth bit" - I hope you're not using that convention.

I suppose it is better to call them "bit 0" and "bit 15" instead of 0th
bit and 15th bit.

Also, there are more conventions. IBM for S/360 through ESA/390 numbered
the bits starting from the MSB, so the sign bit was bit 0.

z/Architecture extended ESA/390 to 64 bits, such that the bits of a
register are numbered from 0 (sign bit) to 63 (LSB). When using 32
bit instructions, the bits of a register are numbered from 32 (MSB)
to 63 (LSB).

Note that numbering starting with the MSB is consistent with big-endian
addressing (even though there are not and bit addressing instructions).

-- glen
 
G

glen herrmannsfeldt

(snip)
From what I can gather, foo will be promoted from a 16 bit unsigned
integer to a 32bit signed integer. foo will be explicitly 0 extended
from 16 bits to 32 bits, but will end up being undefined if foo has its
15th bit set before the shift, as that would result in the shift
changing the sign bit.
I understand that there is a petition to make the above defined
behaviour, by permitting the sign bit to be changed by a shift.
What I am wondering is why the promotion is from unsigned 16 to signed
32 (other than "because that's what the spec says")?

One reason for it being undefined (or implementation dependent) is that
C still allows for sign-magnitude and ones complement representations.
Many twos complement systems will give the value that you want, though
possibly along with an overflow exception.

-- glen
 
E

Eric Sosman

Hello c.l.c,

I have recently encountered a curiosity with integer promotion and
wonder whether anyone can shed light on the rational behind the
decisions of the spec.

uint16_t foo;
uint64_t bar;

bar = (foo << 16);

I would expect the above to be well formed, although it appears to be
undefined behaviour under the C specification.

From what I can gather, foo will be promoted from a 16 bit unsigned
integer to a 32bit signed integer. foo will be explicitly 0 extended
from 16 bits to 32 bits, but will end up being undefined if foo has its
15th bit set before the shift, as that would result in the shift
changing the sign bit.

Mostly right. It is quite common (nowadays) for int to be
32 bits wide, but C requires only "16 or more" bits. There are
two cases of interest:

- If int is 17 or more bits wide, foo will promote to int as
you describe and produce a non-negative promoted value. If
any of the leftmost 17 of its bits are non-zero (more
succinctly, if foo is greater than INT_MAX/65536), shifting
the value 16 bits to the left yields undefined behavior.

- If int is 16 bits wide, foo promotes to unsigned int rather
than to plain int. The behavior is again undefined, but for
a different reason: You can't shift an N-bit quantity by
more than N-1 bits (in one operation).
What I am wondering is why the promotion is from unsigned 16 to signed
32 (other than "because that's what the spec says")?

"The spec says" there's a type called int -- but hey, that's
only what "the spec says" and there's no *real* reason for it. ;-)
Given that you have to go out of your way for an unsigned integer in C
(as integers default to signed), I can't see any reason why it would be
better to promote to an signed integer rather than an unsigned one, and
can see several reasons why it might be better to keep the underlying
signed/unsigned-ness when promoting to a wider integer.

Am I missing something obvious?

This was a point of debate a quarter-century ago when the
original ANSI Standard was being formulated. Some C compilers
were "value-preserving" while others were "sign-preserving," and
the Rationale devotes a substantial amount of text to explaining
how the Committee wrangled with the discrepancy. Eventually they
came down in favor of value-preserving semantics, and That's The
Way It Is.
For those of you wondering, the specific situation involves a uin16_t
being shifted by 16, and being or'd into a 64 bit value, where the
programmer can reason that the 15th bit will never be set. We have had
a security issue because on punitive compilation, and are trying to
avoid similar problems in the future. As a result, I wish to avoid UB
wherever possible.

Well, you can't shift a 16-bit value by 16 bits and get
anything sensible. (You could shift by 11 and then by 5 and
hope for a zero result, but even that wouldn't be certain if
promotion were to plain int.) If what you want is to extend
the 16-bit value to 64 bits, shift it, and OR with another 64-
bit value, then just say so:

target |= (uint64_t)foo << 16;

It is almost better to write what you mean than to write
something else and sort of hope the compiler -- and future
maintainers -- will catch your drift.
 
J

James Kuyper

I suppose it is better to call them "bit 0" and "bit 15" instead of 0th
bit and 15th bit.

Also, there are more conventions. IBM for S/360 through ESA/390 numbered
the bits starting from the MSB, so the sign bit was bit 0.

Since bit 0 is the first bit, I've already referred to that convention.
z/Architecture extended ESA/390 to 64 bits, such that the bits of a
register are numbered from 0 (sign bit) to 63 (LSB). When using 32
bit instructions, the bits of a register are numbered from 32 (MSB)
to 63 (LSB).

That one I hadn't heard of, though I'm unsurprised by it.
Note that numbering starting with the MSB is consistent with big-endian
addressing (even though there are not and bit addressing instructions).

Is "not and" a typo for "not any", or more simply, "no" ?
 
J

James Kuyper

(Other posters have explained the rational behind this, so I'll skip
that part.)

I am a great fan of using explicit casts when types change - I think it
makes it clearer to the reader, and leaves no doubts on the behaviour.

In my experience (which is extensive, since the project I'm working on
has coding standard that require us to write code that way), this is
generally a bad idea. It disables the diagnostic and warning messages
that compilers normally produce to protect against certain kinds of
mistakes. If the type of foo gets changed, for instance, to a floating
point or pointer type (possibly in a header that is nowhere near the
location of the shift), an explicit conversion to an integer type before
the shift will disable the diagnostic messages that code would otherwise
generate.
Part of this is also that on some of the systems I use, "int" is 32-bit,
on others it is 16-bit - so implicit integer promotions have different
sizes.

So I would write:

bar = (uint64_t) ((uint32_t) foo << 16);

Why not just directly convert to uint64_t before the shift?
 
K

Keith Thompson

Eric Sosman said:
This was a point of debate a quarter-century ago when the
original ANSI Standard was being formulated. Some C compilers
were "value-preserving" while others were "sign-preserving," and
the Rationale devotes a substantial amount of text to explaining
how the Committee wrangled with the discrepancy. Eventually they
came down in favor of value-preserving semantics, and That's The
Way It Is.
[...]

It occurs to me that, even given the advantages of value-preserving
semantics, sign-preserving semantics would make more sense
specifically for the left operand of a shift operator. Sensible
programmers use shifts only on unsigned values, or at least avoid
shifts that threaten the sign bit, but unexpected unsigned-to-signed
promotions are confusing.

But having one rule for shift operators and a different rule for
everything else would probably be worse than what we have now. And
standardizing on one rule or the other is a vast improvement over the
pre-ANSI chaos.

See also http://www.lysator.liu.se/c/rat/c2.html
 
T

Tim Rentsch

David Brown said:
(Other posters have explained the rational behind this, so I'll skip
that part.)

I am a great fan of using explicit casts when types change - I think
it makes it clearer to the reader, and leaves no doubts on the
behaviour. Part of this is also that on some of the systems I use,
"int" is 32-bit, on others it is 16-bit - so implicit integer
promotions have different sizes.

So I would write:

bar = (uint64_t) ((uint32_t) foo << 16);

This is horrible advice. It interferes with getting compiler
warnings. It can mask problems. It can cause problems. It's a
maintenance nightmare. The idea that this practice makes things
clearer to those reading the code is, at best, misguided.

A much better rule is that no cast should ever be included unless
it is necessary. They almost never are. If it's important to
convert 'foo' to be a 32-bit type, that can be done without a
cast:

uint32_t foo32 = foo;
bar = foo32 << 16;

or, if one is writing in C99, using a compound literal:

bar = (uint32_t){ foo } << 16;

Even these writings have their problems. They are overly
prescriptive, and platform specific. Generally it's better to
write code that works over a range of implementations and types
that is as broad as possible. Casting in cases like the above
is almost guaranteed to interfere with that goal.
 
A

Andrew Cooper

Hello c.l.c,

I have recently encountered a curiosity with integer promotion and
wonder whether anyone can shed light on the rational behind the
decisions of the spec.

uint16_t foo;
uint64_t bar;

bar = (foo << 16);

I would expect the above to be well formed, although it appears to be
undefined behaviour under the C specification.

From what I can gather, foo will be promoted from a 16 bit unsigned
integer to a 32bit signed integer. foo will be explicitly 0 extended
from 16 bits to 32 bits, but will end up being undefined if foo has its
15th bit set before the shift, as that would result in the shift
changing the sign bit.

I understand that there is a petition to make the above defined
behaviour, by permitting the sign bit to be changed by a shift.


What I am wondering is why the promotion is from unsigned 16 to signed
32 (other than "because that's what the spec says")?

Given that you have to go out of your way for an unsigned integer in C
(as integers default to signed), I can't see any reason why it would be
better to promote to an signed integer rather than an unsigned one, and
can see several reasons why it might be better to keep the underlying
signed/unsigned-ness when promoting to a wider integer.

Am I missing something obvious?


For those of you wondering, the specific situation involves a uin16_t
being shifted by 16, and being or'd into a 64 bit value, where the
programmer can reason that the 15th bit will never be set. We have had
a security issue because on punitive compilation, and are trying to
avoid similar problems in the future. As a result, I wish to avoid UB
wherever possible.


Thanks,

~Andrew

Thanks for all the replies. I didn't realise I had waded into an area
which has been debated for longer than I have been alive, although in
hindsight it is not surprising.

The answers certainly have been educational, and I can now see that
promoting to a signed integer rather than unsigned offers fewer corner
cases with unexpected behaviour.

With the example above, the code is from x86 64bit pagetable handling,
so there is nothing funky going on with 1s compliments or sign/magnitude
numbers.

In this case, it was decided to be fixed up with an explicit cast, even
though there is no possible way the 15th bit to actually be set (for
legacy ABI reasons).

~Andrew
 
P

Philip Lantz

David said:
Sometimes that is convenient, yes.


I cannot see how that is any clearer or otherwise "better" than a simple
cast.

Because foo is type-checked for assignment compatibility with uint32_t.
 
J

James Kuyper

....
However, one of the reasons for using a cast is precisely that the types
are not assignment compatible, ...

That falls in the category of "necessary" casts. It's the unnecessary
casts that should be avoided. Those are the ones that explicitly convert
a value to produce a result that is guaranteed to be the same as would
have been produced by implicit conversion if no cast had been used.
... or that assignment will give an extra
warning (such as if gcc's "-Wconversion" is used, and the assignment is
to a smaller type). ...

Code written in such a way as to avoid triggering -Wconversion warnings
does not seem an improvement to me.
 
G

glen herrmannsfeldt

David Brown said:
On 02/12/13 12:59, James Kuyper wrote:
(snip)
I think I over-stated my case in my first post in this branch - I don't
add explicit casts in /all/ cases when types change, but I do add them
when I need to be sure and clear exactly how and when they change.
This means I use some casts that are unnecessary, or might be unnecessary
depending on the target, but I don't do it /all/ the time. For example,
I would not cast a uint16_t to uint32_t before assigning to a uint32_t
variable without particularly good reasons. But I /would/ be likely to
cast a uint32_t value to a uint16_t before assigning to a uint16_t
variable, as it makes it clear that I am reducing the range of the
variable. (I might alternatively use something like an "& 0xffff" mask
- again, it is not needed, but it can make the intention clearer.)

Java requires casts on what it calls "narrowing" conversions.

Unlike C, the primitive types in Java have known and fixed bit widths
and twos complement representation. Also, &0xffff won't convince the
compiler. It isn't "loss of information", as int to float is not
considered narrowing, but float to int is.
Note that this is not just my idea - Misra coding standards have a rule
"Implicit conversions which may result in a loss of information shall
not be used."

-- glen
 
T

Tim Rentsch

Andrew Cooper said:
Thanks for all the replies. I didn't realise I had waded into an
area which has been debated for longer than I have been alive,
although in hindsight it is not surprising.

The answers certainly have been educational, and I can now see
that promoting to a signed integer rather than unsigned offers
fewer corner cases with unexpected behaviour.

With the example above, the code is from x86 64bit pagetable
handling, so there is nothing funky going on with 1s compliments
or sign/magnitude numbers.

In this case, it was decided to be fixed up with an explicit cast,
even though there is no possible way the 15th bit to actually be
set (for legacy ABI reasons).

If the compiles are being done under C99 (and I hope they are),
at least use a compound literal rather than a cast -

bar = (uint_fast32_t){ foo } << 16;

Using a compound literal is both safer (because only the usual
implicit conversions are allowed) and will not suppress warning
messages the way casting does. Note also the type - if
<stdint.h> types are being used, 'uint_fast32_t' is a better
match to what you want to do than 'uint32_t' is.

(btw, "1s complement", not "1s compliment".)
 
T

Tim Rentsch

Sorry for not responding earlier, it's a busy time.

I'm responding here to similar comments in several postings,
rather than repetitively responding invididually. The order
will be different but I will try to make sure there is enough
context so that doesn't matter.

The benefits are the same as using an extra variable, but without
needing to declare or name a variable.

Also it doesn't mask conversion warnings the way casts often do.
Thanks - I hadn't thought of that.

However, one of the reasons for using a cast is precisely that
the types are not assignment compatible, or that assignment will
give an extra warning (such as if gcc's "-Wconversion" is used,
and the assignment is to a smaller type).

That's not one reason, it's two, and the two situations should
not be lumped together.

If I see a cast that is required, in the sense that it cannot
be avoided through use of implicit conversions (not counting by
using (void *), which is a special case), I know why it's there -
there is some sort of skulduggery afoot, and there _has_ to be a
cast to get the compiler to tolerate the skulduggery. Probably
the person who wrote the cast made a conscious decision to engage
in said skulduggery, but whether that's true or not the presence
of a required cast is a red flag that calls for greater than
usual attention to what's going on there. Competent developers
try to write code that has as few red flags as possible (or at
least as few as reasonably possible); one reason for that is
such cases are generally more disasterous when they are done
wrongly, so naturally they merit a greater degree of scrutiny.

In the other case, if a section of code contains an unnecessary
cast, meaning it is not required in the sense described above,
there are lots of different reasons why it might be there. For
example:

a. a non-default conversion is necessary for the code to work
correctly (ie, a semantic difference is needed);
b. a non-default conversion has no effect on this platform,
but it is necessary on a different platform;
c. a non-default conversion may be necessary for the code to
work correctly, but it isn't obvious whether it is or not,
so a cast was added for safety;
d. a non-default conversion is not necessary for the code to
work correctly (and the author knows this), but a cast was
added to make it obvious that the code will work;
e. the author thinks a non-default conversion is necessary for
the code to work correctly (on this platform or some other
one), even though it isn't;
f. a redundant cast was added to call attention to the presence
of a non-obvious conversion;
g. a cast was added that is redundant on this platform but is
not redundant on a different platform;
h. a cast was put in to conform to a coding standard rule (or
code review, local practice, etc);
i. a cast was put in to make the code easier to understand for
inexperienced developers;
j. a cast was put in to suppress a compiler warning message;
k. a cast was put in to suppress a compiler warning message
not on this platform but a different platform (or compiler
version);
l. a cast was put in to suppress an expected future compiler
warning message (or expected alternate platform);
m. the author believes a cast is needed to suppress a compiler
warning message (on this platform or some other), but in
fact it is not; or
n. the cast was added earlier on for one of the above listed
reasons, but meanwhile the code has changed so the cast
now serves no current purpose.
The cast tells the compiler, and the programmer and readers, that
you know what you are doing here.

There are several things wrong with this statement, especially
for casts in the "unnecessary" category. Even though a cast
specifes what operation is to take place, it doesn't say what it
is there to accomplish, or why. Second, and partly as corollary
to the previous sentence, there is often no way to tell if an
unnecessary cast is really needed for its intended purpose.
Third, the implied assertion that the author knows what he/she is
doing is often wrong (and the more unnecessary casts there are
the more our attention is diluted away from the cases that need
it). Fourth, giving general advice to add casts in places where
casts are not necessary makes things worse - likely adopters of
such advice include inexperienced developers, who as a group
make more mistakes and would most benefit from receiving the
warning messages that unnecessary casts suppress.
Certainly explicit casts can mask some compiler warnings - but
sometimes that is exactly what you want.

That is exactly what I do NOT want. Casts written to suppress a
compiler warning message should _never_ be written in open code.
If there is no other way to suppress a warning other than by
using a cast, at least the cast should be wrapped in a suitable
macro, so that there might be suitable assertion checks, etc,
conditionally included.
This is particularly true
when changing to lower range types, or swapping between signed
and unsigned types. If you write "an_int_16 = an_int_32", a
compiler with appropriate warnings can flag that as risky.
Adding an explicit cast, "an_int_16 = (int16_t) an_int_32" makes
it clear to the reader, the writer, and the compiler that the
assignment is safe.

The problem is it does not make that clear. /Maybe/ it means the
author thinks it's safe, but that doesn't mean it is safe. It
might not mean even that; it might be just a knee-jerk reaction
to previously getting a compiler warning message. Looking at
the cast, there's no way to tell the difference between those
two circumstances.
(Clearly the
programmer must ensure that it /is/ safe in this case.)

Putting in a cast (meaning a plain cast in open code) takes away
one of our best tools to help with that. Automated tools are
more reliable than developers reasoning.
So yes, it can mask some problems - but I also think it can allow
other problems to be seen. There is a balance to be reached.

Surely you don't expect anyone to be convinced by this statement
until you say something about what those other things might be,
and offer some kind of evidence, even if anecdotal, that they
provide some positive benefit.
Perhaps I see the use of it more in the type of programming I do,
which involves small microcontrollers (often 8-bit or 16-bit),
and quite a lot of conversions between different sizes.

That makes it worse. If narrowing conversions (or similar) are
common and routine, they should be wrapped up in suitably safe and
type-safe macros or inline functions, not bludgeoned with an
open-code cast sledgehammer.
I don't see how it is a "maintenance nightmare", unless you
regularly change the types of your variables in "maintenance"
without due consideration of their use.

The problem is not the consideration, but making the attendant
changes at all the use sites. And using straight casts will make
those harder to find.
However, although I would definitely put at least one explicit
conversion in cases like the one mentioned, I don't put them
/everywhere/ - I use them when they make the code clearer and
leave no doubts as to what types are to be used. Too many casts
would make it hard to see the "real" code, and legibility is
vital.

Does this mean you don't have any specific guidelines (ie,
objective rather than subjective) for when/where casting should
be done? Criteria like "make the code clearer" or "too many
casts" might be good as philosophical principles but they are too
subjective to qualify even as guidelines.
The use of size-specific types is part of making code
platform-agnostic. "int", and "integer promotions" are /not/
platform independent - they are dependent on the size of the
target's "int". So when you need calculations with specific
sizes, using explicit conversions and casts rather than implicit
ones is part of avoiding platform-specific code.

Using size-specific types /in declarations/ is part of making code
less platform specific. It is never necessary to use casts to
accomplish this. Also, you missed my point about the particular
types used (which were types like 'uint32_t'). These types are
not available in all implementations. If what is wanted to
arithmetic that is at least 32 bits, and an integer conversion rank
at least as big as 'int', it's easy to get that without resorting
to casting, or even using any type names at all.

I think I over-stated my case in my first post in this branch - I
don't add explicit casts in /all/ cases when types change, but I
do add them when I need to be sure and clear exactly how and when
they change. This means I use some casts that are unnecessary, or
might be unnecessary depending on the target, but I don't do it
/all/ the time. For example, I would not cast a uint16_t to
uint32_t before assigning to a uint32_t variable without
particularly good reasons. But I /would/ be likely to cast a
uint32_t value to a uint16_t before assigning to a uint16_t
variable, as it makes it clear that I am reducing the range of the
variable. (I might alternatively use something like an "& 0xffff"
mask - again, it is not needed, but it can make the intention
clearer.)

I think I mostly agree with what you are trying to do. What I
disagree on is that using a cast is a good way to accomplish those
goals. An unnecessary cast does not reveal information but
conceals it, muddying the water rather than clarifying it.
Note that this is not just my idea - Misra coding standards have
a rule "Implicit conversions which may result in a loss of
information shall not be used."

The Misra coding rules are not best practices. Looking over the
list (the 2004 version), many of its rules are nothing more than
coding standard dogma, and often bad dogma. The rules they have
regarding casting are particularly egregious.
There is a balance to be achieved here - writing the casts
explicitly can make some code clearer, but make other code harder
to read. They can hide some diagnostic messages, but allow other
diagnostics to be enabled (by letting you hide the messages when
you need to).

Yes but it isn't necessary to use casts to avoid such messages.
And the habit of using casts weakens the value of the messages,
because people will start to add them reflexively in response
to getting the warnings.
In the case of the OP, he has:

uint16_t foo;
uint64_t bar;

bar = (foo << 16);

He needs to lengthen foo to at least 32 bits unsigned in order to
work correctly for all values of foo. I think - but I'm not
entirely sure - that for a target with 32-bit ints, and foo
having 0 in its MSB (as the OP said), then "bar = (foo << 16)"
will give the correct behaviour as it stands, with nothing
undefined or implementation-dependent [beyond the fact of int's
being 32 bits, presumably].

"In cases where you are uncertain, look up the rule and remove the
uncertainty." - paraphrased from Keith Thompson. Also, if you
think it will be the same but aren't sure, then write an assert()
to test it

assert( (uint32_t) foo << 16 == foo << 16 );

The assert does a much better job of communicating what is in the
author's mind than a straight cast does. (Of course, another
formulation that avoids both casting and the need for a checking
assertion is another possibility.)
However, I
/know/ that "bar = ((uint32_t) foo << 16)" will work as the user
expects, with no undefined or implementation-dependent behaviour.

Not so, because uint32_t is not present in all implementations,
not even just those that are C99 implementations.
It
will also work with 16-bit ints, and it makes it clear to the
reader exactly what is going on.

This is sort of like saying 'goto' makes it clear where execution
will continue. That is true at one level, but in a more important
way it's wrong.
So is that cast strictly necessary? No, I don't think so. Does
it make the code better? Yes, I believe so.

It is arguably an improvement over 'foo << 16', but that doesn't
mean it passes muster. Find a way to write it that you're sure
will work and doesn't need any casts. After doing that, find a
way to write it that you're sure will work and doesn't need any
type names or declarations.
However, adding an extra "(uint64_t)" case before the assignment,
as I first wrote, is probably excessive. It is a stylistic
choice whether it is included or not.

(I am sure I use unnecessary casts at other times that you will
disagree with more strongly. Style is always open to debate.)

Here I think you are using the word "style" in a way that's
inappropriate. A programming choice falls under the heading of
style when the various choices are directly and obviously
behaviorally equivalent, where "behavior" includes both the
semantics of the program code and how the compiler acts upon
processing that code. A style choice is a choice that makes a
difference /only/ to human readers, not to how the program works
or what the compiler does in the different cases. My objections
to unnecessary casting are that it isn't obvious whether or not
it makes a diffence, and often the use of a cast /will/ make a
difference (specifically in terms of warning messages) that is
undesired. The issues here are not style issues -- at least, not
in their high-order bits. So I suspect the word "style" is being
used here to avoid having to defend the practice of unnecessary
casting.
 
D

David Brown

Sorry for not responding earlier, it's a busy time.

I'm responding here to similar comments in several postings,
rather than repetitively responding invididually. The order
will be different but I will try to make sure there is enough
context so that doesn't matter.

The benefits are the same as using an extra variable, but without
needing to declare or name a variable.

Also it doesn't mask conversion warnings the way casts often do.
Thanks - I hadn't thought of that.

However, one of the reasons for using a cast is precisely that
the types are not assignment compatible, or that assignment will
give an extra warning (such as if gcc's "-Wconversion" is used,
and the assignment is to a smaller type).

That's not one reason, it's two, and the two situations should
not be lumped together.

If I see a cast that is required, in the sense that it cannot
be avoided through use of implicit conversions (not counting by
using (void *), which is a special case), I know why it's there -
there is some sort of skulduggery afoot, and there _has_ to be a
cast to get the compiler to tolerate the skulduggery. Probably
the person who wrote the cast made a conscious decision to engage
in said skulduggery, but whether that's true or not the presence
of a required cast is a red flag that calls for greater than
usual attention to what's going on there. Competent developers
try to write code that has as few red flags as possible (or at
least as few as reasonably possible); one reason for that is
such cases are generally more disasterous when they are done
wrongly, so naturally they merit a greater degree of scrutiny.

In the other case, if a section of code contains an unnecessary
cast, meaning it is not required in the sense described above,
there are lots of different reasons why it might be there. For
example:

a. a non-default conversion is necessary for the code to work
correctly (ie, a semantic difference is needed);
b. a non-default conversion has no effect on this platform,
but it is necessary on a different platform;
c. a non-default conversion may be necessary for the code to
work correctly, but it isn't obvious whether it is or not,
so a cast was added for safety;
d. a non-default conversion is not necessary for the code to
work correctly (and the author knows this), but a cast was
added to make it obvious that the code will work;
e. the author thinks a non-default conversion is necessary for
the code to work correctly (on this platform or some other
one), even though it isn't;
f. a redundant cast was added to call attention to the presence
of a non-obvious conversion;
g. a cast was added that is redundant on this platform but is
not redundant on a different platform;
h. a cast was put in to conform to a coding standard rule (or
code review, local practice, etc);
i. a cast was put in to make the code easier to understand for
inexperienced developers;
j. a cast was put in to suppress a compiler warning message;
k. a cast was put in to suppress a compiler warning message
not on this platform but a different platform (or compiler
version);
l. a cast was put in to suppress an expected future compiler
warning message (or expected alternate platform);
m. the author believes a cast is needed to suppress a compiler
warning message (on this platform or some other), but in
fact it is not; or
n. the cast was added earlier on for one of the above listed
reasons, but meanwhile the code has changed so the cast
now serves no current purpose.
The cast tells the compiler, and the programmer and readers, that
you know what you are doing here.

There are several things wrong with this statement, especially
for casts in the "unnecessary" category. Even though a cast
specifes what operation is to take place, it doesn't say what it
is there to accomplish, or why. Second, and partly as corollary
to the previous sentence, there is often no way to tell if an
unnecessary cast is really needed for its intended purpose.
Third, the implied assertion that the author knows what he/she is
doing is often wrong (and the more unnecessary casts there are
the more our attention is diluted away from the cases that need
it). Fourth, giving general advice to add casts in places where
casts are not necessary makes things worse - likely adopters of
such advice include inexperienced developers, who as a group
make more mistakes and would most benefit from receiving the
warning messages that unnecessary casts suppress.
Certainly explicit casts can mask some compiler warnings - but
sometimes that is exactly what you want.

That is exactly what I do NOT want. Casts written to suppress a
compiler warning message should _never_ be written in open code.
If there is no other way to suppress a warning other than by
using a cast, at least the cast should be wrapped in a suitable
macro, so that there might be suitable assertion checks, etc,
conditionally included.
This is particularly true
when changing to lower range types, or swapping between signed
and unsigned types. If you write "an_int_16 = an_int_32", a
compiler with appropriate warnings can flag that as risky.
Adding an explicit cast, "an_int_16 = (int16_t) an_int_32" makes
it clear to the reader, the writer, and the compiler that the
assignment is safe.

The problem is it does not make that clear. /Maybe/ it means the
author thinks it's safe, but that doesn't mean it is safe. It
might not mean even that; it might be just a knee-jerk reaction
to previously getting a compiler warning message. Looking at
the cast, there's no way to tell the difference between those
two circumstances.
(Clearly the
programmer must ensure that it /is/ safe in this case.)

Putting in a cast (meaning a plain cast in open code) takes away
one of our best tools to help with that. Automated tools are
more reliable than developers reasoning.
So yes, it can mask some problems - but I also think it can allow
other problems to be seen. There is a balance to be reached.

Surely you don't expect anyone to be convinced by this statement
until you say something about what those other things might be,
and offer some kind of evidence, even if anecdotal, that they
provide some positive benefit.
Perhaps I see the use of it more in the type of programming I do,
which involves small microcontrollers (often 8-bit or 16-bit),
and quite a lot of conversions between different sizes.

That makes it worse. If narrowing conversions (or similar) are
common and routine, they should be wrapped up in suitably safe and
type-safe macros or inline functions, not bludgeoned with an
open-code cast sledgehammer.
I don't see how it is a "maintenance nightmare", unless you
regularly change the types of your variables in "maintenance"
without due consideration of their use.

The problem is not the consideration, but making the attendant
changes at all the use sites. And using straight casts will make
those harder to find.
However, although I would definitely put at least one explicit
conversion in cases like the one mentioned, I don't put them
/everywhere/ - I use them when they make the code clearer and
leave no doubts as to what types are to be used. Too many casts
would make it hard to see the "real" code, and legibility is
vital.

Does this mean you don't have any specific guidelines (ie,
objective rather than subjective) for when/where casting should
be done? Criteria like "make the code clearer" or "too many
casts" might be good as philosophical principles but they are too
subjective to qualify even as guidelines.
The use of size-specific types is part of making code
platform-agnostic. "int", and "integer promotions" are /not/
platform independent - they are dependent on the size of the
target's "int". So when you need calculations with specific
sizes, using explicit conversions and casts rather than implicit
ones is part of avoiding platform-specific code.

Using size-specific types /in declarations/ is part of making code
less platform specific. It is never necessary to use casts to
accomplish this. Also, you missed my point about the particular
types used (which were types like 'uint32_t'). These types are
not available in all implementations. If what is wanted to
arithmetic that is at least 32 bits, and an integer conversion rank
at least as big as 'int', it's easy to get that without resorting
to casting, or even using any type names at all.

I think I over-stated my case in my first post in this branch - I
don't add explicit casts in /all/ cases when types change, but I
do add them when I need to be sure and clear exactly how and when
they change. This means I use some casts that are unnecessary, or
might be unnecessary depending on the target, but I don't do it
/all/ the time. For example, I would not cast a uint16_t to
uint32_t before assigning to a uint32_t variable without
particularly good reasons. But I /would/ be likely to cast a
uint32_t value to a uint16_t before assigning to a uint16_t
variable, as it makes it clear that I am reducing the range of the
variable. (I might alternatively use something like an "& 0xffff"
mask - again, it is not needed, but it can make the intention
clearer.)

I think I mostly agree with what you are trying to do. What I
disagree on is that using a cast is a good way to accomplish those
goals. An unnecessary cast does not reveal information but
conceals it, muddying the water rather than clarifying it.
Note that this is not just my idea - Misra coding standards have
a rule "Implicit conversions which may result in a loss of
information shall not be used."

The Misra coding rules are not best practices. Looking over the
list (the 2004 version), many of its rules are nothing more than
coding standard dogma, and often bad dogma. The rules they have
regarding casting are particularly egregious.
There is a balance to be achieved here - writing the casts
explicitly can make some code clearer, but make other code harder
to read. They can hide some diagnostic messages, but allow other
diagnostics to be enabled (by letting you hide the messages when
you need to).

Yes but it isn't necessary to use casts to avoid such messages.
And the habit of using casts weakens the value of the messages,
because people will start to add them reflexively in response
to getting the warnings.
In the case of the OP, he has:

uint16_t foo;
uint64_t bar;

bar = (foo << 16);

He needs to lengthen foo to at least 32 bits unsigned in order to
work correctly for all values of foo. I think - but I'm not
entirely sure - that for a target with 32-bit ints, and foo
having 0 in its MSB (as the OP said), then "bar = (foo << 16)"
will give the correct behaviour as it stands, with nothing
undefined or implementation-dependent [beyond the fact of int's
being 32 bits, presumably].

"In cases where you are uncertain, look up the rule and remove the
uncertainty." - paraphrased from Keith Thompson. Also, if you
think it will be the same but aren't sure, then write an assert()
to test it

assert( (uint32_t) foo << 16 == foo << 16 );

The assert does a much better job of communicating what is in the
author's mind than a straight cast does. (Of course, another
formulation that avoids both casting and the need for a checking
assertion is another possibility.)
However, I
/know/ that "bar = ((uint32_t) foo << 16)" will work as the user
expects, with no undefined or implementation-dependent behaviour.

Not so, because uint32_t is not present in all implementations,
not even just those that are C99 implementations.
It
will also work with 16-bit ints, and it makes it clear to the
reader exactly what is going on.

This is sort of like saying 'goto' makes it clear where execution
will continue. That is true at one level, but in a more important
way it's wrong.
So is that cast strictly necessary? No, I don't think so. Does
it make the code better? Yes, I believe so.

It is arguably an improvement over 'foo << 16', but that doesn't
mean it passes muster. Find a way to write it that you're sure
will work and doesn't need any casts. After doing that, find a
way to write it that you're sure will work and doesn't need any
type names or declarations.
However, adding an extra "(uint64_t)" case before the assignment,
as I first wrote, is probably excessive. It is a stylistic
choice whether it is included or not.

(I am sure I use unnecessary casts at other times that you will
disagree with more strongly. Style is always open to debate.)

Here I think you are using the word "style" in a way that's
inappropriate. A programming choice falls under the heading of
style when the various choices are directly and obviously
behaviorally equivalent, where "behavior" includes both the
semantics of the program code and how the compiler acts upon
processing that code. A style choice is a choice that makes a
difference /only/ to human readers, not to how the program works
or what the compiler does in the different cases. My objections
to unnecessary casting are that it isn't obvious whether or not
it makes a diffence, and often the use of a cast /will/ make a
difference (specifically in terms of warning messages) that is
undesired. The issues here are not style issues -- at least, not
in their high-order bits. So I suspect the word "style" is being
used here to avoid having to defend the practice of unnecessary
casting.

Hi Tim,

You've given me a lot to think about here - thank you. I've read
through your post - I'll be re-reading it, and considering it in light
of code I have written and code I am writing. I need to look through
some of my code to see what my real practice is, and how often I
/actually/ use unnecessary casts - I don't think it is nearly as common
as I seem to have implied. But in light of the points you have made, I
might aim to use even fewer in the future - or at least, make sure it is
clearer in the code /why/ they are there.

So rather than trying to answer you point-for-point, I will try to
understand you point-for-point, and see if I can learn from it and
improve my code. (The same applies to other posts by you and others in
this thread.)


One point, however - you said a couple of times that "uint32_t" might
not exist on all platforms. This is true, but it is very rare (even
with pre-C99 toolchains, it is common to implement a basic <stdint.h>
equivalent header with these types). Few people need to write code that
will work on such a wide range of systems that it includes those without
uint32_t - so normally it is fine to assume these types exist and let
the program fail to compile on such difficult targets. The same applies
to other types - I occasionally use targets that don't have 8-bit types,
and much of my code would fail to compile on such devices.


Another side issue is Misra. I mentioned it merely to show that I am
not the only one who uses explicit casts even when sometimes
unnecessary. Apart from that, I agree with your opinions on Misra -
there is a lot about it that I don't like. I think there are perhaps
50% good rules, 50% bad rules, and 50% obvious rules - with a smattering
of contradictory rules for good measure.

David
 
T

Tim Rentsch

David Brown said:
[snip rentsch's rants]

Hi Tim,

You've given me a lot to think about here - thank you.

Great response, and I appreciate you taking it that way. This
was in fact my hope despite the tone being rather strident in
places.
I've read through your post - I'll be re-reading it, and
considering it in light of code I have written and code I am
writing. [snip elaboration]

Outstanding! I'm looking forward to hearing about what you
discover.
One point, however - you said a couple of times that "uint32_t"
might not exist on all platforms. This is true, but it is very
rare (even with pre-C99 toolchains, it is common to implement a
basic <stdint.h> equivalent header with these types). [snip
elaboration]

I take your point. However, something you may not know is that
there are additional conditions that need to be met besides just
having a type of the right width for uint32_t, etc. For example,
an implementation that has 8-bit characters and uses two's
complement for signed char, but which has a range of -127 to 127
(perhaps have provide a "never been assigned" value for signed
characters) is /not allowed/ to define uint8_t - and this despite
there being an 8-bit unsigned type! So if one is intending to
promote platform-agnostic code (which I agree is a good goal)
then as a rule uint32_t, etc, should be used as sparingly as
possible. For example, in expressions that need promotion,
uint_least32_t or uint_fast32_t are better choices, as these
types are required in all C99 implementations (and presumably
will be available in C90 implementations that have <stdint.h>
as well). And there are other ways that offer even broader
support but I will leave that for another time.
Another side issue is Misra. I mentioned it merely to show that
I am not the only one who uses explicit casts even when sometimes
unnecessary.

Yes, unfortunately this practice is all too common. That's
partly why I think it's important to offer another perspective.
Apart from that, I agree with your opinions on
Misra - there is a lot about it that I don't like. I think there
are perhaps 50% good rules, 50% bad rules, and 50% obvious rules
- with a smattering of contradictory rules for good measure.

To put it in the most favorable light, the people who did Misra
originally probably were confronted with an environment where
there was no discipline whatsoever, and decided that some common
discipline was better than no discipline even if the common rules
were sub-optimal. And that may be right. Still, it would be
nice if more thought had gone into what rules really help rather
than impose somewhat arbirary restrictions. My sense of Misra is
that these rule address the symptoms of a problem a lot more than
they do the underlying disease.
 
D

David Brown

David Brown said:
[snip rentsch's rants]

Hi Tim,

You've given me a lot to think about here - thank you.

Great response, and I appreciate you taking it that way. This
was in fact my hope despite the tone being rather strident in
places.
I've read through your post - I'll be re-reading it, and
considering it in light of code I have written and code I am
writing. [snip elaboration]

Outstanding! I'm looking forward to hearing about what you
discover.
One point, however - you said a couple of times that "uint32_t"
might not exist on all platforms. This is true, but it is very
rare (even with pre-C99 toolchains, it is common to implement a
basic <stdint.h> equivalent header with these types). [snip
elaboration]

I take your point. However, something you may not know is that
there are additional conditions that need to be met besides just
having a type of the right width for uint32_t, etc. For example,
an implementation that has 8-bit characters and uses two's
complement for signed char, but which has a range of -127 to 127
(perhaps have provide a "never been assigned" value for signed
characters) is /not allowed/ to define uint8_t - and this despite
there being an 8-bit unsigned type!

Yes, and I actually knew that one. But again, such systems are
incredibly rare. In general, it is a good idea to take advantage of the
capabilities provided by 99% of current platforms (and I would guess 99%
is an underestimate here) when it lets you write better code for these
systems. If you happen to be in a position to use and code for a system
with -127..127 range signed chars, then deal with it at the time.

For most people, and most code, this also applies to the assumption that
int8_t and int16_t exist, even though there are current platforms (some
types of DSP chips) that don't support them.
So if one is intending to
promote platform-agnostic code (which I agree is a good goal)
then as a rule uint32_t, etc, should be used as sparingly as
possible. For example, in expressions that need promotion,
uint_least32_t or uint_fast32_t are better choices, as these
types are required in all C99 implementations (and presumably
will be available in C90 implementations that have <stdint.h>
as well). And there are other ways that offer even broader
support but I will leave that for another time.

I agree that types such as uint_least32_t and int_fast32_t are good
types to use, though I seldom use them myself (since they are a bit
ugly, and much of my code does not need to work efficiently across a
wide range of devices). I don't think they are useful because they
exist even on weird platforms, but because they are very descriptive of
exactly what you want the type to do, and because they can give more
efficient code across a range of platforms. For example, if your code
should work well on very small devices, then you will use 8-bit types a
lot - but these same types could be slower on big processors. So
"int_fast8_t" would be the ideal choice.
Yes, unfortunately this practice is all too common. That's
partly why I think it's important to offer another perspective.

I agree - I have seen "Misra" specified by customers, or given as a
feature of code, without an understanding of what it actually is and means.
To put it in the most favorable light, the people who did Misra
originally probably were confronted with an environment where
there was no discipline whatsoever, and decided that some common
discipline was better than no discipline even if the common rules
were sub-optimal. And that may be right.

Agreed.

They also wanted to make a system that would work even with the tools
available at the time for the devices common in the industry (the
European automotive industry) - and many of these toolchains were
terrible. Still, they could have for example specified that if your
toolchain won't give warnings on "if (a = 1) ...", then you should use
"lint" in your setup - it would have been much better than saying you
should write backwards "if (1 == a)...".
Still, it would be
nice if more thought had gone into what rules really help rather
than impose somewhat arbirary restrictions. My sense of Misra is
that these rule address the symptoms of a problem a lot more than
they do the underlying disease.

Yes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top