how can i generate warnings for implicit casts that lose bits?

robert bristow-johnson · Jun 6, 2007

jacob navia said:

No, gcc is off-topic here, just like lcc-win32 is off-topic here. The
proper course for respondents would have been to examine the source the
OP provided, to see whether the issue could somehow be resolved using
standard C. If not, they should have referred the OP to a gcc-specific
group.

as the OP, i didn't think this was so off-topic for comp.lang.c (i
cross-posted to comp.dsp because there is where i loiter and i know
that some of those guys think about nasty details like this). it
wasn't until someone pointed me to gnu.gcc.something that i had any
idea of the "proper" newsgroup to plop this onto.

i think jacob was reasonably on-topic as can be expected. (at least
you should see how conversations drift at comp.dsp. be careful there
because i have been known to rant if the provocation is sufficient.
and i'm not the only one.)

i gotta find that gnu.gcc.whatever group and post the question there.

r b-j

r b-j

robert bristow-johnson · Jun 6, 2007

I haven't tested it, but I think -Wconversion will generate the
warnings you require on things like:

long a ;
int b = 3 ;

a = b ;

why would it do that? there is no loss of information in that
assignment. did you mean b=a;? (then it would have to be established
that sizeof(int)<sizeof(long) or even that would not be a potentially
bad assignment.)

The problem with printf is that is uses <stdarg.h> variable
parameter passing and hence can't do the same checks.

the conversion that happens when a function is called is another (but
related) issue, and you're right, printf() has no way to know itself
what the size of the args are (except that we inform it with all of
the %d or %hd or %ld) and and my little test program in the original
post demonstrated that with the apparent sign extension done with the
short args. the signed short (a_short_array[26]) was sign extended to
ffff-something when placed on the stack and passed to printf(), but
with the %hx field, only the bottom 16 bits were shown. with the %x
field, it shown the 16 bits of the argument in addition to the 16 bits
of sign extension. either way, when the 32-bit long was assigned to a
16-bit short, this should have generated a warning when -Wconversion
was set, in my opinion. and it didn't.

r b-j

Keith Thompson · Jun 7, 2007

CBFalconer said:
There is no possible reason for a warning in the second. Assigning
a shorttype to shortthing is a completely normal action. There
never is an imperative warning.

There is a *possible* reason to warn about the cast (not about the
assignment). Converting bigthing to shorttype can lose information.

robert bristow-johnson · Jun 7, 2007

Keith Thompson wrote: ....

Agreed, which is what I was trying to point out.

in my opinion, when a programmer uses an explicit cast, the compiler
should be able to assume the guy knows what he is doing (and no
warning necessary). it is the implicit conversions that happen when
one type is assigned to another type or passed as an argument that was
meant to be (and declared as) another type, that might need warnigs.
when we do that without an explicit cast (destination_type) operator,
if and only if there is a possible change of value even if the word
size increased, there should be a warning, or at least the option of
that.

this code:

unsigned long an_unsigned_long;
short a_short;
...
an_unsigned_long = a_short;

should generate such a warning, even if the bits get bigger.

r b-j

Randy Yates · Jun 7, 2007

robert bristow-johnson said:
in my opinion, when a programmer uses an explicit cast, the compiler
should be able to assume the guy knows what he is doing (and no
warning necessary). it is the implicit conversions that happen when
one type is assigned to another type or passed as an argument that was
meant to be (and declared as) another type, that might need warnigs.
when we do that without an explicit cast (destination_type) operator,
if and only if there is a possible change of value even if the word
size increased, there should be a warning, or at least the option of
that.

this code:

unsigned long an_unsigned_long;
short a_short;
...
an_unsigned_long = a_short;

should generate such a warning, even if the bits get bigger.

I agree 100 percent.

And just to establish that I've "been around the C block," this year
marks my 18th year using the language. I've used compilers from multiple
vendors under multiple platforms.
--
% Randy Yates % "She has an IQ of 1001, she has a jumpsuit
%% Fuquay-Varina, NC % on, and she's also a telephone."
%%% 919-577-9882 %
%%%% <[email protected]> % 'Yours Truly, 2095', *Time*, ELO
http://home.earthlink.net/~yatescr

glen herrmannsfeldt · Jun 7, 2007

robert bristow-johnson wrote:
(snip)

in my opinion, when a programmer uses an explicit cast, the compiler
should be able to assume the guy knows what he is doing (and no
warning necessary). it is the implicit conversions that happen when
one type is assigned to another type or passed as an argument that was
meant to be (and declared as) another type, that might need warnigs.
when we do that without an explicit cast (destination_type) operator,
if and only if there is a possible change of value even if the word
size increased, there should be a warning, or at least the option of
that.

C is intended to be a relatively low level language, and programmers
are assumed to know what they are doing. Traditionally warnings like
you mention might have been generated by lint, though I don't know
that it ever specifically did that one. Too much code has been
written assuming those conversions work. Note, for example, that

short i;
i=i+2;

would give a warning as i+2 is int, converted to short by assignment.

On the other hand, Java requires a cast for all narrowing assignments
as part of the language definition. That is sometimes inconvenient,
but mostly reminds the programmer to think before writing. Java is
not intended to be as (relatively) low level language as C.

-- glen

David Thompson · Jun 30, 2007

The C standard screwed up when it chose to use the term "byte" to
refer to what is really a storage unit. Everyone who owns a hard drive
knows that a byte is 8 bits. So do most programmers, regardless of
their programming language choice.

Not people who own (or owned) the drives or disks used on PDP-10 or
-6, PDP-8 or -12, GE-635/645/HIS6180, and at least some CDC machines.
And probably quite a few more, although some of the non-8-bit-byte
machines existed before disks became affordable or even possible.

The Ada standard got it right, because it chose to use the term
"storage unit" to refer to what the C standard refers to as a "byte".
Ada83 pre-dates C90 by enough years--go figure.

An IMNVHO underappreciated benefit of Ada is that its terminology and
particularly language keywords was carefully chosen -- admittedly
starting from a clean slate -- to not only be precise _and_ clear, but
to fit together well. IME it's the only serious language other than
COBOL in which one can write reasonably good code that also reads as
tolerable prose. (I consider Knuth's literate programming a
superstructure/methodology rather than a language as such.)

- formerly david.thompson1 || achar(64) || worldnet.att.net

Jerry Avins · Jun 30, 2007

David Thompson wrote:

An IMNVHO underappreciated benefit of Ada is that its terminology and
particularly language keywords was carefully chosen -- admittedly
starting from a clean slate -- to not only be precise _and_ clear, but
to fit together well. IME it's the only serious language other than
COBOL in which one can write reasonably good code that also reads as
tolerable prose. ...

Forth?

Jerry

Eric Jacobsen · Jul 1, 2007

Not people who own (or owned) the drives or disks used on PDP-10 or
-6, PDP-8 or -12, GE-635/645/HIS6180, and at least some CDC machines.
And probably quite a few more, although some of the non-8-bit-byte
machines existed before disks became affordable or even possible.

I don't remember any of the non-8-bit entities being called "bytes",
though. On the CDC machines there were 60-bit "words" and six-bit
characters, but I don't remember calling anything "bytes" that weren't
eight bits. We used to call the 3-bit octal sections in the PDPs
either "octal digits" or chunks or something like that, but "bytes"
were always eight bits to match the power-of-two partitioning.

Eric Jacobsen
Minister of Algorithms
Abineau Communications
http://www.ericjacobsen.org

Jerry Avins · Jul 1, 2007

Eric Jacobsen wrote:

...

I don't remember any of the non-8-bit entities being called "bytes",
though. On the CDC machines there were 60-bit "words" and six-bit
characters, but I don't remember calling anything "bytes" that weren't
eight bits. We used to call the 3-bit octal sections in the PDPs
either "octal digits" or chunks or something like that, but "bytes"
were always eight bits to match the power-of-two partitioning.

I think the issue is what the C standard defined, not what people called
them.

Jerry

Harald van =?UTF-8?B?RMSzaw==?= · Jul 1, 2007

Jerry said:
Eric Jacobsen wrote:

...

I think the issue is what the C standard defined, not what people called
them.

jaysome claimed the standard screwed up, because it did not match existing
practise, and for that, the words that people used do matter.

Jean-Marc Bourguet · Jul 1, 2007

Harald van DÄ³k said:
jaysome claimed the standard screwed up, because it did not match existing
practise, and for that, the words that people used do matter.

The historical meaning of byte is for sure this one:

"A group of bits sufficient to represent one character is called a _byte_
-- a term coined in 1958 by Werner Buchholz."
Computer Architecture, Concepts and Evolution
Gerrit A. Blaauw and Frederick P. Brooks, Jr.

(They give as reference a paper of 1959 which they co-authored with
Buchholz).

The IBM 7030 (aka Stretch) had instructions to manipulate bytes (the
architecture description use the term byte) whose width was anything
between 1 and 8 bits. BTW, Buchholz, Brooks and Blaauw where all three
part of the architecture team of Stretch. The character set designed for
stretch was the first 8 bits one I know of.

The DEC PDP-10 had instructions to manipulate bytes (again the architecture
description use the term byte) whose width was anything between 1 and 36
bits and was commonly used with ASCII 7 bits character (yes, there was a
lost bit per word).

Yours,

CBFalconer · Jul 1, 2007

Jean-Marc Bourguet said:
.... snip ...

The DEC PDP-10 had instructions to manipulate bytes (again the
architecture description use the term byte) whose width was
anything between 1 and 36 bits and was commonly used with ASCII
7 bits character (yes, there was a lost bit per word).

And the janitors made a fortune sweeping up those bits and hawking
them as PDP 11 memory. They were the heart of the bit serial PDP
11 model.

glen herrmannsfeldt · Jul 2, 2007

Jean-Marc Bourguet wrote:
(snip)

The historical meaning of byte is for sure this one:

"A group of bits sufficient to represent one character is called a _byte_
-- a term coined in 1958 by Werner Buchholz."
Computer Architecture, Concepts and Evolution
Gerrit A. Blaauw and Frederick P. Brooks, Jr.

(They give as reference a paper of 1959 which they co-authored with
Buchholz).

The IBM 7030 (aka Stretch) had instructions to manipulate bytes (the
architecture description use the term byte) whose width was anything
between 1 and 8 bits. BTW, Buchholz, Brooks and Blaauw where all three
part of the architecture team of Stretch. The character set designed for
stretch was the first 8 bits one I know of.

I have the book, I will have to look at it. I always thought that
EBCDIC was designed for S/360.

The DEC PDP-10 had instructions to manipulate bytes (again the architecture
description use the term byte) whose width was anything between 1 and 36
bits and was commonly used with ASCII 7 bits character (yes, there was a
lost bit per word).

The bit isn't lost if you have files with line numbers.

How much of the PDP-10 has heritage in the IBM 36 bit machines?

-- glen

Richard Bos · Jul 2, 2007

Jerry Avins said:
Forth?

Inform 7.

Richard

glen herrmannsfeldt · Jul 2, 2007

Eric Jacobsen wrote:
(snip)

I don't remember any of the non-8-bit entities being called "bytes",
though. On the CDC machines there were 60-bit "words" and six-bit
characters, but I don't remember calling anything "bytes" that weren't
eight bits. We used to call the 3-bit octal sections in the PDPs
either "octal digits" or chunks or something like that, but "bytes"
were always eight bits to match the power-of-two partitioning.

The PDP-10 hardware calls anything between one and 36 bits a byte,
with the load and store byte instructions.

-- glen

Jean-Marc Bourguet · Jul 2, 2007

glen herrmannsfeldt said:
Jean-Marc Bourguet wrote:
(snip)

I have the book, I will have to look at it. I always thought that EBCDIC
was designed for S/360.

EBCDIC was designed for S/360. The character set designed for Stretch was
pecular. For example, it is with Baudot the only character set I know of
for which the digits are not consecutive.

The bit isn't lost if you have files with line numbers.

Thanks for reminding me that.

How much of the PDP-10 has heritage in the IBM 36 bit machines?

I don't know. I've copied comp.arch and alt.folklore.computers and set the
follow up there, people there probably know that.

Yours,

how can i generate warnings for implicit casts that lose bits?	13	Jun 7, 2007
How can I set up the path where the compiler(g++) will search forthe header files	3	Dec 13, 2005
clc selected threads (30-jan-2005 to 31-jan-2005) #1	3	Feb 6, 2005
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
[ANN] JRuby 1.4.0 Released	2	Nov 2, 2009
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	May 1, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Feb 15, 2007

how can i generate warnings for implicit casts that lose bits?

robert bristow-johnson

robert bristow-johnson

Keith Thompson

robert bristow-johnson

Randy Yates

glen herrmannsfeldt

David Thompson

Jerry Avins

Eric Jacobsen

Jerry Avins

Harald van =?UTF-8?B?RMSzaw==?=

Jean-Marc Bourguet

CBFalconer

glen herrmannsfeldt

Richard Bos

glen herrmannsfeldt

Jean-Marc Bourguet

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads