Serious conformance bug in <some_compiler>


T

teapot

Consider this code:

/* foo.c */
#include <stdio.h>

int main(void)
{
long long unsigned int arr[2] = { 1, 1 };

printf("arr[0]: %llu; arr[1]: %llu\n", arr[0], arr[1]);

return 0;
}
/* end foo.c */


Compiled with gcc in C99 mode, program output is:

arr[0]: 1; arr[1]: 1


However, <some_compiler> (in its default, C99 mode)
issues the following diagnostics:

Warning foo.c: 8 printf argument mismatch for format u. Expected long long got unsigned int
Warning foo.c: 8 printf argument mismatch for format u. Expected long long got unsigned int
0 errors, 2 warnings


and the resulting program prints

arr[0]: 4294967297; arr[1]: 9222791494719509029


Open questions:
1. Would not "%llu" require ``unsigned long long'', as opposed
to ``long long'' like the compiler's diagnostic claims?
2. Is not the code calling printf with two arguments of type
``unsigned long long''?
3. Do the printed values seem right?
 
Ad

Advertisements

B

Barry Schwarz

Consider this code:

/* foo.c */
#include <stdio.h>

int main(void)
{
long long unsigned int arr[2] = { 1, 1 };

printf("arr[0]: %llu; arr[1]: %llu\n", arr[0], arr[1]);

return 0;
}
/* end foo.c */


Compiled with gcc in C99 mode, program output is:

arr[0]: 1; arr[1]: 1


However, <some_compiler> (in its default, C99 mode)
issues the following diagnostics:

Warning foo.c: 8 printf argument mismatch for format u. Expected long long got unsigned int
Warning foo.c: 8 printf argument mismatch for format u. Expected long long got unsigned int
0 errors, 2 warnings


and the resulting program prints

arr[0]: 4294967297; arr[1]: 9222791494719509029


Open questions:
1. Would not "%llu" require ``unsigned long long'', as opposed
to ``long long'' like the compiler's diagnostic claims?
2. Is not the code calling printf with two arguments of type
``unsigned long long''?
3. Do the printed values seem right?

6.7.2-2 identifies the legal types. "unsigned long long" and
"unsigned long long int" are listed. "long long unsigned" and "long
long unsigned int" never appear in the standard.

The bug in the compiler that your code demonstrates is the failure to
produce a diagnostic for the invalid syntax consisting of the ignored
"long long" that precedes your definition of an array of unsigned int.
 
K

Kaz Kylheku

However, <some_compiler> (in its default, C99 mode)
issues the following diagnostics:

Warning foo.c: 8 printf argument mismatch for format u. Expected long long got unsigned int

Looks like <some_compiler> doesn't understand all permutations of type
specifiers; it somehow thinks that ``long long unsigned int'' is
just ``unsigned int''.

As a guess, this could be a consequence of trying to match valid type specifier
combinations using phrase structure rules, and missing some combinations.

I.e. perhaps ``unsigned long long'' is understood, but throw an int in there,
or move the unsigned and there is a problem.
Warning foo.c: 8 printf argument mismatch for format u. Expected long long got unsigned int
0 errors, 2 warnings


and the resulting program prints

arr[0]: 4294967297; arr[1]: 9222791494719509029


Open questions:
1. Would not "%llu" require ``unsigned long long'', as opposed
to ``long long'' like the compiler's diagnostic claims?

No. RTFS.

ll (ell-ell) Specifies that a following d, i, o, u, x, or X conversion
specifier applies to a long long int or unsigned long long int
argument; or that a following n conversion specifier applies to a
pointer to a long long int argument.
2. Is not the code calling printf with two arguments of type
``unsigned long long''?
Yes.

3. Do the printed values seem right?

Do trolls ask a lot of rhetorical questions? You bet.

Programmers who write ``long long unsigned int'' instead of ``unsigned long long'' should be shot.

And only then should compilers which do not handle that combination be fixed.

The C grammar is incredibly brain damaged in the first place to allow
specifiers in multiple orders. There is no advantage in allowing all those
permutations.

The standard should identify canonical forms for all of the types, and mark the
other combinations (e.g. ``int unsigned'' and other bullshit) as obsolescent.
 
K

Keith Thompson

Barry Schwarz said:
6.7.2-2 identifies the legal types. "unsigned long long" and
"unsigned long long int" are listed. "long long unsigned" and "long
long unsigned int" never appear in the standard.

The bug in the compiler that your code demonstrates is the failure to
produce a diagnostic for the invalid syntax consisting of the ignored
"long long" that precedes your definition of an array of unsigned int.

You need to read 6.7.2-2 more closely:

Each list of type specifiers shall be one of the following sets
(delimited by commas, when there is more than one set on a line);
the type specifiers may occur in any order, possibly intermixed
with the other declaration specifiers.

So "long long unsigned" and "long long unsigned int" are perfectly
legal synonyms for "unsigned long long int".

(Since the keyword "long" may appear twice, N1336, the preliminary
C201x draft, uses the word "multiset" rather than "set".)
 
K

Keith Thompson

teapot said:
Consider this code:

/* foo.c */
#include <stdio.h>

int main(void)
{
long long unsigned int arr[2] = { 1, 1 };

printf("arr[0]: %llu; arr[1]: %llu\n", arr[0], arr[1]);

return 0;
}
/* end foo.c */


Compiled with gcc in C99 mode, program output is:

arr[0]: 1; arr[1]: 1
Good.

However, <some_compiler> (in its default, C99 mode)
issues the following diagnostics:

Warning foo.c: 8 printf argument mismatch for format u. Expected long long got unsigned int
Warning foo.c: 8 printf argument mismatch for format u. Expected long long got unsigned int
0 errors, 2 warnings


and the resulting program prints

arr[0]: 4294967297; arr[1]: 9222791494719509029
[...]

We had a lengthy discussion some time ago about the fact that
<some_compiler> doesn't properly handle all the permutations of type
specifies required by C99 6.7.2. It's a bug in <some_compiler>;
there's not much of a C language issue.

If you want to complain about a bug in <some_compiler>, contact its
maintainer, or post to comp.compilers.<some_compiler>. Meanwhile, you
can use another compiler or modify your code to use an order that
<some_compiler> can handle.
 
K

Keith Thompson

Richard Heathfield said:
teapot said: [...]
3. Do the printed values seem right?

No, but that's not gcc's problem. It's the library's problem.

The incorrect output was from the program compiled with
<some_compiler>, not gcc. If my educated guess is correct,
<some_compiler> and its runtime library are from the same source --
but based on the description, the problem does appear to be the
compiler's fault.
 
Ad

Advertisements

B

Barry Schwarz

You need to read 6.7.2-2 more closely:

Each list of type specifiers shall be one of the following sets
(delimited by commas, when there is more than one set on a line);
the type specifiers may occur in any order, possibly intermixed
with the other declaration specifiers.

So "long long unsigned" and "long long unsigned int" are perfectly
legal synonyms for "unsigned long long int".

Yes you are right. So "double long", "unsigned int long long", and
"long int long unsigned" are all legal and now the people who like
1["abc"] can have more fun. And just think of all the options if you
add static, extern, or volatile.

While we are in that paragraph - what does the parenthetical phrase
mean? Since very little C code is affected limited by line boundaries
(a string literal is the only instance that comes to mind), why the
reference to "on a line"? If the second set happens to be on the next
line, is the comma omitted? Other than the parameter list of a
function declaration, is there another place where there is "more than
one set"?
 
H

Harald van $)CD)&k

You need to read 6.7.2-2 more closely:

Each list of type specifiers shall be one of the following sets
(delimited by commas, when there is more than one set on a line);
the type specifiers may occur in any order, possibly intermixed with
the other declaration specifiers.

So "long long unsigned" and "long long unsigned int" are perfectly legal
synonyms for "unsigned long long int".
Yes you are right. So "double long", "unsigned int long long", and
"long int long unsigned" are all legal and now the people who like
1["abc"] can have more fun. And just think of all the options if you
add static, extern, or volatile.

While we are in that paragraph - what does the parenthetical phrase
mean?

It means that when you see "int, signed, or signed int" in the list, it
means

int a;
signed b;
signed int c;

are all valid, but

int signed signed int d;

is invalid. :)
 
F

Flash Gordon

While we are in that paragraph - what does the parenthetical phrase
mean? Since very little C code is affected limited by line boundaries
(a string literal is the only instance that comes to mind), why the
reference to "on a line"? If the second set happens to be on the next
line, is the comma omitted? Other than the parameter list of a
function declaration, is there another place where there is "more than
one set"?

The parenthetical phase is referring to the table in the standard, not
to C code.
 
K

Keith Thompson

I think whoever came up with "long long" should be shot. What the hell was
wrong with making "long" be 64 bits, int 32 bits, etc.? If 128 bits ever
become justified, what are we going to have, really long long int or some
such crap, or are they going to do what they should have done, and (gasp)
change the widths of the types, which were never defined to have a
specific width anyway? Sorry for the tangent...

The problem is that the names of the predefined integer types are not
arbitrary, or at least are not seen to be arbitrary. char must be
exactly 1 byte, int is supposed to be the natural size for the
machine, and there are similar preconceptions about long. This, along
with the fact that it's convenenient to have a predefined type for
each supported size (commonly 8, 16, 32, and 64 bits) places some
constraints on the way the types can be defined.

You certainly *could* have a system where char, short, int, and long
are 8, 16, 32, and 64 bits, respectively, but on most systems that
would break existing code that makes assumptions about the sizes of
the predefined types.

With C99's introduction of int64_t et al, plus extended integer types
whose names needn't be one of the language-defined sequences of
keywords, we *might* be able to get away with this; if you need a
128-bit type, you can just use int128_t (or int_least128_t, or
int_fast128_t) and not worry about whether it happens to correspond to
long long or even short. (Unfortunately the macros for the printf and
scanf formats are ugly, but you can always convert to and from
intmax_t.)
 
Ad

Advertisements

B

Barry Schwarz

The parenthetical phase is referring to the table in the standard, not
to C code.

So the parenthetical expression is specifying how to read the table,
not how to construct code. OK.
 
D

Dik T. Winter

> I think whoever came up with "long long" should be shot.

Shoot the designers of Algol 68, they came up with the possibility of an
arbitrarily long list of longs.
 
C

CBFalconer

Dik T. Winter said:
Shoot the designers of Algol 68, they came up with the possibility
of an arbitrarily long list of longs.

Compare the design of Pascal. There there is only one value,
maxint, to worry about. The system can handle anything up to that
(and down to -maxint). If you want smaller ranges, simply define
the range involved, eg:

CONST
maxsmall = 200;

TYPE
small = 1 .. maxsmall;

and use it:

VAR
foo : small;

and now the compiler will ensure that the storage has sufficient
'size', and that the arithmetic operators work on variables of type
small, and that they can interact with other variables. Notice
that the process has also defined limits so any checking has the
information it needs. How much checking is done, and how closely
the storage matches the desired range, is a compiler and system
dependent thing, and does not affect the accuracy of the code,
although it does affect the errors detected.

Unfortunately many so-called Pascal implementations do not follow
this simple arrangement, resulting in complications, and the
equivalent of defining short, int, long etc. in C.
 
B

Bartc

Dik T. Winter said:
Shoot the designers of Algol 68, they came up with the possibility of an
arbitrarily long list of longs.

That wasn't one of the best features of that otherwise excellent language.

C might at least also have borrowed the type declaration syntax from it
instead of the convoluted mess it uses at the moment that even experts have
trouble with.
 
S

Stupid Echo

Dik T. Winter said:
Shoot the designers of Algol 68, they came up with the possibility
of an arbitrarily long list of longs.

Compare the design of Pascal. There there is only one value,
maxint, to worry about. The system can handle anything up to that
(and down to -maxint). If you want smaller ranges, simply define
the range involved, eg:

CONST
maxsmall = 200;

TYPE
small = 1 .. maxsmall;

and use it:

VAR
foo : small;

and now the compiler will ensure that the storage has sufficient
'size', and that the arithmetic operators work on variables of type
small, and that they can interact with other variables. Notice
that the process has also defined limits so any checking has the
information it needs. How much checking is done, and how closely
the storage matches the desired range, is a compiler and system
dependent thing, and does not affect the accuracy of the code,
although it does affect the errors detected.

Unfortunately many so-called Pascal implementations do not follow
this simple arrangement, resulting in complications, and the
equivalent of defining short, int, long etc. in C.
 
Ad

Advertisements

C

CBFalconer

blargg said:
Stupid Echo wrote:
[...]
Compare the design of Pascal. There there is only one value,
maxint, to worry about. The system can handle anything up to that
(and down to -maxint). If you want smaller ranges, simply define
the range involved, eg:

CONST
maxsmall = 200;

TYPE
small = 1 .. maxsmall;

and use it:

VAR
foo : small;

and now the compiler will ensure that the storage has sufficient
'size', and that the arithmetic operators work on variables of type
small, and that they can interact with other variables. Notice
that the process has also defined limits so any checking has the
information it needs. How much checking is done, and how closely
the storage matches the desired range, is a compiler and system
dependent thing, and does not affect the accuracy of the code,
although it does affect the errors detected.

We can get the same arrangement in C, with regards to the compiler
determining how much checking is done (in this case, no checking is
done):

#define RANGED_INT( min, max ) int

RANGED_INT( 1, 200 ) small;

You are actually replying to me. Stupid Echo has been copying my
posts verbatim and replacing the originator (from field) with his
own pseudo-nym.
 
Ad

Advertisements

D

David Thompson

You need to read 6.7.2-2 more closely:

Each list of type specifiers shall be one of the following sets
(delimited by commas, when there is more than one set on a line);
the type specifiers may occur in any order, possibly intermixed
with the other declaration specifiers.

So "long long unsigned" and "long long unsigned int" are perfectly
legal synonyms for "unsigned long long int".

Yes you are right. So "double long", "unsigned int long long", and
"long int long unsigned" are all legal and now the people who like
1["abc"] can have more fun. And just think of all the options if you
add static, extern, or volatile.
or the other qualifiers (const, restrict) or sc-specs (auto, register,
typedef). C90 (6.9.3) did make sc-spec other than first 'obsolescent'.
But C99 didn't follow through and actually delete it.

And s/now// . This has been a feature(?) of C since its beginning.
(Although the particular example types you used were not original.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top