Does your C compiler support "//"? (was Re: using structures)

D

Douglas A. Gwyn

Nick said:
Perhaps someone should write a book on "Arithmetic Types For
Procedural Languages". I'd read it...

Given the limited market, it would pretty much have to be a
self-published book. Maybe after I retire..
 
R

Ross Ridge

No, it's not news to you. You're on record has having fixed GNU code
that assumed 32-bit ints and as result wan't portable to 64-bit CPUs.

http://mail.gnu.org/archive/html/bug-bash/2001-04/msg00096.html

Paul Eggert said:
Yes, I've fixed several bugs in that area. But it's a big leap to
suggest that the bugs were there because of the GNU coding standards.

No, not at all. Lots of GNU code assumed 32-bit ints wasn't portable to
64-bit CPUs, either the GNU coding standard were being ingored completely
or the assume at least 32-bit ints requirement was being misinterpreted.
Since almost no GNU code is portable to 16-bit CPUs, this suggests to
me that the GNU coding standards weren't being completely ignored.
I've also fixed bugs in the GNU where the source code assumed ASCII;
shall we chalk them up to the GNU coding standards too?

Yes, of course. The GNU coding standards let you assume ASCII.

Ross Ridge
 
K

Keith Thompson


Making "byte" a keyword would probably have broken a lot of existing
code. I'm almost inclined to think it would have been worth it.

Assuming you have a predefined "byte" type, is it signed or unsigned?
Is the signedness implementation-defined? Do you have explicit
"signed byte" and "unsigned byte" types?

Existing code that typedefs "byte" almost always defines it as
unsigned char, which would argue for requiring "byte" to be unsigned
and allowing "signed byte".

Oh well, it's too late now. But forcing the unit for sizeof to be the
same as the size of a character was a poor choice, even if it happened
to work for all supported platforms at the time.
what would be the right way to do it? Have any other languages
managed better? Ada?.

Ada has a predefined integer type Integer with a range of at least
-2**15+1 .. 2**15-1, and optionally another predefined integer type
Long_Integer, with a range of at least -2**31+1 .. 2**31-1.
Implementations may also provide any number of other predefined
integer types such as Short_Short_Integer, Short_Integer, and
Long_Long_Integer. Rather than predefined constants, Ada provides
attributes for all integer types, such as Integer'First, Integer'Last,
and Integer'Size (the latter is the size in bits).

Well-written Ada code tends to use user-defined integer types such as

type My_Integer is range 0 .. 1_000_000;

or, if you want the greatest possible range:

type Max_Integer is System.Min_Int .. System.Max_Int;

Unsigned types can be declared as:

type Unsigned_16 is mod 2**16;

There's similar support for fixed-point and floating-point types.

I have some vague thoughts about adding such a facility to a future
version of C (letting a user declare an integer type with a specified
range), but I don't see much demand for it.
 
K

Keith Thompson

It was a bad idea, anyway: wchar_t is not supposed to be a type on its
own.

Why not? Is there any reason other than compatibility with the
current standard for not making it a distinct type?

If compatibility is the only problem, I agree with your point -- but I
might still argue that creating a distinct "long char" type would have
been cleaner than adding wchar_t as a typedef in C89/C90.

<OT>FWIW, C++ has wchar_t as a distinct type whose name is a keyword.</OT>
 
P

Paul Eggert

No, not at all. Lots of GNU code assumed 32-bit ints wasn't portable to
64-bit CPUs, either the GNU coding standard were being ingored completely

There are many bugs like that in non-GNU code as well. It's a very
common error. Blaming this on the GNU coding standards -- when they
don't say at all that you can assume that ints are at most 32 bits --
is a bit weird. You might as well blame all coding errors on the
standard.
Yes, of course. The GNU coding standards let you assume ASCII.

Where? I just searched for "ASCII" in the coding standards, and don't
see where C programmers can assume that the character set is ASCII.
No doubt some ASCII-related bugs remain in GNU code, but most of it
has run in non-ASCII locales for quite some time.
 
P

Paul Eggert

C99 requires at-least-16-bit int because the standard on which it's
based, C90, requires at-least-16-bit int.

That doesn't significantly affect my point. The same logical argument
also applies against C90 requiring at-least-16-bit int.
 
D

Douglas A. Gwyn

Keith said:
Oh well, it's too late now. But forcing the unit for sizeof to be the
same as the size of a character was a poor choice, even if it happened
to work for all supported platforms at the time.

Absolutely. Once Dennis chimed in and insisted that it
was an essential property, the battle for a clean
separation became hopeless. My 1986 proposal covered
all aspects (mem*() were byte oriented, str*() were
char oriented; I/O streams opened in text mode were
char oriented, those opened in binary mode were byte
oriented).

You may have encountered code where sizeof(char)*... was
used, even though now "everybody knows" it has to be
precisely 1 and thus the multiplication seems pointless.
But if the units of storage hadn't necessarily matched
type char, then that would have been just as essential
as for any other array element type. (Another unrelated
reason for such a construct was that before size_t and
prototypes were invented, that was the easiest portable
way to make an argument have the right type; unsigned
long on some platforms and unsigned int on others.)
Ada has...

Java has separate byte and character basic types.
(Although they mandate a universal virtual machine so
there is no flexibility in the widths of these.)
I have some vague thoughts about adding such a facility to a future
version of C (letting a user declare an integer type with a specified
range), but I don't see much demand for it.

There won't be much demand for anything that won't be
widely supported. That's why it makes sense to work
for such improvements within the context of the language
standard.
 
D

Dan Pop

In said:
Why not? Is there any reason other than compatibility with the
current standard for not making it a distinct type?

If you make it a distinct integer type, you have to specify its place/rank
in the integer type system. This makes it a lot less flexible than having
it as a typedef: currently, there are implementations with 8, 16 or 32-bit
wchar_t's, which wouldn't be possible if wchar_t had a "fixed" position in
the integer type system.

It is also hard to imagine an implementation where wchar_t wouldn't
naturally map on one of the already existing integer types, therefore
a typedef makes a lot more sense than a separate type.

Dan
 
K

Keith Thompson

Douglas A. Gwyn said:
Keith Thompson wrote: [...]
I have some vague thoughts about adding such a facility to a future
version of C (letting a user declare an integer type with a specified
range), but I don't see much demand for it.

There won't be much demand for anything that won't be
widely supported. That's why it makes sense to work
for such improvements within the context of the language
standard.

I was thinking of something that would have to be in the standard
itself, since it would involve new syntax. (I suppose it could be
implemented as an extension, but such an extension couldn't be used
with any compiler that doesn't support it -- unlike, say, <stdint.h>,
which can be implemented in pre-C99 code.) My thought is that there
isn't enough demand to get it into the standard in the first place.

Here's the idea. Allow a new kind of type specifier, something like

signed(-1000, 1000)
unsigned(0, 255)

I'm not sure that's the best syntax; I'm open to suggestions. The
arguments specify the range that must be supported by the type. It
acts like a typedef for some predefined signed or unsigned type that
includes the requires range. The resulting type is not constrained by
the specified range; the range is used only to select which predefined
type to use.

The lower bound isn't strictly necessary for unsigned types, but I
think it should be required for symmetry with signed types.

The type signed(-32768, 32767) would be at least 16 bits on a
2's-complement system, and more than 16 bits on a 1's-complement or
signed-magnitude system.

I think that unsigned(0, 255) should be equivalent to uint_fast8_t.
There should probably be another syntax for something equivalent to
uint_least8_t. I don't think it makes sense for a type with a range
specification to require an exact size (as uint8_t does), but I
wouldn't mind another syntax for specifying an exact number of bits.
Maybe just signed(16) and unsigned(32).

The C++ folks might argue that the syntax should be unsigned<0, 255>,
since this is similar in concept to C++ templates -- but if a future
C++ standard were to adopt this for compatibility with C, the dual use
of the <> syntax could cause more confusion. Or not; if this idea
goes anywhere, someone should look into the C/C++ compability issues.
(Compatibility with C++ is not required, of course, but if we can get
it easily we should.)

Probably the only way to guarantee that this would be accepted in a
new standard would be to implement it via another overloading of the
"static" keyword. :cool:}
 
R

Ross Ridge

Paul Eggert said:
Yes, I've fixed several bugs in that area. But it's a big leap to
suggest that the bugs were there because of the GNU coding standards.

Ross said:
No, not at all. Lots of GNU code assumed 32-bit ints wasn't portable to
64-bit CPUs, either the GNU coding standard were being ingored completely

Paul Eggert said:
There are many bugs like that in non-GNU code as well. It's a very
common error.

For non-GNU code its not an error at all unless the code was written to
conform to some other coding standard that had a similar requirement.
Any coding standard that says "assume at least 32-bit ints", I think is
going to result in people assuming that ints are exactly 32 bit wide,
and I think the GNU coding standard and the code written supposedly to
conform to it demonstrate this in spades.
Blaming this on the GNU coding standards -- when they
don't say at all that you can assume that ints are at most 32 bits --
is a bit weird.

Please read my entire post before responding. It's not all weird.
I'll repeat myself one more time. Either the GNU coding standard is being
ignored or it's being misinterpretted. These aren't typos or accidental
mistakes but deliberate and explict assumptions that int is 32-bits wide.
Where? I just searched for "ASCII" in the coding standards, and don't
see where C programmers can assume that the character set is ASCII.

The coding standard sets a minimum standard for portablilty:

The primary purpose of GNU software is to run on top of the GNU
kernel, compiled with the GNU C compiler, on various types of CPU.

Since the GNU OS is an ASCII based OS then accordingly you can assume
ASCII. The coding standard even goes as far as to say that portability
to an OS like MVS, one of the few non-ASCII based operating systems
currently available, is undesirable:

As for systems that are not like Unix, such as MSDOS, Windows,
the Macintosh, VMS, and MVS, supporting them is often a lot of
work. When that is the case, it is better to spend your time
adding features that will be useful on GNU and GNU/Linux, rather
than on supporting other incompatible systems.
No doubt some ASCII-related bugs remain in GNU code, but most of it
has run in non-ASCII locales for quite some time.

Then maybe I'm wrong. Maybe GNU programmers are just ignoring the GNU
coding standards entirely.

Ross Ridge
 
R

Ross Ridge

I don't see the latest edition of POSIX being terribly
relevent. Who the heck has any reason to implement it?

Paul Eggert said:
The same sort of people who are implementing C99.
Gluttons for punishment, all of them.

So just SGI then?


Ross Ridge
 
K

Keith Thompson

If you make it a distinct integer type, you have to specify its place/rank
in the integer type system. This makes it a lot less flexible than having
it as a typedef: currently, there are implementations with 8, 16 or 32-bit
wchar_t's, which wouldn't be possible if wchar_t had a "fixed" position in
the integer type system.

It is also hard to imagine an implementation where wchar_t wouldn't
naturally map on one of the already existing integer types, therefore
a typedef makes a lot more sense than a separate type.

The C++ standard says:

Type wchar_t is a distinct type whose values can represent
distinct codes for all members of the largest extended character
set specified among the supported locales (22.1.1). Type wchar_t
shall have the same size, signedness, and alignment requirements
(3.9) as one of the other integral types, called its underlying
type.

Presumably its rank is the same as that of its underlying type. C
could have done the same thing. I'm not arguing that it should be
changed, just that the decision that was made was not the only
reasonable one.

Personally, I'd be happier if typedefs created new distinct types
rather than just aliases for existing ones. One problem with the
existing situation is that this:

#include <stddef.h>
size_t s;
unsigned long *ptr = &s;

is legal or not depending on whether size_t is typedef'ed as unsigned
long. If it's legal on a given platform, it's a bug silently waiting
to show up when the code is compiled on another platform; if the types
happen to match, the compiler isn't likely even to produce a warning.
If a typedef created a new type, the problem would be caught
immediately.

But of course it's way too late to change it; such a change would
break too much existing code.
 
P

Paul Eggert

For non-GNU code its not an error at all unless the code was written to
conform to some other coding standard that had a similar requirement.

You've lost me here. Certainly in many environments it's OK to assume
32-bit int. But there are many non-GNU environments where it's not.

Any coding standard that says "assume at least 32-bit ints", I think is
going to result in people assuming that ints are exactly 32 bit wide,

I disagree. But if you really think it leads to confusion, you're
welcome to suggest improvements in the wording in the GNU coding
standards, as well as to the latest POSIX standard.

These aren't typos or accidental mistakes but deliberate and explict
assumptions that int is 32-bits wide.

In the only specific problem you've mentioned, namely
<http://mail.gnu.org/archive/html/bug-bash/2001-04/msg00096.html>,
the code (since fixed) did not actually make that assumption.
Instead, when it allocated arrays to contain string representations of
integers, it allocated fixed-size buffers. These buffers assumed that
the integers to be printed were in the range (-10**15, 10**16), or in
some cases the range (-10**30,10**31). So the code was attempting to
be portable (and in fact is portable to all but a handful of current
hosts of interest to the GNU project, since you'd need
greater-than-49-bit int or greater-than-102-bit long to violoate those
assumptions), but it didn't do the right thing in general.

It's quite inaccurate to characterize this particular bug as
"explicit assumptions that int is 32-bits wide".

There may well be other bugs in GNU code, where the code is supposed
to be portable, but indeed assumes 32-bit int. Such bugs would be
understandable, but they should be fixed. I'm not aware of any right
now, though.

Since the GNU OS is an ASCII based OS

No, lots of GNU utilities run in non-ASCII locales every day, on
millions of desktops and servers. You've jumped to incorrect
conclusions that are not at all warranted by what's actually in the
GNU coding standards.
 
D

Douglas A. Gwyn

Keith said:
I was thinking of something that would have to be in the standard
itself, since it would involve new syntax. ...
Here's the idea. Allow a new kind of type specifier, something like
signed(-1000, 1000)
unsigned(0, 255)
I'm not sure that's the best syntax; I'm open to suggestions. ...

That's one approach. There are other attributes of integer
types that would also be useful to be able to specify, such
as representational requirements (endianness, encoding, etc.).
If we limited the extension to just the range, it doesn't buy
much that <stdint.h> doesn't already provide, although it is
packaged more attractively for some purposes.

I think you're right that there won't be sufficient interest
in just adding some syntactic sugar; however, there is some
interest in giving the programmer more control over
representation. Generally we have considered that syntax
can wait until after we have agreement on the desired
semantics.
 
L

lawrence.jones

In comp.std.c Keith Thompson said:
Probably the only way to guarantee that this would be accepted in a
new standard would be to implement it via another overloading of the
"static" keyword. :cool:}

``It wouldn't be a new C standard if it didn't give a new meaning to the
word 'static'.''
- Peter Seebach in `C Unleashed'

-Larry Jones

He doesn't complain, but his self-righteousness sure gets on my nerves.
-- Calvin
 
J

James Kuyper

Ross said:
Please read my entire post before responding. It's not all weird.
I'll repeat myself one more time. Either the GNU coding standard is being
ignored or it's being misinterpretted.

You can't blame a stnadard for errors made as a result of ignoring it.
You can only blame it for misinterpretations if it is so poorly written
that the misinterpretations are it's fault. I don't see how you could
say "at least 32 bits" any more clearly than by saying "at least 32
bit". I don't see how the authors can prevent a misinterpretation by
someone whose reading skills are so poor that they misread it as "at
most 32 bits".
 
D

Douglas A. Gwyn

I don't see how the authors can prevent a misinterpretation by
someone whose reading skills are so poor that they misread it as "at
most 32 bits".

I seem to recall that when the GNU project started,
Stallman went on record as stating that he was only
interested in target platforms with precisely 32-bit
words and further, with flat address spaces whose
addresses would safely fit into a 32-bit word. I.e.
VAX-like. It wouldn't be surprising in that case if
much of the early GNU software had exact-32-bit
dependencies.
 
P

Paul Eggert

I seem to recall that when the GNU project started,
Stallman went on record as stating that he was only
interested in target platforms with precisely 32-bit
words

I don't recall anything like that, and I've been helping to write GNU
code since the mid-1980s. Can you cite any chapter and verse here?

and further, with flat address spaces whose
addresses would safely fit into a 32-bit word.

You're right about the flat address space, but I don't recall ever
seeing any assumption about 32-bit addresses. I looked on the net for
old versions of the GNU coding standards, and the most relevant quote
I found was this:

You can assume that all pointers have the same format, regardless of
the type they point to, and that this is really an integer. There are
some weird machines where this isn't true, but they aren't important;
don't waste time catering to them.

(This is a quote from a circa-1994 edition of the GNU coding
standards, the oldest edition that I can easily get my hands on.
The current wording is quite different in this area.)

This differs from your recollection. Perhaps you misunderstood
"integer" to mean "int", and you assumed that "int" had to be exactly
32 bits wide? These would be understandable confusions, but they
weren't present in the original document.
 
D

Dan Pop

In said:
The C++ standard says:

Type wchar_t is a distinct type whose values can represent
distinct codes for all members of the largest extended character
set specified among the supported locales (22.1.1). Type wchar_t
^^^^^^^^^^^^
shall have the same size, signedness, and alignment requirements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(3.9) as one of the other integral types, called its underlying
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
type.
^^^^

In other words, wchar_t is *not* a true type on its own.
Personally, I'd be happier if typedefs created new distinct types
rather than just aliases for existing ones.

There are good uses for both flavours of typedef. Having "typedef" and
"typealias" would have been the right thing.

You really don't want these two declarations of signal() to be
incompatible:

void (*signal(int, void (*)(int)))(int);

typedef void handler(int);
handler *signal(int, handler *);

The former (the "official" one) is simply a nightmare for the human
reader, while the latter is straightforward. I wouldn't mind using
another keyword for this purpose, but I would mind losing this capability.
One problem with the
existing situation is that this:

#include <stddef.h>
size_t s;
unsigned long *ptr = &s;

is legal or not depending on whether size_t is typedef'ed as unsigned
long. If it's legal on a given platform, it's a bug silently waiting
to show up when the code is compiled on another platform; if the types
happen to match, the compiler isn't likely even to produce a warning.

But if they don't match, the compiler *must* generate a diagnostic, so
where is the problem?
If a typedef created a new type, the problem would be caught
immediately.

The problem is caught as soon as it becomes a problem ;-)
But of course it's way too late to change it; such a change would
break too much existing code.

Adding a *new* keyword with your preferred semantics is still doable.
And changing things like size_t from their current status of type alias
to the status of "new type" shouldn't hurt existing *correct* code, as
long as the new type inherits all the properties of the underlying type.

Dan
 
D

Douglas A. Gwyn

Paul said:
This differs from your recollection. Perhaps you misunderstood
"integer" to mean "int", and you assumed that "int" had to be exactly
32 bits wide? These would be understandable confusions, but they
weren't present in the original document.

Thanks for looking these up; I was working entirely from memory,
and as it was long ago (and noticed only incidentally in the first
place), I could easily have misremembered.

It is, however, noteworthy that "some machines exist but they
aren't important" was an argument even then. It is really easy
to avoid depending on all pointers having the same format, and
it is reasonable that they might not have the same format (e.g.,
on a word addressable machine it might be more efficient in the
generated code if the C implementation allocates the word address
and a byte offset in adjacent words rather than having to pack/
unpack them into a single word). When the programming guidelines
encourage reliance on property when it isn't necessary to do so,
it is natural for programmers to produce code that with the same
effort could have been more widely portable than it turns out to be.
As a long-time proponent of code portability I hate to see that
happen.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,797
Messages
2,569,646
Members
45,374
Latest member
VernitaBer

Latest Threads

Top