Types in C

J

jacob navia

Hi

I am updating my tutorial of C, and I have added a section of type
classification. Please tell me if I forgot something or said
something wrong if possible.

Thanks in advance for your help


A C type can be either a function type, an incomplete type or an object
type. Function types can be either fully specified, i.e. we have a
prototype available, or partially specified with missing arguments but
a known return value. The function type "implicit int function" with
return value and arguments unknown, has been declared obsolete in C99.

Incomplete types are unspecified and it is assumed that they will be
specified elsewhere, except for the void type that is an incomplete
type that can't be further specified.

Object types can be either scalar or composite types. Composite types
are built from the scalar types: structures, unions and arrays. Scalar
types are of two kinds: arithmetic or pointer types. Pointer types can
point to scalar or composite types, to functions or to incomplete types.

Arithmetic types have two kinds: integer types and floating types. The
integer types are either standard integers, bit fields or enumerations.

The standard integer types are boolean, char, short, int, long and long
long, all with signed or unsigned types, except the boolean type
that hasn't any sign but is not an unsigned type.

Floating types are either real or complex, with both of them appearing
in three flavors: float, double and long double.
 
K

Keith Thompson

jacob navia said:
I am updating my tutorial of C, and I have added a section of type
classification. Please tell me if I forgot something or said
something wrong if possible.

Thanks in advance for your help


A C type can be either a function type, an incomplete type or an object
type. Function types can be either fully specified, i.e. we have a
prototype available, or partially specified with missing arguments but
a known return value. The function type "implicit int function" with
return value and arguments unknown, has been declared obsolete in C99.

In C90, "implicit int function" isn't a distinct type.
foo(void);
and
int foo(void);
are merely two different ways of expressing exactly the same thing.
The return type isn't unknown, it's int.
Incomplete types are unspecified and it is assumed that they will be
specified elsewhere, except for the void type that is an incomplete
type that can't be further specified.

Object types can be either scalar or composite types. Composite types
are built from the scalar types: structures, unions and arrays. Scalar
types are of two kinds: arithmetic or pointer types. Pointer types can
point to scalar or composite types, to functions or to incomplete types.

The phrase "composite type" has a very different meaning; see C99 6.2.7.

The standard refers to array and structure types (but not union types)
as "aggregate types". It doesn't seem to have a term that covers
arrays, structures and unions (and I'd be very hesitant in this kind of
tutorial to invent terms that aren't in the standard).
Arithmetic types have two kinds: integer types and floating types. The
integer types are either standard integers, bit fields or enumerations.

The standard integer types are boolean, char, short, int, long and long
long, all with signed or unsigned types, except the boolean type
that hasn't any sign but is not an unsigned type.

You didn't mention extended integer types. I don't know whether you
want to.
 
B

Ben Bacarisse

jacob navia said:
I am updating my tutorial of C, and I have added a section of type
classification. Please tell me if I forgot something or said
something wrong if possible.

I'd add few remarks to what Keith Thompson has said.
A C type can be either a function type, an incomplete type or an object
type. Function types can be either fully specified, i.e. we have a
prototype available, or partially specified with missing arguments but
a known return value. The function type "implicit int function" with
return value and arguments unknown, has been declared obsolete in C99.

Incomplete types are unspecified and it is assumed that they will be
specified elsewhere, except for the void type that is an incomplete
type that can't be further specified.

Form a style point of view, this listing of terms followed by
definitions (your paragraphs below all start with some category of type)
is not that helpful. Often the definition of a term is the least
interesting thing about it. That's particularly true of incomplete
types. I'd explain them by stating how they often occur:

"It is sometimes helpful to postpone or omit the full specification of a
type. Such a partial definition of an object type gives rise an
incomplete type. The type definition will usually be completed later in
the source file or in another source file where the full details are
finally needed. Until a type is complete, the size of the type is
unknown and objects of the type can't be defined. The type void is
rather special. It is an incomplete type that can never be completed."
Object types can be either scalar or composite types. Composite types
are built from the scalar types: structures, unions and arrays. Scalar
types are of two kinds: arithmetic or pointer types. Pointer types can
point to scalar or composite types, to functions or to incomplete types.

Arithmetic types have two kinds: integer types and floating types. The
integer types are either standard integers, bit fields or
enumerations.

I don't think bit fields are really types. They have a type, but you
can't declare something to have a bit-field type. I'd leave them out
altogether -- I think they are better explained as a special kind of
structure members.
The standard integer types are boolean, char, short, int, long and long
long, all with signed or unsigned types, except the boolean type
that hasn't any sign but is not an unsigned type.

boolean is not a type in C. You could say "bool (the Boolean type)" or
you could use the real type name _Bool, ugly though it is. Also, _Bool
*is* an unsigned type.

Technically, char is not listed as one of the "standard integer types".
"signed char" is one and so is the unsigned type that corresponds to it,
but char is a third type different to these two. It is included in the
group called "integer types". The problem is that the term "standard
integer type" is used to distinguish these types form "extended integer
types". If you dropped the word "standard" from that paragraph, there
would be not possibility of confusion.
 
J

jacob navia

Le 23/05/11 00:03, Ben Bacarisse a écrit :
Form a style point of view, this listing of terms followed by
definitions (your paragraphs below all start with some category of type)
is not that helpful. Often the definition of a term is the least
interesting thing about it. That's particularly true of incomplete
types. I'd explain them by stating how they often occur:

Of course I go later into each type with a lot of detail. This is
just the introduction to types, the "big picgture". There is actually
a picture that shows all the relationships as I described.
I don't think bit fields are really types. They have a type, but you
can't declare something to have a bit-field type. I'd leave them out
altogether -- I think they are better explained as a special kind of
structure members.

I have them in because they were in the classification that I used as
base by Plauger and Brody.
boolean is not a type in C. You could say "bool (the Boolean type)" or
you could use the real type name _Bool, ugly though it is. Also, _Bool
*is* an unsigned type.

Yes, I corrected both counts. bool was added to the unsigned types.
Technically, char is not listed as one of the "standard integer types".
"signed char" is one and so is the unsigned type that corresponds to it,
but char is a third type different to these two. It is included in the
group called "integer types". The problem is that the term "standard
integer type" is used to distinguish these types form "extended integer
types". If you dropped the word "standard" from that paragraph, there
would be not possibility of confusion.

I added:

<quote>
The char type has not only signed and unsigned flavors but in some more
esoteric classifications has a third flavor "plain char", different as a
type from an unsigned or a signed char. We will not go into that
hair splitting here.
<end quote>

I think this gives the reader a glimpse into the underlying complexities
without really getting bogged down...

True "hair splitting" is maybe exaggerated but it conveys my feeling
about that whole issue :)


Thanks for your help
 
K

Keith Thompson

jacob navia said:
I added:

<quote>
The char type has not only signed and unsigned flavors but in some more
esoteric classifications has a third flavor "plain char", different as a
type from an unsigned or a signed char. We will not go into that
hair splitting here.
<end quote>

I think this gives the reader a glimpse into the underlying complexities
without really getting bogged down...

True "hair splitting" is maybe exaggerated but it conveys my feeling
about that whole issue :)

Plain char really is a disinct time from both signed char and
unsigned char, just as int and long are distinct types even if they
happen to have exactly the same representation. IMHO you're not
serving your readers well by suggesting that this distinction is
"esoteric" and "hair splitting".

The phrase "in some more esoteric classifications" could lead some
readers to think that they should be using signed char or unsigned
char, and that plain char should be avoided. If I didn't know C,
I might even assume that most compilers don't support plain char.

Yes, it does seem odd that plain char has exactly the same
representation as either signed char or unsigned char, but that
it's a distinct type. But the alternative would be that plain char
is the same type as either signed char or unsigned char -- which
would make some programs either valid or uncompilable, depending
on the implementation. Would you rather be writing a tutorial for
that hypothetical language?

And I think the reason it seems strange is the misperception that
type is *just* about representation. It isn't. C is less strongly
typed than some languages, but it still has the concept tha two types
can be distinct even if one happens to be an exact copy of the other.
That might be something to explore in another part of your tutorial
(probably not the introduction).

Teach the language as it is, not as you wish it were.
 
S

Seebs

The phrase "in some more esoteric classifications" could lead some
readers to think that they should be using signed char or unsigned
char, and that plain char should be avoided. If I didn't know C,
I might even assume that most compilers don't support plain char.

Hmm.

This whole thread reminds me: I wrote an article series on the C type
system long ago for IBM developerWorks, and I bet it's got errors that
some of the clever folks here could catch.

http://www.ibm.com/developerworks/power/library/pa-ctypes1/index.html
http://www.ibm.com/developerworks/power/library/pa-ctypes2/index.html
http://www.ibm.com/developerworks/power/library/pa-ctypes3/index.html
http://www.ibm.com/developerworks/power/library/pa-ctypes4/index.html

-s
 
J

John Gordon

In said:
This whole thread reminds me: I wrote an article series on the C type
system long ago for IBM developerWorks, and I bet it's got errors that
some of the clever folks here could catch.

Not really an error, but is notation such as /byte/ meant to be italic
text?
 
K

Keith Thompson

John Gordon said:
Not really an error, but is notation such as /byte/ meant to be italic
text?

Without having looked at the web pages (yet), I would assume so, because
"byte" is defined by the C standard.
 
K

Keith Thompson

Keith Thompson said:
Without having looked at the web pages (yet), I would assume so, because
"byte" is defined by the C standard.

And now I see that I completely misunderstood your question.

The word "byte" appears on the web page surrounded by '/' characters;
yes, it should be in italics without the '/' characters.

When my newsreader saw "/byte/" in John's article, it helpfully tried to
display it in italics (which my terminal emulator doesn't support) and
dropped the '/' characters. I'll need to figure out how to tell it not
to do that. (Something to do with gnus-emphasis-alist, I think.)
 
J

jacob navia

Le 23/05/11 19:19, Seebs a écrit :
Hmm.

This whole thread reminds me: I wrote an article series on the C type
system long ago for IBM developerWorks, and I bet it's got errors that
some of the clever folks here could catch.

http://www.ibm.com/developerworks/power/library/pa-ctypes1/index.html
http://www.ibm.com/developerworks/power/library/pa-ctypes2/index.html
http://www.ibm.com/developerworks/power/library/pa-ctypes3/index.html
http://www.ibm.com/developerworks/power/library/pa-ctypes4/index.html

-s

Interesting. What this discussion is concerned, you define

<quote>
The type of an object in C describes the way in which a chunk of memory
is associated with a value. In many cases, a type describes a computer's
native ways of representing values. For instance, on a typical UNIX®
system, declaring a variable as int will reserve enough storage to hold
a single processor register. This variable will be manipulated by
processor-native instructions, and will have the semantics native to the
processor.
<end quote>

I used a more functional definition of type. The equivalent sentence in
my book is:

<quote>
\subsection{What is a type?}
A first tentative, definition for what a type is, could be “a type is a
definition of an algorithm for understanding a sequence of storage
bits”. It gives the meaning of the data stored in memory. If we say
that the object a is an int, it means that the bits stored at that
location are to be understood as a natural number that is built by
consecutive additions of powers of two as specified by its bits.
If we say that the type of a is a double, it means that the bits are to
be understood as the IEEE 754 (for instance) standard sequences of bits
representing a double precision floating point value.

A second, more refined definition would encompass the first but add the
notion of "concept" behind a type. For instance in some machines
the type "size_t" has exactly the same bits as an unsigned long, yet,
it is a different type. The difference is that we store sizes in
size_t objects, and not some arbitrary integer. The type is associated
with the concept of size. We use types to convey a concept to the
reader of the program.

The base of C's type hierarchy are the machine types, i.e. the types
that the integrated circuit understands. C has abstracted from the
myriad of machine types some types like 'int' or 'double' that are
almost universally present in all processors.

There are many machine types that C doesn't natively support, for
instance some processors support BCD coded data but that data is
accessible only through special libraries in C.

C makes an abstraction of the many machine types present in many
processors, selecting only some of them and ignoring others.

It can be argued why a type makes its way into the language and why
another doesn't. For instance the most universal type always present in
all binary machines is the boolean type (one or zero). Still, it was
ignored by the language until the C99 standard incorporated it as a
native type.
<end quote>

I think that both paragraphs convey more or less the same thing,
even if I would say you could have expanded a bit the definition part.

Afterwards, most of what you say is similar to what I have written,
but obviously in 4 articles you have much less space than me in a
book with not any hard page limit.

jacob
 
K

Keith Thompson

jacob navia said:
A second, more refined definition would encompass the first but add the
notion of "concept" behind a type. For instance in some machines
the type "size_t" has exactly the same bits as an unsigned long, yet,
it is a different type. The difference is that we store sizes in
size_t objects, and not some arbitrary integer. The type is associated
with the concept of size. We use types to convey a concept to the
reader of the program.
[...]

Just a quibble: size_t may be exactly the same type as unsigned long,
since a typedef merely creates an alias for an existing type, not
a new and distinct type. plain char and signed char (for example)
are distinct in a way that size_t and unsigned long (for example)
are not.

On the system I'm currently using, size_t happens to be a typedef for
unsigned int. Given the following program:

#include <stddef.h>
int main(void)
{
size_t *size_t_ptr = NULL;
unsigned int *unsigned_int_ptr = NULL;
unsigned long *unsigned_long_ptr = NULL;

signed char *signed_char_ptr = NULL;
unsigned char *unsigned_char_ptr = NULL;
char *plain_char_ptr = NULL;

size_t_ptr = unsigned_int_ptr; /* line 12 */
size_t_ptr = unsigned_long_ptr; /* line 13 */

plain_char_ptr = signed_char_ptr; /* line 15 */
plain_char_ptr = unsigned_char_ptr; /* line 16 */

return 0;
}

gcc (with the proper options) warns about lines 13, 15, and 16.
Any conforming compiler must diagnose lines 15 and 16, and will
probably diagnose exactly one of lines 12 and 13 (it could diagnose
both if, for example, size_t is unsigned long long).

Of course size_t should be thought of as something that's quite
distinct from any predefined integer type, but the thing that's
distinct is something other than a C type. (IMHO this is a weakness
of C's type system.)
 
S

Seebs

Just a quibble: size_t may be exactly the same type as unsigned long,
since a typedef merely creates an alias for an existing type, not
a new and distinct type. plain char and signed char (for example)
are distinct in a way that size_t and unsigned long (for example)
are not.

True. Although, possibly confusingly, there are places where it seems
that the thing a typedef creates is itself then called "a type". The
wording is a bit fuzzy.

We all know what we mean, I think. You could say it several ways:

Typedef does not actually create types; instead, it creates
names which refer to existing types.

Typedef creates types, but those types are identical to existing
types and can be used interchangably with them.

These appear to me to be essentially equivalent. I tend to think of
the created name as being "a type" because my map has "type" connecting
to "the mapping from a name to a representation and storage thing", but
of course this isn't quite right, because of the plain char and (either
signed or unsigned char) equivalence; it's the same representation and
such as one of them, but it's not the same type as either.

I'm aware this isn't quite how it works, but consider this point you
made:
Of course size_t should be thought of as something that's quite
distinct from any predefined integer type, but the thing that's
distinct is something other than a C type. (IMHO this is a weakness
of C's type system.)

Yes.

I would love to know how much code would ACTUALLY break if typedef
created "types". Such that, say:
typedef int a;
typedef int b;
a foo;
b *bar = &foo;
was a constraint violation.

My guess is it would work well.

And that's why I tend to treat C *as if* that were true, because the only
code which I regard as "broken" which isn't broken according to the C type
system is generally bad code, and it prevents me from doing stupid things
as much.

-s
 
K

Keith Thompson

Seebs said:
True. Although, possibly confusingly, there are places where it seems
that the thing a typedef creates is itself then called "a type". The
wording is a bit fuzzy.

(Didn't we already have this discussion?)
We all know what we mean, I think. You could say it several ways:

Typedef does not actually create types; instead, it creates
names which refer to existing types.

That's my preference.
Typedef creates types, but those types are identical to existing
types and can be used interchangably with them.

I dislike that description.

Given:

typedef unsigned int word;

there is only one *type* being considered; it has two *names*, "unsigned
int" and "word". "unsigned int" is not itself a type; it's a sequence
of keywords that is the *name* of a type.

And using the idea of "identical types" makes it difficult to discuss
the difference between the relationship of size_t vs. unsigned int
(same type) and the relationship of char vs. signed char (identical
but distinct types).
These appear to me to be essentially equivalent. I tend to think of
the created name as being "a type" because my map has "type" connecting
to "the mapping from a name to a representation and storage thing", but
of course this isn't quite right, because of the plain char and (either
signed or unsigned char) equivalence; it's the same representation and
such as one of them, but it's not the same type as either.

I'm aware this isn't quite how it works, but consider this point you
made:


Yes.

And the "thing that's distinct", I now realize, is simply the *name*.
I would love to know how much code would ACTUALLY break if typedef
created "types". Such that, say:
typedef int a;
typedef int b;
a foo;
b *bar = &foo;
was a constraint violation.

My guess is it would work well.

My guess is that it would break a lot of code. It would be a good
thing, IMHO, if size_t were distinct from any predefined type (which
would break only code that deserves to be broken).
And that's why I tend to treat C *as if* that were true, because the only
code which I regard as "broken" which isn't broken according to the C type
system is generally bad code, and it prevents me from doing stupid things
as much.

Typedefs are used in different ways. Sometimes a typedef creates a name
for a type, when the type being named may vary from one system to
another, or from one version of the program to another (size_t, word).
And sometimes a typedef just creates a name for a type because you don't
like the existing name, but the type being named will never change.

Some examples of the latter:

typedef unsigned long ulong; /* I don't do this myself */
typedef unsigned char byte;

typedef struct node {
struct node *next;
int data;
} node;

Note that if "struct node" and "node" were two distinct types, this
wouldn't work.

I would have liked to see one mechanism that creates a new name for a
type, and another that creates a new distinct type with the same
characteristics as an existing type. (Ada has this, "subtype"
vs. "derived type".)

On the other hand, C does have a mechanism for creating new types, the
"struct" keyword.
 
J

jacob navia

Le 23/05/11 21:45, Seebs a écrit :
True. Although, possibly confusingly, there are places where it seems
that the thing a typedef creates is itself then called "a type". The
wording is a bit fuzzy.

We all know what we mean, I think. You could say it several ways:

Typedef does not actually create types; instead, it creates
names which refer to existing types.

Typedef creates types, but those types are identical to existing
types and can be used interchangably with them.

These appear to me to be essentially equivalent.


I used size_t since structures and other definitions of new types
are described later. Size_t has the good quality of being immediately
comprehensible (size type) for a newcomer into the language...

What bothers me is that I get never a discussion about the main
argumentation of my posts but always

"Just a quibble" and the we start discussing about small details
that are just that: details

What do you think of the two definitions
(1) Functional (a type is an algorithm description)
(2) Conceptual (A type embodies a concept in the program)

And I think I am missing one:

(3) A type implies a set of operations allowed with it.

This would be different than (1) since (1) refers to calculating
the "value" stored in a type.

I am sure (3) is missing but can't really identify it yet as
definitely distinct of (1).
 
J

jacob navia

Le 23/05/11 22:03, Keith Thompson a écrit :
That's my preference.

You just do not understand Thompson. size_t will be identical to
unsigned int in your machine but in my Mac OSX it is identical to
unsigned long long. But it is the same type: size_t.

size_t is an *abstraction* of the underlying type, it is NOT
the underlying type even if you say that size_t is unsigned int
in YOUR machine. The goal of size_t is to HIDE the underlying type
not just to name an underlying type with another name.
 
K

Keith Thompson

jacob navia said:
What bothers me is that I get never a discussion about the main
argumentation of my posts but always

"Just a quibble" and the we start discussing about small details
that are just that: details

That happens to be what I noticed. Should I not have bothered to
mention it? (Perhaps I shouldn't; I can't help noticing that you
rarely respond to me directly.)
 
J

jacob navia

Le 23/05/11 22:27, Keith Thompson a écrit :
That happens to be what I noticed. Should I not have bothered to
mention it? (Perhaps I shouldn't; I can't help noticing that you
rarely respond to me directly.)

The idea that you could discuss the main argumentation
of my posts just doesn't even occur to you...

And you CONFIRM what I say by snipping the MAIN part of my post:

What about the functional definition of a type?

YOU SNIPPED IT AGAIN!!!

Incredible :)
 
U

Uncle Steve

True. Although, possibly confusingly, there are places where it seems
that the thing a typedef creates is itself then called "a type". The
wording is a bit fuzzy.

We all know what we mean, I think. You could say it several ways:

Typedef does not actually create types; instead, it creates
names which refer to existing types.

Typedef creates types, but those types are identical to existing
types and can be used interchangably with them.

How about the idea that typedefs create aliases for the basic built-in
types supplied by the language spec.?

[snip]
And that's why I tend to treat C *as if* that were true, because the only
code which I regard as "broken" which isn't broken according to the C type
system is generally bad code, and it prevents me from doing stupid things
as much.

A language that protects you from doing dumb things is good, but
flexibility is sacrificed as a consequence. How much of an issue this
is for you depends on what your expectations are in regards to type
enforcement. I wonder if Perl or Python could do the same integer
pointer cast as I use to account for the free list in the allocator
code I posted previously. My suspicion is that it would be fairly
difficult to do in those languages.

Worse, comp-sci profs (especially Pascal weenies) would hate it, I
suspect, and would give a failing grade if anyone ever did something
like that in their class. IMO, C is properly thought of as portable
assembler, with the caveat that it gives you enough rope to be as HLL
as you want it to be.



Regards,

Uncle Steve
 
I

Ian Collins

How about the idea that typedefs create aliases for the basic built-in
types supplied by the language spec.?

A typedef may provide an alias to any existing type, built-in, user
defined or even incomplete.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,928
Messages
2,570,068
Members
46,513
Latest member
JacklynMcC

Latest Threads

Top