usage of size_t

Richard Bos · Feb 23, 2010

Malcolm McLean said:
size_t is the only type that is guaranteed to be able to index any
array. So if the number of elements is arbitrary, it's the only
correct type to use.
The problem is that very few people actually do so. So we've got a
very undesireable situation.

The problem, however, is not with size_t. It is with people who do not
want to use size_t.

Or do you also blame the large number of drunk drivers on the law
against drunk driving?

Richard

Keith Thompson · Feb 23, 2010

...because this is nonsense. There have been _many_ situations in which
sizeof(void *) != sizeof (size_t) != sizeof (int).

I don't think I've ever used a system where sizeof(void*) != sizeof(size_t).

That's not to say that such systems don't exist, of course, but on
most modern systems with a linear monolithic address space, it makes
sense for void* and size_t to be the same size.

Seebs · Feb 23, 2010

And yet, I would prefer a novel written in English for literate readers
_not_ to eschew idioms. You should compare C to a Shaw play or a book by
Joyce. Do not write C as if you are Dr. Seuss - that's what BASIC is
for.

I have to take some exception to this, because Dr. Seuss was actually an
extremely skilled writer of English, even though many of his books don't
make this obvious to casual observation.

.... But the point is still valid. Idiomatic writing is used because it is
clearer and more communicative, and yes, that does impose the cost of learning
the idioms on the reader. It's still worth it.

-s

Peter Nilsson · Feb 23, 2010

Keith Thompson said:
I don't think I've ever used a system where sizeof(void*)
!= sizeof(size_t).

I think it's more common for size_t to match unsigned long,
rather than being dependant on the size of void *. N1256 has
a 'recommended practice'...

"The types used for size_t and ptrdiff_t should not
have an integer conversion rank greater than that of
signed long int unless the implementation supports
objects large enough to make this necessary."

There must be 64-bit systems where it's possible, but
not _necessary_, to make unsigned long and size_t larger
than 32-bit.

That's not to say that such systems don't exist, of course,

Early 68k based Macs were capable of addressing 16M, but most
applications still had to fit into 32K, or 32K chunks stored
in the resource fork of the application. Many applications
were actually limited to using less than 32K in total. So it
wouldn't surprise me if there were some early mac C
implementations where pointers were 32-bit, but size_t and
int were only 16-bit due to the relative cost of 32-bit
operations and storage. [The 68k processor had separate
data and address registers. Even though they were all 32-bit,
16-bit operations on data registers were quicker than 32-bit
ones.]

but on most modern systems with a linear monolithic address
space, it makes sense for void* and size_t to be the same
size.

Why? Serious question! It's a very common assumption, but
not one that's guaranteed by the standard. Size_t is only
required to be able to store the size of one object (more
precisely the result of sizeof.) It isn't required to be
large enough to store the combined size of all objects.

Recall the calloc kerfuffle and the possibility of creating
objects too big to fit in a size_t! Whilst I think that was
ruled out, the question remains as to whether C allows a
program to allocate more combined space than will fit in
a size_t.

Keith Thompson · Feb 23, 2010

Peter Nilsson said:
I think it's more common for size_t to match unsigned long,
rather than being dependant on the size of void *.

For what it's worth (which isn't a whole lot), every system I've
checked has size_t, unsigned long, and void* all the same size.

N1256 has
a 'recommended practice'...

"The types used for size_t and ptrdiff_t should not
have an integer conversion rank greater than that of
signed long int unless the implementation supports
objects large enough to make this necessary."

I've never seen a system that violates this recommendation.

There must be 64-bit systems where it's possible, but
not _necessary_, to make unsigned long and size_t larger
than 32-bit.

Sure, but all the 64-bit systems I've seen have 64-bit unsigned long.
(I vaguely recall that 64-bit Windows has 32-bit unsigned long; I
don't know what it uses for size_t.)

That's not to say that such systems don't exist, of course,

Click to expand...

Early 68k based Macs were capable of addressing 16M, but most
applications still had to fit into 32K, or 32K chunks stored
in the resource fork of the application. Many applications
were actually limited to using less than 32K in total. So it
wouldn't surprise me if there were some early mac C
implementations where pointers were 32-bit, but size_t and
int were only 16-bit due to the relative cost of 32-bit
operations and storage. [The 68k processor had separate
data and address registers. Even though they were all 32-bit,
16-bit operations on data registers were quicker than 32-bit
ones.]

Sure, if the maximum size of a single object is smaller than the
total addressing space (e.g., because an object must fit into a
single memory segment), it makes sense for size_t to be smaller
than void*.

Why? Serious question! It's a very common assumption, but
not one that's guaranteed by the standard. Size_t is only
required to be able to store the size of one object (more
precisely the result of sizeof.) It isn't required to be
large enough to store the combined size of all objects.

Absolutely. But most modern systems (at least the ones I've been
exposed to) have a monolithic linear address space, where the size of
a single object, at least in principle, has the same upper bound as
the size of all of memory. Smaller limits might be imposed by the
operating system, but those limits aren't typically enforced by the
compiler by making size_t smaller than void*.

Recall the calloc kerfuffle and the possibility of creating
objects too big to fit in a size_t! Whilst I think that was
ruled out, the question remains as to whether C allows a
program to allocate more combined space than will fit in
a size_t.

I suspect that's not quite what you meant to say.

There's some question whether a single object can be bigger than
SIZE_MAX bytes, but I'm quite sure that the language doesn't require
the total size of all objects to be no bigger than INT_MAX.

For example, on system with 64-bit void* and 32-bit size_t (and
sufficient resources), a single object arguably couldn't be bigger
than 4 gigabytes, but you could have 1000 distinct 1-gigabyte objects.

Keith Thompson · Feb 24, 2010

Richard Heathfield said:
Keith Thompson wrote:

So you never used C under MS-DOS?
Nope.

Lucky fellow!

Yup.

Nick Keighley · Feb 24, 2010

I'm *reasonably* sure he was joking - yanking Richard NoName
MyHammerIsADebuggerAndEveryProblemIsANail Riley's chain a little.

I didn't think Bill was that witty.

I must confess I was expecting a response from /a/ Richard, just not
from you!

Nick Keighley · Feb 24, 2010

And yet, I would prefer a novel written in English for literate readers
_not_ to eschew idioms. You should compare C to a Shaw play or a book by
Joyce.

‘Sir Tristram, violer d’amores, fr’over the short sea, has passencore
rearrived from North Armorica on this side the scraggy isthmus of
Europe Minor to wielderfight his penisolate war; nor had topsawyer’s
rocks by the stream Oconee exaggerated themselse to Laurens County’s
giorgios while the went doubling their mumper all the time’

Do not write C as if you are Dr. Seuss - that's what BASIC is for.

I would not, could not, in a box.
I could not, would not, with a fox.
I will not eat them with a mouse.
I will not eat them in a house.
I will not eat them here or there.
I will not eat them anywhere.
I do not eat green eggs and ham.
I do not like them, Sam-I-am.

no contest really...

Nick Keighley · Feb 24, 2010

Again, not "like mine", but "like the vast majority I have seen which was
produced by skilled C programmers".

If someone is violating common conventions, they either have a reason for
doing so, or they're simply so unfamiliar with what the rest of the world
has been doing for the last couple of decades that they may as well be a
neophyte.

We had one particular coder around here for a while who persisted in
using such bizarre macros and the like that his code was virtually
unreadable. Was it good code? Was it safe, reliable, usable code?

Maybe it was, but his choice to make it _appear_ as something else meant
few cared to bother. If one chooses to make code appear bad or
amateurish or otherwise undesirable, why blame the reder for rejecting it?

you are comparing oranges with orchards.

count = count + 1;

is not obscure code.

io_x style

#define B {
#define P printf

is obscure code

Ed Vogel · Feb 24, 2010

Keith Thompson said:
I don't think I've ever used a system where sizeof(void*) !=
sizeof(size_t).

That's not to say that such systems don't exist, of course, but on
most modern systems with a linear monolithic address space, it makes
sense for void* and size_t to be the same size.

I worked on a C compiler for OpenVMS. On that system size_t is
always 32-bits. By default., pointers were 32-bits, but one could
compile
(or use a #pragma) to make the size of pointers 64-bits. In that mode
sizeof(void *) != sizeof(size_t).

Ed Vogel

Malcolm McLean · Feb 24, 2010

I worked on a C compiler for OpenVMS. On that system size_t is
always 32-bits. By default., pointers were 32-bits, but one could
compile
(or use a #pragma) to make the size of pointers 64-bits. In that mode
sizeof(void *) != sizeof(size_t).

People aren't going to be using 64 bits for long before they start
asking for objects greater than 4GB.
Your compiler is an intermediate step along the way.

Ersek, Laszlo · Feb 24, 2010

I worked on a C compiler for OpenVMS. On that system size_t is
always 32-bits. By default., pointers were 32-bits, but one could compile
(or use a #pragma) to make the size of pointers 64-bits. In that mode
sizeof(void *) != sizeof(size_t).

Aah, great!

ludens$ cc /version

HP C V7.1-015 on OpenVMS Alpha V8.3

ludens$ help cc /pointer_size

CC

/POINTER_SIZE

/POINTER_SIZE=option
/NOPOINTER_SIZE (D)

Controls whether or not pointer-size features are enabled, and
whether pointers are 32 bits or 64 bits long.

The default is /NOPOINTER_SIZE, which disables pointer-size
features, such as the ability to use #pragma pointer_size, and
directs the compiler to assume that all pointers are 32-bit
pointers. This default represents no change over previous versions
of the compiler.

You must specify one of the following options:

SHORT The compiler assumes 32-bit pointers.

32 Same as SHORT.

LONG The compiler assumes 64-bit pointers.

64 Same as LONG.

Specifying /POINTER_SIZE=32 directs the compiler to assume that all
pointers are 32-bit pointers. But unlike the default of
/NOPOINTER_SIZE, /POINTER_SIZE=32 enables use of the #pragma
pointer_size long and #pragma pointer_size short preprocessor
directives to control pointer size throughout your program.

Specifying /POINTER_SIZE=64 directs the compiler to assume that all
pointers are 64-bit pointers, and also enables use of the #pragma
pointer_size directives.

ludens$ type siz.c

#include <stdio.h>
#include <stdlib.h>

int
main(void)
{
/* sorry for the stupid indentation */
return 0 <= fprintf(stdout, "%u %u\n", (unsigned)sizeof(void *),
(unsigned)sizeof(size_t)) && 0 == fflush(stdout) ? EXIT_SUCCESS
: EXIT_FAILURE;
}

ludens$ cc /standard=ansi89 /pointer_size=32 siz.c
ludens$ link siz.obj
ludens$ run siz
4 4

ludens$ cc /standard=ansi89 /pointer_size=64 siz.c
ludens$ link siz.obj
ludens$ run siz
8 4

The results are the same with /standard=c99.

Cheers,
lacos

Ersek, Laszlo · Feb 24, 2010

"Ed Vogel" <edward.vogel@hp_stopping_spam.com> writes:

^^ ^^^^

HP C V7.1-015 on OpenVMS Alpha V8.3

So you worked on *the* C compiler for OpenVMS, then; I notice.

Thanks,
lacos

spinoza1111 · Feb 24, 2010

There isn't enough time in the world to give every piece of code the level
of review you'd give to something you knew was written by, say,Nilges,
or Bill Cunningham.

In practice, heuristics are an EXTREMELY effective way to allocate scarce
resources. The heuristic that certain kinds of quirky writing are a red
flag that the rest of the code will likely contain weirdness, errors, or
things that need careful re-reading to comprehend them, turns out to be
stunningly effective.

Dear little Peter, we know that your brain and your null academic
preparation for your chosen trade comprise scarce resources. However,
we also know that it is in fact a mark of incompetence and unreason to
judge based on "shibboleth", where the pronunciation of the Semitic
word for "ear of corn" as a way for Hebrews to quicky judge friend
from foe.

But, I'm well aware that more and more programmers do this, because in
order to secure a cowed and complaisant work force to merely maintain
a cost center, companies will deliberately hire psychology majors such
as you, since you can be trusted to be a company man, and to take your
anger out on people you think constitute safe targets.

You have no independent body of knowledge, where break room saws and
making excuses for bugs constitute a language game, not knowledge...as
does tearing other people down to build yourself up.

It's nearly always beneficial. However, heuristics aren't dogma; they're just
a first pass to quickly spot cases where it's likely to be necessary to spend
extra time studying some code.

But let's take a look at your "heuristics". For example, you get all
flustered if you see Hungarian notation despite the fact that REAL
Hungarian notation, which was stolen and distorted by Charles Szymonyi
from IBM praxis invented in 1960, and you make a judgement, in fact,
not about code, but about a person: this isn't good methodology: it is
the ad hominem fallacy.

spinoza1111 · Feb 24, 2010

I have to take some exception to this, because Dr. Seuss was actually an
extremely skilled writer of English, even though many of his books don't
make this obvious to casual observation.

... But the point is still valid. Idiomatic writing is used because it is
clearer and more communicative, and yes, that does impose the cost of learning
the idioms on the reader. It's still worth it.

But whose idioms shall we use? Hint: I wouldn't use yours, since you
simply don't appear to be a qualified programmer.

spinoza1111 · Feb 24, 2010

[snips]

count = count + 1;

Click to expand...

is not obscure code.

Click to expand...

When the construct used in virtually every piece of C code one runs
across reads "count++", where "count++" is such a common idiom that
avoiding its use suggests there is some reason (either neophyte status,
or something less obvious) for doing so, then yes, lacking comments
explaining precisely _why_ such a screwball construct is being used, the
result _is_ obscure code.

Without additional explanation (eg "Imported from MatLab, which uses this
sort of construct") there is no readily apparent reason for using such a
construct. If we assume the coder is not a neophyte, it then follows he
is using this screwball notation for a specific purpose, which implies
there is some behaviour involved which shows up in "count = count + 1"
but _does not_ show up in "count++".

Which means now we have do scratch our heads, go running for the standard
(and the compiler documentation), check "count" to see if there's some
special magic associated with it, and try to figure out _what_ the
different behaviour is that's being relied upon.

When the search fails (assuming it does, i.e. we find no special magic)
we're left not with confidence the construct works as we'd expect, but
rather the uneasy feeling it is relying on some bizarre behaviour, quite
possibly of an implementation-specific optimizer, or some equivalent,
which we'll never be able to fully understand, let alone rely upon. The
code, as a result, simply cannot be trusted.

There are languages in which "count = count + 1" are common idiom. To
people used to those languages, such a construct may be clear and
concise. C is not one of those languages.

Indeed, the very fact this has engendered a discussion as involved as
this should be sufficient to show that such constructs are _not_ trusted
by C coders, but _are_ treated as flags suggesting extreme review is
warranted.

Actually, there are good reasons for dropping all use of pre and post
increment in favor of normal expressions:

* C "standardization" has considerably messed up the evaluation order
of these constructs: in a striking reversal of the intentions of most
standardization efforts, the standards geeks have gravely pronounced
that its evaluation order is Heisenbergian uncertain

* They only apply to lValues which makes them non-orthogonal to most
other operators

* A modern compiler will generate the same code for count++ as for
count = count + 1

* A modern optimizing compiler will override the scheduling that the
programmer attempts in if (a++) foo, compiling this and "if (a) foo; a
= a+1" in the same optimized way.

The problem is what "technological anthropologist" Diane Vaughan calls
"normalized deviance". She developed this anthropological construct to
explain why an all-male team at NASA subcontractor was pressured to
approve the January 1986 launch of the Challenger Space Shuttle (the
"teacher in space" mission which exploded shortly after leaving the
launch pad).

Vaughan, the author of "The Challenger Launch Decision" (Univ of Chi
1999) realized that in male workgroups, a "macho" attitude causes
technical men to decide collectively to abandon good practice and to
mistreat dissidents.

This phenomenon seems to me well advanced in C. C was developed in a
deviant fashion: an adolescent prank intended to show the "grownups"
of the Multics (PL/I based) project that bearded hippy weirdos could
program better than grey flannel suits. For this reason, and because
Kernighan and Pike had no visible way of demonstrating their
superiority except by an easily measured development time, C was from
the start a mishmash of ill-digested notions.

These included the preprocessor, a macro facility developed when the
serious problems of such processing were becoming evident in macro
assemblers: the pre and post increment operators which were (I
believe) merely implemented to use attractive machine instructions:
and above all, the unspeakably amateurish choice of Nul to terminate
strings, which permanently deprived a common and useful character of
membership in strings.

The psychology was adolescent in contrast to the mature and adult
effort to develop Algol on the part of serious scientists, and it was
funded solely by the monopolistic market failure that was the Bell
system of the time.

However, the fact that Kernighan and Pike got away with this nonsense,
and the even bigger nonsense of unix, normalized deviance into mythos.
The real contributions to the advance of software of a slightly older
generation were stolen by irresponsible adolescents and today, this
has created the sort of antics that occur on this newsgroup, including
Seebach's resentment-based analysis of code, and his substitution of
personal hatred for science.

Malcolm McLean · Feb 24, 2010

Actually, there are good reasons for dropping all use of pre and post
increment in favor of normal expressions:

* A modern compiler will generate the same code for count++ as for
count = count + 1

That's a good reason for dropping the special operators.
However count = count + 1 is a construct needed so frequently that it
helps to have a special syntax for it.
In my idiom x = x + 1 and x++ are not equivalent. x++ is increemnting,
x = x+1 is adding a constant which, coincidentally, is unity.
An example is scores for matches in association football. The old rule
was 2 points for a win, one for a draw, zero for a loss. Now the rule
is 3 points for a win, one for a draw, zero for a loss. However they
could easily decide that the win premium is too great and make it five
for a win, three for a draw. Using the ++ operator to increment the
total for a draw wuld be inappropriate.

Richard Bos · Feb 24, 2010

Keith Thompson said:
I don't think I've ever used a system where sizeof(void*) != sizeof(size_t).

Yes, but you are on record as never having had to contend with tiny,
small, medium, compact, large and huge memory models. Things could
get... interesting.

Richard

Ed Vogel · Feb 24, 2010

Ersek said:
So you worked on *the* C compiler for OpenVMS, then; I notice.

For *many* years.

Ed Vogel

Ed Vogel · Feb 24, 2010

Malcolm McLean said:
People aren't going to be using 64 bits for long before they start
asking for objects greater than 4GB.

Some have asked for it, most have not.
There are ways to get/access objects > 4GB, but one needs to
be careful.

Ed Vogel

size_t, ssize_t and ptrdiff_t	56	Oct 12, 2013
size_t in inttypes.h	4	May 26, 2011
The problem with size_t	45	Oct 15, 2009
return -1 using size_t???	44	Feb 11, 2012
Plauger, size_t and ptrdiff_t	26	Feb 17, 2006
size_t and ptr_diff_t	9	Aug 23, 2007
size_t	18	Dec 6, 2004
finding max value of size_t	22	Mar 9, 2007

usage of size_t

Richard Bos

Keith Thompson

Seebs

Peter Nilsson

Keith Thompson

Keith Thompson

Nick Keighley

Nick Keighley

Nick Keighley

Ed Vogel

Malcolm McLean

Ersek, Laszlo

Ersek, Laszlo

spinoza1111

spinoza1111

spinoza1111

Malcolm McLean

Richard Bos

Ed Vogel

Ed Vogel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads