What's the deal with size_t?

Richard Heathfield · Nov 11, 2007

Malcolm McLean said:

The claim has been made that there is "no basis" to my claims.

As I understand it, the basis you advance for your claims is two-fold:

(a) the size_t name is ugly;
(b) the size_t type is unsigned.

I have already pointed out that I agree, pretty much, with (a), but that I
don't consider it to be a particularly persuasive or meritorious argument
for abandoning or deprecating size_t. It might make a reasonable argument
for suggesting a name change, although of course the chance of getting ISO
to agree on a name change is, in reality, zero.

The fact that size_t is unsigned fits naturally with its role (as
demonstrated by the cases in which the standard library uses it) as a way
for storing object sizes and object counts. A negative size for an object
is meaningless, as is a negative count of the number of objects (in the C
sense of the word). So I don't consider this argument to be particularly
persuasive or meritorious either, because the unsignedness of size_t is
perfectly natural and sensible, given the nature of its intended purpose.

To write

size_t i;

for(i=0;i<N;i++)
{
ptr++;
}

is misleading, because it implies that i is a "size type" when it is
nothing of the sort. It is an index.

It's an object count. It measures the distance, expressed in object units,
between the start of the array and the point in that array where can be
found the object that we care about. This is entirely consistent with the
usage of size_t in functions such as fread, fwrite, and calloc.

Don't you think that is a justification that has some merit?
No.

Or maybe you
are making the stronger claim that, if you personally disagree with a
proposal, all arguments for it are not just weaker than the arguments
against (which you must believe), but have "no merit".

Click to expand...

No. For example, I personally disagree with ISO's decision massively to
extend the standard library to include many types that are intended to
increase portability but whose successful use, as far as I understand it,
depends on the whims of implementors. Nevertheless, I would not say that
the arguments put forward for this change have no merit. I have seen
several defences of the decision, and it seemed to me that these arguments
did indeed have some merit. They did not persuade me, but they did at
least make me think hard about the issues, and that in itself is an
indication that they are not empty arguments.

I suspect this is
the case. "Has no merit" is Heathfieldese for "I disagree".

Click to expand...

In discussions in this newsgroup, I have disagreed with a great many
people, including Chris Torek, Steve Summit, Lawrence Kirby, Dan Pop, Dann
Corbit... names that will, I trust, be familiar to you. I think you will
agree that I have, on at the very least /some/ such occasions, managed to
disagree with them /without/ claiming that their arguments have no merit.
Therefore, your argument that "has no merit" is my way of saying "I
disagree" has no merit.

Malcolm McLean · Nov 11, 2007

Keith Thompson said:
I think fair people *are* saying that you make pure assertions, or
nearly so.

Ultimately you have to come to some sort of end, preferably with agreed
facts.

For instance one of my cases against size_t is that it is a major change to
the language, because most integers are ultimately used as indices. When
challenged on that I give a little bit of statistical evidence, but when
challenged on that evidence, I give up. There's no helping some people. The
assertion about indices is fairly easily verified - as long as you know what
you are talking about.

Go through a sample of C code, and count every instance of variables
declared as int, long, short, long long, arguably unsigned or signed char,
and derivatives or aliases of these types. Also count pointers such as int
*, int **, and the like. Then see how many times the variable, or the
variable pointed to, is used to ultimately derive index calculations. If it
is, score it as index / size_t, if not, score it as non-index, non-size_t.
Note that if we have

int add(int *x, int N)
{
int answer = 0;
for(i=0;i<N;i++)
answer += x;
return answer;
}

we'd have two index / size_t variables (N and i) and two non-indexes, x and
answer, unless the variables are used elsewhere in index calculations. If we
index into an array on the result, x and answer would need to be scored as
index.
You might say "what happens if add() is called twice, once to derive an
index once not?" You'll also note that it is possible to write code in such
a way as to scew the results. We could do while(N--) to get rid of an index
variable, for example. These aren't worth worrying about.

Assuming that all arrays can grow until the computer runs out of memory,
that tells you how many size_t's you need in the program. Of course that
assumption doesn't necessarily hold. But that's a slightly different
argument.

Now is that "pure assertion?" does it constitute a "near pure assertion?" or
is it actually a testable claim and a coherent argument?

Flash Gordon · Nov 11, 2007

Malcolm McLean wrote, On 11/11/07 21:38:

The "rather slow" ones think that no justification has been offered. As
I said, it is demand for more material, which is rather irritating since
I consider plenty to have been provided.

Many seem to disagree.

Those whio lack "sensitivity and insight" are those who firstly don't
simply accept that most integers are ultimately used for memory index
operations, or in intermediate calulations to derive such indices, and

Well, all of my managers and colleagues over the past 20+ years who have
expressed an opinion would disagree with any suggestion that I lack
"sensitivity and insight" in programming.

secondly don't see the force of a statistical proof when it is provided.

Well, you have not provided proof. One study designed for something
completely different with respect to a different language and that
ignores major application areas that C is used for does not prove
anything about how C is used. If you really understood statistics you
would understand why you have not proved your point.

This is no slowness, but it something that someone with no real feel for
computer programming would think.

Ah, this lack of feel for computer programming must explain me making a
successful career in SW development.

The "arrogance" refers to a philosophical fallacy, which I will call the
"debunking fallacy". This states that if some objection can be advanced
to a piece evidence, it disappears, it may no longer be used.

So what do you call your fallacy of assuming that any evidence against
your position is irrelevant?

Charlie Gordon · Nov 11, 2007

What is wrong with Keith's news server, I don't receive any of his posts
(neither free.fr, nor aioe.org carry them). Is this some sort of agency
mantated embargo, as the name suggests ?

Malcolm McLean · Nov 11, 2007

Richard Heathfield said:
Malcolm McLean said:

As I understand it, the basis you advance for your claims is two-fold:

(a) the size_t name is ugly;
(b) the size_t type is unsigned.

(c) is confusing - a bit like

typedef struct
{
double r;
double i;
} imaginary;

(d) is in fact a major change to the language, which wasn't appreciated at
the time it was implemented.

(e) has more theoretical than practical value - an unsigned type is only
needed to represent sizes of memory greater than half the address space,
which is a rather unusual need.

(f) won't in fact be adopted uniformly, leading to a legacy of broken code
when programs are proted from one platform to another.

(g) increases the type matrix by one.

However (a) is the major objection.

Richard Heathfield · Nov 11, 2007

Malcolm McLean said:

For instance one of my cases against size_t is that it is a major change
to the language,

That isn't an argument against size_t. Prototypes were a fairly major
change to the language, but I don't see you arguing against them.

because most integers are ultimately used as indices.

Are they? Let's find out.

Go through a sample of C code, and count every instance of variables
declared as int, long, short, long long, arguably unsigned or signed
char, and derivatives or aliases of these types. Also count pointers such
as int *, int **, and the like. Then see how many times the variable, or
the variable pointed to, is used to ultimately derive index calculations.

Okay, let's see. Time is short (or is it time_t?), so I just picked one C
file on my local disk, and went through it line by line. (My choice wasn't
random, but neither was it 'cooked'. I just listed my projects, and one of
them happened to catch my eye. So it was kinda-sorta random.)

Funnily enough, the code was 255 lines long - just enough to be indexed by
an unsigned char on my system - but I'll put that one down to coincidence.

Const qualifiers have been ignored. So have strings.

int rc: return code
size_t len: string length (i.e. count of char objects)
size_t longest: measure of longest string constructed (i.e. count of char
objects)
size_t maxlinelen: measure of longest line encountered (i.e. count of char
objects)
size_t n: line count
int first: flag

NONE of these objects is used as an index into an array.

int argc: argument count (should be size_t really, but ISO seems to
disagree)
int rc: return code

So - somewhat to my surprise, actually - *none* of the integer objects in
that program were used for array indexing.

Naturally, we must consider the possibility of a statistical blip. So I
guess I'd better do the whole darn exercise again with a different
program.

Okay, this one is 1156 lines long. I'm going to ignore all the struct
definitions (which contain many integer types, few of which are likely to
be used as indexes) because I haven't got all night.

int Status: return code
size_t ThisPattern: used as index
size_t len: line length (i.e. object count)
int Found: flag
size_t SpinnerControl: used as index
int LineCount: line count
size_t len: used as index (this is in a different function to the other
len)
size_t pattern: used as pointer offset, which we'll count as an index
int Status: return code
size_t ThisPattern: used as index
size_t len: used as pointer offset, i.e. index
int Found: flag
size_t wcount: word count
size_t width: keeps track of how much horizontal space an output line takes
up
int Hit: flag
int Status: return code
size_t ThisPattern: used as index
size_t len: current line length
size_t wcount: word count
size_t width: horizontal space tracker
size_t idx: used as index
size_t j: used as index
int done: flag
size_t curr: used as index
size_t i: used as index
size_t Size: tracks current buffer size
size_t BytesRead: tracks number of input bytes
int Status: return code
size_t pos: records position of a letter in the alphabet
size_t ThisEntry: used as index
size_t ThisByte: used as index
size_t Count: counter
size_t pos: records position of a letter in the alphabet
size_t Freq: used as index
size_t ch: used as index
size_t Start: tracks starting position
size_t End: tracks ending position
size_t RangeStart: tracks start of range
size_t RangeEnd: tracks end of range
size_t LineLength: tracks line length
size_t ch: used as index
size_t Start: tracks starting position
size_t End: tracks ending position
size_t RangeStart: tracks start of range
size_t RangeEnd: tracks end of range
size_t LineLength: tracks line length

Well, I'm bored silly, at considerably more than way through the source.
Enough data there, I think, so I will look no further. Let me just count
them up:

non-index: 30
index: 16

So your claim that "most integers are ultimately used as indices" doesn't
seem to hold water for this program, either.

Of course, I only checked around a thousand lines of source, maybe even
less, so the statistical significance of this result should not be
overplayed.

Now is that "pure assertion?" does it constitute a "near pure assertion?"
or is it actually a testable claim and a coherent argument?

Well, it's certainly a testable claim, although it doesn't seem to stand up
to much testing - but I'm not so sure that it's a coherent argument. Just
because integer objects are often used for indexing arrays (which is
hardly surprising, given that an array index /must/ be an integer value),
that doesn't mean size_t is a bad idea.

Keith Thompson · Nov 11, 2007

Charlie Gordon said:
What is wrong with Keith's news server, I don't receive any of his posts
(neither free.fr, nor aioe.org carry them). Is this some sort of agency
mantated embargo, as the name suggests ?

I don't think there's anything wrong with my news server (which is
"news-server.san.rr.com", provided by my ISP, Time Warner Cable's
Roadrunner service). At least some people are obviously seeing my
messages, and Google Groups shows four of my articls in this thread.

Perhaps someone or something has decided to block messages from
news-server.san.rr.com for some reason. Charlie, perhaps you could
check with the administrators of free.fr and aioe.org? If I'm being
blocked, it's always possible that somebody important is being blocked
too.

}

It also seemed odd that I didn't show up on the latest "Stats for
comp.lang.c (last 7 days)" post, for Nov 3-10. I don't think I've
been *that* quiet.

I'll e-mail this to Charlie as well as posting here.

Malcolm McLean · Nov 11, 2007

Richard Heathfield said:
Malcolm McLean said:

That isn't an argument against size_t. Prototypes were a fairly major
change to the language, but I don't see you arguing against them.

Are they? Let's find out.

Okay, let's see. Time is short (or is it time_t?), so I just picked one C
file on my local disk, and went through it line by line. (My choice wasn't
random, but neither was it 'cooked'. I just listed my projects, and one of
them happened to catch my eye. So it was kinda-sorta random.)

Funnily enough, the code was 255 lines long - just enough to be indexed by
an unsigned char on my system - but I'll put that one down to coincidence.

Const qualifiers have been ignored. So have strings.

int rc: return code
size_t len: string length (i.e. count of char objects)
size_t longest: measure of longest string constructed (i.e. count of char
objects)
size_t maxlinelen: measure of longest line encountered (i.e. count of char
objects)
size_t n: line count
int first: flag

NONE of these objects is used as an index into an array.

Or used to derive index calculations. Which almost certainly you are doing
with the string lengths. That's what I mean by "ultimately".

int argc: argument count (should be size_t really, but ISO seems to
disagree)
int rc: return code

So - somewhat to my surprise, actually - *none* of the integer objects in
that program were used for array indexing.

argc - used to derive indices for argv. This becomes obvious if you write
for(i=0;i<argc;i++)
printf("%s\n", argv);

if you say
if(argc == 3)
printf("%s %s\n", argv[1], argv[2]);
else
printf("must have 2 arguments\n");

you might want to argue that it isn't being used to derive the 1 and the 2
indices.

Naturally, we must consider the possibility of a statistical blip. So I
guess I'd better do the whole darn exercise again with a different
program.

Okay, this one is 1156 lines long. I'm going to ignore all the struct
definitions (which contain many integer types, few of which are likely to
be used as indexes) because I haven't got all night.

*int Status: return code
size_t ThisPattern: used as index
size_t len: line length (i.e. object count)
*int Found: flag
size_t SpinnerControl: used as index
int LineCount: line count
size_t len: used as index (this is in a different function to the other
len)
size_t pattern: used as pointer offset, which we'll count as an index
*int Status: return code
size_t ThisPattern: used as index
size_t len: used as pointer offset, i.e. index
*int Found: flag
size_t wcount: word count
size_t width: keeps track of how much horizontal space an output line
takes
up
*int Hit: flag
*int Status: return code
size_t ThisPattern: used as index
size_t len: current line length
size_t wcount: word count
? size_t width: horizontal space tracker
size_t idx: used as index
size_t j: used as index
* int done: flag
size_t curr: used as index
size_t i: used as index
size_t Size: tracks current buffer size
size_t BytesRead: tracks number of input bytes
* int Status: return code
size_t pos: records position of a letter in the alphabet
size_t ThisEntry: used as index
size_t ThisByte: used as index
* (?) size_t Count: counter
size_t pos: records position of a letter in the alphabet
size_t Freq: used as index
size_t ch: used as index
size_t Start: tracks starting position
size_t End: tracks ending position
size_t RangeStart: tracks start of range
size_t RangeEnd: tracks end of range
size_t LineLength: tracks line length
size_t ch: used as index
size_t Start: tracks starting position
size_t End: tracks ending position
size_t RangeStart: tracks start of range
size_t RangeEnd: tracks end of range
size_t LineLength: tracks line length

Well, I'm bored silly, at considerably more than way through the source.
Enough data there, I think, so I will look no further. Let me just count
them up:

non-index: 30
index: 16

So your claim that "most integers are ultimately used as indices" doesn't
seem to hold water for this program, either.

Click to expand...

Counts of things in memory are almost always used ultimately to index
things.

Of course, I only checked around a thousand lines of source, maybe even
less, so the statistical significance of this result should not be
overplayed.

Click to expand...

No, that should give you enough to go on. I've marked with an asterisk
everything I would say is not an index. Obviously I can't see your code, so
I've put in a few question marks, and the range trackers may not be used to
derive indices, but I find this very difficult to believe. So that makes
about 10 non-index variables.

Maybe a better of saying what I am getting at would be "given that every
array in this program could potentially take up all available memory, how
many of my variables need to be size_t's?" The answer, as suggested by your
code, which has been well-written, is the vast majority. Now ask, "how many
are size_t's?" In your case, the vast majority. But what of lesser
programmers?

Keith Thompson · Nov 11, 2007

Malcolm McLean said:
(c) is confusing - a bit like

typedef struct
{
double r;
double i;
} imaginary;

"imaginary" would clearly be a bad name for such a type. "complex"
would clearly be a better name.

Can you suggest a better name for "size_t"? (I've suggested "count_t"
myself.)

(d) is in fact a major change to the language, which wasn't
appreciated at the time it was implemented.

And removing size_t would be *another* major change to the language.

(e) has more theoretical than practical value - an unsigned type is
only needed to represent sizes of memory greater than half the address
space, which is a rather unusual need.

The fact that it's unsigned is a disadvantage only if there's some
clear advantage to making it signed. I don't believe there is. You
do have to be a bit more careful with unsigned computations than with
signed computations, because the lower bound of the type is so close
to the set of reasonable values, but I don't see that as a large
problem. (Yes, you can provide examples where unsigned types can
cause problems; you just have to be aware of such problems when you're
using the type.)

(f) won't in fact be adopted uniformly, leading to a legacy of broken
code when programs are proted from one platform to another.

Do you imagine that removing or deprecating size_t will lead to more
uniform code? Oh, yes, you want to require int to be 64 bits and
deprecate other integer types. That would probably result in greater
uniformity, but at a far greater cost than most of us are willing to
pay.

(g) increases the type matrix by one.

However (a) is the major objection.

Your major objection is that you don't like the name? Sorry, but I
see that as an entirely trivial concern.

Keith Thompson · Nov 11, 2007

Malcolm McLean said:
For instance one of my cases against size_t is that it is a major
change to the language, because most integers are ultimately used as
indices. When challenged on that I give a little bit of statistical
evidence, but when challenged on that evidence, I give up. There's no
helping some people. The assertion about indices is fairly easily
verified - as long as you know what you are talking about.

If I recall correctly, you have a single example of a study that
involved Java. I'll grant you that your evidence is not nonexistent,
but it certainly seems weak. (I have neither the time nor the
expertise to judge the validity of the study, but others here have
certainly disputed it.)

You continue to assert that most integers are ultimately used as
indices. I don't believe you have sufficient evidence.

[snip]

Charlie Gordon · Nov 11, 2007

Keith Thompson said:
Eric Sosman said:

Tubular Technician wrote On 11/05/07 20:16,: [...]

* size_t is unnecessary (size of object in memory never exceeds
what can be held in an integer).

Click to expand...

This claim has been made, and also refuted with actual
examples of real contemporary machines.

Click to expand...

[...]

The claim *as stated* is correct. The size of any object in memory
can never exceed what can be held in an integer (ignoring a nitpicking
controversy about how big an object calloc() can create); size_t is,
after all, an integer type.

Blurring the distinction between "int" and "integer" is one of the
worst errors I see here. There are a number of integer types in C,
ranging from char to long long (and perhaps more if the implementation
provides one or more extended integer types). "int" is just one of
those types (it's also a keyword that can be used in the names of
several other integer types).

(The standard might use the term "integral types" rather than "integer
types"; I'm not sure, it doesn't make much difference to the point,
and my copy of the standard isn't handy at the moment.)

No, the Standard only uses "integral" to refer to the integral part of
floating point values as opposed to the fractional part, with one exception
in note 40 of 6.2.6.1p3 where "successive integral powers of 2" is somewhat
redundant anyway.

Also this is a good time to remind our non native speakers that unlike
"integral", "integer" is pronounced with a soft g as in "just" and
"general". Most French speakers make this mistake ;-)

Charlie Gordon · Nov 11, 2007

Richard Heathfield said:
Malcolm McLean said:

That isn't an argument against size_t. Prototypes were a fairly major
change to the language, but I don't see you arguing against them.

Are they? Let's find out.

Okay, let's see. Time is short (or is it time_t?), so I just picked one C
file on my local disk, and went through it line by line. (My choice wasn't
random, but neither was it 'cooked'. I just listed my projects, and one of
them happened to catch my eye. So it was kinda-sorta random.)

Funnily enough, the code was 255 lines long - just enough to be indexed by
an unsigned char on my system - but I'll put that one down to coincidence.

Const qualifiers have been ignored. So have strings.

int rc: return code
size_t len: string length (i.e. count of char objects)
size_t longest: measure of longest string constructed (i.e. count of char
objects)
size_t maxlinelen: measure of longest line encountered (i.e. count of char
objects)
size_t n: line count
int first: flag

NONE of these objects is used as an index into an array.

int argc: argument count (should be size_t really, but ISO seems to
disagree)
int rc: return code

So - somewhat to my surprise, actually - *none* of the integer objects in
that program were used for array indexing.

Naturally, we must consider the possibility of a statistical blip. So I
guess I'd better do the whole darn exercise again with a different
program.

Okay, this one is 1156 lines long. I'm going to ignore all the struct
definitions (which contain many integer types, few of which are likely to
be used as indexes) because I haven't got all night.

int Status: return code
size_t ThisPattern: used as index
size_t len: line length (i.e. object count)
int Found: flag
size_t SpinnerControl: used as index
int LineCount: line count
size_t len: used as index (this is in a different function to the other
len)
size_t pattern: used as pointer offset, which we'll count as an index
int Status: return code
size_t ThisPattern: used as index
size_t len: used as pointer offset, i.e. index
int Found: flag
size_t wcount: word count
size_t width: keeps track of how much horizontal space an output line
takes
up
int Hit: flag
int Status: return code
size_t ThisPattern: used as index
size_t len: current line length
size_t wcount: word count
size_t width: horizontal space tracker
size_t idx: used as index
size_t j: used as index
int done: flag
size_t curr: used as index
size_t i: used as index
size_t Size: tracks current buffer size
size_t BytesRead: tracks number of input bytes
int Status: return code
size_t pos: records position of a letter in the alphabet
size_t ThisEntry: used as index
size_t ThisByte: used as index
size_t Count: counter
size_t pos: records position of a letter in the alphabet
size_t Freq: used as index
size_t ch: used as index
size_t Start: tracks starting position
size_t End: tracks ending position
size_t RangeStart: tracks start of range
size_t RangeEnd: tracks end of range
size_t LineLength: tracks line length
size_t ch: used as index
size_t Start: tracks starting position
size_t End: tracks ending position
size_t RangeStart: tracks start of range
size_t RangeEnd: tracks end of range
size_t LineLength: tracks line length

Well, I'm bored silly, at considerably more than way through the source.
Enough data there, I think, so I will look no further. Let me just count
them up:

non-index: 30
index: 16

So your claim that "most integers are ultimately used as indices" doesn't
seem to hold water for this program, either.

Of course, I only checked around a thousand lines of source, maybe even
less, so the statistical significance of this result should not be
overplayed.

Well, it's certainly a testable claim, although it doesn't seem to stand
up
to much testing - but I'm not so sure that it's a coherent argument. Just
because integer objects are often used for indexing arrays (which is
hardly surprising, given that an array index /must/ be an integer value),
that doesn't mean size_t is a bad idea.

You are playing on words: most integers in your programs are used to index
into arrays or measure array sizes. Your examples side with Malcolm's
point, just not with the exact terms of his assertion. He should rephrase
it as "because most integers are ultimately used as indices or sizes".

I for one do not particularly like to type or see size_t variables. I much
prefer ssize_t as defined in Posix, to have the ability to represent non
size values such as -1. I think C would be much simpler if object sizes
were always less than or equal to INT_MAX. The reality is that a lot of
platforms see it as necessary to support larger ones, if only by one bit
(SIZE_MAX == UINT_MAX as in DOS/WIN16 and most 32 bit architectures), or
even much larger as in most 64 bit systems. It is impossible to reset the
clock and prevent that. We just have to live with it.

Chris Torek · Nov 11, 2007

I for one do not particularly like to type or see size_t variables.
I much prefer ssize_t as defined in Posix, to have the ability to
represent non size values such as -1. ...

Given that signed integers behave badly (trap on overflow, and
have weird values like negative zero), but unsigned integers
always "work right" -- so that (unsigned int)x + (unsigned int)-1
is always the same as x-1 -- I could argue that C would be better
if it had nothing but *un*signed types.

Seriously, you can use unsigned types throughout, and ssize_t is
generally unnecessary. POSIX probably should not have defined it,
and should instead have said that read() and write() take a size_t
(as they do) and return a size_t, with the value (size_t)-1 returned
on error. Unfortunately, this would have broken the common idiom:

if (write(fd, buf, len) < 0) ... handle error ...

Many programmers like to assume that write() only ever returns
either its third argument or -1, and think that this test is
somehow "more efficient" than:

if (write(fd, buf, len) == -1)

(which would still work if write() returned a size_t, provided that
size_t was not a "narrow" type, and would have worked even with a
"narrow" size_t if the X3J11 committee had used the correct widening
rules back in the 1980s).

As always, of course, we just have to work within the flawed systems
we have, most of the time anyway. (This applies to both C, which
has a series of minor flaws, and POSIX, which has a number of large
ugly ones, in my opinion anyway.)

Keith Thompson · Nov 11, 2007

Charlie Gordon said:
Richard Heathfield said:

Malcolm McLean said: [...]

because most integers are ultimately used as indices.

Click to expand...

Are they? Let's find out.

Click to expand...

[big snip]
[...]

So your claim that "most integers are ultimately used as indices" doesn't
seem to hold water for this program, either.

Click to expand...

[...]

You are playing on words: most integers in your programs are used to index
into arrays or measure array sizes. Your examples side with Malcolm's
point, just not with the exact terms of his assertion. He should rephrase
it as "because most integers are ultimately used as indices or sizes".

Then surely it's Malcolm's job to phrase his challenge correctly.

I for one do not particularly like to type or see size_t variables. I much
prefer ssize_t as defined in Posix, to have the ability to represent non
size values such as -1. I think C would be much simpler if object sizes
were always less than or equal to INT_MAX. The reality is that a lot of
platforms see it as necessary to support larger ones, if only by one bit
(SIZE_MAX == UINT_MAX as in DOS/WIN16 and most 32 bit architectures), or
even much larger as in most 64 bit systems. It is impossible to reset the
clock and prevent that. We just have to live with it.

With modern 64-bit systems, it's becoming more and more common for
object sizes to exceed what can be represented in 32 bits. Making int
bigger than 32 bits causes problems; it leaves a gap in the type
system (if char is 8 bits and int is 64 bits, then short can be either
16 or 32 bits; either there's no 32-bit type or there's no 16-bit
type). C99's extended integer types might solve this, but I don't
know of any implementations that actually use them.

Charlie Gordon · Nov 11, 2007

Malcolm McLean said:
No good. Why not.

stringsup.h

typedef size-t index_t;

You *really* don't like underscores, do you?

index_t charat(char *str, char ch);

I assmue you meant ``index_t indexof(const char *str, char ch);''
charat is traditionally used in other languages to denote different
semantics.

This function is a still good example of the problem with size_t being
unsigned: how are you going to report that str does not contain ch ? The
way this is done in java and javascript is by returning -1. You could
conceivably do that in C by returning (size_t)-1, but it is not elegant at
all, littering user code with ugly casts.

Another way is to make index_t a signed type (ptrdiff_t or ssize_t as
standardized in Posix). This effectively restricts object sizes to
SIZE_MAX/2, but the language already has this inconsistency pending because
the difference of two pointers is indeed a signed quantity, of type
ptrdiff_t, that may not be representable in that type.

I suspect size_t was introduced in an attempt to maximize the use of memory
in architectures were chunks of memory larger than INT_MAX could be handled,
but doubling the size of int was not practical. Intel 16 bit segmented
architecture is a good example. Most compilers for 16 bit intel will define
ptrdiff_t and size_t as 16 bit integers, extending the size of objects to
64KB, still insufficient for problems that need large object
representations.

Charlie Gordon · Nov 11, 2007

Malcolm McLean said:
Time of writing != compile time.

Let's say we want to calculate a standard deviation. The prototype is

double stdev(double *x, N);

what type should N be? If you don't know how big the maximum array is
going to be, which you don't for this function - except that it fits in
memory - it must be a size_t.

Thus we must write

size_t i;

for(i=0;i<N;i++)
{
}

Of course that is misleading, because i is not in any shape or form a size
type. It is an index counter. The implications of introducing size_t
simply weren't thought through.

You find that functions like stdev() are by no means uncommon. Very
frequently you will not hard code the size of an array, until maybe in the
very top layer of code.

The worse problem is that, frequently, you don't know the exact size of an
array but you know that it will be small. For instance the number of
children in a class. Should that be a size_t or not? If we sort the class
by grade, qsort() takes two size_ts. However people will naturally gib at
using a size_t when an int, realistically, is going to be enough. So you
get inconsistency.

Most integers are ultimately used as index variables. Not every integer,
of course, for instance if you dealing with amounts of money you may
choose to represent the sums by integers. But every time you add up a list
of amounts of moeny, you will have one index integer to iterate through
the array and another to count it. Programs don't spend their time doing
calculations, but on moving data from one place to another. Something like
20% of all mainframe cycles are used in sorting, for instance.

Even if an integer is a type field, typically that is used as an array
index. For instance if we have an emum {MR, MRS, MISS, MS, DR, REV, PROF,
LORD} we will probably have an array of strings we index into to help us
construct letters.
That's the problem. Really virtually every integer in the program should
be a size_t, because they will almost all end up being used to derive
index calculations. But that is unlikely to be accepted, partly because of
the unsignedness and efficiency considerations, but mainly because to type
"size_t i;" is so counter-intutitive.
That's why I think think size_t will ultimately have to go, and the
introduction of 64-bit types on the desktop will be the catalyst for this,
because it will no longer be true that int can index an arbitrary array.

ITYM: it will no longer be true that int cannot index an arbitrary array.

dj3vande · Nov 11, 2007

I don't think there's anything wrong with my news server (which is
"news-server.san.rr.com", provided by my ISP, Time Warner Cable's
Roadrunner service). At least some people are obviously seeing my
messages, and Google Groups shows four of my articls in this thread.

Perhaps someone or something has decided to block messages from
news-server.san.rr.com for some reason. Charlie, perhaps you could
check with the administrators of free.fr and aioe.org? If I'm being
blocked, it's always possible that somebody important is being blocked
too. }

The Path header on the copy of the post to which I'm replying that
arrived at news.uwaterloo.ca was:
Path: news.uwaterloo.ca!meganewsservers.com!feeder2.on.meganewsservers.com!nx01.iad01.newshosting.com!newshosting.com!post01.iad01!roadrunner.com!not-for-mail

Comparing paths from servers where it has arrived may (or may not) give
some clues about which newsswerver along the way is dropping them.

Keith's Message-ID is not broken, which eliminates one possible reason
for losing posts that we've seen in the past.

dave

Richard Heathfield · Nov 12, 2007

Charlie Gordon said:

<test info snipped - see upthread>

Test datum #1: index variables: 0% (sample size: 255 lines)
Test datum #2: index variables < 35% (sample size: around 600 lines)

You are playing on words: most integers in your programs are used to
index into arrays or measure array sizes.

No, they aren't. Colloquially, "most" means "nearly all", which is clearly
not true, and strictly speaking, "most" means "more than half", and not
even /that/'s true. In one of the samples, the number was a big fat zero,
and in the second, it was considerably less than half, indeed only
slightly over one third.

Unless of course *you* are playing with words, and claiming that, if less
is more, then fewest is most?

Your examples side with Malcolm's
point, just not with the exact terms of his assertion.

I don't see how.

<snip>

Malcolm McLean · Nov 12, 2007

Chris Torek said:
Given that signed integers behave badly (trap on overflow, and
have weird values like negative zero), but unsigned integers
always "work right" -- so that (unsigned int)x + (unsigned int)-1
is always the same as x-1 -- I could argue that C would be better
if it had nothing but *un*signed types.

Seriously, you can use unsigned types throughout, and ssize_t is
generally unnecessary.

All indices must ultimately be positive. The problem is that intermediate
values can be negative, which doesn't happen often, but not so infrequently
as not to be a problem.

For instance my current program takes a window of 13 residues round a target
in a protein. If I'm taking a residue near the N-terminus, conventionally
regarded as the start, the window might overlap into negative values. Whilst
you can code round the problem of not having negatives, it is much better to
be able to say clearly if(residuei < 0).

A trap on overflow is not bad behaviour incidentally. It is good behaviour.
I wish all my C programs would exit with an error message whenever their
capabilities are exceeded.

Richard Heathfield · Nov 12, 2007

Malcolm McLean said:

All indices must ultimately be positive.

ITYM non-negative.

The problem is that intermediate
values can be negative, which doesn't happen often, but not so
infrequently as not to be a problem.

My experience differs. I can certainly agree that it happens occasionally,
but not so frequently as to be a problem.

A trap on overflow is not bad behaviour incidentally. It is good
behaviour. I wish all my C programs would exit with an error message
whenever their capabilities are exceeded.

I wish all mine did, too. Mostly, however, they do. When they don't, I
consider it to be my fault, not C's fault. In almost all circumstances, C
provides a mechanism for getting any C feature right, so that getting it
wrong is my problem, not C's problem. In the few circumstances where this
is not the case, I will tend to avoid that particular feature (gets() is
an example, as are VLAs (although of course this is not the *only* reason
I avoid VLAs).

What's the deal with C99?	111	Mar 24, 2008
Overflow of size_t?	9	Jul 3, 2009
What's the deal with the "toupper" family?	48	Jul 5, 2006
Whats the deal with 'const'?	20	Jul 15, 2006
Weird Behavior with Rays in C and OpenGL	4	Feb 12, 2024
What's the guideline for dealing with unwanted chars in input stream?	17	Dec 31, 2005
simple, practical example of "code-reuse with the help of OOP"	0	Apr 18, 2014
Making Fatal Hidden Assumptions	353	Mar 6, 2006

What's the deal with size_t?

Richard Heathfield

Malcolm McLean

Flash Gordon

Charlie Gordon

Malcolm McLean

Richard Heathfield

Keith Thompson

Malcolm McLean

Keith Thompson

Keith Thompson

Charlie Gordon

Charlie Gordon

Chris Torek

Keith Thompson

Charlie Gordon

Charlie Gordon

dj3vande

Richard Heathfield

Malcolm McLean

Richard Heathfield

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads