What's the deal with size_t?

  • Thread starter Tubular Technician
  • Start date
M

Malcolm McLean

Tubular Technician said:
Array sizes need to be known at compile time; furthermore according
to ISO/IEC 9899:1999 5.2.4.1 a single object is not guaranteed to be
able to be larger than 65535 bytes, which according to 5.2.4.2.1 also
happens to be the guaranteed range of an unsigned int (UINT_MAX).
So an index type of unsigned int ("natural" int size?) seems perfectly
fine to me.
Time of writing != compile time.

Let's say we want to calculate a standard deviation. The prototype is

double stdev(double *x, N);

what type should N be? If you don't know how big the maximum array is going
to be, which you don't for this function - except that it fits in memory -
it must be a size_t.

Thus we must write

size_t i;

for(i=0;i<N;i++)
{
}

Of course that is misleading, because i is not in any shape or form a size
type. It is an index counter. The implications of introducing size_t simply
weren't thought through.

You find that functions like stdev() are by no means uncommon. Very
frequently you will not hard code the size of an array, until maybe in the
very top layer of code.

The worse problem is that, frequently, you don't know the exact size of an
array but you know that it will be small. For instance the number of
children in a class. Should that be a size_t or not? If we sort the class by
grade, qsort() takes two size_ts. However people will naturally gib at using
a size_t when an int, realistically, is going to be enough. So you get
inconsistency.

Most integers are ultimately used as index variables. Not every integer, of
course, for instance if you dealing with amounts of money you may choose to
represent the sums by integers. But every time you add up a list of amounts
of moeny, you will have one index integer to iterate through the array and
another to count it. Programs don't spend their time doing calculations, but
on moving data from one place to another. Something like 20% of all
mainframe cycles are used in sorting, for instance.

Even if an integer is a type field, typically that is used as an array
index. For instance if we have an emum {MR, MRS, MISS, MS, DR, REV, PROF,
LORD} we will probably have an array of strings we index into to help us
construct letters.
What I was wondering in my original post is, if an object *does*
represent a size/index, but said value is neither the result of either
sizeof nor does it involve any of the standard library functions taking
or returning a size_t, how common is it to make it a size_t nonetheless?
That's the problem. Really virtually every integer in the program should be
a size_t, because they will almost all end up being used to derive index
calculations. But that is unlikely to be accepted, partly because of the
unsignedness and efficiency considerations, but mainly because to type
"size_t i;" is so counter-intutitive.
That's why I think think size_t will ultimately have to go, and the
introduction of 64-bit types on the desktop will be the catalyst for this,
because it will no longer be true that int can index an arbitrary array.
 
K

Keith Thompson

Tubular Technician said:
Malcolm McLean wrote: [...]
The vast majority of the functions in the book work on arrays whose
dimensions are unknown at the time of writing.

Array sizes need to be known at compile time; furthermore according
to ISO/IEC 9899:1999 5.2.4.1 a single object is not guaranteed to be
able to be larger than 65535 bytes, which according to 5.2.4.2.1 also
happens to be the guaranteed range of an unsigned int (UINT_MAX).
So an index type of unsigned int ("natural" int size?) seems perfectly
fine to me.

Not all array sizes are known at compile time. C99 lets you declare
variable-size arrays (VLAs):

int n = some_func();
double vla[n];

And even in C90, arrays can be created by malloc():

int n = some_func();
double *ptr = malloc(n * sizeof *ptr);

Though an implementation isn't required to support objects bigger than
65535 bytes, it certainly can do so. On a modern 64-bit system,
UINT_MAX is typically 2**32-1, but objects bigger than 4 gigabytes
might be supported. You can't necessarily index such an array with
unsigned int; you can index it with size_t.

[...]
The only inconsistency I can see is if using the sizeof operator in
a translation unit that does not include of any of the standard
headers providing the definition of size_t.
[why does the compiler know about sizeof but not about size_t? Or
rather, why is sizeof a keyword but size_t is not?]

sizeof is a built-in operator (it can't be defined as a function or
macro), so it pretty much has to be a keyword.

size_t isn't a distinct type, it's a typedef for some other type
(which one varies from implementation to implementation). A typedef
doesn't create a type, it merely creates an alias for an existing
type. So if, in a given implementation, size_t happens to be an alias
for unsigned long, the compiler merely has to have the sizeof operator
yield a value of type unsigned long. The ``typedef unsigned long
size_t;'' declaration merely documents this choice.

[...]
How does that follow?


Never heard of that theory.

Malcolm is, as far as I know, the only person who hold that theory.

[...]
What I was wondering in my original post is, if an object *does*
represent a size/index, but said value is neither the result of either
sizeof nor does it involve any of the standard library functions taking
or returning a size_t, how common is it to make it a size_t nonetheless?

It's fairly common, and IMHO quite reasonable. Since size_t can hold
the size in bytes of any object, it follows that it can hold the
length in elements of any array object (since each element is at least
one byte).

I believe Malcolm is mistaken on this point.
 
F

Flash Gordon

Malcolm McLean wrote, On 11/11/07 08:52:
Time of writing != compile time.

Not is it requirements analysis time, of which more later.
Let's say we want to calculate a standard deviation. The prototype is

double stdev(double *x, N);

what type should N be? If you don't know how big the maximum array is
going to be, which you don't for this function - except that it fits in
memory - it must be a size_t.

However, you have either failed to do requirements analysis (in which
case your project deserves to fail) or you know or have decided on some
limits. If you really have no limits then it is impossibly because
computers are finite and if you don't know them then you don't know if
it is possible on a 32 bit computer, 64 bit computer or if you need a
supercomputer.
Thus we must write

size_t i;

for(i=0;i<N;i++)
{
}

Of course that is misleading, because i is not in any shape or form a
size type.

If you find it misleading you are easily mislead.
It is an index counter.

So? That means it is something that has to be able to span the size of
an object. You can always to
typedef index_t size_t;
if you want.
The implications of introducing
size_t simply weren't thought through.

They were thought through a lot more carefully that you seem to have
thought through your ideas.
You find that functions like stdev() are by no means uncommon. Very
frequently you will not hard code the size of an array, until maybe in
the very top layer of code.

See previous comments about requirements analysis. Also see comments in
previous posts about the fact that using an int is also imposing limits
and using a typedef if there might be reason for the type being changed.
The worse problem is that, frequently, you don't know the exact size of
an array but you know that it will be small.

See, you do know something about the size before you start writing code.
Now you just have to use that information when you design your program.
For instance the number of
children in a class. Should that be a size_t or not? If we sort the
class by grade, qsort() takes two size_ts. However people will naturally
gib at using a size_t when an int, realistically, is going to be enough.

So? If you choose to use int for efficiency then it will be converted to
size_t, and if your value is within range (which you have just said it
will be) the conversion will do what you expect.
So you get inconsistency.

IF you use int because you know from your requirements analysis it is
sufficient and you want the efficiency then you use it consistently
within your code and it gets converted when passed to a library function
that expects a size_t. If you use size_t consistently it is all the same
time. Either way there is no problem. Also no problem if you use your
own typedef (and #defines) so you can change the type easily later if
there is a risk of needing to change it later. That is the wonderful
thing about most programming languages (including ones I don't like),
they provide mechanisms to solve problems if you bother to learn and
understand them.
Most integers are ultimately used as index variables. Not every integer,
of course, for instance if you dealing with amounts of money you may
choose to represent the sums by integers. But every time you add up a
list of amounts of moeny, you will have one index integer to iterate
through the array and another to count it. Programs don't spend their
time doing calculations, but on moving data from one place to another.

All unsupported rubbish which has been pointed out to you many times by
people who have actually spent a lot of time working in the SW industry.
Something like 20% of all mainframe cycles are used in sorting, for
instance.

Another statistic which I suspect you have pulled out of thin air.
Even if an integer is a type field, typically that is used as an array
index. For instance if we have an emum {MR, MRS, MISS, MS, DR, REV,
PROF, LORD} we will probably have an array of strings we index into to
help us construct letters.

I don't for most of my enums, and for the ones I do that is an
occasional use not the main use of the variables.

It varies. Some people will use size_t if it is a generic library to
ensure it can cope with any size of object, some will use int because
they are more concerned with efficiency (correctly or not) and some will
select a type without thinking it through. Just like any other design
decision really.
That's the problem. Really virtually every integer in the program should
be a size_t, because they will almost all end up being used to derive
index calculations.

Malcolm is talking rubbish again.
But that is unlikely to be accepted, partly because
of the unsignedness

People have explained why this makes sense and is perfectly acceptable
to people other than you.
and efficiency considerations,

Ah, so a size_t large enough to access any byte of any object introduces
unacceptable efficiency problems but *your* desire for an equally large
int type does not?
but mainly because to
type "size_t i;" is so counter-intutitive.

Only to you. Most people seem not to have a problem with it.
That's why I think think size_t will ultimately have to go, and the
introduction of 64-bit types on the desktop will be the catalyst for
this, because it will no longer be true that int can index an arbitrary
array.

You don't see the contradiction of claiming that a large enough int is
not a problem but an equal size size_t introduces efficiency problems?

To the OP, Malcolm has been spouting this rubbish for a while now, but
in all the years since the ANSI standard was introduced in 1989, despite
Malcolm's dire predictions of what size_t will do to the language, it
has not killed C. Including all the years with a 16 bit int type and
objects too large to be address by it.
 
B

Ben Bacarisse

Flash Gordon said:
typedef index_t size_t;

<nit>typedef size_t index_t;</nit>

.... and you saved me from having to reply to Malcolm. The anti-size_t
FUD needs to be refuted.
 
F

Flash Gordon

Ben Bacarisse wrote, On 11/11/07 12:35:
<nit>typedef size_t index_t;</nit>

I'm doing well at that at the moment. I thought it the correct way
around, but typed it backwards.
... and you saved me from having to reply to Malcolm. The anti-size_t
FUD needs to be refuted.

Indeed. At least, for those without the experience and knowledge to spot
it for the stupidity it is or who might assume that there is actually
some basis to his claims beyond his own bias.
 
M

Malcolm McLean

Ben Bacarisse said:
<nit>typedef size_t index_t;</nit>

... and you saved me from having to reply to Malcolm. The anti-size_t
FUD needs to be refuted.
No good. Why not.

stringsup.h

typedef size-t index_t;

index_t charat(char *str, char ch);


The poor user

#include strsup.h /* I need charat, right */


void complex_parser(char *stuff)
{
index_t quote;

quote = charat(stuff, '\"');
/* wait a minute, index_t is now runnignt hrough all my code
Just so that I can use malcolm's charat() function. */
}


You do not define bool, string, or any other fundamental type in user code.
Because you then force everyone to either adopt your convention, or to
create a kludgy mess. You also do not define your own alias for size_t.
 
M

Malcolm McLean

Flash Gordon said:
Indeed. At least, for those without the experience and knowledge to spot
it for the stupidity it is or who might assume that there is actually some
basis to his claims beyond his own bias.
The claims have been justified time after time on this ng. Obviously not at
sufficient length to persuade those who are rather slow.

For instance a statistical analysis of some Java byte code was posted a
while back to convince you that most integers are used for index variables.
Remember that? Actually it something that any programmer with any
sensitivity know instinctively. However you didn't have the insight to
appreciate the relevance. Now you've decided that, because the great Flash
has managed to venture some criticism, however inane, there is "no basis" to
my ideas.

What arrogance.
 
B

Ben Bacarisse

Malcolm McLean said:
No good. Why not.

If people don't learn how and when to use size_t then a real problem
does emerge.
stringsup.h

typedef size-t index_t;

index_t charat(char *str, char ch);


The poor user

#include strsup.h /* I need charat, right */


void complex_parser(char *stuff)
{
index_t quote;

quote = charat(stuff, '\"');
/* wait a minute, index_t is now runnignt hrough all my code
Just so that I can use malcolm's charat() function. */
}

No. It is almost as if we are talking about a different language.
You have not persuaded me in the past that the "poisoning" you think
happens is inevitable and I have failed to persuade you that it is not
a problem. We have both stated our cases (in other threads) and the
best thing is simply for others to decide for themselves. I don't see
any point in going over it again.

However, someone needs to cry "foul" (or at least "not agreed") if you
keep telling beginners that size_t will mess up their programs.
 
B

Ben Bacarisse

Malcolm McLean said:
The claims have been justified time after time on this ng. Obviously
not at sufficient length to persuade those who are rather slow.

I'd prefer to keep the debate as polite as possible. Both sides of
this debate (and as far as I can see you are alone on your side[1])
are having trouble getting through but calling me "slow" won't help.

[1] I don't mean people who find the underscore infelicitous. I mean
people who agree that it needs to be removed from the language to
ensure the survival of C.
 
M

Malcolm McLean

Ben Bacarisse said:
However, someone needs to cry "foul" (or at least "not agreed") if you
keep telling beginners that size_t will mess up their programs.
I don't jump on every use of size_t in newbie code and say "aha, that will
be deprecated within ten years. Better take it out now".

However if someone starts a thread "what's the deal with size_t" naturally
I'll give my opinion, without claiming that it is the only one that can be
held.
 
R

Richard

Ben Bacarisse said:
Malcolm McLean said:
The claims have been justified time after time on this ng. Obviously
not at sufficient length to persuade those who are rather slow.

I'd prefer to keep the debate as polite as possible. Both sides of
this debate (and as far as I can see you are alone on your side[1])
are having trouble getting through but calling me "slow" won't help.

He didn't. He was referring to the more aloof arguing tactics of Flash
Gordon.
[1] I don't mean people who find the underscore infelicitous. I mean
people who agree that it needs to be removed from the language to
ensure the survival of C.

I certainly seem to have lost track of this ongoing saga. I thought most
of it was about people being anal in insisting size_t be used as an
index in a controlled program where the programmer knows full well that
a typical int/unsigned int is sufficient[1].

[1] And if he doesn't he should question whether he should be a
programmer of anything likely to perform.
 
F

Flash Gordon

Malcolm McLean wrote, On 11/11/07 18:10:
I don't jump on every use of size_t in newbie code and say "aha, that
will be deprecated within ten years. Better take it out now".

However if someone starts a thread "what's the deal with size_t"
naturally I'll give my opinion, without claiming that it is the only one
that can be held.

No, you claim it as absolute fact and claim to have proved it, which you
have not.
 
F

Flash Gordon

Malcolm McLean wrote, On 11/11/07 17:19:
The claims have been justified time after time on this ng. Obviously not
at sufficient length to persuade those who are rather slow.

For instance a statistical analysis of some Java byte code was posted a
while back to convince you that most integers are used for index
variables. Remember that?

Yes, and you should remember the number of people who came up with good
reasons why that did not prove your point.
Actually it something that any programmer with
any sensitivity know instinctively.

Ah, so you are claiming that everyone else on this group who has
expressed an opinion has no sensitivity. Interesting and almost
certainly wrong.
However you didn't have the insight
to appreciate the relevance.

You seem not to have the ability to see why it did not prove your point
despite the reasons being pointed out.
Now you've decided that, because the great
Flash has managed to venture some criticism, however inane, there is "no
basis" to my ideas.

Since you came up with that study after forming your opinion you formed
your opinion based on your own bias. Since many very good reasons why
that study did not prove your point were pointed out to you it is still
true that you have no basis other than your own bias for these claims.
What arrogance.

Ah, so someone who disagrees with you with reasons that have been stated
is arrogant. Where as you who disagree with the majority including such
luminaries as Intel, the C standards committee, those on this group who
know more than me about C and all the evidence presented to counter your
claims are not arrogant.

By the way, people have presented evidence as to why your claims are
wrong and why your study did not prove your point. I am fairly sure that
most was presented as evidence rather than proof. However, you are of
the opinion that any evidence presented against you does not count where
as when you present something it is proof.
 
F

Flash Gordon

Richard wrote, On 11/11/07 18:11:
Ben Bacarisse said:
Malcolm McLean said:
The claims have been justified time after time on this ng. Obviously
not at sufficient length to persuade those who are rather slow.
I'd prefer to keep the debate as polite as possible. Both sides of
this debate (and as far as I can see you are alone on your side[1])
are having trouble getting through but calling me "slow" won't help.

He didn't. He was referring to the more aloof arguing tactics of Flash
Gordon.

I agree that he was probably referring more to me than Ben, although the
way it is phased says that it applies to everyone who has read the
threads and still disagrees with Malcolm.
[1] I don't mean people who find the underscore infelicitous. I mean
people who agree that it needs to be removed from the language to
ensure the survival of C.

I certainly seem to have lost track of this ongoing saga. I thought most
of it was about people being anal in insisting size_t be used as an
index in a controlled program where the programmer knows full well that
a typical int/unsigned int is sufficient[1].

[1] And if he doesn't he should question whether he should be a
programmer of anything likely to perform.

Most of it from my side is against Malcolm's premiss that most integers
are used for indexing and his conclusion that size_t should be removed
from the language and int should become 64 bit on modern "64 bit"
processors. Oh, and the claims my Malcolm that any evidence presented
against his case does not count because it is only "a few little
exceptions" (or something, I can't remember his exact words) or gets
ignored.

I'm mainly of the opinion that people like Intel, the C standards
committee, the Posix standard committee et al know rather more than me
OR Malcolm on this subject and yet they all seem to have reached the
opposite conclusions to Malcolm (I almost said they disagreed with him,
but of course they have probably never heard of him).
 
R

Richard Heathfield

Malcolm McLean said:

The claims [about size_t] have been justified time after time on this ng.

I have yet to see a justification that has any merit.
Obviously not
at sufficient length to persuade those who are rather slow.

You don't need lengthy arguments, merely good ones. You haven't presented
any so far that I can recall.
 
M

Malcolm McLean

Richard Heathfield said:
Malcolm McLean said:

The claims [about size_t] have been justified time after time on this ng.

I have yet to see a justification that has any merit.
Obviously not
at sufficient length to persuade those who are rather slow.

You don't need lengthy arguments, merely good ones. You haven't
presented any so far that I can recall.
The claim has been made that there is "no basis" to my claims. Now I don't
think any fair person would say that I make pure assertions - that size_t is
bad because I say so, and for no other reason. But obviously there is a
demand for more material.

If you disagree and think that size_t ought to be retained that's quite a
different matter from saying that "no justification has been advanced". It
is also quite different from saying that "no justification with any merit
for abolishing size_t has been advanced".

To write

size_t i;

for(i=0;i<N;i++)
{
ptr++;
}

is misleading, because it implies that i is a "size type" when it is nothing
of the sort. It is an index.

Don't you think that is a justification that has some merit? Or maybe you
are making the stronger claim that, if you personally disagree with a
proposal, all arguments for it are not just weaker than the arguments
against (which you must believe), but have "no merit". I suspect this is the
case. "Has no merit" is Heathfieldese for "I disagree".
 
K

Keith Thompson

Malcolm McLean said:
I don't jump on every use of size_t in newbie code and say "aha, that
will be deprecated within ten years. Better take it out now".

However if someone starts a thread "what's the deal with size_t"
naturally I'll give my opinion, without claiming that it is the only
one that can be held.

The difference between claiming that your opinion is the only one that
can be held and claiming that those who disagree are "rather slow",
lacking any sensitivity or insight, and arrogant is a rather subtle
one, don't you think?

Please take just a moment and consider the possibility that you might
be wrong.
 
K

Keith Thompson

Malcolm McLean said:
The claim has been made that there is "no basis" to my claims. Now I don't
think any fair person would say that I make pure assertions - that size_t is
bad because I say so, and for no other reason. But obviously there is a
demand for more material.

I think fair people *are* saying that you make pure assertions, or
nearly so.
If you disagree and think that size_t ought to be retained that's quite a
different matter from saying that "no justification has been advanced". It
is also quite different from saying that "no justification with any merit
for abolishing size_t has been advanced".

Yes, it's quite different.
To write

size_t i;

for(i=0;i<N;i++)
{
ptr++;
}

is misleading, because it implies that i is a "size type" when it is nothing
of the sort. It is an index.

Don't you think that is a justification that has some merit?
No.

Or maybe you
are making the stronger claim that, if you personally disagree with a
proposal, all arguments for it are not just weaker than the arguments
against (which you must believe), but have "no merit". I suspect this is the
case. "Has no merit" is Heathfieldese for "I disagree".


Richard Heathfield (who generally writes in English, not
Heathfieldese) is entirely capable of expressing disagreement without
saying that any arguments on the other side have no merit. I don't
presume to speak for him, but I believe he reserves claims that an
argument has no merit for cases where he believes that an argument has
no merit.
 
M

Malcolm McLean

Keith Thompson said:
The difference between claiming that your opinion is the only one that
can be held and claiming that those who disagree are "rather slow",
lacking any sensitivity or insight, and arrogant is a rather subtle
one, don't you think?

Please take just a moment and consider the possibility that you might
be wrong.
The "rather slow" ones think that no justification has been offered. As I
said, it is demand for more material, which is rather irritating since I
consider plenty to have been provided.
Those whio lack "sensitivity and insight" are those who firstly don't simply
accept that most integers are ultimately used for memory index operations,
or in intermediate calulations to derive such indices, and secondly don't
see the force of a statistical proof when it is provided. This is no
slowness, but it something that someone with no real feel for computer
programming would think.

The "arrogance" refers to a philosophical fallacy, which I will call the
"debunking fallacy". This states that if some objection can be advanced to a
piece evidence, it disappears, it may no longer be used.
 
R

Richard

Keith Thompson said:
Malcolm McLean said:
The claim has been made that there is "no basis" to my claims. Now I don't
think any fair person would say that I make pure assertions - that size_t is
bad because I say so, and for no other reason. But obviously there is a
demand for more material.

I think fair people *are* saying that you make pure assertions, or
nearly so.
If you disagree and think that size_t ought to be retained that's quite a
different matter from saying that "no justification has been advanced". It
is also quite different from saying that "no justification with any merit
for abolishing size_t has been advanced".

Yes, it's quite different.
To write

size_t i;

for(i=0;i<N;i++)
{
ptr++;
}

is misleading, because it implies that i is a "size type" when it is nothing
of the sort. It is an index.

Don't you think that is a justification that has some merit?


No.


I do. A size means something totally different to a number of elements.
number of elements * size of elements = total size.

This is fairly basic nomenclature and difficult to disagree with.
Richard Heathfield (who generally writes in English, not
Heathfieldese) is entirely capable of expressing disagreement without

Richard Heathfield writes in flowery prose that sometimes appears to be
designed to confuse non native speakers from what I can gather.
saying that any arguments on the other side have no merit. I don't
presume to speak for him, but I believe he reserves claims that an
argument has no merit for cases where he believes that an argument has
no merit.

Isn't that like saying "when he thinks he's right he thinks he's right?
Or is my parser now broekn?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,775
Messages
2,569,601
Members
45,182
Latest member
BettinaPol

Latest Threads

Top