What's the deal with size_t?

  • Thread starter Tubular Technician
  • Start date
F

Flash Gordon

Malcolm McLean wrote, On 12/11/07 21:57:
People can't count them right.

That comment and the rest of your post completely ignores the points
that I made. Ignoring points does not invalidate them.

Why should we do your work for you. If you want to convince people *you*
provide the evidence to convince them.

In the part of the post you snipped (without marking the cut which is
potentially misleading) I provided a count from one file and it suggests
that your assertion is wrong for the type of code I am currently most
involved in.

I scanned but did not count another in the part you left in I gave my
impression from a quick look at a second, which also suggests you are wrong.
Maybe because the idea of "ultimately
used to derive indices" is a bit woolly.
If we say


for(i=start;i<=end;i++)
array = 0;

start and end hold index values, though they are not actually used as
the indexing variable themsleves. They are intermediates.


<snip>

I used the most liberal interpretation I could which would have included
start and end in your example above and much more. It still suggests you
are wrong.

Now, since both RHs checks and mine both disagree with your assertion,
and unlike the unrelated study you pointed out both Richard and I used
real C code developed for real applications will you accept the
possibility that you could be wrong? I'm not asking you to say you are
definitely wrong, just that you admit that it is not definite that you
are correct.
 
B

Ben Bacarisse

Chris Torek said:
So you have an implementation in which:

int i;
...
i = INT_MAX - 4;
...
i += 32; /* result should be INT_MAX + 31, ie, "overflow" */

causes a runtime trap? Those are, alas, all too rare. Can you
name your implementation?

I discovered only recently that gcc can make integer arithmetic
overflow (on some platforms at least) with the -ftrapv flag. Can be
handy to have that assurance.
 
T

Tubular Technician

Malcolm said:
Time of writing != compile time.

Let's say we want to calculate a standard deviation. The prototype is

double stdev(double *x, N);

Technically, x is a pointer, not an array. Three likely possibilities...

[1.]
double foo, bar[] = { /* ... */ };
foo = stdev(bar, sizeof bar / sizeof bar[0]);

Here, `N' is the result of sizeof, a size_t.

[2a.]
#define NUMBARS 100
double foo, bar[NUMBARS];
foo = stdev(bar, NUMBARS);

`N' is of type int, implicitly converted to size_t.

[2b.]
#define NUMBARS 100
double foo, bar[NUMBARS];
foo = stdev(bar, sizeof bar / sizeof bar[0]);

`N' is the result of sizeof, a size_t.

[3.]
#define NUMBARS 100
double foo, *bar;
bar = malloc(NUMBARS * sizeof *bar);
foo = stdev(bar, NUMBARS);

`N' is closely related to the allocation size in the malloc() call,
which is a size_t.
what type should N be? If you don't know how big the maximum array is
going to be, which you don't for this function - except that it fits in
memory - it must be a size_t.

You make it sound like that's a bad thing. In the hypothetical (but
possible) case that CHAR_BITS is sufficiently large and
(sizeof double) == 1; yes, N can be as big as the biggest unit of
allocatable memory; the exact range a variable of type size_t can
hold.
Thus we must write

size_t i;

for(i=0;i<N;i++)
{
}

Of course that is misleading, because i is not in any shape or form a size
type. It is an index counter. The implications of introducing size_t
simply weren't thought through.

Think of it this way -- `i' is a variable that has to be able to
represent the same set of values as `N'; the name of its type is
the least important here.
You find that functions like stdev() are by no means uncommon. Very
frequently you will not hard code the size of an array, until maybe in the
very top layer of code.

If stdev() is designed so it can take the maximum possible number
of input values only restricted by continuous addressable memory,
what other type could N be? The calling code, on the other hand,
may pass a narrower type in the second argument which will be
implicitly converted to the wider type. Different layers of code
may hold the same value in different types.
The worse problem is that, frequently, you don't know the exact size of an
array but you know that it will be small. For instance the number of
children in a class. Should that be a size_t or not? If we sort the class
by grade, qsort() takes two size_ts. However people will naturally gib at
using a size_t when an int, realistically, is going to be enough. So you
get inconsistency.

Same thing here. qsort() is a general purpose function that is able
to deal with a wider range of inputs than this example uses. Since
neither of the two size_t arguments can be negative, I'd suggest
unsigned int, but since both types (assuming non-negative values)
convert losslessly to a size_t, what is the exact problem? Different
layers of code can hold the same value in different types.
Most integers are ultimately used as index variables.

Maybe, maybe not, depends. Does not really matter, either; narrower
integers can be index variables, too, as long as the expected set
of values fits into the smaller type.
[some compilers/assemblers on some CPUs will even produce shorter
opcodes for small index types and/or small constant indices]
Not every integer,
of course, for instance if you dealing with amounts of money you may
choose to represent the sums by integers.

Presumably this referes to the data type used for array elements and
their sum; unrelated.
But every time you add up a list
of amounts of moeny, you will have one index integer to iterate through
the array and another to count it.

While both, index and count, should have the same (preferably unsigned)
type, they don't have to be size_t.
Programs don't spend their time doing
calculations, but on moving data from one place to another. Something like
20% of all mainframe cycles are used in sorting, for instance.

I'll pretend I have no opinion.
Even if an integer is a type field, typically that is used as an array
index.

Maybe, maybe not, depends, does not really matter, ...
For instance if we have an emum {MR, MRS, MISS, MS, DR, REV, PROF,
LORD} we will probably have an array of strings we index into to help us
construct letters.

....the number of enumeration constants in said enum cannot portably
be larger than 1023; assuming it starts at 0 and has no gaps in it,
a variable of that type can indeed be used for such purposes.
Where does size_t come into play here?
That's the problem. Really virtually every integer in the program should
be a size_t, because they will almost all end up being used to derive
index calculations.

Whether that's true or not (I have no opinion)... why do they have
to be size_t? As far as I can tell, in

foo_expression '[' bar_expression ']'

"One of the expressions shall have type ``pointer to object type'',
the other expression shall have integer type, and the result has
type ``[object] type''." [6.5.2.1.1]

Nowhere I can find is it mentioned that an index must be size_t,
nor that any differently typed index is converted to size_t, or may
results in any other form of computational overhead. In fact, as
already mentioned above, a smaller index type may produce *shorter*
code on some architectures.
But that is unlikely to be accepted, partly because of
the unsignedness and efficiency considerations, but mainly because to type
"size_t i;" is so counter-intutitive.

Personally, I think the set of storable values is more important
than the name of a type; *iff* the full range of values a size_t can
hold is even needed in any particular place.
That's why I think think size_t will ultimately have to go, and the
introduction of 64-bit types on the desktop

"Desktops" aren't everything.
will be the catalyst for this,
because it will no longer be true that int can index an arbitrary array.

Just curious, did anyone suggest this change to the standards body?
What did they respond?
 
S

Stephen Sprunk

Malcolm McLean said:
Only if you count from zero, which computers do but mathematicians
don't.

Ah, but computers have all kinds of funny things when it comes to math, like
positive and negative zero on some systems.

For that matter, are negative indices necessarily incorrect?

char a[2], *b, c;
b = &a[1];
c = b[-1];

ISTM that is defined according to C's rules, since the last line is
equivalent to "c = *(b + -1);", which is of course defined.

S
 
P

pete

Tubular Technician wrote:
Sooo... what's the real deal with size_t? Where should it be
used/avoided (examples?)

Tubular Technician wrote:
Sooo... what's the real deal with size_t? Where should it be
used/avoided (examples?)

size_t is an implementation defined unsigned integer type.

The biggest problem with size_t,
is that it may or may not be lower ranking than int,
and that it might or might not
promote from an unsigned type to a signed type,
which matters much in comparisons.

And so, expressions that have operands of size_t,
may require some casts for portability reasons.

Consider the necessity of the casts in the case
where these expressions are defined as being equal to (1):
(UNIT_MAX > (size_t)-1)
(UNIT_MAX > INT_MAX)
( INT_MAX > (size_t)-1)
( count == (size_t)-2)

For this code:
int get_line(char **lineptr, size_t *n, FILE *stream)
{
int rc;
void *p;
size_t count;

count = 0;
while ((rc = getc(stream)) != EOF) {
if (count != (size_t)-2) {
++count;
}
if ((size_t)(count + 2u) > *n) {


http://www.mindspring.com/~pfilandr/C/get_line/get_line.c
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,177
Latest member
OrderGlucea
Top