What's the deal with size_t?

  • Thread starter Tubular Technician
  • Start date
P

Philip Potter

Flash said:
Malcolm McLean wrote, On 07/11/07 22:10:

However, if in your scenario the programmer of the search function uses
the type int you can have an infinite game? Don't be an idiot. Of
course, if they have decided to leave the decision until later or think
there may be reason to change there is another useful language feature
called typedef.

typedef room_size_t whatever;

ITYM
typedef whatever room_size_t;
 
R

Richard Bos

Jeffrey Stedfast said:
sorry, I meant to say "at least", as in, "Max size_t is at least 4GB on
32bit" because it has to be able to at least hold values that large.

Not even that. It is quite possible for a system (probably embedded) to
have 32-bit or even 64-bit ints, but very few of them. If your
implementation targets a chip which has only 256 4-octet words, it is
possible to limit size_t to one octet.
Two things need to be noted here, though. One: whether the above
approach makes practical sense is quite another question; all I'm saying
here is that it's allowed. And two: this is for C89; C99 has SIZE_MAX,
which C89 didn't have, and it demands that SIZE_MAX be at least 65535.
Still not 4 GB, though.

Richard
 
E

Eric Sosman

Charlton Wilbur wrote On 11/07/07 16:25,:
[...]
Yes, and the dialect of C I use most often will implicitly cast an int
to a size_t as necessary. Does your compiler not support that feature?

How can a cast be "implicit?"

In other words, what are you talking about? Can
you give an example?
 
C

Charlton Wilbur

ES> Charlton Wilbur wrote On 11/07/07 16:25,:
>> [...] Yes, and the dialect of C I use most often will
>> implicitly cast an int to a size_t as necessary. Does your
>> compiler not support that feature?

ES> How can a cast be "implicit?"

Sorry, implicit type conversion, not implicit cast.

ES> In other words, what are you talking about? Can you give
ES> an example?

Given the following declarations:

struct foo **foolist;
int i;
int num_foo;
int compare_foo (void *foo_one, void *foo_two);

and a prototype for qsort in scope, the following code:

qsort (foolist, last_foo, sizeof struct foo *, compare_foo);

may cause a warning at compile time because num_foo (an int) is used
when the prototype of qsort calls for a size_t, but the type
conversion from int to size_t is nonetheless performed.

Charlton
 
E

Eric Sosman

Charlton Wilbur wrote On 11/08/07 14:48,:
ES> Charlton Wilbur wrote On 11/07/07 16:25,:
[...] Yes, and the dialect of C I use most often will
implicitly cast an int to a size_t as necessary. Does your
compiler not support that feature?
[...]
ES> In other words, what are you talking about? Can you give
ES> an example?

Given the following declarations:

struct foo **foolist;
int i;
int num_foo;
int compare_foo (void *foo_one, void *foo_two);

and a prototype for qsort in scope, the following code:

qsort (foolist, last_foo, sizeof struct foo *, compare_foo);

may cause a warning at compile time because num_foo (an int) is used
when the prototype of qsort calls for a size_t, but the type
conversion from int to size_t is nonetheless performed.

Thanks; that makes sense (assuming last_foo is really
num_foo). The conversion is performed because that's what
C requires. C does not require the diagnostic, but the
compiler is always allowed to emit diagnostics that C
doesn't require.
 
C

Charlton Wilbur

ES> Charlton Wilbur wrote On 11/08/07 14:48,:
>>>>>>> "ES" == Eric Sosman <[email protected]> writes:
>> ES> Charlton Wilbur wrote On 11/07/07 16:25,:
>> >> [...] Yes, and the dialect of C I use most often will >>
>> implicitly cast an int to a size_t as necessary. Does your >>
>> compiler not support that feature? [...]
ES> In other words, what are you talking about? Can you give an
ES> example?
ES> Thanks; that makes sense (assuming last_foo is really
ES> num_foo).

Yes; though I write code and posts in the same editor, the compiler
only checks my code.

ES> The conversion is performed because that's what C requires. C
ES> does not require the diagnostic, but the compiler is always
ES> allowed to emit diagnostics that C doesn't require.

Right. I was answering one of Malcolm's incorrect objections to
size_t, even though by now I should know better.

Charlton
 
C

Charlie Gordon

Charlton Wilbur said:
ES> Charlton Wilbur wrote On 11/07/07 16:25,:
[...] Yes, and the dialect of C I use most often will
implicitly cast an int to a size_t as necessary. Does your
compiler not support that feature?

ES> How can a cast be "implicit?"

Sorry, implicit type conversion, not implicit cast.

ES> In other words, what are you talking about? Can you give
ES> an example?

Given the following declarations:

struct foo **foolist;
int i;
int num_foo;
int compare_foo (void *foo_one, void *foo_two);

and a prototype for qsort in scope, the following code:

qsort (foolist, last_foo, sizeof struct foo *, compare_foo);

may cause a warning at compile time because num_foo (an int) is used
when the prototype of qsort calls for a size_t, but the type
conversion from int to size_t is nonetheless performed.

Actually, the code above fails to compile because sizeof requires
parentheses for a type argument. You probably meant to write sizeof
*foolist. Beware also that the comparison function compare_foo is a little
tricky: its arguments will need to be converted to struct foo **, not struct
foo *.
 
M

Malcolm McLean

Tubular Technician said:
IIRC, this claim was made in the big book discussion thread some time
ago and was given as a reason why the example code in said book does
not use size_t.
The vast majority of the functions in the book work on arrays whose
dimensions are unknown at the time of writing. However most will be small -
tan image could be of any dimensions, for instance, but in practise it is
most unlikely that any dimension would exceed a few thousand pixels.

The question is whether to got the size_t route. Which means that almost
every integer becomes a size_t. Most integers are ultimately used to index
arrays. Even a type field is often used as an index. There are exceptions,
for instance amounts of money, but not many.

My decision was no. I don't see a language that insists on

size_t N;
size_t i;

for(i=0;i<N;i++)
/* typical loop code */

catching on. People are going to want a more meaningful type name for a
standard, defualt index variable, which doesn't hold a size. Either the ANSI
committee will take down C, or the rule will be changed.
 
M

Malcolm McLean

Charlton Wilbur said:
ES> The conversion is performed because that's what C requires. C
ES> does not require the diagnostic, but the compiler is always
ES> allowed to emit diagnostics that C doesn't require.

Right. I was answering one of Malcolm's incorrect objections to
size_t, even though by now I should know better.
If you convert from an int to a size_t the compiler is quite justifed in
warning about a signed to unsigned conversion. If you convert from a size_t
to an int it is quite justified in warning about the reverse, or maybe a
size truncation.

You can get round this problem by littering your code with casts. The
alternative of messing with warning levels isn't a route you can go down to
solve this problem, because you might need a higher warning level for other
purposes.

However it is tolerable.

The real fun starts when you start passing about integers by indirection, or
printing them out, or saving them to files. You can of course make code
work, though it is very easy to write something that will in fact break if a
size is greater than the range of an int, or sizeof(size_t) doesn't equal
sizeof(int). Array operations are so common that by introducing size_t you
have made a fundamental change to the language.

The real answer is to deprecate size_t, make int big enough to address any
array except huge char arrays (we can live with this little inconsistency),
and then introduce smaller types on 64 bit machines to aid the
micro-optimiser, who might want a fast but small integer. 64 bit int will be
reasonably fast on any practical 64 bit architecture, we are talking about
shaving off cache usage and cycles to squeeze out the last drop of
efficiency.
 
M

Malcolm McLean

Flash Gordon said:
Philip Potter wrote, On 08/11/07 09:58:

I do indeed. Thanks.
You want to declare a special symbol for the type that holds the number of
rooms in an adventure game?
 
B

Ben Bacarisse

And so should I, but here goes...
If you convert from an int to a size_t the compiler is quite justifed
in warning about a signed to unsigned conversion. If you convert from
a size_t to an int it is quite justified in warning about the reverse,
or maybe a size truncation.

The compiler is also quite justified in warning if you use a signed
type to index an array.[1] By convention, they don't, but if they did,
would you be advocating a change to the language to solve the
"problem"?
You can get round this problem by littering your code with casts. The
alternative of messing with warning levels isn't a route you can go
down to solve this problem, because you might need a higher warning
level for other purposes.

Warnings should not be controlled by "level" -- there is not
reasonable total ordering in the severity of warnings. I think you
have had bad luck with your tools.
However it is tolerable.

The real fun starts when you start passing about integers by
indirection, or printing them out, or saving them to files. You can of
course make code work, though it is very easy to write something that
will in fact break if a size is greater than the range of an int, or
sizeof(size_t) doesn't equal sizeof(int). Array operations are so
common that by introducing size_t you have made a fundamental change
to the language.

The real answer is to deprecate size_t, make int big enough to address
any array except huge char arrays (we can live with this little
inconsistency), and then introduce smaller types on 64 bit machines to
aid the micro-optimiser, who might want a fast but small integer. 64
bit int will be reasonably fast on any practical 64 bit architecture,
we are talking about shaving off cache usage and cycles to squeeze out
the last drop of efficiency.

At the time, size_t was a huge relief. Every project had decided how
to represent size_t things in its own way and this was very bad for
portability. Mandating int as the type for sizes would have meant
that millions of lines of code would have to be at least checked to
make sure that it would not break (either because of the performance
or because of pre-standards assumptions about the sizes of types).
size_t was good, and for those of up it helped, it has very few
negative connotations.

Now that we have it, we have to compare the costs and benefits of (a)
continuing to use it; (b) going the Malcolm McLean route. This is
where I get stuck on your argument. Used properly, size_t has almost
no costs. You can re-name it if you don't like the name and you can
reserve a few values for error returns if you like to play such tricks
(I raise these two because ugliness and negative error returns have
been cited as advantages of the MM way). What is the "size_t problem"
that your proposal tries to "solve"?

Sure, if you refuse to use it you get problems from some compilers
warning you and incompatibility with size_t pointers. What else could
you expect?

[1] Obviously this is true in the trivial sense that a compiler can
complain about anything it likes. What I mean is that after declaring
'int i, a[3];', an expression like 'a' is certainly wrong if i is
negative but passing a negative i where a size_t is expected is only
probably wrong. I am not advocating this warning, just pointing out
that is it as justifiable as many others.
 
M

Malcolm McLean

Ben Bacarisse said:
At the time, size_t was a huge relief. Every project had decided how
to represent size_t things in its own way and this was very bad for
portability. Mandating int as the type for sizes would have meant
that millions of lines of code would have to be at least checked to
make sure that it would not break (either because of the performance
or because of pre-standards assumptions about the sizes of types).
size_t was good, and for those of up it helped, it has very few
negative connotations.

Now that we have it, we have to compare the costs and benefits of (a)
continuing to use it; (b) going the Malcolm McLean route. This is
where I get stuck on your argument. Used properly, size_t has almost
no costs. You can re-name it if you don't like the name and you can
reserve a few values for error returns if you like to play such tricks
(I raise these two because ugliness and negative error returns have
been cited as advantages of the MM way). What is the "size_t problem"
that your proposal tries to "solve"?
If everyone used size_t consistently there would be only two objections, the
ugliness of the name and the fact that it is unsigned. These are relatively
trivial. The fact is that they won't.
For instance one person complains with squeals of indignation when I suggest
making int 64 bit. That would slow down his code. Can you imagine that this
individual uses size_t consistently for an index variable? The real killer,
hower is the force of

size_t i;

when i is not a memory size, it is an index.

Renaming size_t yourself is a bad option. It's the bool problem writ large.
No one can call you unless they either adopt the same convention, or
decorate their fucntions with ugly hacks. A new fundamental type is a job
for a standards body.

The snag with inconsistent use of size_t is that you don't get the benefit
of it - the code will, in practise break if arrays overflow the size of an
int. You get the drawbacks - there are plenty of ugly underscores. And you
get an even worse drawback, because you've got more and more types swilling
around. The number of interconversion is N^2, by adding just one type we
have double the number of potential problems. They really hit when you start
passing variables about by indirection. That's when you find yourself
calling malloc() and writing a little conversion function, just to get the
list of indices into size_t format from int. Worse, it's probably a no-op.
So someone somewhere will just cast to an int *.
 
R

Richard Heathfield

Malcolm McLean said:
If everyone used size_t consistently there would be only two objections,
the ugliness of the name

No big deal. I never liked "int" either. I'd have preferred "integer". But
c'est la vie, n'est-ce-pas?
and the fact that it is unsigned.

No, that's an advantage.
These are relatively trivial. The fact is that they won't.
For instance one person complains with squeals of indignation when I
suggest making int 64 bit.

I have a better idea. Make int as big as you like, provided only that it's
*at least* 16 bits. Oh, but wait - that's what we have already.

If you want a 64-bit type, by all means make int 64 bits in *your* compiler
- but don't force that choice on the rest of us, please.

The snag with inconsistent use of size_t is that you don't get the
benefit of it

Fine - so use it consistently. End of problem.
 
P

Philip Potter

Malcolm said:
underscores. And you get an even worse drawback, because you've got more
and more types swilling around. The number of interconversion is N^2, by
adding just one type we have double the number of potential problems.

No, by adding one type we increase the number of possible type
conversions by 2N - one from each type to the new type, and one the
other way. If it doubled, the number would be 2^N, not N^2.

Philip
 
P

Philip Potter

Richard said:
No big deal. I never liked "int" either. I'd have preferred "integer". But
c'est la vie, n'est-ce-pas?

But using 'int' keeps the word 'integer' free to talk about integer
types in general. I think this is an advatage.
 
M

Malcolm McLean

Philip Potter said:
No, by adding one type we increase the number of possible type conversions
by 2N - one from each type to the new type, and one the other way. If it
doubled, the number would be 2^N, not N^2.
You are right of course.
 
F

Flash Gordon

Malcolm McLean wrote, On 09/11/07 10:08:
You want to declare a special symbol for the type that holds the number
of rooms in an adventure game?

If it's type might change, yes, and you were the one who suggested its
type might change.

I note you failed to answer the other points I raised.
 
F

Flash Gordon

Malcolm McLean wrote, On 09/11/07 10:06:
If you convert from an int to a size_t the compiler is quite justifed in
warning about a signed to unsigned conversion. If you convert from a
size_t to an int it is quite justified in warning about the reverse, or
maybe a size truncation.

It is allowed to, but then it can warn about anything it likes.
You can get round this problem by littering your code with casts. The
alternative of messing with warning levels isn't a route you can go down
to solve this problem, because you might need a higher warning level for
other purposes.

Just disable the one warning. Most compilers support this.
However it is tolerable.

The real fun starts when you start passing about integers by
indirection,

Not a problem if if you are consistent.
or printing them out,

Not a problem in C99 since you need to know the type anyway.
or saving them to files.

Saving any integer type in its native binary format means the file won't
be portable anyway, that is nothing to do with size_t itself.
You can of
course make code work,

Yes, very easily.
though it is very easy to write something that
will in fact break if a size is greater than the range of an int, or
sizeof(size_t) doesn't equal sizeof(int).

Yes, and it is just as easy to write code that will break if int is not
32 bits.
Array operations are so common
that by introducing size_t you have made a fundamental change to the
language.

Well, it was introduced back in the late 80s and did not stop the
popularity of C. So now it is REMOVING it that would be a fundamental
change.
The real answer is to deprecate size_t, make int big enough to address
any array except huge char arrays (we can live with this little
inconsistency),

What give you the right to speak for everybody?
and then introduce smaller types on 64 bit machines to
aid the micro-optimiser, who might want a fast but small integer. 64 bit
int will be reasonably fast on any practical 64 bit architecture, we are
talking about shaving off cache usage and cycles to squeeze out the last
drop of efficiency.

Alternatively the minority of people who object to size_t can move to a
language without it or create their own language. Then the majority of
people won't have to put up with you ignoring the major problems your
proposed change would mean.
 
T

Tubular Technician

Malcolm said:
The vast majority of the functions in the book work on arrays whose
dimensions are unknown at the time of writing.

Array sizes need to be known at compile time; furthermore according
to ISO/IEC 9899:1999 5.2.4.1 a single object is not guaranteed to be
able to be larger than 65535 bytes, which according to 5.2.4.2.1 also
happens to be the guaranteed range of an unsigned int (UINT_MAX).
So an index type of unsigned int ("natural" int size?) seems perfectly
fine to me.

If you're talking about address (as obtained by malloc()) + offset,
size_t is already known due to inclusion of <stdlib.h>, and since
allocation size is of type size_t and offset may range from 0 to
(allocation_size-1), to me it always seemed natural to use size_t
for offsets as well.

The only inconsistency I can see is if using the sizeof operator in
a translation unit that does not include of any of the standard
headers providing the definition of size_t.
[why does the compiler know about sizeof but not about size_t? Or
rather, why is sizeof a keyword but size_t is not?]

However most will be small
- tan image could be of any dimensions, for instance, but in practise it
is most unlikely that any dimension would exceed a few thousand pixels.

It is quite common in interface functions to allow negative image
dimensions to indicate mirroring along that particular axis, which
means size_t may not be the best choice. When indexing into the
image's actual pixel values, however... see above.
The question is whether to got the size_t route. Which means that almost
every integer becomes a size_t.

How does that follow?
Most integers are ultimately used to index arrays.

Never heard of that theory.
Even a type field is often used as an index. There are exceptions,
for instance amounts of money, but not many.
?

My decision was no. I don't see a language that insists on

size_t N;
size_t i;

What if N is the result of sizeof? Then i needs to be able to represent
all values from 0 .. N-1. If not, why make them size_t in the first place?
for(i=0;i<N;i++)
/* typical loop code */

catching on. People are going to want a more meaningful type name for a
standard, defualt index variable, which doesn't hold a size.

If it doesn't hold a size, then why should it have some "standard" type?

What I was wondering in my original post is, if an object *does*
represent a size/index, but said value is neither the result of either
sizeof nor does it involve any of the standard library functions taking
or returning a size_t, how common is it to make it a size_t nonetheless?

Either the ANSI committee will take down C, or the rule will be changed.

?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top