usage of size_t

F

Francis Moreau

Hello,

I usually use 'unsigned int' type for variables which hold the length
of a buffer.

However, someone suggests me to use 'size_t'.

So I took a look to the C99 spec and see what it tells about size_t:
and it's the type of the retuned value by sizeof() (6.5.3.4 p4) and
its max value is 65535 (7.18.3 p2).

size_t doesn't seem to be the good type to use when the variable of
that type describes the number of elements of a buffer whose type is
not 'char' and if the buffer size is less than 65535 bytes.

Is that correct ?

Thanks
 
M

Malcolm McLean

size_t doesn't seem to be the good type to use when the variable of
that type describes the number of elements of a buffer whose type is
not 'char' and if the buffer size is less than 65535 bytes.
size_t is an int designed by committee.

The idea was that you would have a special type to hold amounts of
memory. Since, usually, the address space of a processor is the same
as the pointer width which is the same as an integer data register,
size_t was specified as unsigned.
The problem is that size_t ends up being the default index variable
type, which causes all sorts of problems. Mostly it's psychological -
people would much rather write int i; than size_t i when declaring a
counter. However there are also many situations where unsigned indices
are inconvenient, eg for(i=N-1;i>=0;i--).
The worst problem is that, because C is strictly typed, an int * is
not compatible with a size_t *. So you can end up writing little
adaptor functions to convert a vector of size_ts to a vector of ints,
and vice versa, even though the underlying bit patterns may be
identical.
 
S

santosh

Francis Moreau said:
Hello,

I usually use 'unsigned int' type for variables which hold the
length of a buffer.

However, someone suggests me to use 'size_t'.

So I took a look to the C99 spec and see what it tells about
size_t: and it's the type of the retuned value by sizeof() (6.5.3.4
p4) and its max value is 65535 (7.18.3 p2).

That's not correct, atleast not the last part. The type size_t must
be large enough to represent the size in bytes of the largest single
object that the implementation supports. This is not restricted to
65536 bytes.

Incidentally, standard C does require an implementation to support
atleast one object of 65536 bytes, but it can, and commonly does,
support more and bigger objects than that.

Under 32 bit systems, the usual theoretical upper limit for object
size is roughly 4 Gb, which a 32 bit size_t can just represent. Under
some 32 bit and 64 bit systems, this is a much higher limit, about 18
Tb, if i'm not wrong.

To be more concrete, for finding out the upper limit of size_t's
range under a particular implementation, look-up the value of the
SIZE_MAX macro in the limits.h header.
size_t doesn't seem to be the good type to use when the variable of
that type describes the number of elements of a buffer whose type
is not 'char' and if the buffer size is less than 65535 bytes.

Is that correct ?

Why do you say it's not a good type to represent the size of non-char
objects? What's your reasoning for this?

And for arrays less than 65536 bytes, you can safely store their
sizes in an unsigned int or unsigned long, but it doesn't make much
difference to store it in size_t too.

If you're tracking the sizes of a large number of relatively small
objects and size_t on your system is wastefully big, you could
conceivably use unsigned int, or even unsigned short or unsigned char
to store the sizes, but consider if you later modify your program and
the sizes of one or more of these objects grows and your
unsigned/short/char wraps around. Size_t is the only type guaranteed
by the standard to store sizes of objects allocated using the
implementation, but unsigned long should work under most situations,
but again, what's the big gain in using it instead of size_t?
 
M

Malcolm McLean

I don't recall ever having to do that, in 20+ years of using C.
I suppose it depends how you work. For instance some people never use
complex numbers in their entire programming career.
I not infrequently find myself having to call a routine that takes an
int * or a size_t * as input. It's not always possible to make them
match with the data in the rest of the program. Most of the time, I'll
grant you, people expect lists of integers as int *s, which is also my
preference.
 
I

Ike Naar

Under 32 bit systems, the usual theoretical upper limit for object
size is roughly 4 Gb, which a 32 bit size_t can just represent. Under
some 32 bit and 64 bit systems, this is a much higher limit, about 18
Tb, if i'm not wrong.

Small nit: 18 exabyte (EB). That's about 18 million TB.
 
S

Seebs

So I took a look to the C99 spec and see what it tells about size_t:
and it's the type of the retuned value by sizeof() (6.5.3.4 p4) and
its max value is 65535 (7.18.3 p2).

You are incorrect.

Its max value is *AT LEAST* 65535.

It may be much, much, much, larger.
size_t doesn't seem to be the good type to use when the variable of
that type describes the number of elements of a buffer whose type is
not 'char' and if the buffer size is less than 65535 bytes.
Is that correct ?

I don't understand what you are trying to do.

Use size_t for sizes. If you are recording the number of items in a thing,
and it's zero or more, use size_t, that's what size_t is for. It doesn't
matter what the type is or whether or not it's 65535 bytes or more or
less. If you can have a buffer of over 65535 bytes, then size_t will be
able to represent sizes over 65535.

-s
 
G

gwowen

Idioms is there for those as wants to count down:

size_t i = N;
while(i--)

is simpler, shorter, and more correcterer.

Sadly, its also less clear. It requires the reader to remember the
difference between --i and i--, and it requires them to be aware of
the implicit int-to-bool conversion. It's idiomatic precisely
because, until you've seen it many times, it requires more thought
than should be necessary.

size_t i=N-1; // implicitly assume N!=0
do {
foo(i); // or more likely foo(bar[i-1])
i = i - 1; // or --i or i--, as you prefer.
} while(i != 0);

seems, to me, the one thats most clearly expresses intent (though
obviously, I'm aware this is purely a personal preference).
Alternatively

size_t i=N; // No need for assumption N!=0 this time...
while(i != 0) {
foo(i-1); // ibid...
--i; // ibid...
};

and trust your compiler to do the right thing, optimisation wise...
 
K

Keith Thompson

gwowen said:
Sadly, its also less clear. It requires the reader to remember the
difference between --i and i--, and it requires them to be aware of
the implicit int-to-bool conversion.

Any C programmer needs to know the difference between --i and i--, and
there is no implicit int-to-bool conversion here. The condition in a
while statement is a scalar that's tested for inequality to 0.

[snip]
 
G

gwowen

Any C programmer needs to know the difference between --i and i--,

I know the difference. I've known the difference for years. But I
still have to think about it (if you see what I mean. I know which
one's Ant and which one's Dec, but I have to think about that too).
If I'm reading some code, I don't want my concentration unnecessarily
broken by having to recall some syntactical nicety, even a
relatively. The next guy to read my code may have to think harder
than me.
and there is no implicit int-to-bool conversion here.  
The condition in a while statement is a scalar that's tested for inequality to 0.

That test for inequality is implicit: an explicit one would look like
while(--i != 0). I defer to your knowledge on whether this counts as
a conversion to bool, but whatever such an implicit test is called, I
don't care for it with --i or i--. That's writing for the compiler,
not the human reader.

Personally, I almost never use --i as anything but an stand-alone
expression, don't use i-- unless I can absolutely help it. Is there a
compiler anywhere for which

z = i--;

produces different code than

z = i;
i = i-1;

And, if not, which one is clearer to a neophyte C coder who's been
given my code to maintain (poor bastard), or a Fortran programmer
trying to see how my C code works, or a mathematician checking my
implementation of his algorithm? Yes, its minor a stylistic point,
and they're automatically subjective, but that's my opinion. I don't
doubt yours is at least as valid, and probably more widely held.
 
N

Nick Keighley

which, arguably, makes it less simple

I disagree. I would argue that it's a well-known idiom.

I don't remember seeing it before (which, of course, isn't a good
definition of "not well known").
still, I accept that there are arguments on both sides.

yes, I'd classify it as "slightly obscure". I'd wonder why they didn't
use a for-loop. I'd probably comment it if I decided to use it.
I would expect any serious C programmer to be aware of both of these
without having to think too strenuously about it,

I wasn't aware of it, but I didn't have to think too strenuously.

but the second at least is easily dealt with:

while(i-- > 0)

yes, I prefer the test to be explicit. I think it is clearer. For
similar reasons I usually test for NULL.
It's idiomatic precisely
because, until you've seen it many times, it requires more thought
than should be necessary.
size_t i=N-1;  // implicitly assume N!=0
do {
  foo(i);      // or more likely foo(bar[i-1])
  i = i - 1;   // or --i or i--, as you prefer.
} while(i != 0);

I find my version much easier to read. But then I would, wouldn't I? :)

I'm never happy with do-while. Are we certain the loop body should
always be executed at least once? I use do-while but I think rather
carefully first. This was a bug in the original Fortran- it always did
a loop at least once.
 
P

Phil Carmody

Richard Heathfield said:
I don't recall ever having to do that, in 20+ years of using C.

I don't remember C being a strictly typed language, on the
presumption that implies something similar to being strongly
typed.

Phil
 
M

Malcolm McLean

I'm never happy with do-while. Are we certain the loop body should
always be executed at least once? I use do-while but I think rather
carefully first. This was a bug in the original Fortran- it always did
a loop at least once.
I almost never use a do-while.

Usually you need a empty case where the loop never executes. Other
times it's easier to code the logic into a while loop.
In Fortran 77 the fact that a loop body always executes at least once
can be a real nuisance..
 
P

Phil Carmody

Richard Heathfield said:
gwowen said:
Sadly, its also less clear.

I disagree. I would argue that it's a well-known idiom. Still, I
accept that there are arguments on both sides.
It requires the reader to remember the
difference between --i and i--, and it requires them to be aware of
the implicit int-to-bool conversion.

I would expect any serious C programmer to be aware of both of these
without having to think too strenuously about it, but the second at
least is easily dealt with:

while(i-- > 0)

It's idiomatic precisely
because, until you've seen it many times, it requires more thought
than should be necessary.

size_t i=N-1; // implicitly assume N!=0
do {
foo(i); // or more likely foo(bar[i-1])
i = i - 1; // or --i or i--, as you prefer.
} while(i != 0);

I find my version much easier to read. But then I would, wouldn't I? :)

I see yours coping with N==0, and the others not coping with it,
to be the black and white distinguisher. Almost every time I
have a varying number of something, 0 is a valid count. To even
think of simply dismissing such a case out of hand seems sloppy.
Which is why, if the order I do things doesn't matter, I also
use your construct. (But if I'm accessing things by index, and
need to count forwards, I clearly won't)

Phil
 
G

gwowen

I disagree. I would argue that it's a well-known idiom. Still, I
accept that there are arguments on both sides.
I would expect any serious C programmer to be aware of both of these
without having to think too strenuously about it, but the second at
least is easily dealt with:
while(i-- > 0)
It's idiomatic precisely
because, until you've seen it many times, it requires more thought
than should be necessary.
size_t i=N-1;  // implicitly assume N!=0
do {
  foo(i);      // or more likely foo(bar[i-1])
  i = i - 1;   // or --i or i--, as you prefer.
} while(i != 0);
I find my version much easier to read. But then I would, wouldn't I? :)

I see yours coping with N==0, and the others not coping with it,
to be the black and white distinguisher.

Au contraire, Blackadder. As I posted earlier...

size_t i=N;
while(i != 0) {
foo(i-1);
--i;
};

Having given it some thought, I now prefer this...

while(i != 0) { // Here i tells us how many loop iterations remain
--i; // i now indexes the i'th element of an array...
foo(bar);
};
Almost every time I have a varying number of something, 0 is a valid count.

Absolutely. But with zero-based indexing, unsigned types which wrap,
somewhere there's going to be some ugliness -- people are going to
want use the loop variable as an index, as well as a count of how many
unprocessed elements remain. The least ugly way to do this is a
matter of taste.
To even think of simply dismissing such a case out of hand seems sloppy.

I didn't dismiss it out of hand, I noted my assumption, and then
provided an alternative which dealt with it. Dismissing solutions
having not read them seems sloppy ;)
 
P

Phil Carmody

Kelsey Bjarnason said:
[snips]

That test for inequality is implicit: an explicit one would look like
while(--i != 0). I defer to your knowledge on whether this counts as a
conversion to bool, but whatever such an implicit test is called, I
don't care for it with --i or i--. That's writing for the compiler, not
the human reader.

Well, in much C code, constructs such as while(x) are fairly common, with
or without increment or decrement, eg while ( *s++ ) n++;

This is hardly a novel usage.
Personally, I almost never use --i as anything but an stand-alone
expression, don't use i-- unless I can absolutely help it. Is there a
compiler anywhere for which

z = i--;

produces different code than

z = i;
i = i-1;

Probably, somewhere, there is some pathological implementation which
does, but who cares?

In the current absense of a declaration for i, I'll suggest a volatile one.
Pointless complexity or code density accomplishes nothing, to be sure,
but I know a lot of C coders - myself included - who would look at "i = i
- 1" and worry that whoever wrote it did not understand C, and thus the
code needs to undergo serious - and total - review. It's just not how C
programmers write C.

Yes, but I also see macho code (MS - you know who you are, _please_
stop doing that!), and when I encounter just one mistake in that I
know I have to review thousands of lines of macho code, which is
often way worse than thousands of lines of naive code.
And while you're at it, are you writing code for C programmers, or for
mathematicians? They (presumably) wouldn't be competent to do anything
with the code anyhow, unless they were _also_ C programmers, in which
case they'd know the common usages and idioms.

Yup.
Just a thunk.

No! If we have thunks, the lisp contingent will emerge, and we'll soon
have recursive function calls, and exploding stacks!

Phil
 
G

gwowen

And while you're at it, are you writing code for C programmers, or for
mathematicians?  

Yes. If a reasonably computer-savvy mathematician, with some
programming experience, not necessarily in C, cannot understand my C
code, sufficiently well to detect whether I've correctly implemented
an algorithm with which they're familiar, then it probably needs
refactoring for clarity.

Yes, I know this is a fairly extreme position.
No, I don't expect anyone who is not working on my codebase to code in
this manner.
It's just not how C programmers write C.

It's how some C programmers write C. I block out a lot of algorithms
in Matlab, then port the timing-critical bits to C (mainly to avoid
the copy-in-copy-out that plagues matrix operations in Matlab). ++i
does nothing in Matlab, and i++ is a syntax error, so you see a lot
of

i = i+1;

When porting, I'm not going to change that to ++i just for idiomatic
reasons.
 
G

gwowen

The -- in the loop is a well known C usage and if its not clear to you
then your C is hazy to say the least. It is much more confusing and hard
to read to put the decrement on some line in the body.

Your code is clear for people with good C skills.
My code is clear for people without good C skills, and clear (but non-
idiomatic) for those with good skills, and clear-but-hideous for C
mavens. I'm OK with that.

All programmers any many non-programmers can read well written pseudo-
code. If I can make my code look like pseudo code, by omitting
unecessary idioms, why shouldn't I? If I'm writing for Usenet, where
many posters are not native English speakers, I'm not going to use
strongly idiomatic English, even though this is an English language
newsgroup.

So, if you want your code understood as widely as possible, don't be a
vicar of Bray; grasp the nettle, and do Yeoman's service and all
things being equal, Bob's your uncle and you'll come up smelling of
roses... Otherwise you'll do a Devon Loch, be hoist by your own
petard, be gone for a right royal Burton, or otherwise come a
cropper. I wouldn't touch idiomatic English with a bargepole. It's
just not cricket.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

size_t, ssize_t and ptrdiff_t 56
size_t in inttypes.h 4
The problem with size_t 45
return -1 using size_t??? 44
Plauger, size_t and ptrdiff_t 26
size_t and ptr_diff_t 9
size_t 18
finding max value of size_t 22

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,059
Latest member
cryptoseoagencies

Latest Threads

Top