usage of size_t

F

Francis Moreau

That's not correct, atleast not the last part. The type size_t must
be large enough to represent the size in bytes of the largest single
object that the implementation supports. This is not restricted to
65536 bytes.

Incidentally, standard C does require an implementation to support
atleast one object of 65536 bytes, but it can, and commonly does,
support more and bigger objects than that.

Under 32 bit systems, the usual theoretical upper limit for object
size is roughly 4 Gb, which a 32 bit size_t can just represent. Under
some 32 bit and 64 bit systems, this is a much higher limit, about 18
Tb, if i'm not wrong.

To be more concrete, for finding out the upper limit of size_t's
range under a particular implementation, look-up the value of the
SIZE_MAX macro in the limits.h header.

ah yes sorry I mis read the spec.
Why do you say it's not a good type to represent the size of non-char
objects? What's your reasoning for this?

Well, size_t is the type of the value returned by sizeof(). And
sizeof() returns the number of bytes (ie char) of its operand. So I
assumed that size_t was introduced to represent a number of char.
 
F

Francis Moreau

You are incorrect.

Its max value is *AT LEAST* 65535.

It may be much, much, much, larger.


I don't understand what you are trying to do.

I'm just trying to understand what (expert) people can deduce when
they're seeing an object whose type is size_t.

For example, if you see the following declaration:

int do_something_on_an_array(struct foo array[], size_t len);

Does 'len' parameter imply a size in bytes of 'array' (the one that
sizeof() operator would return assuming the length of the array is
known) or does it mean the number of object of type 'struct foo' in
'array'.

To sum up, I was wondering if there is some assumptions that could be
done with size_t.

Thanks
 
F

Francis Moreau

size_t is an int designed by committee.

The idea was that you would have a special type to hold amounts of
memory. Since, usually, the address space of a processor is the same
as the pointer width which is the same as an integer data register,
size_t was specified as unsigned.
The problem is that size_t ends up being the default index variable
type, which causes all sorts of problems. Mostly it's psychological -
people would much rather write int i; than size_t i when declaring a
counter.

That's true, I did read quite a lot of C code and I've never seen an
index variable whose type was 'size_t'.
 
G

gwowen

But your code isn't any clearer at all.

So you keep asserting. "Clear" is not an on-off concept, or even a
one-dimensional one. Programming idioms are like jargon. Speaking in
jargon and acronyms, I can be clear and concise to someone familiar
with my areas of expertise, but baffling to an outsider. By dropping
that jargon I'll probably be less concise, but understood by a less
exclusive group.

The question is, to whom am I trying to be clear -- to people with my
exact skillset or people in general? To any programmer who is not
familiar with idiomatic C, and is used to writing a language that does
not have the --i idiom[0] is not clear. To someone familiar with
idiom, yes of course it is, but to anyone else unclear.
Anyone NOT understanding "while(i--)" has no business modifying the code in
the first place.

Modifying is not the same as understanding. My code is comprehensible
to people who are not, by training or desire, C programmers, and they
can (and do) spot errors in it, because its deliberately written so
that that is possible.

[0] Say Fortran. Or Matlab. Or Lisp, Haskell or pretty much any
functional language. Or Basic, Pascal, Logo, or Python. Or anyone who
can read pseudo-code, or anyone who can follow a flow chart. But
other than that, almost no-one.
 
S

santosh

Richard said:
But your code isn't any clearer at all.

Anyone modifying or reading that code is far likelier to understand
the common C idiom of the decrement in the loop than in some random
body line.

This can be solved to some extent by placing the x = x + 1 in a for
loop, instead of somewhere within a while.
Anyone NOT understanding "while(i--)" has no business modifying the
code in the first place.

From what I understand, it seems he is aiming to write code which is
easier to understand (and not necessarily modify. The two don't need
to go together), for those with some knowledge of algorithms and
pseudo-code, but little or no knowledge of C.
 
N

Nick Keighley

I almost never use a do-while.
ditto

Usually you need a empty case where the loop never executes. Other
times it's easier to code the logic into a while loop.

I sometimes end up with a do-while when it's "get an object and if its
no good try again". This seem sto code quite naturally as a do-while
though C's ability to do assignment in the while test means you have
to use it.

do
{
read_an_item (&item);
} while (!is_valid (item));

for some reason I like this with user input

of course

while (!is_valid (read_an_item (&item)))
;

and reserve for(;;) for break-out-of-the-middle cases

for (;;)
{
msg = get_msg();
if (msg == STOP)
break; /* <-- break here! */
process_msg(msg);
}
In Fortran 77 the fact that a loop body always executes at least once
can be a real nuisance..

I remember.

People could accidently create this problem with Algol-60 and its
descendents (Algol-60 didn't have a proper while loop)

FOR i := 0, i + 1 WHILE i <= last_used_entry DO
process (item );

it always processes item[0] (assuming I have the syntax right!)
 
M

Malcolm McLean

For example, if you see the following declaration:

   int do_something_on_an_array(struct foo array[], size_t len);

Does 'len' parameter imply a size in bytes of 'array' (the one that
sizeof() operator would return assuming the length of the array is
known) or does it mean the number of object of type 'struct foo' in
'array'.
In qsort,no. The function takes two size_ts, one giving element width
in bytes, which is where you'd expect a size_t, the other giving the
number of elements, which we would expect to be an int.
The justification is that int may not be big enough to index an entire
array. This could happen a) if int is the address size of the
processor, and the array is an array of chars taking up more than half
of memory, or b) if int is smaller than the address space of the
machine.
a) is so unlikely that we can ignore it. b) can happen if int is not
64 bits on a machine with a 64 bit address space.
 
G

gwowen

From what I understand, it seems he is aiming to write code which is
easier to understand (and not necessarily modify. The two don't need
to go together), for those with some knowledge of algorithms and
pseudo-code, but little or no knowledge of C.

Yes, thats exactly right.
 
F

Francis Moreau

Idioms is there for those as wants to count down:

size_t i = N;
while(i--)

is simpler, shorter, and more correcterer.

FWIW, I prefer just keep using 'int' for index, because

a) a variable whose name is 'i' has always 'int' type
for me;

b) I feel more confident to write this:

int i = N;
while (i-- >= 0) { .... };

because this is more robust and you can have in
the body of the while construct something like
this: "i-= X" where X > 1 without worring if 'i'
is greater that X.
 
S

santosh

Francis Moreau said:
Well, size_t is the type of the value returned by sizeof(). And
sizeof() returns the number of bytes (ie char) of its operand. So I
assumed that size_t was introduced to represent a number of char.

As far as I know, the purpose of size_t seems to be to serve as a
portable type to hold the sizes of objects. However it's also the
only type guaranteed to hold the number of elements of an object, in
a strict sense. However I agree it seems unnatural to use it for
indexing arrays. It's probably the ugly name. As I said, unsigned
long should work in nearly all cases, if you'd prefer that.

But for maximum portability, I guess size_t is the way to go, both
for holding sizes and index values for arrays.
 
R

Richard Tobin

It requires the reader to remember the
difference between --i and i--, and it requires them to be aware of
the implicit int-to-bool conversion.
[/QUOTE]
I would expect any serious C programmer to be aware of both of these
without having to think too strenuously about it

I also think it's unclear, but not because the reader is likely to
be unaware of the difference. The trouble is that it seems natural
for a test at the top of a loop to be testing the value that will
be used in the loop, but here it is testing a different value.

I suppose you could use

for(i=N-1; i != (size_t)-1; i--)

but it's not pretty.

-- Richard
 
R

Richard Tobin

gwowen said:
size_t i=N-1; // implicitly assume N!=0
do {
foo(i); // or more likely foo(bar[i-1])
i = i - 1; // or --i or i--, as you prefer.
} while(i != 0);

This runs the loop with values N-1 ... 1. If you're going to use i-1
as the array index in the loop, you should have set it to N, not N-1,
at the start.

And if you're going to use i-1, you might as well write

for(i=N; i>0; i++)
... i-1 ...;

-- Richard
 
R

Richard Tobin

Richard Heathfield said:
It's a trivially small error to make, if you compare it to a similar
error once made by Isaac Asimov. He once managed to mislay a factor of
10^23, which rather knocks 10^6 into the shade.

I once saw someone on Usenet mistakenly use 2^70 as the number of
atoms in the universe, instead of 10^70, which is out by a factor of
about 10^49.

-- Richard
 
F

Francis Moreau

For example, if you see the following declaration:
   int do_something_on_an_array(struct foo array[], size_t len);
Does 'len' parameter imply a size in bytes of 'array' (the one that
sizeof() operator would return assuming the length of the array is
known) or does it mean the number of object of type 'struct foo' in
'array'.

In qsort,no. The function takes two size_ts, one giving element width
in bytes, which is where you'd expect a size_t, the other giving the
number of elements, which we would expect to be an int.
The justification is that int may not be big enough to index an entire
array. This could happen a) if int is the address size of the
processor, and the array is an array of chars taking up more than half
of memory, or b) if int is smaller than the address space of the
machine.
a) is so unlikely that we can ignore it. b) can happen if int is not
64 bits on a machine with a 64 bit address space.

But 'unsigned long' type could have been used, couldn't it ?
 
G

gwowen

Putting the decrement in the body makes it less clear.

So you keep asserting. If you say it again, does it become magically
true?
If a post decrement is too clever for the reader then so is using C.

Using is not understanding. Understanding is not modifying.
Modifying is not bug-spotting. Which do you mean?
 
F

Francis Moreau

FWIW, I prefer just keep using 'int' for index, because

   a) a variable whose name is 'i' has always 'int' type
      for me;

   b) I feel more confident to write this:

        int i = N;
        while (i-- >= 0) { .... };

      because this is more robust and you can have in
      the body of the while construct something like
      this: "i-= X" where X > 1 without worring if 'i'
      is greater that X.

and

c) size_t is just a very misleading name for
something that doesn't hold a size (ie index)
 
S

santosh

Francis Moreau said:
For example, if you see the following declaration:
int do_something_on_an_array(struct foo array[], size_t len);
Does 'len' parameter imply a size in bytes of 'array' (the one
that sizeof() operator would return assuming the length of the
array is known) or does it mean the number of object of type
'struct foo' in 'array'.

In qsort,no. The function takes two size_ts, one giving element
width in bytes, which is where you'd expect a size_t, the other
giving the number of elements, which we would expect to be an int.

They are both integer types. Why expect one instead of another? By
your reasoning then one would expect an int at any place where an
integer type is warranted, but C hasn't evolved that way. Instead we
have a multiplicity of integer types.
But 'unsigned long' type could have been used, couldn't it ?

Right, but size_t is maximally portable (whatever that means) while
unsigned long is not. It's possible to have a 32 bit unsigned long on
a machine with a 64 bit address space, though I don't know of any
actual implementation that does that. size_t is guaranteed to "just
work" across all standard implementations for holding sizes, and
serving as indexes.

Consider an architecture with 64 bit or higher integers and a 16 bit
or lower address space. In this case using unsigned long to hold
sizes and indexes would waste storage, while size_t would presumably
be more economical, being a typedef for an unsigned short or an
unsigned int.
 
M

Malcolm McLean

Right, but size_t is maximally portable (whatever that means) while
unsigned long is not.
size_t is the only type that is guaranteed to be able to index any
array. So if the number of elements is arbitrary, it's the only
correct type to use.
The problem is that very few people actually do so. So we've got a
very undesireable situation.
 
S

santosh

Malcolm McLean said:
size_t is the only type that is guaranteed to be able to index any
array. So if the number of elements is arbitrary, it's the only
correct type to use. The problem is that very few people actually
do so. So we've got a very undesireable situation.

You make a good point. IMHO there is not much practical difference
between unsigned long and size_t, at-least on most architectures. So
you might as well use size_t where you'd otherwise use unsigned long.
But int or long should be perfectly fine for objects which you know
wont exceed their limits, by design and intent. Using size_t to index
into a 1 kb array, say holding a line from a config file, seems a bit
paranoid to me:)
 
B

blmblm

I almost never use a do-while.

Usually you need a empty case where the loop never executes. Other
times it's easier to code the logic into a while loop.
In Fortran 77 the fact that a loop body always executes at least once
can be a real nuisance..

Not that it matters in this group, really, but I was under the
impression that one of the things that made FORTRAN 77 different
from its predecessors was that loops could execute zero times
if the range of indices was empty. Possibly the old behavior
(at least one trip through the loop no matter what) was supposed
to be made available via some compiler option?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

size_t, ssize_t and ptrdiff_t 56
size_t in inttypes.h 4
The problem with size_t 45
return -1 using size_t??? 44
Plauger, size_t and ptrdiff_t 26
size_t and ptr_diff_t 9
size_t 18
finding max value of size_t 22

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top