Using size_t clearly (appropriately?)

M

Michael Mair

Richard said:
Mark Odell said:
It's a great type for an index, too. Someone said it's harder to use size_t
to count backwards, but it's not.

for(i = n; i-- > 0; )
{
foo(bar + i);
}

True enough. I love it when that clashes with <insert
adjective here> company coding guidelines which prohibit
expressions with side effects for tests (exception:
function calls to functions returning a success status).
That makes for, while, do--while, if, and switch a little bit
safer but does not help in the above case.
In addition to "only the init part of a for loop may be
omitted", you get
i = n;
while (i != 0) {
--i;
....
}
which is not the intuitive thing to write.
Worse yet, if people worked with a signed index type before,
they just may not be aware of that one.
If "cleverly" hidden in a series of filters, even the wrong
for (i = n-1; i >= 0; --i) {
....
/* break/return for some condition */
....
}
may "work" for a while...


Cheers
Michael
 
M

Marc Thrun

Andrey Tarasevich schrieb:
[...]
Now, once we start using arrays, that 'quantity' type immediately
springs to mind as the best choice for index type. Note, that we indeed
"use what we have", as you said in your message. I just want to say that
by the time we get to arrays, we will already "have" the index type, and
it is not 'size_t'. 'size_t' is a bad choice to represent generic
'quantities' for obvious reasons (it might simply not have the range,
think of segmented 16-bit platform with 16-bit 'size_t').

size_t will always have the needed range by definition, as it has to be
able to represent the size of the largest possible object. So even on a
16-bit platform with a 16-bit size_t you will not be able to create an
array, which is an object, with a total size of more than (size_t)-1.
 
K

Keith Thompson

Marc Thrun said:
Andrey Tarasevich schrieb:
[...]
Now, once we start using arrays, that 'quantity' type immediately
springs to mind as the best choice for index type. Note, that we indeed
"use what we have", as you said in your message. I just want to say that
by the time we get to arrays, we will already "have" the index type, and
it is not 'size_t'. 'size_t' is a bad choice to represent generic
'quantities' for obvious reasons (it might simply not have the range,
think of segmented 16-bit platform with 16-bit 'size_t').

size_t will always have the needed range by definition, as it has to
be able to represent the size of the largest possible object. So even
on a 16-bit platform with a 16-bit size_t you will not be able to
create an array, which is an object, with a total size of more than
(size_t)-1.

In some (fairly odd) circumstances, the maximum size of a single
object might be 65535, but you might be able to allocate a greater
number of individual objects.

On the other hand, I don't know of any such systems, and it's easy
enough for the implementation to make size_t 32 bits. (Perhaps Andrey
knows more about the practical aspects of this.)
 
S

Stephen Sprunk

Michael Mair said:
I would think "here is someone who thought about what an index is"...
:)
If ssize_t were standard C, I'd accept that as well for the reason
that you can easier deal with loops that count downwards.

Typedefs used to define certain roles, say
typedef .... Index;
inspire the same confidence.

int, long, size_t, and maybe unsigned long are perfectly fine
choices for array indices.

int could be too small to hold a valid array index, and the same is true for
long, though less likely.

Unfortunately, if one is counting downwards in a loop, one may rely on being
able to get to -1, which makes size_t a worse choice than long in most
cases. ssize_t, where available, would be better.

S
 
A

Al Balmer

Andrey Tarasevich said:

All these functions are excellent examples of generinc array processing
functions, with which 'size_t' is perfectly appropriate. I explicitly
mentioned it in my message. I actually mentioned some of these functions
as well.

True enough, but I fail to see why you consider them exceptions.

Okay, let's take a different tack. The canonical way to determine the number
of elements in an array (cf C89 3.3.3.4) is: sizeof array / sizeof array[0]

Now, sizeof yields size_t. What is the natural type to use for storing the
result of a division of size_t by size_t? I would argue that it's size_t.
Certainly the division will yield an unsigned type as its result. So it
makes perfect sense to do this:

size_t i;

for(i = 0; i < sizeof array / sizeof array[0]; i++)

Yes? Well, I doubt whether I've convinced you, but maybe some others here
will be swayed by this argument. :)

From the rationale:

"The type of sizeof, whatever it is, is published (in the library
header <stddef.h>) as size_t, since it is useful for the programmer to
be able to refer to this type. This requirement implicitly restricts
size_t to be a synonym for an existing unsigned integer type. Note
also that, although size_t is an unsigned type, sizeof does not
involve any arithmetic operations or conversions that would result in
modulus behavior if the size is too large to represent as a size_t,
thus quashing any notion that the largest declarable object might be
too big to span even with an unsigned long in C89 or uintmax_t in C9X.
This also restricts the maximum number of elements that may be
declared in an array, since for any array a of N elements,

N == sizeof(a)/sizeof(a[0])

Thus size_t is also a convenient type for array sizes, and is so used
in several library functions."
 
A

Andrey Tarasevich

Marc said:
Andrey Tarasevich schrieb:
[...]
Now, once we start using arrays, that 'quantity' type immediately
springs to mind as the best choice for index type. Note, that we indeed
"use what we have", as you said in your message. I just want to say that
by the time we get to arrays, we will already "have" the index type, and
it is not 'size_t'. 'size_t' is a bad choice to represent generic
'quantities' for obvious reasons (it might simply not have the range,
think of segmented 16-bit platform with 16-bit 'size_t').

size_t will always have the needed range by definition, as it has to be
able to represent the size of the largest possible object. So even on a
16-bit platform with a 16-bit size_t you will not be able to create an
array, which is an object, with a total size of more than (size_t)-1.

What I'm trying to say in the quited paragraph is that choosing 'size_t'
to represent _generic_ quantities (any quantities, not just 'number of
elements in an array') is bad idea. And the fact that some quantity
might be somehow related to some array somewhere in the code is not an
argument for choosing 'size_t'. 'Quantities' predate 'containers'.
Deriving 'quantity' type from 'container' type is no different from
putting horse behind the carriage.

The above applies to specific code. In generic code 'size_t' is an
excellent choice of 'quantity' and 'index' type, no argument here.
 
A

Andrey Tarasevich

Richard said:
...
All these functions are excellent examples of generinc array processing
functions, with which 'size_t' is perfectly appropriate. I explicitly
mentioned it in my message. I actually mentioned some of these functions
as well.

True enough, but I fail to see why you consider them exceptions.

Okay, let's take a different tack. The canonical way to determine the number
of elements in an array (cf C89 3.3.3.4) is: sizeof array / sizeof array[0]

Now, sizeof yields size_t. What is the natural type to use for storing the
result of a division of size_t by size_t? I would argue that it's size_t.
Certainly the division will yield an unsigned type as its result. So it
makes perfect sense to do this:

size_t i;

for(i = 0; i < sizeof array / sizeof array[0]; i++)

Yes? Well, I doubt whether I've convinced you, but maybe some others here
will be swayed by this argument. :)

What you are saying here applies to abstract, generic arrays. And I have
absolutely no problem with using 'size_t' for representing the number of
elements in an array as well as array index in _generic_ context, i.e.
when we are working with arrays that are just... well, abstract arrays
and nothing more.

Whatever I said against using 'size_t' applies to application-specific
(or should we say "application domain-specific) context, when an array
is not just an array, but one particular implementation of a linear
container, whose maximum size is dictated by the requirements of
application domain and designed limitations of the code, not by some
internal rules of C language. Today it is an array, tomorrow it might be
replaced with a linked list, then it's suddenly a tree, and then it
might be changed back to an array again. A hardcoded 'size_t' has no
place in such a context.
 
B

Ben Pfaff

Al Balmer said:
Posix puts their ssize_t (signed size_t) to use for functions that
return either a count or -1. I don't know of anything in standard C
that could use that feature.

Do the types "ptrdiff_t" and "ssize_t" ever differ in practice?
 
M

Mark F. Haigh

Andrey said:
It might me. Not as much "confusing", as conceptually incorrect. 'size_t' type
is intended to be used to represent a concept of 'size of an object'. Number of
elements in the array is described by a completely different concept of 'number
of elements in a container'. Note, that is case of generic container these two
concepts are completely unrelated. In the particular case of an _array_ there's
certain "parasitic" relationship between the two: the latter cannot be greater
than the former. This is often used as a justification for using 'size_t' to
represent array indices. This is a false reasoning. In general case, once again,
using 'size_t' for this purpose is a conceptual error.

In certain particular cases though 'size_t' could be appropriate as an array
index type. For example, when one needs to iterate through an array of raw
memory bytes (i.e. array of 'unsigned char'). Another example would be generic
purpose functions that work with "generic" arrays, i.e. functions that are not
tied to a concrete application-specific area. String processing functions and
functions of 'memset'/'memcpy'/etc group, 'bsearch' and 'qsort' functions belong
to that category.

<snip>

When indexing C arrays with the subscript operator ([]), you can't go
wrong with size_t, regardless of what you claim about its conceptual
status. On the other hand, when you're dealing with custom data
structures (judy arrays, search trees, etc), size_t may be too
restrictive, and you may want to use a wider type.

Custom data structures may be "arrays" in an abstract sense, but they
are not "C arrays". Any standard C array object can be exhaustively
indexed with a size_t. An array that size_t cannot index is not a
standard "C array" (ie it can't be sorted with qsort, exhaustively
indexed with [], etc).


Mark F. Haigh
(e-mail address removed)
 
M

Mark F. Haigh

Keith said:
Marc Thrun said:
Andrey Tarasevich schrieb:
[...]
Now, once we start using arrays, that 'quantity' type immediately
springs to mind as the best choice for index type. Note, that we indeed
"use what we have", as you said in your message. I just want to say that
by the time we get to arrays, we will already "have" the index type, and
it is not 'size_t'. 'size_t' is a bad choice to represent generic
'quantities' for obvious reasons (it might simply not have the range,
think of segmented 16-bit platform with 16-bit 'size_t').

size_t will always have the needed range by definition, as it has to
be able to represent the size of the largest possible object. So even
on a 16-bit platform with a 16-bit size_t you will not be able to
create an array, which is an object, with a total size of more than
(size_t)-1.

In some (fairly odd) circumstances, the maximum size of a single
object might be 65535, but you might be able to allocate a greater
number of individual objects.

IIRC the compact and large memory models with the Microsoft C compiler
for MSDOS. SIZE_MAX is 65535, but all data pointers are "far" and the
total memory one can allocate is somewhere under 640K.

When a far pointer like DEAD:FFFF is incremented (or indexed with the
subscript operator), only the offset portion wraps around (ie
DEAD:0000), which is why SIZE_MAX is 65535 on those particular
platforms. It's the theoretical maximum size of a C object.

To even get your hands on a chunk of memory larger than the maximum C
object size, you need to call a platform-specific huge allocator. It
may return an "array", but in this context, it's not a "C array".
On the other hand, I don't know of any such systems, and it's easy
enough for the implementation to make size_t 32 bits. (Perhaps Andrey
knows more about the practical aspects of this.)

You can compile it targeting a huge memory model (ie one where
segment-offset pairs are internally normalized by the compiler), or
litter the code with platform-specific magic to make it happen. The
former has the benefit of having bsearch, qsort, etc, work on your
large arrays.


Mark F. Haigh
(e-mail address removed)
 
D

Dietmar Schindler

Stephen said:
int could be too small to hold a valid array index, and the same is true for
long, though less likely.

You can't possibly mean what you wrote. int can hold 0; 0 is a valid
array index; therefore int is not too small to hold a valid array index.
 
G

Guest

Dietmar said:
You can't possibly mean what you wrote. int can hold 0; 0 is a valid
array index; therefore int is not too small to hold a valid array index.

int could be too small to hold *a* valid array index. In other words,
there may exist a valid array index which is outside of int's range
(regardless of other array indices which aren't). You read "a" as
"any", but that changes the meaning.
 
S

Stephen Sprunk

Dietmar Schindler said:
You can't possibly mean what you wrote. int can hold 0; 0 is a valid
array index; therefore int is not too small to hold a valid array index.

Depending on the implementation, it is possible for INT_MAX+1 to be a valid
array index. On such systems, my statement holds true.

In at least two common 64-bit systems, int is 32-bit yet malloc() can return
objects larger than 2^32 bytes. One of those also has a 32-bit long, which
means size_t is the only type you can safely use as an array index in
portable code (since long long isn't yet available on many implementations,
and even that may be one bit too small).

S
 
D

Dietmar Schindler

=?utf-8?B?SGFyYWxkIHZhbiBExLNr?= said:
int could be too small to hold *a* valid array index. In other words,
there may exist a valid array index which is outside of int's range
(regardless of other array indices which aren't). You read "a" as
"any", but that changes the meaning.

You read "a" as "every", but are you sure that "a" means "every" rather
than "any"?
(I'm asking what "a" truly means, not what it got to be meaning to make
the sentence "int could be too small to hold *a* valid array index"
meaningful.)
 
M

Mark L Pappin

Dietmar Schindler said:
You read "a" as "every", but are you sure that "a" means "every" rather
than "any"?
(I'm asking what "a" truly means, not what it got to be meaning to make
the sentence "int could be too small to hold *a* valid array index"
meaningful.)

English is strange.

"a" has many possible meanings, and which one is appropriate is
determined by context.

In the original statement (which I don't recall having seen other than
as the subject of dissection in this sub-thread), the writer might
have meant either "every" or "any". "any" makes the statement
trivially false (as you pointed out, 0 is a valid index and 'int' can
certainly hold that value); "every" makes the statement true.

You may choose to interpret it either way, depending on whether you
think the writer intended to make a true statement or a trivially
false one.

(If spoken you may have heard emphasis on "could", if "every" was
intended; here we cave no such clues.)


In next week's lesson, we move on to "b". In a year we'll reach "zz".

mlp
 
D

Dave Thompson

On Thu, 29 Jun 2006 01:19:35 +0000, Richard Heathfield
Okay, let's take a different tack. The canonical way to determine the number
of elements in an array (cf C89 3.3.3.4) is: sizeof array / sizeof array[0]

Now, sizeof yields size_t. What is the natural type to use for storing the
result of a division of size_t by size_t? I would argue that it's size_t.
Certainly the division will yield an unsigned type as its result. So it

<pedantic> Not if size_t('s actual type) is lower in rank (= narrower)
than signed int. Then both operands promote to int and the division is
done in int. The result value is in range for size_t however.
makes perfect sense to do this:

size_t i;

for(i = 0; i < sizeof array / sizeof array[0]; i++)

Yes? Well, I doubt whether I've convinced you, but maybe some others here
will be swayed by this argument. :)

Personally I usually use size_t for bound or subscript in routines
(esp libraries) that are intended to be fully generic. But in code
where I know* what sorts of bounds or (ranges of) subscripts will be
used I often just use unsigned int or long, and sometimes even signed.

* For values of know that depending on mood may include have some
evidence for, strongly suspect, guess, and have a vague hunch. <G>

- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

size_t, ssize_t and ptrdiff_t 56
size_t, when to use it? (learning) 45
return -1 using size_t??? 44
usage of size_t 190
size_t 18
ssize_t and size_t 8
What's the deal with size_t? 104
size_t in inttypes.h 4

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top