size_t, when to use it? (learning)

G G · Apr 10, 2014

typedef unsigned int size_t

..............

size_t

when to declare something size_t? when the variable is associated with memory usage?

at any other time should the variable be declared an unsigned int?

it's not a question of style, right?

James Kuyper · Apr 10, 2014

typedef unsigned int size_t

.............

size_t

when to declare something size_t? when the variable is associated with memory usage?

at any other time should the variable be declared an unsigned int?

it's not a question of style, right?

There's a few general rules that apply:
* If you're using a function that uses size_t in its interface, any
corresponding variables in your code should have the type size_t, unless
you have some better reason for giving them some other type.

* If you're using a function that takes, as an argument, a pointer to
size_t, the object pointed at by that pointer MUST have the type size_t.

* Because malloc() and realloc() take size_t arguments, size_t is
guaranteed to be large enough to index any array you can build in memory
allocated using those functions. Because sizeof() has a value that is of
type size_t, it's also nearly certain (though technically not required)
that size_t will be big enough to index any array that is allocated by
any other means. There's no smaller type for which any comparable
guarantees apply, so you should choose size_t whenever indexing an
array, if you have no other information about the size of that array
which would allow you to use some other, smaller type. Even if you do
have such information, I wouldn't recommend using the smaller type
unless you also know that it's faster.

Other than that, you should keep in mind that size_t is an unsigned
type. Expressions involving values of both unsigned and signed types
often end up being evaluated by converting the signed value to an
unsigned type, and producing a result with an unsigned type. The
conversion necessarily changes the value if it was negative before the
conversion. This can be quite annoying if you didn't anticipate that
possibility. This is called "unsigned poisoning", and because of it, you
should in general use unsigned types only when you have a good reason to
do so.

G G · Apr 10, 2014

On Thursday, April 10, 2014 11:42:33 AM UTC-4, James Kuyper,

thanks,

g.

Kaz Kylheku · Apr 10, 2014

typedef unsigned int size_t

.............

size_t

when to declare something size_t? when the variable is associated with memory usage?
at any other time should the variable be declared an unsigned int?

Unsigned types are best used in certain kinds of calculations involving binary
numbers which must be machine-independent, and at the same time produce values
that are defined exactly at the bit level. Also, unsigned types are suitable
for representing bit fields and masks: they have no troublesome sign bit which
causes nonportable behaviors.

Used as arithmetic types, the unsigned types are inherently dangerous because
of their cliff behavior around zero: where a signed calculation would produce a
negative value, the unsigned type produces some large value.

I would say, avoid using size_t user-defined code, even for representing the
sizes of objects.

It's okay for capturing the result of a standard library function which returns
size_t, as long as no complicated arithmetic is being done with it.

A good rule of thumb is that when you start subtracting sizes, you probably
want to switch to signed integers.

Signed types like "long" and "long long" are usually good enough to represent
the sizes of ordinary objects in a C program. If size_t is "unsigned int", and
unsigned int is 32 bits wide, then you need a two gigabyte array before its
size doesn't fit into int, and requires size_t.

If size_t is "unsigned int" and only 16 bits wide, then it can represent
object sizes in the range 32768 to 65535 which "int" cannot; but in that
case, the "long" type can cover the range.

Malcolm McLean · Apr 10, 2014

typedef unsigned int size_t

size_t

when to declare something size_t? when the variable is associated with memory usage?
at any other time should the variable be declared an unsigned int?

it's not a question of style, right?

It's partly a question of style.

My own view is that size_t should never have been introduced. It causes far more problems than it
solves.
The original idea was that it would hold the size in bytes of an object in memory. Typically,
machines have an address space of 4GB. So if you want an object of over 2GB in size, you can't pass
an int to malloc(), as was the interface in old C.
But unfortunately the ANSI committee also used size_t for counts of objects in memory. If you have
a string of over 2GB, an int won't hold the length. sort also takes two size_ts.
But if your count of objects in memory is a size_t, then your index variable which goes from 0 to
N-1 must also be a size_t. That's where the problems start.
Firstly, if sizes in bytes, counts of objects, index variables, and intermediate variables used in calculating
indices are all size_t, then that's practically all the integers in a typical C program. So plain int fades
away, it's no useful any more. Except that it's intuitive to say "int i" when you want an integer, not
size_t i, when i doesn't hold a size. So in fact code that uses size_t is littered with conversions from
int to size_t. The other problem is that size_t is unsigned. So you have to be careful with code like

for(i=0;i<N-1;i++)

if we use ints, the loop body won't execute, which is probably the intention. if we use size_t, we'll get either a crash or a very long delay, depending on whether i indexes into memory or not.

My own view is, don't use size_t at all. Just pass ints to the standard library functions and pretend it
was never invented. You're much more likely to get a size_t bug than to have to deal with N > 2G.
But of course I'm advocating writing code which, strictly, is incorrect. So it's hardly the ideal answer.
There isn't an ideal answer. The committee has imposed on us something that makes sense maybe
in the small embedded world, and certainly makes sense in a non-human way of thinking, but is
just a danger to actual programmers writing scalable algorithms.

Kaz Kylheku · Apr 10, 2014

It's partly a question of style.

My own view is that size_t should never have been introduced. It causes far
more problems than it solves.

Here ye, here ye.

The original idea was that it would hold the size in bytes of an object in
memory. Typically, machines have an address space of 4GB. So if you want an
object of over 2GB in size, you can't pass
an int to malloc(), as was the interface in old C.

There is also a concern for small systems, such as 8086 based systems,
at least when targetted using certain memory models.

You need 16 bits to be able to represent the size of an object up to an almost
full 64K "paragraph". A signed 16 bit type only goes to 32767.

One fix would be to use long as the argument of malloc and return value of
sizeof, strlen and so on. But that leads to awful inefficiencies on a 16 bit
processor.

glen herrmannsfeldt · Apr 10, 2014

(snip)

My own view is that size_t should never have been introduced.
It causes far more problems than it solves.

The original idea was that it would hold the size in bytes of
an object in memory. Typically, machines have an address
space of 4GB. So if you want an object of over 2GB in size,
you can't pass an int to malloc(), as was the interface in old C.

But unfortunately the ANSI committee also used size_t for counts
of objects in memory. If you have a string of over 2GB, an int
won't hold the length. sort also takes two size_ts.

Since sort doesn't know the size of things you might want to sort,
it sort of has to do that.

But if your count of objects in memory is a size_t, then your
index variable which goes from 0 to N-1 must also be a size_t.
That's where the problems start.

Seems to me that in a large fraction of the cases, int is fine.
The fact that malloc() takes a size_t isn't a problem, as it will
be converted.

Firstly, if sizes in bytes, counts of objects, index variables,
and intermediate variables used in calculating indices are all
size_t, then that's practically all the integers in a typical C
program.

The standard has to allow for all possible programs, even if 99.9%
of them int is fine. If you are declaring an array of pointers in
place of a 2D matrix, you can be pretty sure that int will be enough.
(Are there any where INT_MAX*INT_MAX is too big for size_t?)

So plain int fades away, it's no useful any more. Except that
it's intuitive to say "int i" when you want an integer, not
size_t i, when i doesn't hold a size. So in fact code that
uses size_t is littered with conversions from int to size_t.

Well, int is supposed to be the convenient size for the processor.

Some years ago (about 10) when I had over 2GB of swap space on
a Win2K machine, I had some programs that wouldn't run claiming not
enough memory. They did the calculation in signed int (my guess),
found out that available memory was less than it needed, and quit.

I now have a 3TB disk that I can NFS mount on different systems,
even ones that don't have native file systems that large.

The other problem is that size_t is unsigned.
(snip)

My own view is, don't use size_t at all. Just pass ints to
the standard library functions and pretend it was never invented.

When you know that int will always be big enough, that seems right
to me.

-- glen

James Kuyper · Apr 10, 2014

Most of the integers in my programs contain either unsigned 12-bit
photon counts (the relevant photons could be considered objects in some
sense, but they are not C objects), or signed 16-bit scaled integers,
neither of which fits into any of the categories you've listed. These
often occur in fairly large (multi-million element) arrays, so using a
32-bit int to store them would be pretty wasteful.

....

When you know that int will always be big enough, that seems right
to me.

Having frequently programmed on systems where int had 16 bits, I learned
pretty quickly not to make such assumptions.

G G · Apr 10, 2014

The original idea was that it would hold the size in bytes of an object in memory. Typically,
machines have an address space of 4GB. So if you want an object of over 2GB in size, you can't pass
an int to malloc(), as was the interface in old C.

But unfortunately the ANSI committee also used size_t for counts of objects in memory. If you have
a string of over 2GB, an int won't hold the length. sort also takes two size_ts.
But if your count of objects in memory is a size_t, then your index variable which goes from 0 to
N-1 must also be a size_t. That's where the problems start.

Firstly, if sizes in bytes, counts of objects, index variables, and intermediate variables used in calculating
indices are all size_t, then that's practically all the integers in a typical C program. So plain int fades
away, it's no useful any more. Except that it's intuitive to say "int i" when you want an integer, not
size_t i, when i doesn't hold a size. So in fact code that uses size_t is littered with conversions from
int to size_t. The other problem is that size_t is unsigned. So you have to be careful with code like

for(i=0;i<N-1;i++)

if we use ints, the loop body won't execute, which is probably the intention. if we use size_t, we'll get either a crash or a very long delay, depending on whether i indexes into memory or not.

My own view is, don't use size_t at all. Just pass ints to the standard library functions and pretend it
was never invented. You're much more likely to get a size_t bug than to have to deal with N > 2G.
But of course I'm advocating writing code which, strictly, is incorrect. So it's hardly the ideal answer.
There isn't an ideal answer. The committee has imposed on us something that makes sense maybe
in the small embedded world, and certainly makes sense in a non-human way of thinking, but is
just a danger to actual programmers writing scalable algorithms.

Malcolm,

your post has made me curious about the name size_t. i won't ask why that's the name, but does it have a kind of meaning, like int, integer, char, character...

so, kinda ... sort of ... like ..., size_t, size of object, size_t is like the word "size" and the "t" in object or is it size of int, where "size" and the last letter in int, the "t", are put together.

i know it's, this, a little off subject, please forgive me, but thanks.

Malcolm McLean · Apr 10, 2014

Most of the integers in my programs contain either unsigned 12-bit
photon counts (the relevant photons could be considered objects in some
sense, but they are not C objects), or signed 16-bit scaled integers,
neither of which fits into any of the categories you've listed. These
often occur in fairly large (multi-million element) arrays, so using a
32-bit int to store them would be pretty wasteful.

Fixed-point numbers are integers to C, but that's just a reflection of the
fact that C doesn't have any native syntactical sugar for fixed-point
arithmetic.
The other question is whether

int i;
int N = 100000;
short *photons = malloc(N * sizeof(short));

for(i=0;i<N;i++)
potons = detectphoton();

declares 100002 integers, two of which are either counts of objects in memory
or array indices and 100000 of which are data, or three integers, two of which
are counts or indices and one of which is data.

Data tends to be real valued. Not 100% of the time, of course, and maybe
less often when you're doing quantum physics. But usually data points are
real.

James Kuyper · Apr 10, 2014

On 04/10/2014 04:42 PM, G G wrote:
....

Malcolm,

your post has made me curious about the name size_t. i won't ask why that's the name, but does it have a kind of meaning, like int, integer, char, character...

so, kinda ... sort of ... like ..., size_t, size of object, size_t is like the word "size" and the "t" in object or is it size of int, where "size" and the last letter in int, the "t", are put together.

The formal definition of size_t is

"the unsigned integer type of the result of the sizeof operator;" (7.19p2).

That explains the "size" part of the name. Using "_t" for type names is
a common convention. POSIX even reserves such identifiers for use as
POSIX types. That C uses the same convention reflects the fact that C an
Unix were both first developed in roughly the same place at roughly the
same time.

Malcolm McLean · Apr 10, 2014

your post has made me curious about the name size_t. i won't ask why
that's the name, but does it have a kind of meaning, like int, integer,
char, character...

so, kinda ... sort of ... like ..., size_t, size of object, size_t is
like the word "size" and the "t" in object or is it size of int, where
"size" and the last letter in int, the "t", are put together.

i know it's, this, a little off subject, please forgive me, but thanks.

The t stands for "type".

The name is a big part of the problem with size_t. The underscore looks
ugly and clashes with the convention that underscores represent either
namespace prefixes or subscripts. size strongly implies that the variable
holds a size in bytes, which was the original intention. Also, there's
no "size" type in most other programming languages.

Tim Prince · Apr 11, 2014

Here ye, here ye.

Hear... ?
Typical? Windows 64-bit came in at least 14 years ago, with int and
long too small to contain a pointer, which a certain customer I was
assigned to work with turned out to demand as a condition for continued
engagement. Of course, it was C++, only incidentally requiring
acceptance of some features shared with C. I don't think they cared
about any distinction between signed and unsigned storage requirement.
I don't see your solution which would have saved that job.

Remains to be seen on my next engagement what pitfalls the customer
needs to be extricated from in their transition from Fortran to C and C++.

There is also a concern for small systems, such as 8086 based systems,
at least when targetted using certain memory models.

You need 16 bits to be able to represent the size of an object up to an almost
full 64K "paragraph". A signed 16 bit type only goes to 32767.

One fix would be to use long as the argument of malloc and return value of
sizeof, strlen and so on. But that leads to awful inefficiencies on a 16 bit
processor.

The 8/16 bit cpus I worked with back in the day probably would have
worked with the typedef quoted above, not that I understand why anyone
would do that. I guess I'm not particularly interested in why C89 won't
work with some current CPU.

G G · Apr 11, 2014

On 04/10/2014 04:42 PM, G G wrote:

"the unsigned integer type of the result of the sizeof operator;" (7.19p2).

That explains the "size" part of the name. Using "_t" for type names is
a common convention. POSIX even reserves such identifiers for use as
POSIX types. That C uses the same convention reflects the fact that C an
Unix were both first developed in roughly the same place at roughly the
same time.

thanks James,

g.

Keith Thompson · Apr 11, 2014

Malcolm McLean said:
Fixed-point numbers are integers to C, but that's just a reflection of the
fact that C doesn't have any native syntactical sugar for fixed-point
arithmetic.
The other question is whether

int i;
int N = 100000;
short *photons = malloc(N * sizeof(short));

for(i=0;i<N;i++)
potons = detectphoton();

declares 100002 integers, two of which are either counts of objects in memory
or array indices and 100000 of which are data, or three integers, two of which
are counts or indices and one of which is data.

It declares (and also defines) two integer objects. Via the malloc
call, if it succeeds, it also creates another 100000 integer objects. I
don't know where the "three integers" come from (unless you're
suggesting that a short* is an integer, which it isn't).

[...]

Keith Thompson · Apr 11, 2014

Malcolm McLean said:
your post has made me curious about the name size_t. i won't ask why
that's the name, but does it have a kind of meaning, like int, integer,
char, character... [...]

Click to expand...

The t stands for "type".

The name is a big part of the problem with size_t.

You are, as far as I can tell, alone in that opinion.

The underscore looks
ugly and clashes with the convention that underscores represent either
namespace prefixes or subscripts. size strongly implies that the variable
holds a size in bytes, which was the original intention. Also, there's
no "size" type in most other programming languages.

A size_t can hold the size in bytes of any object [*]. That implies
that, for example, it can also hold the number of elements in any array
object, regardless of the element size.

[*] Well, almost certainly; it's not 100% clear that objects bigger than
SIZE_MAX bytes are forbidden, but most sane implementations would not
support them.

Kenny McCormack · Apr 11, 2014

Malcolm McLean said:
Malcolm McLean said:

your post has made me curious about the name size_t. i won't ask why
that's the name, but does it have a kind of meaning, like int, integer,
char, character... [...]

Click to expand...

The t stands for "type".

The name is a big part of the problem with size_t.

Click to expand...

You are, as far as I can tell, alone in that opinion.

Kiki is wrong, as usual.

See also:
http://flamewarriorsguide.com/warriorshtm/android.htm

--
One of the best lines I've heard lately:

Obama could cure cancer tomorrow, and the Republicans would be
complaining that he had ruined the pharmaceutical business.

(Heard on Stephanie Miller = but the sad thing is that there is an awful lot
of direct truth in it. We've constructed an economy in which eliminating
cancer would be a horrible disaster. There are many other such examples.)

Stefan Ram · Apr 11, 2014

Malcolm McLean said:
int N = 100000;

N1570 5.2.4.2.1 Sizes of integer types <limits.h>

....

Their implementation-defined values shall be equal or greater
in magnitude (absolute value) to those shown, with the same sign.

INT_MAX +32767

Ian Collins · Apr 11, 2014

Malcolm said:
The t stands for "type".

The name is a big part of the problem with size_t. The underscore looks
ugly and clashes with the convention that underscores represent either
namespace prefixes or subscripts.

What convention?

To anyone from a Unix background, _t as type suffix is the convention...

It is also used for all the stdint types.

Malcolm McLean · Apr 11, 2014

N1570 5.2.4.2.1 Sizes of integer types <limits.h>

Their implementation-defined values shall be equal or greater
in magnitude (absolute value) to those shown, with the same sign.

INT_MAX +32767

My view is that

size_t should be used consistently throughout a program for all index
variables and all counts of objects in memory.
It should be a fast integer type (normally easy to achieve).
It should be signed. If that means demanding special code for objects
of over half the address space, it's a price worth paying.
It should have a better name.

So call size_t "int" and make int 64 bits on a 64 bit machine,

size_t, ssize_t and ptrdiff_t	56	Oct 12, 2013
size_t in a struct	24	May 20, 2011
Learning to programm	1	Aug 25, 2022
size_t in inttypes.h	4	May 26, 2011
Struggling With Learning to Application	2	Sep 10, 2022
return -1 using size_t???	44	Feb 11, 2012
When/why to use size_t	12	May 23, 2006
usage of size_t	190	Feb 21, 2010

size_t, when to use it? (learning)

G G

James Kuyper

G G

Kaz Kylheku

Malcolm McLean

Kaz Kylheku

glen herrmannsfeldt

James Kuyper

G G

Malcolm McLean

James Kuyper

Malcolm McLean

Tim Prince

G G

Keith Thompson

Keith Thompson

Kenny McCormack

Stefan Ram

Ian Collins

Malcolm McLean

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads