size_t, when to use it? (learning)

Discussion in 'C Programming' started by G G, Apr 10, 2014.

  1. G G

    G G Guest

    typedef unsigned int size_t

    ..............

    size_t

    when to declare something size_t? when the variable is associated with memory usage?

    at any other time should the variable be declared an unsigned int?

    it's not a question of style, right?
     
    G G, Apr 10, 2014
    #1
    1. Advertisements

  2. G G

    James Kuyper Guest

    There's a few general rules that apply:
    * If you're using a function that uses size_t in its interface, any
    corresponding variables in your code should have the type size_t, unless
    you have some better reason for giving them some other type.

    * If you're using a function that takes, as an argument, a pointer to
    size_t, the object pointed at by that pointer MUST have the type size_t.

    * Because malloc() and realloc() take size_t arguments, size_t is
    guaranteed to be large enough to index any array you can build in memory
    allocated using those functions. Because sizeof() has a value that is of
    type size_t, it's also nearly certain (though technically not required)
    that size_t will be big enough to index any array that is allocated by
    any other means. There's no smaller type for which any comparable
    guarantees apply, so you should choose size_t whenever indexing an
    array, if you have no other information about the size of that array
    which would allow you to use some other, smaller type. Even if you do
    have such information, I wouldn't recommend using the smaller type
    unless you also know that it's faster.

    Other than that, you should keep in mind that size_t is an unsigned
    type. Expressions involving values of both unsigned and signed types
    often end up being evaluated by converting the signed value to an
    unsigned type, and producing a result with an unsigned type. The
    conversion necessarily changes the value if it was negative before the
    conversion. This can be quite annoying if you didn't anticipate that
    possibility. This is called "unsigned poisoning", and because of it, you
    should in general use unsigned types only when you have a good reason to
    do so.
     
    James Kuyper, Apr 10, 2014
    #2
    1. Advertisements

  3. G G

    G G Guest

    On Thursday, April 10, 2014 11:42:33 AM UTC-4, James Kuyper,

    thanks,

    g.
     
    G G, Apr 10, 2014
    #3
  4. G G

    Kaz Kylheku Guest

    Unsigned types are best used in certain kinds of calculations involving binary
    numbers which must be machine-independent, and at the same time produce values
    that are defined exactly at the bit level. Also, unsigned types are suitable
    for representing bit fields and masks: they have no troublesome sign bit which
    causes nonportable behaviors.

    Used as arithmetic types, the unsigned types are inherently dangerous because
    of their cliff behavior around zero: where a signed calculation would produce a
    negative value, the unsigned type produces some large value.

    I would say, avoid using size_t user-defined code, even for representing the
    sizes of objects.

    It's okay for capturing the result of a standard library function which returns
    size_t, as long as no complicated arithmetic is being done with it.

    A good rule of thumb is that when you start subtracting sizes, you probably
    want to switch to signed integers.

    Signed types like "long" and "long long" are usually good enough to represent
    the sizes of ordinary objects in a C program. If size_t is "unsigned int", and
    unsigned int is 32 bits wide, then you need a two gigabyte array before its
    size doesn't fit into int, and requires size_t.

    If size_t is "unsigned int" and only 16 bits wide, then it can represent
    object sizes in the range 32768 to 65535 which "int" cannot; but in that
    case, the "long" type can cover the range.
     
    Kaz Kylheku, Apr 10, 2014
    #4
  5. It's partly a question of style.

    My own view is that size_t should never have been introduced. It causes far more problems than it
    solves.
    The original idea was that it would hold the size in bytes of an object in memory. Typically,
    machines have an address space of 4GB. So if you want an object of over 2GB in size, you can't pass
    an int to malloc(), as was the interface in old C.
    But unfortunately the ANSI committee also used size_t for counts of objects in memory. If you have
    a string of over 2GB, an int won't hold the length. sort also takes two size_ts.
    But if your count of objects in memory is a size_t, then your index variable which goes from 0 to
    N-1 must also be a size_t. That's where the problems start.
    Firstly, if sizes in bytes, counts of objects, index variables, and intermediate variables used in calculating
    indices are all size_t, then that's practically all the integers in a typical C program. So plain int fades
    away, it's no useful any more. Except that it's intuitive to say "int i" when you want an integer, not
    size_t i, when i doesn't hold a size. So in fact code that uses size_t is littered with conversions from
    int to size_t. The other problem is that size_t is unsigned. So you have to be careful with code like

    for(i=0;i<N-1;i++)

    if we use ints, the loop body won't execute, which is probably the intention. if we use size_t, we'll get either a crash or a very long delay, depending on whether i indexes into memory or not.

    My own view is, don't use size_t at all. Just pass ints to the standard library functions and pretend it
    was never invented. You're much more likely to get a size_t bug than to have to deal with N > 2G.
    But of course I'm advocating writing code which, strictly, is incorrect. So it's hardly the ideal answer.
    There isn't an ideal answer. The committee has imposed on us something that makes sense maybe
    in the small embedded world, and certainly makes sense in a non-human way of thinking, but is
    just a danger to actual programmers writing scalable algorithms.
     
    Malcolm McLean, Apr 10, 2014
    #5
  6. G G

    Kaz Kylheku Guest

    Here ye, here ye.
    There is also a concern for small systems, such as 8086 based systems,
    at least when targetted using certain memory models.

    You need 16 bits to be able to represent the size of an object up to an almost
    full 64K "paragraph". A signed 16 bit type only goes to 32767.

    One fix would be to use long as the argument of malloc and return value of
    sizeof, strlen and so on. But that leads to awful inefficiencies on a 16 bit
    processor.
     
    Kaz Kylheku, Apr 10, 2014
    #6
  7. (snip)
    Since sort doesn't know the size of things you might want to sort,
    it sort of has to do that.
    Seems to me that in a large fraction of the cases, int is fine.
    The fact that malloc() takes a size_t isn't a problem, as it will
    be converted.
    The standard has to allow for all possible programs, even if 99.9%
    of them int is fine. If you are declaring an array of pointers in
    place of a 2D matrix, you can be pretty sure that int will be enough.
    (Are there any where INT_MAX*INT_MAX is too big for size_t?)
    Well, int is supposed to be the convenient size for the processor.

    Some years ago (about 10) when I had over 2GB of swap space on
    a Win2K machine, I had some programs that wouldn't run claiming not
    enough memory. They did the calculation in signed int (my guess),
    found out that available memory was less than it needed, and quit.

    I now have a 3TB disk that I can NFS mount on different systems,
    even ones that don't have native file systems that large.
    When you know that int will always be big enough, that seems right
    to me.

    -- glen
     
    glen herrmannsfeldt, Apr 10, 2014
    #7
  8. G G

    James Kuyper Guest

    Most of the integers in my programs contain either unsigned 12-bit
    photon counts (the relevant photons could be considered objects in some
    sense, but they are not C objects), or signed 16-bit scaled integers,
    neither of which fits into any of the categories you've listed. These
    often occur in fairly large (multi-million element) arrays, so using a
    32-bit int to store them would be pretty wasteful.

    ....
    Having frequently programmed on systems where int had 16 bits, I learned
    pretty quickly not to make such assumptions.
     
    James Kuyper, Apr 10, 2014
    #8
  9. G G

    G G Guest

    Malcolm,

    your post has made me curious about the name size_t. i won't ask why that's the name, but does it have a kind of meaning, like int, integer, char, character...

    so, kinda ... sort of ... like ..., size_t, size of object, size_t is like the word "size" and the "t" in object or is it size of int, where "size" and the last letter in int, the "t", are put together.

    i know it's, this, a little off subject, please forgive me, but thanks.
     
    G G, Apr 10, 2014
    #9
  10. Fixed-point numbers are integers to C, but that's just a reflection of the
    fact that C doesn't have any native syntactical sugar for fixed-point
    arithmetic.
    The other question is whether

    int i;
    int N = 100000;
    short *photons = malloc(N * sizeof(short));

    for(i=0;i<N;i++)
    potons = detectphoton();

    declares 100002 integers, two of which are either counts of objects in memory
    or array indices and 100000 of which are data, or three integers, two of which
    are counts or indices and one of which is data.

    Data tends to be real valued. Not 100% of the time, of course, and maybe
    less often when you're doing quantum physics. But usually data points are
    real.
     
    Malcolm McLean, Apr 10, 2014
    #10
  11. G G

    James Kuyper Guest

    On 04/10/2014 04:42 PM, G G wrote:
    ....
    The formal definition of size_t is

    "the unsigned integer type of the result of the sizeof operator;" (7.19p2).

    That explains the "size" part of the name. Using "_t" for type names is
    a common convention. POSIX even reserves such identifiers for use as
    POSIX types. That C uses the same convention reflects the fact that C an
    Unix were both first developed in roughly the same place at roughly the
    same time.
     
    James Kuyper, Apr 10, 2014
    #11
  12. The t stands for "type".

    The name is a big part of the problem with size_t. The underscore looks
    ugly and clashes with the convention that underscores represent either
    namespace prefixes or subscripts. size strongly implies that the variable
    holds a size in bytes, which was the original intention. Also, there's
    no "size" type in most other programming languages.
     
    Malcolm McLean, Apr 10, 2014
    #12
  13. G G

    Tim Prince Guest

    Hear... ?
    Typical? Windows 64-bit came in at least 14 years ago, with int and
    long too small to contain a pointer, which a certain customer I was
    assigned to work with turned out to demand as a condition for continued
    engagement. Of course, it was C++, only incidentally requiring
    acceptance of some features shared with C. I don't think they cared
    about any distinction between signed and unsigned storage requirement.
    I don't see your solution which would have saved that job.

    Remains to be seen on my next engagement what pitfalls the customer
    needs to be extricated from in their transition from Fortran to C and C++.
    The 8/16 bit cpus I worked with back in the day probably would have
    worked with the typedef quoted above, not that I understand why anyone
    would do that. I guess I'm not particularly interested in why C89 won't
    work with some current CPU.
     
    Tim Prince, Apr 11, 2014
    #13
  14. G G

    G G Guest

    thanks James,

    g.
     
    G G, Apr 11, 2014
    #14


  15. It declares (and also defines) two integer objects. Via the malloc
    call, if it succeeds, it also creates another 100000 integer objects. I
    don't know where the "three integers" come from (unless you're
    suggesting that a short* is an integer, which it isn't).

    [...]
     
    Keith Thompson, Apr 11, 2014
    #15
  16. You are, as far as I can tell, alone in that opinion.
    A size_t can hold the size in bytes of any object [*]. That implies
    that, for example, it can also hold the number of elements in any array
    object, regardless of the element size.

    [*] Well, almost certainly; it's not 100% clear that objects bigger than
    SIZE_MAX bytes are forbidden, but most sane implementations would not
    support them.
     
    Keith Thompson, Apr 11, 2014
    #16
  17. Kiki is wrong, as usual.

    See also:
    http://flamewarriorsguide.com/warriorshtm/android.htm

    --
    One of the best lines I've heard lately:

    Obama could cure cancer tomorrow, and the Republicans would be
    complaining that he had ruined the pharmaceutical business.

    (Heard on Stephanie Miller = but the sad thing is that there is an awful lot
    of direct truth in it. We've constructed an economy in which eliminating
    cancer would be a horrible disaster. There are many other such examples.)
     
    Kenny McCormack, Apr 11, 2014
    #17
  18. G G

    Stefan Ram Guest

    N1570 5.2.4.2.1 Sizes of integer types <limits.h>

    ....

    Their implementation-defined values shall be equal or greater
    in magnitude (absolute value) to those shown, with the same sign.

    INT_MAX +32767
     
    Stefan Ram, Apr 11, 2014
    #18
  19. G G

    Ian Collins Guest

    What convention?

    To anyone from a Unix background, _t as type suffix is the convention...

    It is also used for all the stdint types.
     
    Ian Collins, Apr 11, 2014
    #19
  20. My view is that

    size_t should be used consistently throughout a program for all index
    variables and all counts of objects in memory.
    It should be a fast integer type (normally easy to achieve).
    It should be signed. If that means demanding special code for objects
    of over half the address space, it's a price worth paying.
    It should have a better name.

    So call size_t "int" and make int 64 bits on a 64 bit machine,
     
    Malcolm McLean, Apr 11, 2014
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.