Mixing size_t and other types

Ian Collins · Aug 1, 2010

95% of the time though, one knows when they are likely to be dealing with a
large file, and for the rest of the time, they can simply regard files
larger than a certain "reasonable" size to be invalid...

this is like, for example, assuming that the size of a JPEG or COFF object
is< 16MB.
yes, there *could* be a 16MB JPEG or COFF object passed in, but for sake of
convinience one can assume that it doesn't exist, or check the size and
refuse to accept it if it exceeds a certain limit.

Wasn't checking the size the original problem?

it is still a fairly safe bet at the moment that most files are less than
2GB or 4GB, and so it is a reasonable enough working assumption for most
things.

That all depends on the context. If you are dealing with media files or
iso images, it is a poor assumption.

Nobody · Aug 1, 2010

in the cases where the long/size_t issue will actually matter, malloc and
realloc will likely fail as well...

.... if you do it right, i.e. check that the value fits into a size_t
before casting. Otherwise, the value passed to malloc() will end up
containing the size of the file modulo 2-to-the-something, and you'll
either get a buffer overrun (if you use the actual size of the file) or
truncated data (if you use the the size of the buffer).

Barry Schwarz · Aug 2, 2010

[email protected] said:
[email protected] said:

Strangely enough, size_t is commonly used for sizes, so just make size a
size_t and save to agro.

size_t will be an unsigned type.

Thanks. But this variable holds a value that comes from the standard
function "ftell". It'd be something like,

fseek (fp , 0 , SEEK_END);
size = ftell (fp); /* ftell gives me long int. */
rewind (fp);

So I think I won't be able to make it a size_t. Not sure how to go
about this. Any help?

Click to expand...

You mustn't assign the return value of 'ftell' to a variable of type
'size_t' because it may return -1L on error. In that case you'd end up
with a nonsensical file position value, [snip]

Click to expand...

Right in the sense that the type 'size_t' can be too narrow,
but 'unsigned long' should work just fine, since 'ftell()'
returns a plain 'long'. The error indicator value of -1
can be tested for in an 'unsigned long size' using a regular
equality comparison (ie, == or !=).

How does one distinguish between the error return of -1L (which is
converted to ULONG_MAX) and the actual return of ULONG_MAX?

Barry Schwarz · Aug 2, 2010

malloc expects a size_t argument so there is no reason to cast it.

The reason for the cast is to suppress the (annoying?) warning that
started this thread.

size_t will be able to hold any required value that ftell can return.

Not necessarily? size_t is allowed to be 16 bits. ftell will return
a long which is at least 32 bits and can easily exceed 65,535.

Nobody · Aug 2, 2010

You mustn't assign the return value of 'ftell' to a variable of type
'size_t' because it may return -1L on error. In that case you'd end up
with a nonsensical file position value, [snip]

Click to expand...

Right in the sense that the type 'size_t' can be too narrow,
but 'unsigned long' should work just fine, since 'ftell()'
returns a plain 'long'. The error indicator value of -1
can be tested for in an 'unsigned long size' using a regular
equality comparison (ie, == or !=).

Click to expand...

How does one distinguish between the error return of -1L (which is
converted to ULONG_MAX) and the actual return of ULONG_MAX?

ftell() returns a signed long, so it can't return ULONG_MAX (or any other
value greater than LONG_MAX).

ImpalerCore · Aug 2, 2010

Thanks. But this variable holds a value that comes from the standard
function "ftell". It'd be something like,

fseek (fp , 0 , SEEK_END);
size = ftell (fp); /* ftell gives me long int. */
rewind (fp);

So I think I won't be able to make it a size_t. Not sure how to go
about this. Any help?

Well, you're in the land of platform specific behavior. In my
opinion, the best way to handle this is to make an assumption about
the conversion between size_t and long int and stick to it. If you're
assumption isn't held, you fail the compiler so that if a person
compiles your code on a different platform with different assumptions,
the compiler flags the person down before getting on the bus to the
city of Undefined Behavior.

If you decide to go down this path, this is where I'd consider using a
static assertion mechanism. The C construct is found in the archives,
and C++ has a static assertion as well. The one that I use is the
following (not sure who the first one was in the archive to come up
with it though):

\code
#define STATIC_ASSERT(name, expr) extern char (name)[(expr) ? 1 : -1]
\endcode

If 'expr' evaluates to false, you attempt to reference an array with a
negative sign, causing a compiler error. In the syntax of the
compiler error message, you find the 'name' which gives a hint at the
failure point. You can use this in this kind of context.

\code
#include <stdlib.h>
#include <limits.h>

STATIC_ASSERT( CHAR_BIT_is_8_bits, CHAR_BIT == 8 );
STATIC_ASSERT( sizeof_int_at_least_32_bits, sizeof(int) >= 4 );

int main(void)
{
... your stuff here ...

return EXIT_SUCCESS;
}
\endcode

If I require CHAR_BIT to be 32-bits for code specific to a particular
dsp, I can use the following

\code
#include <stdlib.h>
#include <limits.h>

STATIC_ASSERT( CHAR_BIT_is_32_bits, CHAR_BIT == 32 );

int main(void)
{
return EXIT_SUCCESS;
}
\endcode

And when I compile this on my PC, I get an error message that reads

\error
STATIC_ASSERT_example.c:4: error: size of array 'CHAR_BIT_is_32_bits'
is negative
\enderror

So with a static assertion, you can define the sandbox where your code
will do what you intend, and anyone outside the sandbox will be forced
to deal with the compiler error. I'm not completely positive what
'expr' to use in the static assert, but my first guess would be.

STATIC_ASSERT( size_t_compatible_with_ftell, sizeof( size_t ) >=
size_t( long int ) );

Just an idea to throw around.

Best regards,
John D.

ImpalerCore · Aug 2, 2010

Thanks. But this variable holds a value that comes from the standard
function "ftell". It'd be something like,

fseek (fp , 0 , SEEK_END);
size = ftell (fp); /* ftell gives me long int. */
rewind (fp);

So I think I won't be able to make it a size_t. Not sure how to go
about this. Any help?

I'll also add that validating whether the 'long int' return value of
ftell is compatible with 'size_t' is that static asserts may not be
sufficient to verify that the conversion is valid. Some runtime
behavior will need to be added to handle the -1 error from ftell (for
example, if you don't have read permissions on a file, you may not be
able to seek it to determine the file size).

So static assertions are not the end all to solving this problem, but
they can help enforce the assumptions that you make in your code.

ImpalerCore · Aug 2, 2010

It looks like precalculating the filesize is not a good idea. A better
approach would be to read each charater and allocate necessary memory
using "realloc". So I don't have all these size_t and other type
mixing issues.

This depends on whether its required to store the entire file in
memory. It can be okay for configuration files and some data files,
but in general, it's not something that you can rely on. In my
scenario, I have 2+ GB files that I have to process on a 32-bit system
that maxes process memory at 2GB. I am not be able to load the entire
flight in memory at once, and I cannot force people to upgrade their
hardware to allow process memory footprints of > 2GB.

If your file consists of records that can be processed one at a time,
you can have a source->sink model where you read the file into a
buffer, parse the buffer into records, put the records in a container
(queue or list), perform your calculations/filtering, save any output
records to file, remove them from the container, and start reading the
buffer again to get more records. Rinse and repeat until the file is
complete.

I don't know the limits of the file sizes that you work with, but
there are real life examples when loading the entire file into memory
will not work. It can be very convenient when it does though.

Best regards,
John D.

BGB / cr88192 · Aug 2, 2010

Ian Collins said:
Wasn't checking the size the original problem?

this is about size range...

it is unlikely that most files are actually large enough that one will need
to care about the difference between long and size_t.

That all depends on the context. If you are dealing with media files or
iso images, it is a poor assumption.

yes, but one typically KNOWS when they are dealing with a media file or ISO
image, so it still holds...
anyway, large media files or ISO images are still likely the vast minority
when it comes to the types of files one is likely to be working with.

this is like criticising the design of a car because a cow, horse, or
elephant can't fit in the drivers' seat...
one typically knows when something may be expected to hold a large object,
and so can design it as needed, and if they know it wont need to deal with a
large object, there is no problem designing it for smaller objects.

or, like, condemning that a person should use FAT32 due to its file-size
limits, and hence, inability to hold such large media files or ISO's in the
first place...

but, one may still use FAT32, even for larger drive, noting that in some
ways, it does have advantages over NTFS and similar...

or such...

BGB / cr88192 · Aug 2, 2010

Nobody said:
... if you do it right, i.e. check that the value fits into a size_t
before casting. Otherwise, the value passed to malloc() will end up
containing the size of the file modulo 2-to-the-something, and you'll
either get a buffer overrun (if you use the actual size of the file) or
truncated data (if you use the the size of the buffer).

doesn't matter as much if one can't allocate a buffer that large anyways,
which in turn doesn't matter if one is not going to be dealing with files
that large.

anyways, if one really needs large files, they can use the OS-specific IO
functions (fseekl/ftelll or GetFileSizeEx or similar...).

Ian Collins · Aug 2, 2010

this is about size range...

it is unlikely that most files are actually large enough that one will need
to care about the difference between long and size_t.

or, like, condemning that a person should use FAT32 due to its file-size
limits, and hence, inability to hold such large media files or ISO's in the
first place...

but, one may still use FAT32, even for larger drive, noting that in some
ways, it does have advantages over NTFS and similar...

Ah, but the file size limit on fat32 is 4GB, so anyone using a 32 bit
system will have to care about the difference between long and size_t!

chrisbazley · Aug 3, 2010

yes, but one typically KNOWS when they are dealing with a media file or ISO
image, so it still holds...
anyway, large media files or ISO images are still likely the vast minority
when it comes to the types of files one is likely to be working with.

this is like criticising the design of a car because a cow, horse, or
elephant can't fit in the drivers' seat...
one typically knows when something may be expected to hold a large object,
and so can design it as needed, and if they know it wont need to deal with a
large object, there is no problem designing it for smaller objects.

I agree with you up to a point, but it is still prudent when designing
a small car to document in the driver's handbook that it wasn't
intended for elephants and design the doors to debar elephants -
rather than unexpectedly turning itself inside out or destroying the
universe when the first elephant tries to climb aboard.

Eric Sosman · Aug 3, 2010

It looks like precalculating the filesize is not a good idea. A better
approach would be to read each charater and allocate necessary memory
using "realloc". So I don't have all these size_t and other type
mixing issues.

Click to expand...

This depends on whether its required to store the entire file in
memory. It can be okay for configuration files and some data files,
but in general, it's not something that you can rely on. In my
scenario, I have 2+ GB files that I have to process on a 32-bit system
that maxes process memory at 2GB. [...]

Shouldn't you call it a 31-bit system?

BGB / cr88192 · Aug 4, 2010

Ian Collins said:
Ah, but the file size limit on fat32 is 4GB, so anyone using a 32 bit
system will have to care about the difference between long and size_t!

the point was about in general: using a mechanism with a size limit.
rather than the issue of where exactly this limit is...

BGB / cr88192 · Aug 4, 2010

yes, but one typically KNOWS when they are dealing with a media file or
ISO
image, so it still holds...
anyway, large media files or ISO images are still likely the vast minority
when it comes to the types of files one is likely to be working with.

this is like criticising the design of a car because a cow, horse, or
elephant can't fit in the drivers' seat...
one typically knows when something may be expected to hold a large object,
and so can design it as needed, and if they know it wont need to deal with
a
large object, there is no problem designing it for smaller objects.

<--
I agree with you up to a point, but it is still prudent when designing
a small car to document in the driver's handbook that it wasn't
intended for elephants and design the doors to debar elephants -
rather than unexpectedly turning itself inside out or destroying the
universe when the first elephant tries to climb aboard.
-->

by this reasoning, no one could use pointers without first checking they
weren't NULL either...

faster and easier in most cases is to assume that the pointers are not NULL,
and only really bother with checking in cases where a NULL is a reasonably
expected result...

then one can also, say, use certain compiler extensions, such as a "__try
{ ... } __except(...) { ... }" block to catch those rare cases when these
assumptions turn out to be wrong, or maybe go and fix it or similar...

BGB / cr88192 · Aug 4, 2010

Eric Sosman said:
It looks like precalculating the filesize is not a good idea. A better
approach would be to read each charater and allocate necessary memory
using "realloc". So I don't have all these size_t and other type
mixing issues.

Click to expand...

This depends on whether its required to store the entire file in
memory. It can be okay for configuration files and some data files,
but in general, it's not something that you can rely on. In my
scenario, I have 2+ GB files that I have to process on a 32-bit system
that maxes process memory at 2GB. [...]

Click to expand...

Shouldn't you call it a 31-bit system?

it is typical for 32-bit OS's to reserve the upper 1-2GB of the address
space for themselves.
this is actually useful in some ways, as it gives this nice non-addressable
space in which to use for encoding abstract values into pointers (or, on
x86-64, that large space between the positive and negative part of the
address space...).

Ian Collins · Aug 4, 2010

Eric Sosman said:
Eric Sosman said:

It looks like precalculating the filesize is not a good idea. A better
approach would be to read each charater and allocate necessary memory
using "realloc". So I don't have all these size_t and other type
mixing issues.

This depends on whether its required to store the entire file in
memory. It can be okay for configuration files and some data files,
but in general, it's not something that you can rely on. In my
scenario, I have 2+ GB files that I have to process on a 32-bit system
that maxes process memory at 2GB. [...]

Click to expand...

Shouldn't you call it a 31-bit system?

Click to expand...

it is typical for 32-bit OS's to reserve the upper 1-2GB of the address
space for themselves.

Typical for windows maybe, but not for others.

Ben Pfaff · Aug 4, 2010

Ian Collins said:
Typical for windows maybe, but not for others.

It is also typical for Linux, at least on x86. It is impractical
to build an x86 operating "protected mode" operating system,
without reserving some address space for the OS.

Ian Collins · Aug 4, 2010

It is also typical for Linux, at least on x86. It is impractical
to build an x86 operating "protected mode" operating system,
without reserving some address space for the OS.

Some maybe, but not half!

BGB / cr88192 · Aug 4, 2010

Ian Collins said:
Some maybe, but not half!

AFAIK, Linux reserves 1GB (could be wrong, but I think it is 1GB).

I guess it is possible though to build Linux in such a way that it doesn't
reserve any space (it does an address-space switch whenever going into the
kernel), although it is not free (slows down OS), and gives more space at
the cost of not leaving any unused space for sticking abstract handles or
similar...

Windows reserves 1GB or 2GB depending on settings, and an app flag.
older Windows always reserved 2GB, and newer Windows always reserves 1GB,
with the "large address aware" flag telling what the app expects.

size_t, ssize_t and ptrdiff_t	56	Oct 12, 2013
size_t in inttypes.h	4	May 26, 2011
Types	13	Jan 20, 2014
mixed declarations and code (and size_t)?	7	Nov 15, 2010
size_t in a struct	24	May 20, 2011
Weird result from size_t	5	Oct 31, 2009
return -1 using size_t???	44	Feb 11, 2012
usage of size_t	190	Feb 21, 2010

Mixing size_t and other types

Ian Collins

Nobody

Barry Schwarz

Barry Schwarz

Nobody

ImpalerCore

ImpalerCore

ImpalerCore

BGB / cr88192

BGB / cr88192

Ian Collins

chrisbazley

Eric Sosman

BGB / cr88192

BGB / cr88192

BGB / cr88192

Ian Collins

Ben Pfaff

Ian Collins

BGB / cr88192

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads