My GetDirectoryTreeSize() function, couple of questions.

P

Poster Matt

First of all if the code below is a mess on your screen, or if you'd prefer to
read it on a web page, then it's up on pastie - here: http://pastie.org/816917

I'm back doing some C coding on Linux, having not coded in C for 15 years...

My project needs the total file size of all the files in a directory tree, so I
wrote the function below. It calls itself recursively (wow, haven't needed to
use recursion in a while!). The function works fine, but I have a couple of
questions.

1) Am I freeing the memory correctly?

2) It seems like a lot of code to do something so simple. I had a look at
basically the same thing I wrote in C# a couple of years ago (which uses the
same recursive logic but uses only 10 lines of code). Is the code below more
lengthy than need be? In other words, have I missed an easier way of doing it,
by my choice of C standard library functions?

3) Any glaring mistakes in the code? I know I'm not checking malloc's return
value (I'll use my own mallocOrDie() function later), and ditch the printf()s.

Thanks all.


// Returns the size in bytes of a dir tree.
// Make sure char* dirName ends in a '/'.
long GetDirectoryTreeSize(char* dirName)
{
// Total in bytes
long total = 0;

DIR *directory;
struct dirent *dirItem;
struct stat statbuf;

// Open dir for reading or exit()
directory = opendir(dirName);
if (directory == NULL)
{
printf("\nCan't open dir: %s", dirName);
exit(1);
}

// Loop through files and subdirs.
while ((dirItem = readdir(directory)) != NULL)
{
// If item is a file, make full path and add its file size.
if (dirItem->d_type != DT_DIR)
{
// Make full path to the file.
int fullPathLen = strlen(dirName) + strlen(dirItem->d_name) + 1;
char* fullPath = (char*) malloc(fullPathLen);
sprintf(fullPath, "%s%s", dirName, dirItem->d_name);

// Call stat(), put file stats in statbuf.
int ret = stat(fullPath, &statbuf);

// On sucess stat() returns 0.
if (ret == 0)
total += statbuf.st_size;

printf("\n%s - %ld", fullPath, statbuf.st_size);

free(fullPath);
fullPath = NULL;
}
// If item is a dir, but not '.' or '..', make recursive call.
else
{
if ( (strcmp(dirItem->d_name, ".") != 0) &&
(strcmp(dirItem->d_name, "..") != 0) )
{
// Make full path to subdir, add a final '/'.
int fullPathToSubDirLen = strlen(dirName) +
strlen(dirItem->d_name) + 2;
char* fullPathToSubDir = (char*) malloc(fullPathToSubDirLen);
sprintf(fullPathToSubDir, "%s%s/", dirName, dirItem->d_name);
printf("\nNew Dir: %s", fullPathToSubDir);

// Make recursive call.
total += GetDirectoryTreeSize(fullPathToSubDir);

free(fullPathToSubDir);
fullPathToSubDir = NULL;
}
}
}
closedir(directory);

return total;
}
 
B

Ben Bacarisse

Poster Matt said:
My project needs the total file size of all the files in a directory
tree, so I wrote the function below. It calls itself recursively (wow,
haven't needed to use recursion in a while!). The function works fine,
but I have a couple of questions.

1) Am I freeing the memory correctly?

Yup. At least it looks fine to me.
2) It seems like a lot of code to do something so simple. I had a look
at basically the same thing I wrote in C# a couple of years ago (which
uses the same recursive logic but uses only 10 lines of code). Is the
code below more lengthy than need be? In other words, have I missed an
easier way of doing it, by my choice of C standard library
functions?

It looks about right from a C point of view. You might want to ask in
comp.unix.programmer about the system stuff, especially about
alternative ways to tackle this.
3) Any glaring mistakes in the code? I know I'm not checking malloc's
return value (I'll use my own mallocOrDie() function later), and ditch
the printf()s.

Nothing stands out. Looks tickety-boo to me.

<snip>
 
A

Alan Curry

|My project needs the total file size of all the files in a directory tree, so I
|wrote the function below. It calls itself recursively (wow, haven't needed to
|use recursion in a while!). The function works fine, but I have a couple of
|questions.
|
|1) Am I freeing the memory correctly?

There's a free for every allocation, and no way to do one without the other,
so yes.

|
|2) It seems like a lot of code to do something so simple. I had a look at
|basically the same thing I wrote in C# a couple of years ago (which uses the
|same recursive logic but uses only 10 lines of code). Is the code below more
|lengthy than need be? In other words, have I missed an easier way of doing it,
|by my choice of C standard library functions?

News flash: programs in high-level lanugages are shorter. If you want it in
one line, system("du -ab")

|
|3) Any glaring mistakes in the code? I know I'm not checking malloc's return
|value (I'll use my own mallocOrDie() function later), and ditch the printf()s.

Here are the bits that could be improved.

1. Putting a newline at the start of every printf, and none at the end, is a
weird habit that you should lose as soon as possible. Who wants a useless
blank line at the start of the output, and an unterminated line at the end?

2. long isn't always big enough to hold a file size (2 gigs isn't as big as
it used to be). long long would be better, off_t would be best since it's
specifically meant for file sizes.

3. d_type isn't present in all filesystems. To find out whether an entry is a
directory, you need to also check for DT_UNKNOWN, and if you get DT_UNKNOWN,
do a stat and check S_ISDIR(st_mode). st_mode is the regular way to find out
what type of file you have; d_type is a shortcut that works sometimes. So I'd
start with just using stat, and then add the d_type optimization later if
speed is important.

4. Directories take up space too, so you might want to stat every directory
and add them in. (If you've fixed the previous item, then you already have
the stat() in there for checking st_mode)

5. exit() seems a bit harsh when an unreadable directory is encountered,
maybe keep going and just maintain a counter of how many unreadable
directories were skipped.

6. On the other hand, when stat() fails, silently skipping the offending file
is probably not harsh enough. print a warning, or add that to the error
count, or something.

7. If you don't already know the difference between st_blocks and st_size,
read up on it make sure you've got the right one.
 
E

Eric Sosman

First of all if the code below is a mess on your screen, or if you'd
prefer to read it on a web page, then it's up on pastie - here:
http://pastie.org/816917

I'm back doing some C coding on Linux, having not coded in C for 15
years...

My project needs the total file size of all the files in a directory
tree, so I wrote the function below. It calls itself recursively (wow,
haven't needed to use recursion in a while!). The function works fine,
but I have a couple of questions.

1) Am I freeing the memory correctly?

Looks like it. An incredibly picky person might point out
that if opendir() fails you'll exit() with memory still un-freed,
but anybody that picky would be better employed picking his nose.
(A more practical picky person might argue that exit() is an abrupt
way to respond to opendir() failure, and suggest that you think
about less drastic ways to report trouble.)
2) It seems like a lot of code to do something so simple. I had a look
at basically the same thing I wrote in C# a couple of years ago (which
uses the same recursive logic but uses only 10 lines of code). Is the
code below more lengthy than need be? In other words, have I missed an
easier way of doing it, by my choice of C standard library functions?

No; you're already well beyond what the Standard C library
can do. There's no stat(), no opendir() -- there's not even a
concept of "directory" at all.

said:
3) Any glaring mistakes in the code? I know I'm not checking malloc's
return value (I'll use my own mallocOrDie() function later), and ditch
the printf()s.

A `long' variable can handle values up to 2147483647, but
(depending on the implementation) might not be able to deal with
anything larger. Thus, if you've got 2GB or more of total files,
you may get strange results.

It's peculiar that if stat() fails you try to print the file
size anyhow. (The contrast between the treatment of stat() and
opendir() failures is rather stark.)

<off-topic> Something that isn't a DT_DIR is not necessarily
a file with a meaningful size: It might be a FIFO, or a character
special file, or a symlink, or ... </off-topic>

Summary: From a C perspective (and taking the "beyond C"
stuff on faith) the code looks all right, with a caveat about
the range of `long' and a raised eyebrow at the error handling.
From a "get this task done on a Linux system" perspective, there
are several things that need examination and might be improved,
but you should try a Linux or Unix forum for those issues.
 
E

Ersek, Laszlo

I'm back doing some C coding on Linux, having not coded in C for 15 years...
My project needs the total file size of all the files in a directory
tree,

This is not exact enough. Since you've mentioned Linux: will you follow
symbolic links, for example? Will you veer off into different
filesystems? Are you interested in the sum of the apparent file sizes,
or in the sum derived from the number of allocated blocks?

2) It seems like a lot of code to do something so simple. I had a look at
basically the same thing I wrote in C# a couple of years ago (which uses the
same recursive logic but uses only 10 lines of code). Is the code below more
lengthy than need be? In other words, have I missed an easier way of doing it,
by my choice of C standard library functions?

The Single Unix Specification offers nftw(), for example.

http://www.opengroup.org/onlinepubs/000095399/functions/nftw.html
Code:
[/QUOTE]

I've only looked at where you've put closedir(). Now it is my impression
that you may run out of file descriptors (opened for directory streams)
on very deep directory hierarchies.

http://www.opengroup.org/onlinepubs/000095399/functions/opendir.html

nftw() handles this. (If you're curious how, I suggest you look at the
glibc source, or I can dig up my implementation for you, which I
originally wrote for Cygwin when it still lacked it (I was too late in
the end so it wasn't even reviewed, alas)).

Furthermore, the "d_type" member of struct dirent is not even SUS (let
alone C90/C99).

http://www.opengroup.org/onlinepubs/000095399/basedefs/dirent.h.html

On some Linux filesystems, for example ext2 derivatives with the
"filetype" filesystem flag set, directory entries store file type
information, and "d_type" is populated. Normally, for the retrieval of
such information one would have to read the inode referred to by the
directory entry, by way of an explicit stat() call. From the Linux
manual of readdir():

----v----
Other than Linux, the d_type field is available mainly only on BSD
systems. This field makes it possible to avoid the expense of calling
stat(2) if further actions depend on the type of the file. [...] If the
file type could not be determined, the value DT_UNKNOWN is returned in
d_type.
----^----

Thus a usable d_type is not guaranteed even on Linux.

Finally, I wouldn't use a signed type for representing the sum.
Unfortunately, this opens a can of worms on SUSv2, since off_t (type of
st_size) is an extended signed integral type there, and summing it in
any other type is risky (the presence of uint64_t is mandatory, but
still there is no built-in printf() support for either uint64_t or
off_t).

Starting with SUSv3 (which is based on C99), we can calculate the sum in
an uintmax_t variable, and we can print it too. Here's an example.

----v----
#define _XOPEN_SOURCE 600 /* state SUSv3 XSI conformance */

#include <stdint.h>       /* uintmax_t */
#include <ftw.h>          /* struct FTW */
#include <stdio.h>        /* fprintf() */
#include <string.h>       /* strrchr() */
#include <limits.h>       /* _POSIX_OPEN_MAX */
#include <unistd.h>       /* STDERR_FILENO */
#include <stdlib.h>       /* EXIT_SUCCESS */
#include <errno.h>        /* errno */

static uintmax_t apparent_sum;
static const char *pname;

static int
callback(const char *pathname, const struct stat *sbuf, int info,
    struct FTW *pos)
{
  const char *problem;

  switch (info) {
    case FTW_DNR: problem = "directory not readable";       break;
    case FTW_NS:  problem = "permission denied for stat()"; break;
    case FTW_F:   apparent_sum += (uintmax_t)sbuf->st_size;
    default:      return 0;
  }

  (void)fprintf(stderr, "%s: %s: \"%s\"\n", pname, problem, pathname);
  return 1;
}


int
main(int argc, char **argv)
{

  pname = strrchr(argv[0], '/');
  pname = pname ? pname + 1 : argv[0];

  switch (nftw(argv[1] ? argv[1] : ".", &callback,
      _POSIX_OPEN_MAX - (STDERR_FILENO + 1), FTW_PHYS)) {
    case 0:
      if (0 <= fprintf(stdout, "%ju\n", apparent_sum)
          && 0 == fflush(stdout)) {
        return EXIT_SUCCESS;
      }
      break;

    case -1:
      (void)fprintf(stderr, "%s: nftw(): %s\n", pname, strerror(errno));
  }

  return EXIT_FAILURE;
}
----^----

(
argc is at least 1, see arg0 / argv[0] in
http://www.opengroup.org/onlinepubs/000095399/functions/exec.html
)

Theoretically, you should compile this with the standard c99 compiler:

http://www.opengroup.org/onlinepubs/000095399/utilities/c99.html

and if you wanted to select "Programming Environments" (ILP32_OFF32,
ILP32_OFFBIG, LP64_OFF64, LPBIG_OFFBIG) different from the default, you
should do that via getconf.

http://www.opengroup.org/onlinepubs/000095399/utilities/getconf.html

Nonetheless, before submitting this, I've compiled the above with

$ gcc99 -o walk walk.c

And the contents of my gcc99 script is

#!/bin/sh
exec gcc                                                               \
    -pipe -std=c99 -pedantic -fhosted -fno-builtin -Wall -Wextra       \
    -Wfloat-equal -Wundef -Wshadow -Wlarger-than-32767 -Wpointer-arith \
    -Wbad-function-cast  -Wcast-qual -Wcast-align -Wwrite-strings      \
    -Wstrict-prototypes -Wformat=2  -Wmissing-prototypes               \
    -Wmissing-declarations -Wredundant-decls -Wnested-externs          \
    -Wunreachable-code -Winline "$@"

Cheers,
lacos
 
N

Nobody

3) Any glaring mistakes in the code? I know I'm not checking malloc's return
value (I'll use my own mallocOrDie() function later), and ditch the printf()s.
// Make sure char* dirName ends in a '/'.

I'd do it the other way around, i.e. assume that it doesn't end in a '/'
and use:

sprintf(fullPath, "%s/%s", dirName, dirItem->d_name);

In the event that dirName does contain a slash, adding another one
will be harmless.
long GetDirectoryTreeSize(char* dirName)

I'd recommend using off_t rather than long. It's fairly
straightfoward to use 64-bit off_t on a 32-bit system
(-D_FILE_OFFSET_BITS=64), and 2GiB really isn't a lot these days.

Also, see the ftw(3) and fts(3) manpages for built-in functions for
traversing a directory tree. ftw() and nftw() are specified by POSIX
(although POSIX 2008 lists ftw() as obsolescent), while fts_* are from BSD.
 
P

Poster Matt

Ben said:
<snip>

It looks about right from a C point of view. You might want to ask in
comp.unix.programmer about the system stuff, especially about
alternative ways to tackle this.

Ok. Thanks for the pointer.
Nothing stands out. Looks tickety-boo to me.

Cheers Ben.
 
P

Poster Matt

Alan said:
|2) It seems like a lot of code to do something so simple. I had a look at
|basically the same thing I wrote in C# a couple of years ago (which uses the
|same recursive logic but uses only 10 lines of code). Is the code below more
|lengthy than need be? In other words, have I missed an easier way of doing it,
|by my choice of C standard library functions?

News flash: programs in high-level lanugages are shorter. If you want it in
one line, system("du -ab")

It's a bit of a shock-to-the-system having not programmed in C for 15 years,
things that can be accomplished in a high level language in a single low effort
line suddenly need a whole function for themselves. :) I'm enjoying it a lot though.

Here are the bits that could be improved.

1. Putting a newline at the start of every printf, and none at the end, is a
weird habit that you should lose as soon as possible. Who wants a useless
blank line at the start of the output, and an unterminated line at the end?

The printf calls were just to see what was happening while writing the function,
before it went into my project's code at which point the printfs will be no
more. I don't think it's that weird though, just a preference, and I always add
an extra \n before the end - not exactly really living life on the edge. ;)

2. long isn't always big enough to hold a file size (2 gigs isn't as big as
it used to be). long long would be better, off_t would be best since it's
specifically meant for file sizes.

Thanks, it's changed to off_t. [Though on my system long was happily handling
sizes > 8GB.]

3. d_type isn't present in all filesystems. To find out whether an entry is a
directory, you need to also check for DT_UNKNOWN, and if you get DT_UNKNOWN,
do a stat and check S_ISDIR(st_mode). st_mode is the regular way to find out
what type of file you have; d_type is a shortcut that works sometimes. So I'd
start with just using stat, and then add the d_type optimization later if
speed is important.

Another important point, cheers. My code now just uses stat() and st_mode to
determine whether a file is a regular file or a dir.

4. Directories take up space too, so you might want to stat every directory
and add them in. (If you've fixed the previous item, then you already have
the stat() in there for checking st_mode)

Yes indeedy - however I just want the total file sizes, not interested in how
much space the dir itself takes.

5. exit() seems a bit harsh when an unreadable directory is encountered,
maybe keep going and just maintain a counter of how many unreadable
directories were skipped.

Work in progress, it was the program logic I was querying, the exit() call would
never have made it into my real world code.

6. On the other hand, when stat() fails, silently skipping the offending file
is probably not harsh enough. print a warning, or add that to the error
count, or something.

As above, it'll be handled differently in my project.
7. If you don't already know the difference between st_blocks and st_size,
read up on it make sure you've got the right one.

Understood - and I'm using the right one.

Many thanks.
 
P

Poster Matt

Eric said:
Looks like it. An incredibly picky person might point out
that if opendir() fails you'll exit() with memory still un-freed,
but anybody that picky would be better employed picking his nose.
(A more practical picky person might argue that exit() is an abrupt
way to respond to opendir() failure, and suggest that you think
about less drastic ways to report trouble.)

As I wrote to Alan: Work in progress, it was the program logic I was querying,
the exit() call would never have made it into my real world code.

No; you're already well beyond what the Standard C library
can do. There's no stat(), no opendir() -- there's not even a
concept of "directory" at all.

My wrong terminology. :) Remember I'm coming back to C after 15 years, SUS
didn't even exist when I stopped circa 1995.

<off-topic> Try "man -s3 ftw". </off-topic>

Thanks for the pointer.

A `long' variable can handle values up to 2147483647, but
(depending on the implementation) might not be able to deal with
anything larger. Thus, if you've got 2GB or more of total files,
you may get strange results.

Thanks, it's changed to off_t. [Though on my system long was happily handling
sizes > 8GB.]

It's peculiar that if stat() fails you try to print the file
size anyhow. (The contrast between the treatment of stat() and
opendir() failures is rather stark.)

As I wrote to Alan, the printfs would not enter my real world code.

<off-topic> Something that isn't a DT_DIR is not necessarily
a file with a meaningful size: It might be a FIFO, or a character
special file, or a symlink, or ... </off-topic>

DT_DIR is ditched, just using stat() and st_mode to determine whether I've got a
regular file or a dir.

Many thanks.
 
P

Poster Matt

This is not exact enough. Since you've mentioned Linux: will you follow
symbolic links, for example? Will you veer off into different
filesystems? Are you interested in the sum of the apparent file sizes,
or in the sum derived from the number of allocated blocks?

I just want the accumulated file sizes (in bytes) of all the files. Not
interested in following symbolic links, dir sizes, or other filesystems.

The Single Unix Specification offers nftw(), for example.

http://www.opengroup.org/onlinepubs/000095399/functions/nftw.html

Cool thanks for bringing the SUS to my attention. It didn't exist when I stopped
coding C in about 1995. That's gonna be really useful.

I've only looked at where you've put closedir(). Now it is my impression
that you may run out of file descriptors (opened for directory streams)
on very deep directory hierarchies.

Yes I think you're right I could. In reality my project is such that only very
small trees will exist and I've tested with no problems on trees massively
larger than any trees will be for my project.

http://www.opengroup.org/onlinepubs/000095399/functions/opendir.html

nftw() handles this. (If you're curious how, I suggest you look at the
glibc source, or I can dig up my implementation for you, which I
originally wrote for Cygwin when it still lacked it (I was too late in
the end so it wasn't even reviewed, alas)).

Ok. I may switch to using nftw().

Furthermore, the "d_type" member of struct dirent is not even SUS (let
alone C90/C99).

http://www.opengroup.org/onlinepubs/000095399/basedefs/dirent.h.html

Ok thanks, as my replies to others said - DT_DIR is ditched, I'm now just using
stat() and st_mode to determine whether I've got a regular file or a dir.

Finally, I wouldn't use a signed type for representing the sum.
Unfortunately, this opens a can of worms on SUSv2, since off_t (type of
st_size) is an extended signed integral type there, and summing it in
any other type is risky (the presence of uint64_t is mandatory, but
still there is no built-in printf() support for either uint64_t or
off_t).

Again thanks and as my replies to others said the long is long gone :) - off_t
is working fine and I don't need printf support for it.

Starting with SUSv3 (which is based on C99), we can calculate the sum in
an uintmax_t variable, and we can print it too. Here's an example.

Thanks for the example code.

I'm a little confused. You seem to be saying that summing in anything other than
off_t is 'risky' - I'm not sure why? Nor what the advantages of using uintmax_t
would be?

Many thanks.
 
P

Poster Matt

William said:
Your traversal and stat routines aren't "standard C", per se. They're POSIX
routines, and POSIX indeed offers a better alternative: nftw(3). There may
be other suitable alternatives; inquire over at comp.unix.programmer.

The equivalent logic here is less than 10 lines, I think, depending on how
you count. But here's the entire program, main() and all, including any
bugs.

Many thanks. Your code is a lot more concise and a nftw solution may be the way
to go. Thanks for the code and the pointer to nftw.
 
P

Poster Matt

Nobody said:
I'd do it the other way around, i.e. assume that it doesn't end in a '/'
and use:

sprintf(fullPath, "%s/%s", dirName, dirItem->d_name);

In the event that dirName does contain a slash, adding another one
will be harmless.

Yes that it better and I've changed my code. I didn't know that adding extra
'/'s is harmless. Cheers.

I'd recommend using off_t rather than long. It's fairly
straightfoward to use 64-bit off_t on a 32-bit system
(-D_FILE_OFFSET_BITS=64), and 2GiB really isn't a lot these days.

Also, see the ftw(3) and fts(3) manpages for built-in functions for
traversing a directory tree. ftw() and nftw() are specified by POSIX
(although POSIX 2008 lists ftw() as obsolescent), while fts_* are from BSD.

As I've written to the other guys, off_t is now used and I may be changing to a
nftw solution.

Many thanks.
 
E

Ersek, Laszlo

SUS didn't even exist when I stopped circa 1995.

That might be wrong, dependent on how permissive your "circa" is.
Following the links under

http://www.unix.org/what_is_unix/single_unix_specification.html

to the five X/Open CAE (Common Applications Environment) documents that
make up the SUSv1 -- and (partially) the certification for the UNIX 95
brand --

http://www.unix.org/public/pubs/catalog/c434.htm
http://www.unix.org/public/pubs/catalog/c435.htm
http://www.unix.org/public/pubs/catalog/c436.htm
http://www.unix.org/public/pubs/catalog/c438.htm
http://www.unix.org/public/pubs/catalog/c610.htm

then looking at the "Bibliographic Details" on each individual page
(except the last one, because that document is no longer available on
their site), we can see "Sep 1994".

Cheers,
lacos
(ever-tangential poster)
 
E

Ersek, Laszlo

Ersek, Laszlo wrote:
I'm a little confused. You seem to be saying that summing in anything other than
off_t is 'risky' - I'm not sure why? Nor what the advantages of using uintmax_t
would be?

I'm sorry, I wasn't expressing myself clearly. This is what I meant:

Under SUSv2, you have off_t. It is an extended signed integral type.
"st_size" returned by stat() is of this type. It is permitted to have 63
value bits, for example. If you tried to sum such values in a "long"
(with possibly 31 value bits), you might introduce
implementation-defined behavior (by demoting a suitably big off_t to
"long", see C89 6.2.1.2p3). This is "risky". With "long unsigned", the
result of such a demotion is defined, but you might lose information.
With uint64_t instead of "long unsigned", this is less probable, but
then you can't easily print an uint64_t variable.

Summing in off_t is equally "dangerous" under both SUSv2 and SUSv3,
because
- it is signed,
- it is probably at least as wide (high-ranking) as "int", so that it
is not affected by integral (integer) promotions,
- and if the result of the addition cannot be represented in it, the
behavior is undefined.

(Detour: suppose SCHAR_MAX == 127 and INT_MAX == 2147483647. Then
(2147483647 + 2147483647) is undefined behavior, but
(char signed)( (char signed)127 + (char signed)127 ) yields an
implementation-defined value. Am I right?)

Under SUSv3, you have uintmax_t. It can represent any non-negative value
that is representable by any integer type. It can be printed with
fprintf(). It is unsigned. So any single non-negative off_t can be
converted to it; it offers the greatest range for summation; if it still
does overflow, that's well defined behavior; and in the end, you can
print it.

Some examples:

- SUSv2, program is compiled for the XBS5_ILP32_OFF32 programming
environment. You find two files of size 2G-1, the addition done in off_t
overflows: undefined behavior. This holds theoretically (with greater
and more files) for all other standard PE's too (XBS5_ILP32_OFFBIG,
XBS5_LPBIG_OFFBIG, XBS5_LP64_OFF64), because off_t is signed.

- SUSv2, program is compiled for the XBS5_ILP32_OFFBIG programming
environment. You try to convert an off_t (either sum or individual)
value of 2G to "long": implementation-defined result.

In short, under SUSv2 I'd represent the sum in an uint64_t variable and
print it "manually" if necessary, while under SUSv[34] I'd choose
uintmax_t.

As a final note, the Open Group's "64-bit and Data Size Neutrality"
whitepaper [0], written presumably on the occasion of SUSv2, says this
(quote completely out of context, please look it up):

"size_t must represent the largest unsigned type supported by an
implementation"

But I don't consider this normative, and the SUSv2 doesn't seem to
support this claim. (See <sys/types.h> [1] and <stddef.h> [2].) Thus I'd
still choose uint64_t on SUSv2 for the sum.

(I think this was even more confusing -- sorry.)

Cheers,
lacos

[0] http://www.unix.org/whitepapers/64bit.html
[1] http://www.opengroup.org/onlinepubs/007908775/xsh/systypes.h.html
[2] http://www.opengroup.org/onlinepubs/007908775/xsh/stddef.h.html
 
P

Poster Matt

I'm sorry, I wasn't expressing myself clearly. This is what I meant:

-SNIP-

In short, under SUSv2 I'd represent the sum in an uint64_t variable and
print it "manually" if necessary, while under SUSv[34] I'd choose
uintmax_t.

-SNIP-

(I think this was even more confusing -- sorry.)

No not more confusing, once I'd read it a few times. :)

Thanks very much for taking the time to explain in such detail, I really
appreciate it. Thanks also for the 'in short' bit above (not snipped) which made
it crystal clear what I should use. Since that's SUSv2, uint64_t. I had a couple
of painful moments until I worked out that it's defined in <inttypes.h>.

The solution (below) is nice and concise, not to mention infinitely more elegant
than the mess I started off with.

Thanks Lacos.

static uint64_t nftwTotalByteCount = 0;

static int nftwCallBackAddFileSize(const char *path, const struct stat *st, int
flag, struct FTW *pos)
{
// Normal file, add size.
if (flag == FTW_F)
nftwTotalByteCount += (uint64_t) st->st_size;

// nftw couldn't read dir.
else if (flag == FTW_DNR)
return -1;

// nftw's call to stat() failed.
else if (flag == FTW_NS)
return -1;

// On success return 0.
return 0;
}

int rv = nftw(dirName, &nftwCallBackAddFileSize, _POSIX_OPEN_MAX, FTW_PHYS |
FTW_MOUNT);
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,565
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top