Strange Behaviour in finding Size of a File

Discussion in 'C Programming' started by felix, Nov 9, 2012.

  1. felix

    felix Guest

    This method was written to create new Log File, when the size of the Log File reaches a max size defined by user [10MB in our case]. Here is the code snippet that does this check:

    //-- Code starts here : --

    static size_t LogSize = 1048576;
    bool CreateNewLogs = false;


    if ( stat ( logFile, &results ) == 0 )
    {
    if ( results.st_size > LogSize )
    {
    CreateNewLogs = true;
    }
    }
    //-- Code ends here : --

    It is strange that the condition got satisfied when results.st_size = 2589116.
    And we are sure that the size of the data that is written is between 50 to 100 bytes in one operation. And this check is done before writing into the LogFile.

    I am not sure if I am missing anything in our understanding of the stat function. Any inputs or pointers on this regard will be really Helpful.


    Thanks in advance, and please let me know if any other information is required.


    --
    Felix
    felix, Nov 9, 2012
    #1
    1. Advertising

  2. felix

    James Kuyper Guest

    On 11/09/2012 02:09 AM, felix wrote:
    > This method was written to create new Log File, when the size of the Log File reaches a max size defined by user [10MB in our case]. Here is the code snippet that does this check:
    >
    > //-- Code starts here : --
    >
    > static size_t LogSize = 1048576;
    > bool CreateNewLogs = false;
    >
    >
    > if ( stat ( logFile, &results ) == 0 )


    That is presumably the POSIX stat() function, or something similar? If
    so, its behavior is defined by the POSIX standard, not the C standard,
    and you'll get better answers to your questions in comp.unix.programmer
    than in this newsgroup.

    > {
    > if ( results.st_size > LogSize )
    > {
    > CreateNewLogs = true;
    > }
    > }
    > //-- Code ends here : --
    >
    > It is strange that the condition got satisfied when results.st_size = 2589116.
    > And we are sure that the size of the data that is written is between 50 to 100 bytes in one operation. And this check is done before writing into the LogFile.


    Keep in mind that file I/O is normally buffered, so the buffer size is
    more relevant than the size of your individual writes. Still, that seems
    to be a rather large jump to explain by buffering.

    > I am not sure if I am missing anything in our understanding of the stat function. Any inputs or pointers on this regard will be really Helpful.


    The people in comp.unix.programming may need to know more details about
    how data is written to the file, and whether or not you've used any
    POSIX functions to change the file mode.
    Just to get a better idea of what's going on, I'd recommend reporting
    the file size somewhere (probably in a separate log file) every time you
    call stat().
    --
    James Kuyper
    James Kuyper, Nov 9, 2012
    #2
    1. Advertising

  3. felix

    Eric Sosman Guest

    On 11/9/2012 2:09 AM, felix wrote:
    > This method was written to create new Log File, when the size of the Log File reaches a max size defined by user [10MB in our case]. Here is the code snippet that does this check:
    >
    > //-- Code starts here : --
    >
    > static size_t LogSize = 1048576;


    Ah. This is obviously some strange usage of "10 MB" that
    I hadn't previously been aware of.

    > It is strange that the condition got satisfied when results.st_size = 2589116.


    Not all *that* strange ...

    --
    Eric Sosman
    d
    Eric Sosman, Nov 9, 2012
    #3
  4. felix

    John Gordon Guest

    In <> felix <> writes:

    > static size_t LogSize = 1048576;


    > if ( results.st_size > LogSize )


    > It is strange that the condition got satisfied when
    > results.st_size = 2589116.


    You're surprised that 2589116 is greater than 1048576?

    --
    John Gordon A is for Amy, who fell down the stairs
    B is for Basil, assaulted by bears
    -- Edward Gorey, "The Gashlycrumb Tinies"
    John Gordon, Nov 9, 2012
    #4
  5. felix

    Mark Bluemel Guest

    On 09/11/2012 15:16, John Gordon wrote:
    > In <> felix <> writes:
    >
    >> static size_t LogSize = 1048576;

    >
    >> if ( results.st_size > LogSize )

    >
    >> It is strange that the condition got satisfied when
    >> results.st_size = 2589116.

    >
    > You're surprised that 2589116 is greater than 1048576?


    No. Given that "we are sure that the size of the data that is written is
    between 50 to 100 bytes in one operation. And this check is done before
    writing into the LogFile." I think the OP is surprised that the
    condition wasn't satisfied earlier.

    I think James Kuyper has given some good advice.
    Mark Bluemel, Nov 9, 2012
    #5
  6. felix

    James Kuyper Guest

    On 11/09/2012 10:16 AM, John Gordon wrote:
    > In <> felix <> writes:
    >
    >> static size_t LogSize = 1048576;

    >
    >> if ( results.st_size > LogSize )

    >
    >> It is strange that the condition got satisfied when
    >> results.st_size = 2589116.

    >
    > You're surprised that 2589116 is greater than 1048576?


    No, he's surprised that, when checking this condition periodically,
    separated by writes of no more than 100 bytes, that it doesn't trigger
    until 2589116. Naively, it could be expected to trigger with
    results.st_size no more than 100 bytes larger than LogSize. Buffering is
    the simplest of the many reasons invalidating that conclusion; there's
    several others, and many people who are better equipped to explain those
    issues than I am, so I'll leave that explanation to them, rather than
    embarrassing myself by getting it wrong.
    James Kuyper, Nov 9, 2012
    #6
  7. felix

    Greg Martin Guest

    On 12-11-08 11:09 PM, felix wrote:
    > This method was written to create new Log File, when the size of the Log File reaches a max size defined by user [10MB in our case]. Here is the code snippet that does this check:
    >
    > //-- Code starts here : --
    >
    > static size_t LogSize = 1048576;
    > bool CreateNewLogs = false;
    >
    >
    > if ( stat ( logFile, &results ) == 0 )
    > {
    > if ( results.st_size > LogSize )
    > {
    > CreateNewLogs = true;
    > }
    > }
    > //-- Code ends here : --
    >
    > It is strange that the condition got satisfied when results.st_size = 2589116.
    > And we are sure that the size of the data that is written is between 50 to 100 bytes in one operation. And this check is done before writing into the LogFile.
    >
    > I am not sure if I am missing anything in our understanding of the stat function. Any inputs or pointers on this regard will be really Helpful.
    >
    >
    > Thanks in advance, and please let me know if any other information is required.
    >
    >
    > --
    > Felix
    >


    I would think it strange if I knew for sure that the function got called
    before the write in every place that the file was possibly being written
    to and that there weren't multiple process/threads that could write to it.
    Greg Martin, Nov 9, 2012
    #7
  8. felix <> writes:

    > if ( stat ( logFile, &results ) == 0 )


    Use ftell() instead (or lseek() at level 2).

    -- Alain.
    Alain Ketterlin, Nov 9, 2012
    #8
  9. Alain Ketterlin <-strasbg.fr> writes:
    > felix <> writes:
    >> if ( stat ( logFile, &results ) == 0 )

    >
    > Use ftell() instead (or lseek() at level 2).


    What makes you think that will yield better results, and what is "level 2"?

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Nov 9, 2012
    #9
  10. felix <> writes:
    > This method was written to create new Log File, when the size of the
    > Log File reaches a max size defined by user [10MB in our case]. Here
    > is the code snippet that does this check:
    >
    > //-- Code starts here : --
    >
    > static size_t LogSize = 1048576;
    > bool CreateNewLogs = false;
    >
    >
    > if ( stat ( logFile, &results ) == 0 )
    > {
    > if ( results.st_size > LogSize )
    > {
    > CreateNewLogs = true;
    > }
    > }
    > //-- Code ends here : --
    >
    > It is strange that the condition got satisfied when results.st_size =
    > 2589116. And we are sure that the size of the data that is written is
    > between 50 to 100 bytes in one operation. And this check is done
    > before writing into the LogFile.


    It's been pointed out that 1048576 is the wrong value if you want 10 MB
    (more pedantically, 10 MiB). But LogSize is also the wrong type.
    It should be the same type as the st_size member.

    You probably want to use "const" rather than "static" in the definition
    of LogSize, unless the value can change.

    Neither of these is likely to be the cause of the problem you're seeing.
    Since stat() is defined by POSIX, not by C, you'll likely get better
    answers in comp.unix.programmer.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Nov 9, 2012
    #10
  11. Keith Thompson <> writes:

    > Alain Ketterlin <-strasbg.fr> writes:
    >> felix <> writes:
    >>> if ( stat ( logFile, &results ) == 0 )

    >>
    >> Use ftell() instead (or lseek() at level 2).

    >
    > What makes you think that will yield better results,


    What makes you think it will not. At least ftell() is C.

    > and what is "level 2"?


    Unixism. Just ignore it if it hurts you.

    -- Alain.
    Alain Ketterlin, Nov 9, 2012
    #11
  12. Alain Ketterlin <-strasbg.fr> writes:
    > Keith Thompson <> writes:
    >> Alain Ketterlin <-strasbg.fr> writes:
    >>> felix <> writes:
    >>>> if ( stat ( logFile, &results ) == 0 )
    >>>
    >>> Use ftell() instead (or lseek() at level 2).

    >>
    >> What makes you think that will yield better results,

    >
    > What makes you think it will not. At least ftell() is C.


    We know that stat() isn't working as the OP expects. I'd expect
    ftell() to yield exactly the same results -- and it requires opening
    the file and seeking to the end of it. Furthermore, there's no
    guarantee in C that the fseek()/ftell() trick will accurately yield
    the size of a file. Binary streams may legally be padded with an
    implementation-defined number of null characters (N1370 7.21.2p3),
    and "A binary stream need not meaningfully support fseek calls
    with a whence value of SEEK_END" (7.21.9.2p3). For text streams,
    the value returned by ftell() isn't necessarily meaningful except
    as an argument to fseek() (7.21.9.4p2).

    A POSIX environment makes more guarantees -- but as long as you're
    depending on POSIX, there's no good reason not to use stat()
    (or fstat() or lstat()).

    My point is that you suggested ftell() as a solution to the OP's
    problem. It isn't.

    >> and what is "level 2"?

    >
    > Unixism. Just ignore it if it hurts you.


    If you're referring to section 2 of the manual (ftell(2),
    documentation available via "man 2 ftell"), I've never heard of
    that being referred to as "level 2". Try being a little less
    condescending and actually answering the question.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Nov 9, 2012
    #12
  13. Keith Thompson <> writes:

    > Alain Ketterlin <-strasbg.fr> writes:
    >> Keith Thompson <> writes:
    >>> Alain Ketterlin <-strasbg.fr> writes:
    >>>> felix <> writes:
    >>>>> if ( stat ( logFile, &results ) == 0 )
    >>>>
    >>>> Use ftell() instead (or lseek() at level 2).
    >>>
    >>> What makes you think that will yield better results,

    >>
    >> What makes you think it will not. At least ftell() is C.

    >
    > We know that stat() isn't working as the OP expects. I'd expect
    > ftell() to yield exactly the same results -- and it requires opening
    > the file and seeking to the end of it.


    If stat() doesn't give the correct result in the OP's use case (writing
    chunks, and testing whether the size has reached a limit), it's probably
    because the file is still open. And several people have suggested that
    the problem was probably with the buffering. The doc of ftell() says (on
    my system):

    | The ftell() function obtains the current value of the file position
    | indicator for the stream pointed to by stream.

    In my opinion, that's a pretty good hint given the problem description.

    > Furthermore, there's no guarantee in C that the fseek()/ftell() trick
    > will accurately yield the size of a file. Binary streams may legally
    > be padded with an implementation-defined number of null characters
    > (N1370 7.21.2p3), and "A binary stream need not meaningfully support
    > fseek calls with a whence value of SEEK_END" (7.21.9.2p3). For text
    > streams, the value returned by ftell() isn't necessarily meaningful
    > except as an argument to fseek() (7.21.9.4p2).


    I know all this, thank you, but I have no indication that the OP is in
    any of these cases. I just gave the OP a track to follow. He/she would
    surely come back with another (and maybe more precise) question if
    things turn out to be more difficult.

    > A POSIX environment makes more guarantees -- but as long as you're
    > depending on POSIX, there's no good reason not to use stat()
    > (or fstat() or lstat()).


    Keeping the file open is a good reason to not use stat() (see code
    above).

    > My point is that you suggested ftell() as a solution to the OP's
    > problem. It isn't.


    OK, call it a hint if you want...

    >>> and what is "level 2"?

    >>
    >> Unixism. Just ignore it if it hurts you.

    >
    > If you're referring to section 2 of the manual (ftell(2),
    > documentation available via "man 2 ftell"), I've never heard of
    > that being referred to as "level 2".


    OK now you have. I didn't think it was that hard to understand, given
    that lseek was two words away.

    > Try being a little less condescending and actually answering the
    > question.


    Try being a little less condescending and actually helping people asking
    for help instead of being pedantic (your words) on size units and
    everything but the OP's problem.

    -- Alain.
    Alain Ketterlin, Nov 10, 2012
    #13
  14. felix

    Philip Lantz Guest

    Alain Ketterlin wrote:
    > Keith Thompson <> writes:
    > > Alain Ketterlin <-strasbg.fr> writes:
    > >> Keith Thompson <> writes:
    > >>> Alain Ketterlin <-strasbg.fr> writes:
    > >>>> felix <> writes:
    > >>>>> if ( stat ( logFile, &results ) == 0 )
    > >>>>
    > >>>> Use ftell() instead (or lseek() at level 2).
    > >>> and what is "level 2"?
    > >>
    > >> Unixism. Just ignore it if it hurts you.

    > >
    > > If you're referring to section 2 of the manual (ftell(2),
    > > documentation available via "man 2 ftell"), I've never heard of
    > > that being referred to as "level 2".

    >
    > OK now you have. I didn't think it was that hard to understand, given
    > that lseek was two words away.


    I didn't understand it either--I thought you meant to use a value of 2
    for whence, and I thought it was a strange way to express that. It never
    occurred to me you were referring to section 2 of the Unix Programmer's
    Manual.
    Philip Lantz, Nov 10, 2012
    #14
  15. In article <-september.org>,
    Philip Lantz <> wrote:
    ....
    >I didn't understand it either--I thought you meant to use a value of 2
    >for whence, and I thought it was a strange way to express that. It never
    >occurred to me you were referring to section 2 of the Unix Programmer's
    >Manual.


    Well, live & learn!

    See, the Usenet can be a helpful thing after all.

    --
    Just for a change of pace, this sig is *not* an obscure reference to
    comp.lang.c...
    Kenny McCormack, Nov 10, 2012
    #15
  16. felix

    James Kuyper Guest

    On 11/09/2012 03:33 PM, Keith Thompson wrote:
    > Alain Ketterlin <-strasbg.fr> writes:
    >> Keith Thompson <> writes:
    >>> Alain Ketterlin <-strasbg.fr> writes:
    >>>> felix <> writes:
    >>>>> if ( stat ( logFile, &results ) == 0 )
    >>>>
    >>>> Use ftell() instead (or lseek() at level 2).
    >>>
    >>> What makes you think that will yield better results,

    >>
    >> What makes you think it will not. At least ftell() is C.

    >
    > We know that stat() isn't working as the OP expects. I'd expect
    > ftell() to yield exactly the same results -- and it requires opening
    > the file and seeking to the end of it. ...


    Opening the file and seeking to the end is not a problem; it's already
    open, or at least it was at the time of the last write, and the current
    write position was presumably at the end of the file, otherwise it
    wouldn't be growing. I'd expect ftell() to give a better indication of
    the bytes written than stat()=>st_size, since I wouldn't expect the
    value returned by ftell() to be affected by issues such as buffering.
    --
    James Kuyper
    James Kuyper, Nov 10, 2012
    #16
  17. James Kuyper <> writes:
    > On 11/09/2012 03:33 PM, Keith Thompson wrote:
    >> Alain Ketterlin <-strasbg.fr> writes:
    >>> Keith Thompson <> writes:
    >>>> Alain Ketterlin <-strasbg.fr> writes:
    >>>>> felix <> writes:
    >>>>>> if ( stat ( logFile, &results ) == 0 )
    >>>>>
    >>>>> Use ftell() instead (or lseek() at level 2).
    >>>>
    >>>> What makes you think that will yield better results,
    >>>
    >>> What makes you think it will not. At least ftell() is C.

    >>
    >> We know that stat() isn't working as the OP expects. I'd expect
    >> ftell() to yield exactly the same results -- and it requires opening
    >> the file and seeking to the end of it. ...

    >
    > Opening the file and seeking to the end is not a problem; it's already
    > open, or at least it was at the time of the last write, and the current
    > write position was presumably at the end of the file, otherwise it
    > wouldn't be growing. I'd expect ftell() to give a better indication of
    > the bytes written than stat()=>st_size, since I wouldn't expect the
    > value returned by ftell() to be affected by issues such as buffering.


    Ok, good point.

    That's assuming that the program that's querying the size of the file
    is the same one that's writing to the file, and that the querying
    code has access to the relevant FILE*. That's not entirely obvious
    from the original post, but it seems likely.

    (As I said earlier, C doesn't guarantee that the value returned by
    ftell() is meaningful for text streams, but that's unlikely to be
    an issue for the OP.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Nov 10, 2012
    #17
  18. felix

    James Kuyper Guest

    On 11/09/2012 10:24 PM, Keith Thompson wrote:
    > James Kuyper <> writes:

    ....
    >> Opening the file and seeking to the end is not a problem; it's already
    >> open, or at least it was at the time of the last write, and the current
    >> write position was presumably at the end of the file, otherwise it
    >> wouldn't be growing. I'd expect ftell() to give a better indication of
    >> the bytes written than stat()=>st_size, since I wouldn't expect the
    >> value returned by ftell() to be affected by issues such as buffering.

    >
    > Ok, good point.
    >
    > That's assuming that the program that's querying the size of the file
    > is the same one that's writing to the file, and that the querying
    > code has access to the relevant FILE*. That's not entirely obvious
    > from the original post, but it seems likely.


    The behavior that's controlled by the results of this query is the
    creation of a new log file, so it never even occurred to me to consider
    that the log file might be being written by some other program.
    --
    James Kuyper
    James Kuyper, Nov 10, 2012
    #18
  19. felix

    Philip Lantz Guest

    Kenny McCormack wrote:
    >
    > In article <-september.org>,
    > Philip Lantz <> wrote:
    > ...
    > >I didn't understand it either--I thought you meant to use a value of 2
    > >for whence, and I thought it was a strange way to express that. It never
    > >occurred to me you were referring to section 2 of the Unix Programmer's
    > >Manual.

    >
    > Well, live & learn!
    >
    > See, the Usenet can be a helpful thing after all.


    What am I supposed to have learned? Is the term "level" commonly used to
    identify a section of the Unix Programmer's Manual?
    Philip Lantz, Nov 10, 2012
    #19
  20. felix

    Eric Sosman Guest

    On 11/9/2012 10:24 PM, Keith Thompson wrote:
    >[...]
    > (As I said earlier, C doesn't guarantee that the value returned by
    > ftell() is meaningful for text streams, but that's unlikely to be
    > an issue for the OP.)


    If the all the logging happens in one place (or a small
    number of nearby places), and if it all happens during one
    execution of the program, the O.P. can do the entire job in
    purely portable C. Something like

    static FILE *logStream;
    static size_t logLength;

    void writeLog(const char *format, ...) {
    if (logStream == NULL) {
    logStream = openLog(...);
    logLength = 0;
    }

    va_list ap;
    va_start(ap, format);
    logLength += vfprintf(format, ap);

    va_end(ap);
    if (logLength > LIMIT) {
    closeLog(logStream);
    logStream = NULL;
    }
    }

    .... should do it, along with a little error-checking and such.

    (Okay, okay: The number of characters written to a stream is
    not necessarily the same thing as the number of bytes stored on
    a disk. Nonetheless, for "start a new log when the old one gets
    too big" purposes it should be close enough.)

    --
    Eric Sosman
    d
    Eric Sosman, Nov 10, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    719
  2. tiewknvc9
    Replies:
    6
    Views:
    647
    Chris Uppal
    Oct 1, 2006
  3. Jason Cavett

    Preferred Size, Minimum Size, Size

    Jason Cavett, May 23, 2008, in forum: Java
    Replies:
    5
    Views:
    12,522
    Michael Jung
    May 25, 2008
  4. Replies:
    38
    Views:
    817
    Eric Sosman
    Nov 16, 2011
  5. Francesco Moi

    Strange behaviour when parsing a XML file

    Francesco Moi, Jul 26, 2005, in forum: Perl Misc
    Replies:
    2
    Views:
    191
    Francesco Moi
    Jul 27, 2005
Loading...

Share This Page