How to get file count under a directory?

Discussion in 'C++' started by rockdale, Sep 28, 2009.

  1. rockdale

    rockdale Guest

    Hi,

    I have an application which writes log files out. If then log file
    size is great than let's say 1M, the application will create a new log
    file with sequence number. the log file format likes
    mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
    limit.

    Now the problem is if my application get restarted, I need to know
    what is the largest sequence number of my log file. I am thinking in
    a loop from 1 to like 100000, check if the file exist, if it does
    not , then I get the max sequence number I need. But this method looks
    very awkward. Is there another way to do this(get the max number for a
    series of similar files)?

    My applicaiton is running on windows platform but did not using MFC
    function very much.

    Thanks in advance
    -Rockdale
    rockdale, Sep 28, 2009
    #1
    1. Advertising

  2. rockdale wrote:
    > I have an application which writes log files out. If then log file
    > size is great than let's say 1M, the application will create a new log
    > file with sequence number. the log file format likes
    > mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
    > limit.
    >
    > Now the problem is if my application get restarted, I need to know
    > what is the largest sequence number of my log file. I am thinking in
    > a loop from 1 to like 100000, check if the file exist, if it does
    > not , then I get the max sequence number I need. But this method looks
    > very awkward. Is there another way to do this(get the max number for a
    > series of similar files)?


    Yes, and it's platform-specific. You can probably obtain a list of (or
    enumerate) the files whose name fits a certain pattern, like "log_*.*",
    and then find your largest number (behind the '*')...

    > My applicaiton is running on windows platform but did not using MFC
    > function very much.


    Try posting to a relevant newsgroup from 'microsoft.public.*' hierarchy
    where Windows platform-specific stuff is discussed.

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
    Victor Bazarov, Sep 28, 2009
    #2
    1. Advertising

  3. rockdale

    Sjouke Burry Guest

    rockdale wrote:
    > Hi,
    >
    > I have an application which writes log files out. If then log file
    > size is great than let's say 1M, the application will create a new log
    > file with sequence number. the log file format likes
    > mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
    > limit.
    >
    > Now the problem is if my application get restarted, I need to know
    > what is the largest sequence number of my log file. I am thinking in
    > a loop from 1 to like 100000, check if the file exist, if it does
    > not , then I get the max sequence number I need. But this method looks
    > very awkward. Is there another way to do this(get the max number for a
    > series of similar files)?
    >
    > My applicaiton is running on windows platform but did not using MFC
    > function very much.
    >
    > Thanks in advance
    > -Rockdale

    Step 100 at a time to go past the last one,
    then step 1 at a time trough the last partial block.
    Sjouke Burry, Sep 28, 2009
    #3
  4. rockdale

    mzdude Guest

    On Sep 28, 12:50 pm, rockdale <> wrote:
    > Hi,
    >
    > I have an application which writes log files out. If then log file
    > size is great than let's say 1M, the application will create a new log
    > file with sequence number. the log file format likes
    > mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
    > limit.
    >
    > Now the problem is if my application get restarted, I need to know
    > what is the largest sequence number  of my log file. I am thinking in
    > a loop from 1 to like 100000, check if the file exist, if it does
    > not , then I get the max sequence number I need. But this method looks
    > very awkward. Is there another way to do this(get the max number for a
    > series of similar files)?
    >
    > My applicaiton is running on windows platform but did not using MFC
    > function very much.
    >


    Well for starters you can create simple text file to contain the
    next numeric number in your log sequence. Every time you increment
    your log file number, update the text file.

    Then it's simply a matter of opening and reading the number. The
    which Operating System (windows, linux, ..) or library (mfc,
    boost, ...)
    you are using is irrelevant.


    NextNumber.txt
    1234
    mzdude, Sep 28, 2009
    #4
  5. Juha Nieminen, Sep 28, 2009
    #5
  6. Hi,

    rockdale wrote:
    > I have an application which writes log files out. If then log file
    > size is great than let's say 1M, the application will create a new log
    > file with sequence number. the log file format likes
    > mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
    > limit.


    don't do that.

    Use a time stamp and use a naming convention that follows a canonical
    sort order. E.g. mylogfile_yyyy-mm-dd_hh-mm-ss.txt. The guys that must
    service your application will appreciate greatly. Furthermore you should
    prefer UTC time stamps for logging to avoid confusion with daylight saving.

    > Now the problem is if my application get restarted, I need to know
    > what is the largest sequence number of my log file.


    Either create always a new log if the application gets restarted or
    forbear from the size limit and use a time limit instead. I would
    recommend the latter. If your application is under heavy load the files
    grow larger. What's bad with that?

    From the service point of view it is a big advantage to have a
    deterministic relation between the file name (in fact something like a
    primary key) and the content. And it is even better if the canonical
    file name ordering corresponds to their logical order.


    > I am thinking in
    > a loop from 1 to like 100000, check if the file exist, if it does
    > not , then I get the max sequence number I need.


    From that you see how bad the idea is. Everyone who searches for a
    certain entry has to do the same loop, regardless if program or human.
    In fact you have absolutely no advantage over putting all logs of a day
    into a single file in this case.

    > But this method looks
    > very awkward. Is there another way to do this(get the max number for a
    > series of similar files)?


    No. And since most file systems do not maintain a defined sort ordering,
    there is no cheaper solution in general. You could scan the entire
    directory content, but this is in the same order.

    > My applicaiton is running on windows platform but did not using MFC
    > function very much.


    That makes no difference here.

    Using rotating logs with a fixed time slice is straight forward to
    implement, although in case of application restarts. You could use a
    simple and fast hash function on the time stamp, that controls log file
    switches. Every time the hash changes a virtual method that switches the
    log could be invoked. Only his method implements the full rendering of
    the file name scheme.
    This makes it very easy and with good performance to implement different
    cycle times, e.g once per week, once per day and once per hour.

    And if you are even smarter you could add a functionality that cleans up
    old log automatically once they exceed a configured age. This prevents
    from the common issue of full volumes.
    Again a fixed relation between the file name and the content is helpful.
    All you have to do is to calculate the file name that corresponds to now
    minus a configured period and delete all files in the folder which names
    compare less to this name and which match the pattern of your logfiles,
    e.g. mylogfile_*.txt. Neither you have to touch their content nor you
    have to parse the names.
    Unfortunately this will always be O(n), so it should not be invoked too
    often (e.g. once a day).


    Marcel
    Marcel Müller, Sep 28, 2009
    #6
  7. rockdale

    Suraj Guest

    On Sep 28, 11:27 pm, mzdude <> wrote:
    > On Sep 28, 12:50 pm, rockdale <> wrote:
    >
    >
    >
    > > Hi,

    >
    > > I have an application which writes log files out. If then log file
    > > size is great than let's say 1M, the application will create a new log
    > > file with sequence number. the log file format likes
    > > mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
    > > limit.

    >
    > > Now the problem is if my application get restarted, I need to know
    > > what is the largest sequence number  of my log file. I am thinking in
    > > a loop from 1 to like 100000, check if the file exist, if it does
    > > not , then I get the max sequence number I need. But this method looks
    > > very awkward. Is there another way to do this(get the max number for a
    > > series of similar files)?

    >
    > > My applicaiton is running on windows platform but did not using MFC
    > > function very much.

    >
    > Well for starters you can create simple text file to contain the
    > next numeric number in your log sequence. Every time you increment
    > your log file number, update the text file.
    >
    > Then it's simply a matter of opening and reading the number. The
    > which Operating System (windows, linux, ..) or library (mfc,
    > boost, ...)
    > you are using is irrelevant.
    >
    > NextNumber.txt
    >   1234


    It may be for starters but since years, we are using a similar
    technique to achieve this in the product I work on. Maintaining a file
    which contains the current sequence number is what we do.

    The log files have names as LogFile_SeqNo.txt (LogFile_1.txt and so
    on), maintain a file called CurrentSeqNo.txt which contains the
    current sequence number.
    Log is written to the file with current sequence number.

    If the application restarts or even Windows for that matter, the
    application tries to write the file with the current sequence number.
    If the file exceeds a particular size, a new file is created with a
    new sequence number and the new sequence number is updated in the
    CurrentSeqNo.txt.

    Best Regards,
    Suraj
    Suraj, Sep 28, 2009
    #7
  8. rockdale

    Guest

    On Sep 28, 3:18 pm, Marcel Müller <>
    wrote:
    > Hi,
    >
    > rockdale wrote:
    > > I have an application which writes log files out. If then log file
    > > size is great than let's say 1M, the application will create a new log
    > > file with sequence number. the log file format likes
    > > mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
    > > limit.

    >
    > don't do that.
    >
    > Use a time stamp and use a naming convention that follows a canonical
    > sort order. E.g. mylogfile_yyyy-mm-dd_hh-mm-ss.txt. The guys that must
    > service your application will appreciate greatly. Furthermore you should
    > prefer UTC time stamps for logging to avoid confusion with daylight saving.



    Depending on what the log file is logging, a useful alternative is to
    generate log file names with the application's startup time, *plus* a
    unique identifier (lie a sequence number). Especially if your
    applications handles something along the lines of sessions, which may
    show up logged in other places, then a name like "yyyymmdd-hhmmss-
    TypeOfLog-nnn.log" may make associating the various bits back together
    easier.
    , Sep 28, 2009
    #8
  9. wrote:
    > Depending on what the log file is logging, a useful alternative is to
    > generate log file names with the application's startup time, *plus* a
    > unique identifier (lie a sequence number). Especially if your
    > applications handles something along the lines of sessions, which may
    > show up logged in other places, then a name like "yyyymmdd-hhmmss-
    > TypeOfLog-nnn.log" may make associating the various bits back together
    > easier.


    Usually I use dedicated columns in the log for session identification.
    This keeps the strict event sequence to track potential concurrency
    issues even if the time stamps are not accurate enough. A viewer could
    filter that, at the easiest grep. Merging different logs is more
    complicated.
    However, a set of different files can be useful too. E.g. samba uses
    this kind of session specific log files.


    Marcel
    Marcel Müller, Sep 28, 2009
    #9
  10. rockdale

    James Kanze Guest

    On Sep 28, 9:18 pm, Marcel Müller <>
    wrote:
    > rockdale wrote:
    > > I have an application which writes log files out. If then
    > > log file size is great than let's say 1M, the application
    > > will create a new log file with sequence number. the log
    > > file format likes mylogfile_mmddyy_1.txt,
    > > mylogfile_mmddyy_2.txt. ....without upper limit.


    > don't do that.


    > Use a time stamp and use a naming convention that follows a
    > canonical sort order. E.g. mylogfile_yyyy-mm-dd_hh-mm-ss.txt.
    > The guys that must service your application will appreciate
    > greatly. Furthermore you should prefer UTC time stamps for
    > logging to avoid confusion with daylight saving.


    That sounds like a good idea. I'm used to putting the date in
    the logfile name, and using a sequential number (with a fixed
    number of digits, so a straight sort will put them in order),
    but using the time does sound better.

    > > Now the problem is if my application get restarted, I need
    > > to know what is the largest sequence number of my log file.


    > Either create always a new log if the application gets
    > restarted or forbear from the size limit and use a time limit
    > instead. I would recommend the latter. If your application is
    > under heavy load the files grow larger. What's bad with that?


    Files that are too large are hard to read and to manipulate.
    Depending on the application, a time limit might either result
    in an occasional file which is awkwardly large, or a lot of very
    small files.

    That doesn't mean that you should forego using time completely.
    If there are particular moments when the application is largely
    quiescent, those are good times to rotate the log; it reduces
    the probability of a sequence which interests someone spanning
    two different files. (Ideally, of course, the files should be
    small enough so that the reader can easily concatenate two of
    them, in cases where what interests him spans a rotation.)

    > From the service point of view it is a big advantage to have a
    > deterministic relation between the file name (in fact
    > something like a primary key) and the content. And it is even
    > better if the canonical file name ordering corresponds to
    > their logical order.


    > > I am thinking in a loop from 1 to like 100000, check if the
    > > file exist, if it does not , then I get the max sequence
    > > number I need.


    > From that you see how bad the idea is. Everyone who searches
    > for a certain entry has to do the same loop, regardless if
    > program or human. In fact you have absolutely no advantage
    > over putting all logs of a day into a single file in this
    > case.


    The readers can do a binary search. For that matter, so could
    the program. (But again depending on the application, there may
    be so few files that it isn't worth it.)

    > > But this method looks very awkward. Is there another way to
    > > do this(get the max number for a series of similar files)?


    > No. And since most file systems do not maintain a defined sort
    > ordering, there is no cheaper solution in general. You could
    > scan the entire directory content, but this is in the same
    > order.


    > > My applicaiton is running on windows platform but did not
    > > using MFC function very much.


    > That makes no difference here.


    > Using rotating logs with a fixed time slice is straight
    > forward to implement, although in case of application
    > restarts. You could use a simple and fast hash function on the
    > time stamp, that controls log file switches.


    You don't even need that. On program start-up, it's easy to
    calculate the last rotation time from current time; just open
    that file for append. There is some argument, however, for
    always opening a new log file on program start-up.

    > Every time the hash changes a virtual method that switches the
    > log could be invoked. Only his method implements the full
    > rendering of the file name scheme.
    > This makes it very easy and with good performance to implement
    > different cycle times, e.g once per week, once per day and
    > once per hour.


    > And if you are even smarter you could add a functionality that
    > cleans up old log automatically once they exceed a configured
    > age. This prevents from the common issue of full volumes.


    This is usually done by means of a cronjob (or whatever it is
    called under Windows---it surely exists), using a fairly simple
    script. Typically, the log files will go through a stage where
    they are compressed, before being completely deleted. (E.g.
    compress anything older than a day, and delete anything older
    than a week.)

    --
    James Kanze
    James Kanze, Sep 29, 2009
    #10
  11. rockdale

    Jorgen Grahn Guest

    On Mon, 28 Sep 2009 09:50:14 -0700 (PDT), rockdale <> wrote:
    > Hi,
    >
    > I have an application which writes log files out. If then log file
    > size is great than let's say 1M, the application will create a new log
    > file with sequence number. the log file format likes
    > mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
    > limit.

    ....
    > My applicaiton is running on windows platform but did not using MFC
    > function very much.


    If you hadn't been on Windows, I would have suggested you use the
    standard mechanism of your OS -- on the Linuxes I am aware, of, drop a
    suitable configuration file in /etc/logrotate.d/ and stop worrying.

    (I recently had to handle a 3GB log file because someone thought it
    would be fun to reinvent that wheel, badly.)

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Sep 30, 2009
    #11
  12. rockdale

    Rune Allnor Guest

    On 29 Sep, 09:45, James Kanze <> wrote:
    > On Sep 28, 9:18 pm, Marcel Müller <>
    > wrote:


    > > Use a time stamp and use a naming convention that follows a
    > > canonical sort order. E.g. mylogfile_yyyy-mm-dd_hh-mm-ss.txt.
    > > The guys that must service your application will appreciate
    > > greatly. Furthermore you should prefer UTC time stamps for
    > > logging to avoid confusion with daylight saving.

    >
    > That sounds like a good idea.  I'm used to putting the date in
    > the logfile name, and using a sequential number (with a fixed
    > number of digits, so a straight sort will put them in order),
    > but using the time does sound better.


    Simple, effective, and still perfectly possible to screw up.

    Once upon a time the company I worked for requested some logs
    produced by a software system to be tagged by time instead of
    running index. The patch we got wrote the timestamps on
    a format more or less like (I never got around to actually
    decode it)

    printf("log-file-%d%d%d%d%d%d",
    year,month,day,hour,minute,second);

    Which was useless to us (why?).

    Rune
    Rune Allnor, Sep 30, 2009
    #12
  13. On 30 Set, 21:04, Rune Allnor <> wrote:
    > On 29 Sep, 09:45, James Kanze <> wrote:
    >
    > > On Sep 28, 9:18 pm, Marcel Müller <>
    > > wrote:
    > > > Use a time stamp and use a naming convention that follows a
    > > > canonical sort order. E.g. mylogfile_yyyy-mm-dd_hh-mm-ss.txt.
    > > > The guys that must service your application will appreciate
    > > > greatly. Furthermore you should prefer UTC time stamps for
    > > > logging to avoid confusion with daylight saving.

    >
    > > That sounds like a good idea. I'm used to putting the date in
    > > the logfile name, and using a sequential number (with a fixed
    > > number of digits, so a straight sort will put them in order),
    > > but using the time does sound better.

    >
    > Simple, effective, and still perfectly possible to screw up.
    >
    > Once upon a time the company I worked for requested some logs
    > produced by a software system to be tagged by time instead of
    > running index. The patch we got wrote the timestamps on
    > a format more or less like (I never got around to actually
    > decode it)
    >
    > printf("log-file-%d%d%d%d%d%d",
    > year,month,day,hour,minute,second);
    >
    > Which was useless to us (why?).


    Is that a rhetorical question? That format is impossible to decode
    unambiguously!

    Heck, I normally lose a bit of hope to get a living from coding at
    every single day that passes, but reading code like the above thrusts
    me decisively up on optimism.

    If the coder that wrote that patch was getting paid, somehow, that
    means that I still have a chance ;-)

    Your memory made my day, thanks a lot.

    Have good time,
    Francesco
    --
    Francesco S. Carta, hobbyist
    http://fscode.altervista.org
    Francesco S. Carta, Oct 1, 2009
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jack Wright
    Replies:
    1
    Views:
    459
    sonikchopra
    Apr 19, 2005
  2. bronby
    Replies:
    1
    Views:
    589
    Andrew Thompson
    Jul 15, 2005
  3. Peter Rait
    Replies:
    9
    Views:
    359
    Andrew Thompson
    Jun 29, 2008
  4. efelnavarro09
    Replies:
    2
    Views:
    914
    efelnavarro09
    Jan 26, 2011
  5. iMath
    Replies:
    8
    Views:
    276
    emile
    Nov 13, 2012
Loading...

Share This Page