How to get file count under a directory?

R

rockdale

Hi,

I have an application which writes log files out. If then log file
size is great than let's say 1M, the application will create a new log
file with sequence number. the log file format likes
mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
limit.

Now the problem is if my application get restarted, I need to know
what is the largest sequence number of my log file. I am thinking in
a loop from 1 to like 100000, check if the file exist, if it does
not , then I get the max sequence number I need. But this method looks
very awkward. Is there another way to do this(get the max number for a
series of similar files)?

My applicaiton is running on windows platform but did not using MFC
function very much.

Thanks in advance
-Rockdale
 
V

Victor Bazarov

rockdale said:
I have an application which writes log files out. If then log file
size is great than let's say 1M, the application will create a new log
file with sequence number. the log file format likes
mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
limit.

Now the problem is if my application get restarted, I need to know
what is the largest sequence number of my log file. I am thinking in
a loop from 1 to like 100000, check if the file exist, if it does
not , then I get the max sequence number I need. But this method looks
very awkward. Is there another way to do this(get the max number for a
series of similar files)?

Yes, and it's platform-specific. You can probably obtain a list of (or
enumerate) the files whose name fits a certain pattern, like "log_*.*",
and then find your largest number (behind the '*')...
My applicaiton is running on windows platform but did not using MFC
function very much.

Try posting to a relevant newsgroup from 'microsoft.public.*' hierarchy
where Windows platform-specific stuff is discussed.

V
 
S

Sjouke Burry

rockdale said:
Hi,

I have an application which writes log files out. If then log file
size is great than let's say 1M, the application will create a new log
file with sequence number. the log file format likes
mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
limit.

Now the problem is if my application get restarted, I need to know
what is the largest sequence number of my log file. I am thinking in
a loop from 1 to like 100000, check if the file exist, if it does
not , then I get the max sequence number I need. But this method looks
very awkward. Is there another way to do this(get the max number for a
series of similar files)?

My applicaiton is running on windows platform but did not using MFC
function very much.

Thanks in advance
-Rockdale
Step 100 at a time to go past the last one,
then step 1 at a time trough the last partial block.
 
M

mzdude

Hi,

I have an application which writes log files out. If then log file
size is great than let's say 1M, the application will create a new log
file with sequence number. the log file format likes
mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
limit.

Now the problem is if my application get restarted, I need to know
what is the largest sequence number  of my log file. I am thinking in
a loop from 1 to like 100000, check if the file exist, if it does
not , then I get the max sequence number I need. But this method looks
very awkward. Is there another way to do this(get the max number for a
series of similar files)?

My applicaiton is running on windows platform but did not using MFC
function very much.

Well for starters you can create simple text file to contain the
next numeric number in your log sequence. Every time you increment
your log file number, update the text file.

Then it's simply a matter of opening and reading the number. The
which Operating System (windows, linux, ..) or library (mfc,
boost, ...)
you are using is irrelevant.


NextNumber.txt
1234
 
M

Marcel Müller

Hi,
I have an application which writes log files out. If then log file
size is great than let's say 1M, the application will create a new log
file with sequence number. the log file format likes
mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
limit.

don't do that.

Use a time stamp and use a naming convention that follows a canonical
sort order. E.g. mylogfile_yyyy-mm-dd_hh-mm-ss.txt. The guys that must
service your application will appreciate greatly. Furthermore you should
prefer UTC time stamps for logging to avoid confusion with daylight saving.
Now the problem is if my application get restarted, I need to know
what is the largest sequence number of my log file.

Either create always a new log if the application gets restarted or
forbear from the size limit and use a time limit instead. I would
recommend the latter. If your application is under heavy load the files
grow larger. What's bad with that?

From the service point of view it is a big advantage to have a
deterministic relation between the file name (in fact something like a
primary key) and the content. And it is even better if the canonical
file name ordering corresponds to their logical order.

I am thinking in
a loop from 1 to like 100000, check if the file exist, if it does
not , then I get the max sequence number I need.

From that you see how bad the idea is. Everyone who searches for a
certain entry has to do the same loop, regardless if program or human.
In fact you have absolutely no advantage over putting all logs of a day
into a single file in this case.
But this method looks
very awkward. Is there another way to do this(get the max number for a
series of similar files)?

No. And since most file systems do not maintain a defined sort ordering,
there is no cheaper solution in general. You could scan the entire
directory content, but this is in the same order.
My applicaiton is running on windows platform but did not using MFC
function very much.

That makes no difference here.

Using rotating logs with a fixed time slice is straight forward to
implement, although in case of application restarts. You could use a
simple and fast hash function on the time stamp, that controls log file
switches. Every time the hash changes a virtual method that switches the
log could be invoked. Only his method implements the full rendering of
the file name scheme.
This makes it very easy and with good performance to implement different
cycle times, e.g once per week, once per day and once per hour.

And if you are even smarter you could add a functionality that cleans up
old log automatically once they exceed a configured age. This prevents
from the common issue of full volumes.
Again a fixed relation between the file name and the content is helpful.
All you have to do is to calculate the file name that corresponds to now
minus a configured period and delete all files in the folder which names
compare less to this name and which match the pattern of your logfiles,
e.g. mylogfile_*.txt. Neither you have to touch their content nor you
have to parse the names.
Unfortunately this will always be O(n), so it should not be invoked too
often (e.g. once a day).


Marcel
 
S

Suraj

Well for starters you can create simple text file to contain the
next numeric number in your log sequence. Every time you increment
your log file number, update the text file.

Then it's simply a matter of opening and reading the number. The
which Operating System (windows, linux, ..) or library (mfc,
boost, ...)
you are using is irrelevant.

NextNumber.txt
  1234

It may be for starters but since years, we are using a similar
technique to achieve this in the product I work on. Maintaining a file
which contains the current sequence number is what we do.

The log files have names as LogFile_SeqNo.txt (LogFile_1.txt and so
on), maintain a file called CurrentSeqNo.txt which contains the
current sequence number.
Log is written to the file with current sequence number.

If the application restarts or even Windows for that matter, the
application tries to write the file with the current sequence number.
If the file exceeds a particular size, a new file is created with a
new sequence number and the new sequence number is updated in the
CurrentSeqNo.txt.

Best Regards,
Suraj
 
R

robertwessel2

Hi,


don't do that.

Use a time stamp and use a naming convention that follows a canonical
sort order. E.g. mylogfile_yyyy-mm-dd_hh-mm-ss.txt. The guys that must
service your application will appreciate greatly. Furthermore you should
prefer UTC time stamps for logging to avoid confusion with daylight saving.


Depending on what the log file is logging, a useful alternative is to
generate log file names with the application's startup time, *plus* a
unique identifier (lie a sequence number). Especially if your
applications handles something along the lines of sessions, which may
show up logged in other places, then a name like "yyyymmdd-hhmmss-
TypeOfLog-nnn.log" may make associating the various bits back together
easier.
 
M

Marcel Müller

Depending on what the log file is logging, a useful alternative is to
generate log file names with the application's startup time, *plus* a
unique identifier (lie a sequence number). Especially if your
applications handles something along the lines of sessions, which may
show up logged in other places, then a name like "yyyymmdd-hhmmss-
TypeOfLog-nnn.log" may make associating the various bits back together
easier.

Usually I use dedicated columns in the log for session identification.
This keeps the strict event sequence to track potential concurrency
issues even if the time stamps are not accurate enough. A viewer could
filter that, at the easiest grep. Merging different logs is more
complicated.
However, a set of different files can be useful too. E.g. samba uses
this kind of session specific log files.


Marcel
 
J

James Kanze

don't do that.
Use a time stamp and use a naming convention that follows a
canonical sort order. E.g. mylogfile_yyyy-mm-dd_hh-mm-ss.txt.
The guys that must service your application will appreciate
greatly. Furthermore you should prefer UTC time stamps for
logging to avoid confusion with daylight saving.

That sounds like a good idea. I'm used to putting the date in
the logfile name, and using a sequential number (with a fixed
number of digits, so a straight sort will put them in order),
but using the time does sound better.
Either create always a new log if the application gets
restarted or forbear from the size limit and use a time limit
instead. I would recommend the latter. If your application is
under heavy load the files grow larger. What's bad with that?

Files that are too large are hard to read and to manipulate.
Depending on the application, a time limit might either result
in an occasional file which is awkwardly large, or a lot of very
small files.

That doesn't mean that you should forego using time completely.
If there are particular moments when the application is largely
quiescent, those are good times to rotate the log; it reduces
the probability of a sequence which interests someone spanning
two different files. (Ideally, of course, the files should be
small enough so that the reader can easily concatenate two of
them, in cases where what interests him spans a rotation.)
From the service point of view it is a big advantage to have a
deterministic relation between the file name (in fact
something like a primary key) and the content. And it is even
better if the canonical file name ordering corresponds to
their logical order.
From that you see how bad the idea is. Everyone who searches
for a certain entry has to do the same loop, regardless if
program or human. In fact you have absolutely no advantage
over putting all logs of a day into a single file in this
case.

The readers can do a binary search. For that matter, so could
the program. (But again depending on the application, there may
be so few files that it isn't worth it.)
No. And since most file systems do not maintain a defined sort
ordering, there is no cheaper solution in general. You could
scan the entire directory content, but this is in the same
order.
That makes no difference here.
Using rotating logs with a fixed time slice is straight
forward to implement, although in case of application
restarts. You could use a simple and fast hash function on the
time stamp, that controls log file switches.

You don't even need that. On program start-up, it's easy to
calculate the last rotation time from current time; just open
that file for append. There is some argument, however, for
always opening a new log file on program start-up.
Every time the hash changes a virtual method that switches the
log could be invoked. Only his method implements the full
rendering of the file name scheme.
This makes it very easy and with good performance to implement
different cycle times, e.g once per week, once per day and
once per hour.
And if you are even smarter you could add a functionality that
cleans up old log automatically once they exceed a configured
age. This prevents from the common issue of full volumes.

This is usually done by means of a cronjob (or whatever it is
called under Windows---it surely exists), using a fairly simple
script. Typically, the log files will go through a stage where
they are compressed, before being completely deleted. (E.g.
compress anything older than a day, and delete anything older
than a week.)
 
J

Jorgen Grahn

Hi,

I have an application which writes log files out. If then log file
size is great than let's say 1M, the application will create a new log
file with sequence number. the log file format likes
mylogfile_mmddyy_1.txt, mylogfile_mmddyy_2.txt. ....without upper
limit. ....
My applicaiton is running on windows platform but did not using MFC
function very much.

If you hadn't been on Windows, I would have suggested you use the
standard mechanism of your OS -- on the Linuxes I am aware, of, drop a
suitable configuration file in /etc/logrotate.d/ and stop worrying.

(I recently had to handle a 3GB log file because someone thought it
would be fun to reinvent that wheel, badly.)

/Jorgen
 
R

Rune Allnor

On Sep 28, 9:18 pm, Marcel Müller <[email protected]>
wrote:

That sounds like a good idea.  I'm used to putting the date in
the logfile name, and using a sequential number (with a fixed
number of digits, so a straight sort will put them in order),
but using the time does sound better.

Simple, effective, and still perfectly possible to screw up.

Once upon a time the company I worked for requested some logs
produced by a software system to be tagged by time instead of
running index. The patch we got wrote the timestamps on
a format more or less like (I never got around to actually
decode it)

printf("log-file-%d%d%d%d%d%d",
year,month,day,hour,minute,second);

Which was useless to us (why?).

Rune
 
F

Francesco S. Carta

Simple, effective, and still perfectly possible to screw up.

Once upon a time the company I worked for requested some logs
produced by a software system to be tagged by time instead of
running index. The patch we got wrote the timestamps on
a format more or less like (I never got around to actually
decode it)

printf("log-file-%d%d%d%d%d%d",
year,month,day,hour,minute,second);

Which was useless to us (why?).

Is that a rhetorical question? That format is impossible to decode
unambiguously!

Heck, I normally lose a bit of hope to get a living from coding at
every single day that passes, but reading code like the above thrusts
me decisively up on optimism.

If the coder that wrote that patch was getting paid, somehow, that
means that I still have a chance ;-)

Your memory made my day, thanks a lot.

Have good time,
Francesco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top