analysis of java application logs

Nigel Wade · May 24, 2011

I tend to use at least two logging streams: debugging and operational. I
leave debugging statements in production code: its normally off (of
course) but can be turned on if needed.

There is one caveat to leaving debug logging in production code; it may
affect performance. Even with output disabled the string arguments are
still constructed, unless they are constant. If logging is located in a
tight loop, or critical section, it might become significant.

In one particular application I profiled I found this to be eating up
80% of the CPU time. Admittedly, this was a rather special case, but
it's still something which ought to be borne in mind if you leave debug
logging code in production software.

Martin Gregorie · May 24, 2011

There is one caveat to leaving debug logging in production code; it may
affect performance. Even with output disabled the string arguments are
still constructed, unless they are constant. If logging is located in a
tight loop, or critical section, it might become significant.

Fair point. However, my usual debugging statement takes the form:

if (debug > 0)
debugger.trace(result + " = method(" + arg + ")");

Ugly, I know, but quite efficient, since when debugging is off even the
cost of the method call cost is. I use an integer to control debugging
rather than a boolean so I can control its volume: "java Application -dd"
would be expected to provide more detailed debugging output than "java
Application -d"

In one particular application I profiled I found this to be eating up
80% of the CPU time. Admittedly, this was a rather special case, but
it's still something which ought to be borne in mind if you leave debug
logging code in production software.

Agreed.

On occasion I've used a circular buffer to accumulate tracing information
which gets only dumped if an exception occurs. This is great if you're
chasing rarely occurring problems in a high volume, long running process
since it obviates searching through megabytes of tracing info to find
20-30 lines of relevant tracing. You'd expect this too to be quite
expensive to run, so I'd use the same mechanism outlined earlier to
ensure that tracing set off means that circular buffer is never filled
rather than merely suppressing the buffer dump operation.

That said, I used to administer an OS (ICL's George 3) that continuously
traced its internal operations to a fine-grained and a coarse-grained
circular buffer which could be dumped after a crash and still managed to
run fast enough to be a very usable OS.

Lew · May 24, 2011

Martin said:
Fair point. However, my usual debugging statement takes the form:

if (debug> 0)
debugger.trace(result + " = method(" + arg + ")");

Ugly, I know, but quite efficient, since when debugging is off even the
cost of the method call cost is. I use an integer to control debugging
rather than a boolean so I can control its volume: "java Application -dd"
would be expected to provide more detailed debugging output than "java
Application -d"

What's wrong with

if ( logger.isDebugEnabled() ) ...

or

if ( logger.getLevel().isGreaterOrEqual( Logger.DEBUG )) ...
?

Tom Anderson · May 24, 2011

I'm looking for an approach to the problem of analyzing application
log files.

I need to analyse Java log files from applications (i.e., not logs of
web servers). These logs contain Java exceptions, thread dumps, and
free-form log4j messages issued by log statements inserted by
programmers during development. Right now, these man-made log entries
do not have any specific format.

What I'm looking for is a tool and/or strategy that supports in lexing/
parsing, tagging, and analysing the log entries. Because there is only
little defined syntax and grammar - and because you might not know
what you are looking for - the task requires the quick issuing of
queries against the log data base. Some sort of visualization would be
nice, too.

Pointers to existing tools and approaches as well as appropriate tools/
algorithms to develop the required system would be welcome.

grep.

For particularly thorny bits of analysis, sed.

There are those who advocate awk for this sort of thing, but frankly that
seems like overkill.

tom

Martin Gregorie · May 24, 2011

What's wrong with

if ( logger.isDebugEnabled() ) ...

or

if ( logger.getLevel().isGreaterOrEqual( Logger.DEBUG )) ...
?

Not a lot, though that does involve method call overheads that you may
not want in a tight loop.

I initially created my simple ReportError logging class for Java 1.3,
before the Logger class was introduced, and, as it does what I need, have
seen no need to replace it. It was written to provide similar Java
functionality to a set of C functions that I'd been using for many years.

Its trace() methods are roughly equivalent to Logger.log(Level, String)
and Logger.log(Level, String, Object[]) with all output sent to stderr.

Martin Gregorie · May 24, 2011

There are those who advocate awk for this sort of thing, but frankly
that seems like overkill.

Depends on your background, I suppose.

I was using FIND 2, NCC Filetab and (shudder) RPG3 long before I met awk,
so I was familiar with the sort of scriptable file processing utilities
that have a built-in record reading loop and a means of triggering
actions that match particular record types. As a result I felt right at
home with awk, which is about the easiest to use of all the ones I've
tried.

Besides, I very often find I need to calculate maxima, minima, averages,
etc as part of the log analysis and find that very easy to do with awk.

Lew · May 24, 2011

Not a lot, though that does involve method call overheads that you may
not want in a tight loop.

Seems like it's a simple equality test that would be HotSpotted away.

I initially created my simple ReportError logging class for Java 1.3,
before the Logger class was introduced, and, as it does what I need, have
seen no need to replace it. It was written to provide similar Java
functionality to a set of C functions that I'd been using for many years.

Its trace() methods are roughly equivalent to Logger.log(Level, String)
and Logger.log(Level, String, Object[]) with all output sent to stderr.

Daniele Futtorovic · May 24, 2011

There is one caveat to leaving debug logging in production code; it
may affect performance. Even with output disabled the string
arguments are still constructed, unless they are constant. If logging
is located in a tight loop, or critical section, it might become
significant.

In one particular application I profiled I found this to be eating
up 80% of the CPU time. Admittedly, this was a rather special case,
but it's still something which ought to be borne in mind if you leave
debug logging code in production software.

Use a print-eff'ing logger and you'll have next to no overhead. If
evaluating the arguments is too heavy, use the ugly
if( logger.isXXXEnabled() ){}
jump.

Jim Gibson · May 24, 2011

Not a lot, though that does involve method call overheads that you may
not want in a tight loop.

Seems like it's a simple equality test that would be HotSpotted away.[/QUOTE]

My understanding of the log4j and java.util.Logger classes is that each
logger exists within a hierarchy of classes, and that to determine
whether or not a specific logger call will produce output, the library
must test each level, starting at the specific class and traversing
upwards in the hierarchy.

So using a static variable as below to enable and disable logging
output can save some execution time:

static final boolean debug = false;

if( debug ) {
logger.debug(...);
}

Whether or not that is significant, depends upon the situation, of
course.

Lew · May 24, 2011

[QUOTE="Lew said:
[QUOTE="Lew said:

Lew wrote:
Martin Gregorie wrote:
Fair point. However, my usual debugging statement takes the form:

if (debug> 0)
debugger.trace(result + " = method(" + arg + ")");

Ugly, I know, but quite efficient, since when debugging is off even the
cost of the method call cost is. I use an integer to control debugging
rather than a boolean so I can control its volume: "java Application
-dd" would be expected to provide more detailed debugging output than
"java Application -d"

Click to expand...

What's wrong with

if ( logger.isDebugEnabled() ) ...

or

if ( logger.getLevel().isGreaterOrEqual( Logger.DEBUG )) ...
?

Click to expand...

Not a lot, though that does involve method call overheads that you may
not want in a tight loop.

Click to expand...

Seems like it's a simple equality test that would be HotSpotted away.

My understanding of the log4j and java.util.Logger classes is that each
logger exists within a hierarchy of classes, and that to determine
whether or not a specific logger call will produce output, the library
must test each level, starting at the specific class and traversing
upwards in the hierarchy.

So using a static variable as below to enable and disable logging
output can save some execution time:

static final boolean debug = false;

if( debug ) {
logger.debug(...);
}

Whether or not that is significant, depends upon the situation, of
course.[/QUOTE]

All this theory indicates what to measure in any real situation, but to echo
your "of course", you won't know what to conclude prior to that measurement.

Now why you'd *ever* want to put a log statement inside a tight,
performance-critical loop in the first place is a whole 'nother question.

Nigel Wade · May 25, 2011

All this theory indicates what to measure in any real situation, but to
echo your "of course", you won't know what to conclude prior to that
measurement.

In my specific example above I resolved the "problem" by using my own
class which did pretty much exactly as Lew suggested in his follow-up
post. When I subsequently profiled the code the execution time, and CPU
time, required by the remaining method invocations was negligible.

Now why you'd *ever* want to put a log statement inside a tight,
performance-critical loop in the first place is a whole 'nother question.

In my case the loop was reading messages from a network and generating
responses. The logging code was to log the message received, and the
response generated, during development. At this stage it wasn't deemed
to be performance critical, but was essential to verify correct receipt
of the network message.

It wasn't until much later that I was told that the response had to be
returned in less than 40uS (a fact that wasn't in the original
requirements). So I profiled the code to find out where optimisation
ought to be targeted. I still wanted to keep the logging in the code for
debugging purposes, since it was still under active development, but be
able to disable it (and its overhead) for testing.

Ulrich Scholz · May 25, 2011

The solution was pretty simple, and it's dynamic. [...] We had to [...]
wrap
log4j calls with a plethora of methods that would result in messages
formatted to our liking, but after that the heavy lifting was and is
done: it's now up to the clients - *not* to the developers - to request
what gets logged and in what manner.

Thanks for your answer. Your approach sounds reasonable.

But what exactly do you mean with "wrap log4j calls"?
Couln´t you just use different log4j appenders, one for each client?
The
appenders would then (i) decide whether to log a given message for a
particular
client and (ii) format it accordingly.

Ulrich

Arved Sandstrom · May 25, 2011

The solution was pretty simple, and it's dynamic. [...] We had to [...]
wrap
log4j calls with a plethora of methods that would result in messages
formatted to our liking, but after that the heavy lifting was and is
done: it's now up to the clients - *not* to the developers - to request
what gets logged and in what manner.

Click to expand...

Thanks for your answer. Your approach sounds reasonable.

But what exactly do you mean with "wrap log4j calls"?
Couln´t you just use different log4j appenders, one for each client?
The
appenders would then (i) decide whether to log a given message for a
particular
client and (ii) format it accordingly.

Ulrich

Using log4j as an example, if you look at the API for PatternLayout,
you've got a bunch of conversion specifiers, including %m for the
application-supplied message. The wrapping that we did, including those
functional categories I mentioned, is all about formatting the
application-supplied message. This included appropriate formatting of
exception stack traces and so forth.

We certainly also had different log4j conversion specifier combinations
in use for different appenders; the "wrapping" was purely for formatting
the application message, because that is opaque to log4j.

AHS

Michael Wojcik · May 26, 2011

Daniele said:
I'd call it (what I described): audit logging. I don't know if the
meaning of that term normally extends beyond databases, but I don't see
why it shouldn't.

It's also used widely in security contexts.

Most of the products I work on have multiple logging targets, for
problem determination (which includes debugging), system history,
auditing, and other operational uses. Sometimes these use different
mechanisms, because they have different requirements - for example,
security audit logs need to be protected, insofar as that's possible,
from tampering by unauthorized users.

Michael Wojcik · May 26, 2011

Martin said:
Besides, I very often find I need to calculate maxima, minima, averages,
etc as part of the log analysis and find that very easy to do with awk.

Yes, sometimes it's useful to maintain state while analyzing logs, and
that's more convenient with awk (or any of a thousand other scripting
languages) than it is with sed.

A couple of months back I tracked down some tricky locking issues
(multiple multi-threaded processes, with multiple components written
by multiple developers in multiple languages) by analyzing lock files
with an awk script; it made it easy to keep track of contention,
reentrant locking, number of waiters, etc on a per-lock basis.

Back in '91 I demonstrated a bug in AIX 3.1 libc's implementation of
free by using awk to turn a malloc/free log into a series of cut-down
C programs that duplicated the allocation access pattern, until I had
a minimal test program that demonstrated the problem.

Come to think of it, just a few weeks ago I was using a little awk
script to churn through the output from my various
sentiment-determination UIMA annotators[1] and compute precision,
recall, and F1 values.

Obviously, I could do any of those tasks in Perl, or Python, or Java,
or COBOL, or what have you. I only use awk because I've been using it
for more than 20 years now and so it has the advantage of familiarity.

[1] ObCLJP: The annotators are written in Java.

Lawrence D'Oliveiro · May 30, 2011

Besides, I very often find I need to calculate maxima, minima, averages,
etc as part of the log analysis and find that very easy to do with awk.

perl can do everything awk can do, and more, and do it just as concisely.

Lew · May 30, 2011

perl can do everything awk can do, and more, and do it just as concisely.

Assembler can do anything perl or awk can do, and do it much more efficiently,
Lawrence.

Just out of curiosity, Lawrence, what's the percentage for you to continually
try to trigger these Language Wars? What do you get out of it, Lawrence?

Especially since you know as well as everyone here that perl and awk are toy
languages compared to the One True Language, right, Lawrence?

Come on, Lawrence, the gauntlet has been thrown, Lawrence. Let's see what
you've got, Lawrence. The One True Language vs. perl and awk combined in the
tag-team Match of the Century(r) (tm) (k). Are you man enough to take it,
Lawrence?

I advise you to just give up now, Lawrence, and face the fact that the One
True Language is so far and away superior to awk, perl, and yes, I'm sorry,
Lawrence, but even Python - despite your passeltongue fluency, The One True
Language rules, Lawrence, and your favorite languages drool, Lawrence..

So, Lawrence, man up and respond, Lawrence, as the world awaits your wisdom.

Lawrence.

Krzysztof · May 14, 2012

Hi

Ulrich Scholz said:
Hi,

I'm looking for an approach to the problem of analyzing application
log files.

I need to analyse Java log files from applications (i.e., not logs of
web servers). These logs contain Java exceptions, thread dumps, and
free-form log4j messages issued by log statements inserted by
programmers during development. Right now, these man-made log entries
do not have any specific format.

I'm using OtrosLogViewer http://code.google.com/p/otroslogviewer/
You can mark log events (with different colors) http://code.google.com/p/otroslogviewer/wiki/MarkingEvents and add some note to them http://code.google.com/p/otroslogviewer/wiki/Notes.

Ulrich Scholz said:
What I'm looking for is a tool and/or strategy that supports in lexing/
parsing, tagging, and analysing the log entries. Because there is only
little defined syntax and grammar - and because you might not know
what you are looking for - the task requires the quick issuing of
queries against the log data base. Some sort of visualization would be
nice, too.

You can also write rules for automatic log events marking (based on regex or java code) http://code.google.com/p/otroslogviewer/wiki/CreatingStringOrRegularExpressionMarkers

Krzysztof · Sep 19, 2012

W dniu poniedziaÅ‚ek, 23 maja 2011 09:50:54 UTC+2 uÅ¼ytkownik Ulrich Scholz napisaÅ‚:

Hi,

I'm looking for an approach to the problem of analyzing application
log files.

I need to analyse Java log files from applications (i.e., not logs of
web servers). These logs contain Java exceptions, thread dumps, and
free-form log4j messages issued by log statements inserted by
programmers during development. Right now, these man-made log entries
do not have any specific format.

What I'm looking for is a tool and/or strategy that supports in lexing/
parsing, tagging, and analysing the log entries. Because there is only
little defined syntax and grammar - and because you might not know
what you are looking for - the task requires the quick issuing of
queries against the log data base. Some sort of visualization would be
nice, too.

Pointers to existing tools and approaches as well as appropriate tools/
algorithms to develop the required system would be welcome.

Ulrich

I'm using OtrosLogViewer http://code.google.com/p/otroslogviewer with batchprocessing https://code.google.com/p/otroslogviewer/wiki/BatchProcessing. Following method of your log processing logic is called when log event is parsed.
logDataParsed(LogData data, BatchProcessingContext context);
Additionally you are notified when processing is started/finished.

Krzysztof.

Arne VajhÃ¸j · Sep 19, 2012

W dniu poniedziaÅ‚ek, 23 maja 2011 09:50:54 UTC+2 uÅ¼ytkownik Ulrich Scholz napisaÅ‚:

I'm using OtrosLogViewer http://code.google.com/p/otroslogviewer with batch processing https://code.google.com/p/otroslogviewer/wiki/BatchProcessing. Following method of your log processing logic is called when log event is parsed.
logDataParsed(LogData data, BatchProcessingContext context);
Additionally you are notified when processing is started/finished.

If another posting of yours is to be believed then you are not
a happy user but the author of that software.

Honesty please!

Arne

Adding websocket API to a Serverlet/jsp based application	0	Mar 10, 2014
Help figuring out a directory permission change problem	1	May 12, 2023
Urgent positions: 1.JAVA/SOA Developer, 2.Deployment Resource /Weblogic Administrator, 3.Application	2	Dec 23, 2010
is it possible to package a Java application in one jar that includeseverything?	7	Feb 20, 2012
Log4j Problem with Multiple Application	4	Jan 17, 2005
Junior Java Developer position available	1	Jun 18, 2009
Application log file parser	5	Jun 25, 2004
Web scraping from Java	2	May 28, 2009

analysis of java application logs

Nigel Wade

Martin Gregorie

Lew

Tom Anderson

Martin Gregorie

Martin Gregorie

Lew

Daniele Futtorovic

Jim Gibson

Lew

Nigel Wade

Ulrich Scholz

Arved Sandstrom

Michael Wojcik

Michael Wojcik

Lawrence D'Oliveiro

Lew

Krzysztof

Krzysztof

Arne VajhÃ¸j

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads