analysis of java application logs

Discussion in 'Java' started by Ulrich Scholz, May 23, 2011.

  1. Hi,

    I'm looking for an approach to the problem of analyzing application
    log files.

    I need to analyse Java log files from applications (i.e., not logs of
    web servers). These logs contain Java exceptions, thread dumps, and
    free-form log4j messages issued by log statements inserted by
    programmers during development. Right now, these man-made log entries
    do not have any specific format.

    What I'm looking for is a tool and/or strategy that supports in lexing/
    parsing, tagging, and analysing the log entries. Because there is only
    little defined syntax and grammar - and because you might not know
    what you are looking for - the task requires the quick issuing of
    queries against the log data base. Some sort of visualization would be
    nice, too.

    Pointers to existing tools and approaches as well as appropriate tools/
    algorithms to develop the required system would be welcome.

    Ulrich
    Ulrich Scholz, May 23, 2011
    #1
    1. Advertising

  2. On 23 Mai, 09:50, Ulrich Scholz <> wrote:
    > I'm looking for an approach to the problem of analyzing application
    > log files.
    >
    > I need to analyse Java log files from applications (i.e., not logs of
    > web servers). These logs contain Java exceptions, thread dumps, and
    > free-form log4j messages issued by log statements inserted by
    > programmers during development. Right now, these man-made log entries
    > do not have any specific format.
    >
    > What I'm looking for is a tool and/or strategy that supports in lexing/
    > parsing, tagging, and analysing the log entries. Because there is only
    > little defined syntax and grammar - and because you might not know
    > what you are looking for - the task requires the quick issuing of
    > queries against the log data base. Some sort of visualization would be
    > nice, too.
    >
    > Pointers to existing tools and approaches as well as appropriate tools/
    > algorithms to develop the required system would be welcome.


    I once did a project for our Ruby Best Practices blog. The code is
    over there at github:
    https://github.com/rklemme/muppet-laboratories

    Explanations can be found in the blog. This is the first posting of
    the series:
    http://blog.rubybestpractices.com/posts/rklemme/005_Enter_the_Muppet_Laboratories.html

    This works different from what you want: log files are read and
    written out to small log files according to particular criteria. But
    you could reuse the parsing part (including detection of multi line
    log statements) and write what you found into a relational database.
    If you have it in the DB you can query for at least timestamp, log
    level, message content and probably also thread id and class. If you
    want to do custom tagging you could do that once the data is in the
    database.

    Since we do not know what goal your analysis has and how many
    different questions to want to ask the data it's not entirely clear
    whether that would be the optimal approach for your problem. One
    variant to the above would be to provide the parsing process a number
    of regular expressions with a label attached and label all log entries
    during insertion into the database. But since modern relational
    databases usually also support full text indexing and regular
    expression matches that might also be solved with a view. If your
    data volume is large you need to additionally make sure this remains
    efficient.

    Kind regards

    robert
    Robert Klemme, May 23, 2011
    #2
    1. Advertising

  3. Ulrich Scholz

    jlp Guest

    Le 23/05/2011 09:50, Ulrich Scholz a écrit :
    > Hi,
    >
    > I'm looking for an approach to the problem of analyzing application
    > log files.
    >
    > I need to analyse Java log files from applications (i.e., not logs of
    > web servers). These logs contain Java exceptions, thread dumps, and
    > free-form log4j messages issued by log statements inserted by
    > programmers during development. Right now, these man-made log entries
    > do not have any specific format.
    >
    > What I'm looking for is a tool and/or strategy that supports in lexing/
    > parsing, tagging, and analysing the log entries. Because there is only
    > little defined syntax and grammar - and because you might not know
    > what you are looking for - the task requires the quick issuing of
    > queries against the log data base. Some sort of visualization would be
    > nice, too.
    >
    > Pointers to existing tools and approaches as well as appropriate tools/
    > algorithms to develop the required system would be welcome.
    >
    > Ulrich

    At work, so it is not free, with a colleague we have developped a such tool.

    The colleague has developped the Viewer of CSV file with the library
    JFreeChart. The csv files are time series ( date are for example in
    format YYYY/MM/DD:HH:mm:ss )
    I have developped my own parser that translates native logs => csv files.
    In java i have used the java regexp patterns.
    In a file, we have to find the beginning and the end of an
    enregistrement ( it can be a multi-lines enregistrement). I can
    exclude/include enregistrements with java regexp patterns.

    We have to match the pattern of the date ( regexp and java dateFormat
    pattern).
    For every enregistrement, we can extract usefull values by pattern
    matching ( I use a two passes matching to simplify the patterns) the
    values can be bound to a filter ( http URL for example)
    All this is embedded in swing components.

    I can parse acces logs ( Apache, tomcat, weblogic), log4J logs, Verbse
    GC of JVM ( IBM JVM, Open JDK 7 ..), java Threads dumps, hibernate sql
    logs, Tuxedo logs and more generally all implicit or explicit dated
    enregistrements.
    That are the main ways ...
    I take me a long time, an still in developpement ... but we have not
    found any other tool.
    jlp, May 23, 2011
    #3
  4. Ulrich Scholz

    Lew Guest

    Ulrich Scholz wrote:
    > I'm looking for an approach to the problem of analyzing application
    > log files.
    >
    > I need to analyse Java log files from applications (i.e., not logs of
    > web servers). These logs contain Java exceptions, thread dumps, and
    > free-form log4j messages issued by log statements inserted by
    > programmers during development. Right now, these man-made log entries
    > do not have any specific format.
    >
    > What I'm looking for is a tool and/or strategy that supports in lexing/
    > parsing, tagging, and analysing the log entries. Because there is only
    > little defined syntax and grammar - and because you might not know
    > what you are looking for - the task requires the quick issuing of
    > queries against the log data base. Some sort of visualization would be
    > nice, too.
    >
    > Pointers to existing tools and approaches as well as appropriate tools/
    > algorithms to develop the required system would be welcome.


    It helps if you have a logging strategy that mandates a consistent logging
    format, specific information in particular positions or marked by particular
    markup, logging levels and other such so that your analysis tool isn't faced
    with a completely open-ended input. What you describe requires a general
    text-analysis approach, as you indicate that you can make no guarantees about
    the format. Based on that, your best tool is "less" or equivalent text-file
    reader.

    What is a tool supposed to do, read your mind?

    It's really hard to extract information from a garbage can where people just
    randomly dumped whatever they individually felt like dumping without regard
    for operational needs. You can't build a skyscraper on a bad foundation, and
    you can't build a good log analysis off a crappy log.

    Fix the logging system, then the analysis problem will be tractable.

    --
    Lew
    Honi soit qui mal y pense.
    http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg
    Lew, May 23, 2011
    #4
  5. Ulrich Scholz

    Lew Guest

    CncShipper wrote:
    > I wrote one of these and thought about Open Sourcing it, but lost


    "open sourcing"

    > interest. I parsed the logs into a db, and assigned id's to the


    "DB, "IDs" (no greengrocer's apostrophe, and "id" is a different word from
    "ID", although the meaning you imputed by the substitution is poetic and
    interesting)

    > various fields.
    >
    > You could then search by Type, ( WARNING, SEVERE, etc... )


    "type"

    > You could search a range of times
    > It could handle multiple log files into one run
    > could Sync on an event and stop analyzing on another trigger


    "synch"

    > Graphs to count trends, events, exceptions


    "graphs"

    > Used Reg-Ex a heck of a lot of work.. Sorted all the transactions in


    "used" "regex"

    > the logs, so you could also display by package name, really helped
    > me solve a lot of problems when I was working .. took me nearly two
    > years to complete everything to where it is today..


    Double-dot, or two consecutive periods, is not legitimate punctuation in lieu
    of a comma or full stop.

    > I never found a package that even came close to it.. which is why I
    > wrote it


    You have made an important and useful point. Covering for a bad log format is
    a freeform-text parsing problem, inherently difficult and heuristic and
    probably never perfect. I wonder if your effort would have been better spent
    converting to a log format that is parser-friendly, as the OP should do.

    --
    Lew
    Honi soit qui mal y pense.
    http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg
    Lew, May 23, 2011
    #5
  6. Ulrich Scholz

    jlp Guest

    Le 23/05/2011 17:43, Lew a écrit :
    [SNIP]
    > You have made an important and useful point. Covering for a bad log
    > format is a freeform-text parsing problem, inherently difficult and
    > heuristic and probably never perfect. I wonder if your effort would have
    > been better spent converting to a log format that is parser-friendly, as
    > the OP should do.
    >

    I agree with you, Lee, it is what i did with my own tool. Native logs
    are converted in CSV files. But some logs are not simple to convert :
    - java exceptions
    - java threads dumps ( different for every JVM : Sun/Oracle, JRockit,
    IBM ...)
    - java heap dump summary ( same remark)
    - verbose GC logs (same remark)
    - multi-lines log enregistrement (xml logs ...)

    Others are more simple :
    - acces logs that are Common Log Format ( CLF) or CLF extended compliant
    ( Apache, Tomcat, IIS, WebLogic, Websphere ...)
    - Log4J
    jlp, May 23, 2011
    #6
  7. Ulrich Scholz

    Lew Guest

    jlp wrote:
    > I agree with you, Lee, it is what i [sic] did with my own tool. Native logs are


    Who's Lee?

    --
    Lew
    The first-person singular pronoun in English is spelled "I", not "i". It's
    only one letter long, so it should be possible to spell it correctly. This is
    one of the first lessons in an EFL course, so it should come as no surprise.
    Lew, May 23, 2011
    #7
  8. On 23/05/2011 18:20, Lew allegedly wrote:
    > jlp wrote:
    >> I agree with you, Lee, it is what i [sic] did with my own tool. Native
    >> logs are

    >
    > Who's Lee?
    >


    You're Lee now, Lee.
    Daniele Futtorovic, May 23, 2011
    #8
  9. On 23/05/2011 15:11, Lew allegedly wrote:
    > Ulrich Scholz wrote:
    >> I'm looking for an approach to the problem of analyzing application
    >> log files.
    >>
    >> I need to analyse Java log files from applications (i.e., not logs of
    >> web servers). These logs contain Java exceptions, thread dumps, and
    >> free-form log4j messages issued by log statements inserted by
    >> programmers during development. Right now, these man-made log entries
    >> do not have any specific format.
    >>
    >> What I'm looking for is a tool and/or strategy that supports in lexing/
    >> parsing, tagging, and analysing the log entries. Because there is only
    >> little defined syntax and grammar - and because you might not know
    >> what you are looking for - the task requires the quick issuing of
    >> queries against the log data base. Some sort of visualization would be
    >> nice, too.
    >>
    >> Pointers to existing tools and approaches as well as appropriate tools/
    >> algorithms to develop the required system would be welcome.

    >
    > It helps if you have a logging strategy that mandates a consistent
    > logging format, specific information in particular positions or marked
    > by particular markup, logging levels and other such so that your
    > analysis tool isn't faced with a completely open-ended input. What you
    > describe requires a general text-analysis approach, as you indicate that
    > you can make no guarantees about the format. Based on that, your best
    > tool is "less" or equivalent text-file reader.
    >
    > What is a tool supposed to do, read your mind?
    >
    > It's really hard to extract information from a garbage can where people
    > just randomly dumped whatever they individually felt like dumping
    > without regard for operational needs. You can't build a skyscraper on a
    > bad foundation, and you can't build a good log analysis off a crappy log.
    >
    > Fix the logging system, then the analysis problem will be tractable.
    >


    I would argue around the same lines.

    I've been faced a while ago with a situation where some orthogonal
    organisational unit wanted to exploit my logs. I told them to GTFO.

    My logs are my logs. I put in it what I consider necessary. I often
    improve them as I step through the code. I might change the message, fix
    the level, &c. I don't want to have them set in stone. Neither do I
    generally have enough confidence in them to allow them to be used for
    analysis.

    "The solution, then, is simple", I told them, "spec out the exact
    messages and arguments you want, and the exact situations you want them
    logged in, and I'll add them for you. But leave me my precious debugging
    logs."

    Let me emphasize: IMHO debugging logs and logs for analysis are two
    different things and should be kept strictly separated -- possibly
    logged to a different target respectively.

    --
    DF.
    An escaped convict once said to me:
    "Alcatraz is the place to be"
    Daniele Futtorovic, May 23, 2011
    #9
  10. On 23.05.2011 19:06, Daniele Futtorovic wrote:
    > On 23/05/2011 18:20, Lew allegedly wrote:
    >> jlp wrote:
    >>> I agree with you, Lee, it is what i [sic] did with my own tool. Native
    >>> logs are

    >>
    >> Who's Lee?
    >>

    >
    > You're Lee now, Lee.


    Did you mean to say "Bruce"?
    ;-)

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, May 23, 2011
    #10
  11. On 23.05.2011 15:17, Patricia Shanahan wrote:
    > On 5/23/2011 12:50 AM, Ulrich Scholz wrote:
    >> I'm looking for an approach to the problem of analyzing application
    >> log files.
    >>
    >> I need to analyse Java log files from applications (i.e., not logs of
    >> web servers). These logs contain Java exceptions, thread dumps, and
    >> free-form log4j messages issued by log statements inserted by
    >> programmers during development. Right now, these man-made log entries
    >> do not have any specific format.
    >>
    >> What I'm looking for is a tool and/or strategy that supports in lexing/
    >> parsing, tagging, and analysing the log entries. Because there is only
    >> little defined syntax and grammar - and because you might not know
    >> what you are looking for - the task requires the quick issuing of
    >> queries against the log data base. Some sort of visualization would be
    >> nice, too.
    >>
    >> Pointers to existing tools and approaches as well as appropriate tools/
    >> algorithms to develop the required system would be welcome.

    >
    > I would use Perl, and begin by recognizing some of the more important
    > formats, such as thread dumps. I agree with the desirability of
    > introducing some organized formatting into the log messages, but an
    > ad-hoc Perl program can often get useful data out of a disorganized log.


    Only that Perl is so awful - YMMV of course. But for these kinds of
    tasks (more correctly: for *any* task) I very much prefer to use Ruby
    because of its cleaner OO and cleaner syntax. In these cases where the
    basic format is fixed I place general parsing code in a library (a
    single file really) and then I can write ad hoc scripts which do
    arbitrary processing of the data. That's very productive.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, May 23, 2011
    #11
  12. On 23/05/2011 20:27, Robert Klemme allegedly wrote:
    > On 23.05.2011 19:06, Daniele Futtorovic wrote:
    >> On 23/05/2011 18:20, Lew allegedly wrote:
    >>> jlp wrote:
    >>>> I agree with you, Lee, it is what i [sic] did with my own tool. Native
    >>>> logs are
    >>>
    >>> Who's Lee?
    >>>

    >>
    >> You're Lee now, Lee.

    >
    > Did you mean to say "Bruce"?
    > ;-)


    Wait. I thought Lee was a Yank, not an Aussie? :)
    Daniele Futtorovic, May 23, 2011
    #12
  13. Ulrich Scholz

    Lew Guest

    On 05/23/2011 01:16 PM, Daniele Futtorovic wrote:
    > On 23/05/2011 15:11, Lew allegedly wrote:
    >> Ulrich Scholz wrote:
    >>> I'm looking for an approach to the problem of analyzing application
    >>> log files.
    >>>
    >>> I need to analyse Java log files from applications (i.e., not logs of
    >>> web servers). These logs contain Java exceptions, thread dumps, and
    >>> free-form log4j messages issued by log statements inserted by
    >>> programmers during development. Right now, these man-made log entries
    >>> do not have any specific format.
    >>>
    >>> What I'm looking for is a tool and/or strategy that supports in lexing/
    >>> parsing, tagging, and analysing the log entries. Because there is only
    >>> little defined syntax and grammar - and because you might not know
    >>> what you are looking for - the task requires the quick issuing of
    >>> queries against the log data base. Some sort of visualization would be
    >>> nice, too.
    >>>
    >>> Pointers to existing tools and approaches as well as appropriate tools/
    >>> algorithms to develop the required system would be welcome.

    >>
    >> It helps if you have a logging strategy that mandates a consistent
    >> logging format, specific information in particular positions or marked
    >> by particular markup, logging levels and other such so that your
    >> analysis tool isn't faced with a completely open-ended input. What you
    >> describe requires a general text-analysis approach, as you indicate that
    >> you can make no guarantees about the format. Based on that, your best
    >> tool is "less" or equivalent text-file reader.
    >>
    >> What is a tool supposed to do, read your mind?
    >>
    >> It's really hard to extract information from a garbage can where people
    >> just randomly dumped whatever they individually felt like dumping
    >> without regard for operational needs. You can't build a skyscraper on a
    >> bad foundation, and you can't build a good log analysis off a crappy log.
    >>
    >> Fix the logging system, then the analysis problem will be tractable.
    >>

    >
    > I would argue around the same lines.
    >
    > I've been faced a while ago with a situation where some orthogonal
    > organisational unit wanted to exploit my logs. I told them to GTFO.
    >
    > My logs are my logs. I put in it what I consider necessary. I often
    > improve them as I step through the code. I might change the message, fix
    > the level, &c. I don't want to have them set in stone. Neither do I
    > generally have enough confidence in them to allow them to be used for
    > analysis.
    >
    > "The solution, then, is simple", I told them, "spec out the exact
    > messages and arguments you want, and the exact situations you want them
    > logged in, and I'll add them for you. But leave me my precious debugging
    > logs."
    >
    > Let me emphasize: IMHO debugging logs and logs for analysis are two
    > different things and should be kept strictly separated -- possibly
    > logged to a different target respectively.


    That last is rather a brilliant idea, to use different targets. Heretofore
    I've espoused that logs are primarily an operations tool, not a debugging
    tool, although in service of the former they inevitably and inherently must
    support the former. The problem I've always seen is that logging statements
    are left up to the programmer, and not specified for the project.

    --
    Lew
    Honi soit qui mal y pense.
    http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg
    Lew, May 23, 2011
    #13
  14. Ulrich Scholz

    Lew Guest

    On 05/23/2011 02:27 PM, Robert Klemme wrote:
    > On 23.05.2011 19:06, Daniele Futtorovic wrote:
    >> On 23/05/2011 18:20, Lew allegedly wrote:
    >>> jlp wrote:
    >>>> I agree with you, Lee, it is what i [sic] did with my own tool. Native
    >>>> logs are
    >>>
    >>> Who's Lee?
    >>>

    >>
    >> You're Lee now, Lee.

    >
    > Did you mean to say "Bruce"?
    > ;-)


    Just call me Lew Lee. Then you can sing, "Lew Lee. Lew, lay, thou little
    tiny child. Bye-bye, Lew Lee. Lew, lay!"

    --
    Lew
    Honi soit qui mal y pense.
    http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg
    Lew, May 23, 2011
    #14
  15. On Mon, 23 May 2011 20:33:07 +0200, Robert Klemme wrote:

    > On 23.05.2011 15:17, Patricia Shanahan wrote:
    >> On 5/23/2011 12:50 AM, Ulrich Scholz wrote:
    >>> I'm looking for an approach to the problem of analyzing application
    >>> log files.
    >>>
    >>> I need to analyse Java log files from applications (i.e., not logs of
    >>> web servers). These logs contain Java exceptions, thread dumps, and
    >>> free-form log4j messages issued by log statements inserted by
    >>> programmers during development. Right now, these man-made log entries
    >>> do not have any specific format.
    >>>
    >>> What I'm looking for is a tool and/or strategy that supports in
    >>> lexing/ parsing, tagging, and analysing the log entries. Because there
    >>> is only little defined syntax and grammar - and because you might not
    >>> know what you are looking for - the task requires the quick issuing of
    >>> queries against the log data base. Some sort of visualization would be
    >>> nice, too.
    >>>
    >>> Pointers to existing tools and approaches as well as appropriate
    >>> tools/ algorithms to develop the required system would be welcome.

    >>
    >> I would use Perl, and begin by recognizing some of the more important
    >> formats, such as thread dumps. I agree with the desirability of
    >> introducing some organized formatting into the log messages, but an
    >> ad-hoc Perl program can often get useful data out of a disorganized
    >> log.

    >
    > Only that Perl is so awful - YMMV of course. But for these kinds of
    > tasks (more correctly: for *any* task) I very much prefer to use Ruby
    > because of its cleaner OO and cleaner syntax.
    >

    I do the same, but use gawk rather than Perl: I have the same objections
    to Perl as you, while gawk is pretty straight forward if you understand
    regexes and can write C.

    So far, using gawk to extract the information I've needed from Linux
    system logs has been rather straight forward. Besides, I generally find
    gawk to be more concise and readable than Perl, for this type of job,
    anyway.


    --
    martin@ | Martin Gregorie
    gregorie. | Essex, UK
    org |
    Martin Gregorie, May 23, 2011
    #15
  16. On 23/05/2011 21:02, Lew allegedly wrote:
    > On 05/23/2011 01:16 PM, Daniele Futtorovic wrote:
    >> I've been faced a while ago with a situation where some orthogonal
    >> organisational unit wanted to exploit my logs. I told them to GTFO.
    >>
    >> My logs are my logs. I put in it what I consider necessary. I often
    >> improve them as I step through the code. I might change the message, fix
    >> the level, &c. I don't want to have them set in stone. Neither do I
    >> generally have enough confidence in them to allow them to be used for
    >> analysis.
    >>
    >> "The solution, then, is simple", I told them, "spec out the exact
    >> messages and arguments you want, and the exact situations you want them
    >> logged in, and I'll add them for you. But leave me my precious debugging
    >> logs."
    >>
    >> Let me emphasize: IMHO debugging logs and logs for analysis are two
    >> different things and should be kept strictly separated -- possibly
    >> logged to a different target respectively.

    >
    > That last is rather a brilliant idea, to use different targets.
    > Heretofore I've espoused that logs are primarily an operations tool, not
    > a debugging tool, although in service of the former they inevitably and
    > inherently must support the former. The problem I've always seen is that
    > logging statements are left up to the programmer, and not specified for
    > the project.
    >


    I'd call it (what I described): audit logging. I don't know if the
    meaning of that term normally extends beyond databases, but I don't see
    why it shouldn't.

    --
    DF.
    An escaped convict once said to me:
    "Alcatraz is the place to be"
    Daniele Futtorovic, May 23, 2011
    #16
  17. On 23.05.2011 21:06, Lew wrote:
    > On 05/23/2011 02:27 PM, Robert Klemme wrote:
    >> On 23.05.2011 19:06, Daniele Futtorovic wrote:
    >>> On 23/05/2011 18:20, Lew allegedly wrote:
    >>>> jlp wrote:
    >>>>> I agree with you, Lee, it is what i [sic] did with my own tool. Native
    >>>>> logs are
    >>>>
    >>>> Who's Lee?
    >>>>
    >>>
    >>> You're Lee now, Lee.

    >>
    >> Did you mean to say "Bruce"?
    >> ;-)

    >
    > Just call me Lew Lee. Then you can sing, "Lew Lee. Lew, lay, thou little
    > tiny child. Bye-bye, Lew Lee. Lew, lay!"


    Aye!

    Which reminds me of a guy who called himself "LL Cool J". ;-)

    Associative memory...

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, May 23, 2011
    #17
  18. Ulrich Scholz

    Lew Guest

    On 05/23/2011 04:10 PM, Robert Klemme wrote:
    > On 23.05.2011 21:06, Lew wrote:
    >> On 05/23/2011 02:27 PM, Robert Klemme wrote:
    >>> On 23.05.2011 19:06, Daniele Futtorovic wrote:
    >>>> On 23/05/2011 18:20, Lew allegedly wrote:
    >>>>> jlp wrote:
    >>>>>> I agree with you, Lee, it is what i [sic] did with my own tool. Native
    >>>>>> logs are
    >>>>>
    >>>>> Who's Lee?
    >>>>>
    >>>>
    >>>> You're Lee now, Lee.
    >>>
    >>> Did you mean to say "Bruce"?
    >>> ;-)

    >>
    >> Just call me Lew Lee. Then you can sing, "Lew Lee. Lew, lay, thou little
    >> tiny child. Bye-bye, Lew Lee. Lew, lay!"

    >
    > Aye!
    >
    > Which reminds me of a guy who called himself "LL Cool J". ;-)
    >
    > Associative memory...


    Wow, I feel lucky. I was so afraid I'd be put in Coventry for that pun.

    --
    Lew
    Honi soit qui mal y pense.
    http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg
    Lew, May 23, 2011
    #18
  19. On 11-05-23 04:02 PM, Lew wrote:
    > On 05/23/2011 01:16 PM, Daniele Futtorovic wrote:
    >> On 23/05/2011 15:11, Lew allegedly wrote:
    >>> Ulrich Scholz wrote:
    >>>> I'm looking for an approach to the problem of analyzing application
    >>>> log files.
    >>>>
    >>>> I need to analyse Java log files from applications (i.e., not logs of
    >>>> web servers). These logs contain Java exceptions, thread dumps, and
    >>>> free-form log4j messages issued by log statements inserted by
    >>>> programmers during development. Right now, these man-made log entries
    >>>> do not have any specific format.
    >>>>
    >>>> What I'm looking for is a tool and/or strategy that supports in lexing/
    >>>> parsing, tagging, and analysing the log entries. Because there is only
    >>>> little defined syntax and grammar - and because you might not know
    >>>> what you are looking for - the task requires the quick issuing of
    >>>> queries against the log data base. Some sort of visualization would be
    >>>> nice, too.
    >>>>
    >>>> Pointers to existing tools and approaches as well as appropriate tools/
    >>>> algorithms to develop the required system would be welcome.
    >>>
    >>> It helps if you have a logging strategy that mandates a consistent
    >>> logging format, specific information in particular positions or marked
    >>> by particular markup, logging levels and other such so that your
    >>> analysis tool isn't faced with a completely open-ended input. What you
    >>> describe requires a general text-analysis approach, as you indicate that
    >>> you can make no guarantees about the format. Based on that, your best
    >>> tool is "less" or equivalent text-file reader.
    >>>
    >>> What is a tool supposed to do, read your mind?
    >>>
    >>> It's really hard to extract information from a garbage can where people
    >>> just randomly dumped whatever they individually felt like dumping
    >>> without regard for operational needs. You can't build a skyscraper on a
    >>> bad foundation, and you can't build a good log analysis off a crappy
    >>> log.
    >>>
    >>> Fix the logging system, then the analysis problem will be tractable.

    >>
    >> I would argue around the same lines.
    >>
    >> I've been faced a while ago with a situation where some orthogonal
    >> organisational unit wanted to exploit my logs. I told them to GTFO.
    >>
    >> My logs are my logs. I put in it what I consider necessary. I often
    >> improve them as I step through the code. I might change the message, fix
    >> the level, &c. I don't want to have them set in stone. Neither do I
    >> generally have enough confidence in them to allow them to be used for
    >> analysis.
    >>
    >> "The solution, then, is simple", I told them, "spec out the exact
    >> messages and arguments you want, and the exact situations you want them
    >> logged in, and I'll add them for you. But leave me my precious debugging
    >> logs."
    >>
    >> Let me emphasize: IMHO debugging logs and logs for analysis are two
    >> different things and should be kept strictly separated -- possibly
    >> logged to a different target respectively.

    >
    > That last is rather a brilliant idea, to use different targets.
    > Heretofore I've espoused that logs are primarily an operations tool, not
    > a debugging tool, although in service of the former they inevitably and
    > inherently must support the former. The problem I've always seen is
    > that logging statements are left up to the programmer, and not specified
    > for the project.
    >

    General agreement with all. I also am coming off one particular project
    where part of the work - not a major part, but an important part - was
    to improve logging. One of the first things we did was officially
    recognize that we had many different clients of logging output. They
    wanted different things at different levels at different times with
    different storage stipulations.

    The solution was pretty simple, and it's dynamic. I don't propose to get
    into a logging framework war, but in this case we saw that JUL wouldn't
    cut it, but log4j would do the trick. We had to do some arcane app
    server-related stuff for JMX and log4j.xml, also integrate exception
    handling with various "global" handlers that could also log, and wrap
    log4j calls with a plethora of methods that would result in messages
    formatted to our liking, but after that the heavy lifting was and is
    done: it's now up to the clients - *not* to the developers - to request
    what gets logged and in what manner.

    Developers of course are clients themselves.

    Again, not to get into a logging framework war, but for these purposes
    log4j brings a lot to the table. It's common to need logging on specific
    Java packages to be at a certain level, for output of that specific
    logging to go to a specific target (like its own file) and have its own
    storage policy, and for that logging to not be (or be, as the case
    demands) to be additive to parent logging. Being able to do this is a
    minimum for supporting different clients.

    We also added, as part of our log4j method wrappers, an extra field for
    all log messages that characterizes a "functional category". This allows
    decorating all messages with information as to the identity of a
    functional subsystem, and is helpful to post-processing tools like Splunk.

    This system has been in production now for about 4 months, and
    operational support staff and other clients are very pleased with it.
    It's not perfect, because not all the log statements exist in the code
    to support every informational requirement (known or unknown), but the
    framework is not a problem.

    One sidenote: despite doing everything I describe above, you can still
    end up with logs that are difficult to interpret, and more log
    statements aren't necessarily the answer. This typically happens when
    your code itself is a spaghetti tangle. Sometimes to fix a logging
    problem you really do need to refactor your logged code.

    AHS
    Arved Sandstrom, May 23, 2011
    #19
  20. On Mon, 23 May 2011 15:02:23 -0400, Lew wrote:

    > On 05/23/2011 01:16 PM, Daniele Futtorovic wrote:
    >> Let me emphasize: IMHO debugging logs and logs for analysis are two
    >> different things and should be kept strictly separated -- possibly
    >> logged to a different target respectively.

    >
    > That last is rather a brilliant idea, to use different targets.
    > Heretofore I've espoused that logs are primarily an operations tool, not
    > a debugging tool, although in service of the former they inevitably and
    > inherently must support the former. The problem I've always seen is
    > that logging statements are left up to the programmer, and not specified
    > for the project.
    >

    I tend to use at least two logging streams: debugging and operational. I
    leave debugging statements in production code: its normally off (of
    course) but can be turned on if needed. Operational debugging includes
    informational and error messages to be used by sysadmins which are always
    enabled and should be fairly infrequent as well as performance
    measurement messages. The latter can be configured on or off. As others
    have said, the messages need to be designed with both log stream
    selection and ease of parsing for later analysis in mind.

    In a C application for a *NIX OS its easiest to send all these messages
    to the system logger and let it deal with creating separate logs for the
    various message streams: its then trivial to use 'tail' to present the
    operational stream to sysadmins.

    If the application is written in a language that doesn't provide easy
    access to the system logger or is run on an OS that doesn't have one, I'd
    include a custom logging process as part of the application.


    --
    martin@ | Martin Gregorie
    gregorie. | Essex, UK
    org |
    Martin Gregorie, May 23, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ellinghaus, Lance
    Replies:
    1
    Views:
    506
    David Bolen
    Feb 20, 2004
  2. Andy B
    Replies:
    1
    Views:
    225
    Joern Schou-Rode
    Oct 29, 2008
  3. etantonio
    Replies:
    6
    Views:
    369
    etantonio
    Apr 19, 2009
  4. ssubbarayan
    Replies:
    5
    Views:
    2,285
    Dave Hansen
    Nov 3, 2009
  5. Alex Fenton
    Replies:
    0
    Views:
    140
    Alex Fenton
    Nov 8, 2004
Loading...

Share This Page