Parsing a log file

Prabh · Nov 20, 2003

Hello all,
I need to parse a log file and generate a formatted output.
I do have a solution in PERL, but now need to transform it to Java.
Could anyone please direct how do I go about it.

I have a log file of following format, which contains info. on a
series of files after a process.

========================================================
File1: Info. on File1
File2: Info. on File2
File1: Info. on File1
File3: Info. on File3
File1: Info. on File1
and so on...
========================================================

I want to display the output as...

============================
n1 lines of info on File1
n2 lines of info on File2
n3 lines of info on File3
============================

Is this the way to do it in Java:
1) Process log file one line at a time.
2) First find all files mentioned in the log, store the files list in
an array(?).
3) Just consider the unique files in this array.
4) Loop through this unique files list and for each file, find
relevant matches in the original log. Keep track of the count or print
count.

Questions:
1) How does one find "unique" elements in an array in Java?

2) Whats the diff. b/w an array, vector, HashMap etc.?
Is there any thumb rule about when should one use which collection?
Why cant one have an array or hash (as in PERL) and leave the rest
to the user to make whatever of it, why these gazillion choices?

3) I understand my soultion is more shell-scripting or Perl-ish way of
doing things, what would be the Java way to do this?

Thanks,
Prab

===============================================================================
For what its worth, this'd be my solution in PERL.

#!/usr/local/bin/perl
#============================

#=====================
# Log file is Foo.txt
#---------------------
open(FDL,"Foo.txt") ;
chomp(@arr = <FDL> ) ;
close(FDL) ;

#=================================
# First, get the files in the log
#---------------------------------
undef @files ;
foreach $line ( @arr ) {
push(@files,(split(/\:/,$line))[0]) ;
}

#==========================================
# Sort the files, find the uniq files
# Foreach such file, grep the original log
# for all occurences and count.
#------------------------------------------
foreach $file ( &uniq(sort @files ) ) {
undef @info ;
$info = grep {/^$file\:/} @arr ;
printf "$info lines of info on $file\n";
}

#=============================
# subroutine to do Unixy-uniq
#-----------------------------
sub uniq {
@uniq = @_ ;
#=======================================================
# Foreach array element , compare with its predecessor.
# If yes, its already present and splice it from array.
#=======================================================
for ( $i = 1; $i < @uniq ; $i++ ) {
if ( @uniq[$i] eq @uniq[$i-1] ) {
splice( @uniq,$i-1,1 ) ;
$i--;
}
}

return @uniq ;

}

John C. Bollinger · Nov 20, 2003

Prabh said:
Hello all,
I need to parse a log file and generate a formatted output.
I do have a solution in PERL, but now need to transform it to Java.
Could anyone please direct how do I go about it.

Sure. See below.

I have a log file of following format, which contains info. on a
series of files after a process.

========================================================
File1: Info. on File1
File2: Info. on File2
File1: Info. on File1
File3: Info. on File3
File1: Info. on File1
and so on...
========================================================

I want to display the output as...

============================
n1 lines of info on File1
n2 lines of info on File2
n3 lines of info on File3
============================

Is this the way to do it in Java:
1) Process log file one line at a time.
2) First find all files mentioned in the log, store the files list in
an array(?).
3) Just consider the unique files in this array.
4) Loop through this unique files list and for each file, find
relevant matches in the original log. Keep track of the count or print
count.

Questions:
1) How does one find "unique" elements in an array in Java?

The same way you did it in Perl can work -- sort the array, then step
through it looking for elements different from the previous ones. You
could also dump the contents into a Set of one flavor or another, which
will take care of it for you. There are variations, of course.

2) Whats the diff. b/w an array, vector, HashMap etc.?

An array is a low-level data construct with fixed length and fixed
element type. Elements may be of primitive types.

A Vector, ArrayList, or other type of List is similar to a Perl array.
It is dynamically sized, and can hold objects of any type, but not
primitives. It maintains its contents in sequence, and (generally)
permits duplicate entries.

A HashMap or other type of Map is a data structure that associates two
objects, a "key" and a "value". This is similar to a Perl "hash",
especially in the particular case of a HashMap, but both key and value
must be objects (not primitives).

There are also various kinds of Sets, which model mathematical sets in
that they contain only one "copy" of each element, and do not
necessarilly imply any particular order to their contents. Contents
must be objects, not primitives.

Is there any thumb rule about when should one use which collection?

Typically, one should use the one that best models the collection of
objects in question, and is best suited to the problem. Different
Collections have different performance trade-offs and features, as well
as tradeoffs with respect to arrays.

Why cant one have an array or hash (as in PERL) and leave the rest
to the user to make whatever of it, why these gazillion choices?

One _can_ have the equivalents of the Perl constructs. One can also
have alternatives that may be more suitable for some particular problem,
without having to build them oneself. Note also that except for arrays,
none of these things are built into the language -- instead they are
part of the standard class libraries. A fine point perhaps, but it's
akin to the difference between Perl builtins and CPAN modules.

3) I understand my soultion is more shell-scripting or Perl-ish way of
doing things, what would be the Java way to do this?

It would be possible to translate your Perl version to a thoroughly
equivalent Java version. It seems that that is what you are suggesting
in your proposed solution above, except you neglect the fact that your
Perl version slurps the whole log into memory before processing the
lines. It is probably wiser (in both Java _and_ Perl) to avoid that.

I was going to try to describe an approach in prose, but for a Java
novice it may just be easier if I write a partial (almost complete, it
turned out) implementation. It's not clean, wonderful, or tested, but
hopefully it gets the idea across:

package analyzer;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.PrintWriter;

import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.TreeSet;

public class LogAnalyzer {

/**
* Reads a file in the expected format from the file named by
* fileName, and returns a Map from String file names mentioned
* in the log to counts of lines ascribed to those files (as
* int[1]s)
*/
public Map analyze(String fileName) throws IOException {

// Uses the default charset:
FileReader logFileReader = new FileReader(fileName);

BufferedReader logReader = new BufferedReader(logFileReader);
HashMap counts = new HashMap();
String logLine = logReader.readLine();

// Read the lines one at a time, and keep track of the number of
// lines ascribed to each file:
while (logLine != null) {
String fileName = extractFileName(logLine);
int[] count = (int[]) counts.get(fileName);

if (count == null) { // First time for this file name
count = new int[1];
count[0] = 1;
counts.put(fileName, count);
} else { // Seen this file name before
count[0]++;
}

// read the next line, if any
logLine = logReader.readLine();
}

return counts;
}

/**
* Processes a line read from the log file to extract the file name
* it references. Returns the file name as a String.
*/
private String extractFileName(String line) {
int colonIndex = line.indexOf(':');

return (colonIndex < 0) ? line : line.substring(0, colonIndex);
}

/**
* Creates a report on the provided PrintWriter, based on the
* provided Map of analysis results (of the form created by
* analyze(String)); file names will be listed on the report in
* lexicographical order
*/
public void writeReport(PrintWriter out, Map results) {

// Extract all the file names and (automatically) sort them
TreeSet names = new TreeSet(results.keySet());

// Step through the file names in order, writing a summary line
// for each one
for (Iterator it = names.iterator(); it.hasNext(); ) {
String fileName = (String) it.next();
int[] count = (int[]) results.get(fileName);

out.println(count[0] + " lines of info on " + fileName);
}
}

}

You need to provide a main method or seperate main class that drives
that thing by calling its analyze method and passing the result to its
writeReport method. It would be awfully easy to do it much the same way
in Perl, BTW.

John Bollinger
(e-mail address removed)

===============================================================================
For what its worth, this'd be my solution in PERL.

#!/usr/local/bin/perl
#============================

#=====================
# Log file is Foo.txt
#---------------------
open(FDL,"Foo.txt") ;
chomp(@arr = <FDL> ) ;
close(FDL) ;

#=================================
# First, get the files in the log
#---------------------------------
undef @files ;
foreach $line ( @arr ) {
push(@files,(split(/\:/,$line))[0]) ;
}

#==========================================
# Sort the files, find the uniq files
# Foreach such file, grep the original log
# for all occurences and count.
#------------------------------------------
foreach $file ( &uniq(sort @files ) ) {
undef @info ;
$info = grep {/^$file\:/} @arr ;
printf "$info lines of info on $file\n";
}

#=============================
# subroutine to do Unixy-uniq
#-----------------------------
sub uniq {
@uniq = @_ ;
#=======================================================
# Foreach array element , compare with its predecessor.
# If yes, its already present and splice it from array.
#=======================================================
for ( $i = 1; $i < @uniq ; $i++ ) {
if ( @uniq[$i] eq @uniq[$i-1] ) {
splice( @uniq,$i-1,1 ) ;
$i--;
}
}

return @uniq ;

}

Karl von Laudermann · Nov 21, 2003

Hello all,
I need to parse a log file and generate a formatted output.
I do have a solution in PERL, but now need to transform it to Java.
Could anyone please direct how do I go about it.

I have a log file of following format, which contains info. on a
series of files after a process.

========================================================
File1: Info. on File1
File2: Info. on File2
File1: Info. on File1
File3: Info. on File3
File1: Info. on File1
and so on...
========================================================

I want to display the output as...

============================
n1 lines of info on File1
n2 lines of info on File2
n3 lines of info on File3
============================

2) Whats the diff. b/w an array, vector, HashMap etc.?
Is there any thumb rule about when should one use which collection?

An array has a fixed size at the time of creation. A Vector is
dynamic, so you can simply append items to the end of it by using the
add method, and it will grow automatically. A HashMap or Hashtable
allows you to store and retrieve value objects using key objects
rather than simple numeric indices, and the values are unordered.

Offhand, I would implement your program the following way: Use a
Hashtable, where the filenames in the log are used as keys, and an
Integer object containing the line count for a file is used as its
value. As you read each line of the log, parse out the file name and
retrieve the line count from the Hashtable, using the file name as the
key. If a line count doesn't exist for that file name yet, store one
with the value of 1. If it does exist, replace it with an Integer
whose value is 1 higher than the old one. Repeat until you've read the
entire file.

At this point, you can use the keys method of Hashtable to get the
filenames. For each file you can get the line count and print it.

Of course, this all assumes that you don't want to store the lines and
do other useful things with them. If you do, then use a Hashtable to
map each filename to a Vector containing the lines for that file
instead.

Hope this helps.

Comments on parsing solution.	12	Nov 20, 2003
How do i edit the log file format for the "Geogebra Classic 6 Exam Mode"?	0	Apr 27, 2023
Tailing a series of log files	8	Jun 12, 2013
Rearranging .ply file via C++ String Parsing	0	Dec 14, 2019
File content in descending order	0	Nov 8, 2022
Parsing Process Log files - looking for Perl Modules	1	Feb 9, 2013
Fix and improve a UDF File System Driver	0	Aug 20, 2023
nice parallel file reading	14	Apr 26, 2013

Parsing a log file

Prabh

John C. Bollinger

Karl von Laudermann

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads