S
Smitty
I have a requirement to parse a very large log file, and extract a
variety of data.
One of the things I need to do is build a cross reference map from one
symbolic name to another, and for this I guess I use a hash. The data
for an element of this 'map' can be found within a single line of the
log file that might look something like this
....key....is known as value....
The reason for this hash is that I then need to process later lines in
the file for two different things
First I need to look for a line something like this:
.......Created object 'key' ........
Then later I need to look for a line something like this
.......Processed object 'value'......
Where the 'key' and 'value' are the same as what would be in the map
above, but sometimes the processed lines contain objects which were not
created 'locally' so I need to ignore them. Also, not all created
objects get processed.
The main requirement is to discover at what time I have processed XXX
of the 'created' key objects. So I was imagining I would need another
hash with the key being the value and the value being the key from
above, so I would also have a Xref in the above loop like this.
## process the MAP entries
my %map = ();
my %xref = ();
while(<>)
{
$_ =~ /...(key).is a ..(value).../;
${map{$1}} = $2;
${xref{$2}} = $1;
}
## process the 'created' and 'processed' entries
my $counter = 0;
my %created_map = ();
while(<>)
{
$_ =~ /...Create object (key).../;
if($1)
{
${created_map{$1}} = ${map{$1}} ;
} else {
$_ =~ /...Processed object (value).../;
if($1)
{
## get the key from the value
my $key = ${xref{$1}};
if( ${created_map{$key}} )
{
## if we created it, count it
$counter ++;
}
if( $counter >= XXXX )
{
## do the work regarding the creation of the XXXth
object
}
}
}
}
Forgive me if the code above isn't compilable - consider it akin to
psuedo code, it's not really a requirement for the purpose of this
quesiton
Now, the quesiton is.
Am I going to pay a performance penalty for all those hash lookups, and
can anyone suggest a better 'perlish' way which could help me acheive
the same results with better performance?
variety of data.
One of the things I need to do is build a cross reference map from one
symbolic name to another, and for this I guess I use a hash. The data
for an element of this 'map' can be found within a single line of the
log file that might look something like this
....key....is known as value....
The reason for this hash is that I then need to process later lines in
the file for two different things
First I need to look for a line something like this:
.......Created object 'key' ........
Then later I need to look for a line something like this
.......Processed object 'value'......
Where the 'key' and 'value' are the same as what would be in the map
above, but sometimes the processed lines contain objects which were not
created 'locally' so I need to ignore them. Also, not all created
objects get processed.
The main requirement is to discover at what time I have processed XXX
of the 'created' key objects. So I was imagining I would need another
hash with the key being the value and the value being the key from
above, so I would also have a Xref in the above loop like this.
## process the MAP entries
my %map = ();
my %xref = ();
while(<>)
{
$_ =~ /...(key).is a ..(value).../;
${map{$1}} = $2;
${xref{$2}} = $1;
}
## process the 'created' and 'processed' entries
my $counter = 0;
my %created_map = ();
while(<>)
{
$_ =~ /...Create object (key).../;
if($1)
{
${created_map{$1}} = ${map{$1}} ;
} else {
$_ =~ /...Processed object (value).../;
if($1)
{
## get the key from the value
my $key = ${xref{$1}};
if( ${created_map{$key}} )
{
## if we created it, count it
$counter ++;
}
if( $counter >= XXXX )
{
## do the work regarding the creation of the XXXth
object
}
}
}
}
Forgive me if the code above isn't compilable - consider it akin to
psuedo code, it's not really a requirement for the purpose of this
quesiton
Now, the quesiton is.
Am I going to pay a performance penalty for all those hash lookups, and
can anyone suggest a better 'perlish' way which could help me acheive
the same results with better performance?