Q: Concatenate lines which have the same key value

M

Magnus

Hi!

How do I concatenete lines which have the same key value. For example:

file1
date;starttime;key-value1
date;starttime;key-value2
date;stoptime;key-value1

file2
date;stoptime;key-value2
date;starttime;key-value3
date;stoptime;key-value3

I would like to have a new file which looks like;
date;starttime;stoptime;key-value1
date;starttime;stoptime;key-value2
date;starttime;stoptime;key-value3

Thanks in advance!
Magnus
 
A

Andy Baxter

At earth time Mon, 12 Jan 2004 01:47:24 -0800, the following transmission
was received from the entity known as Magnus:
Hi!

How do I concatenete lines which have the same key value. For example:

file1
date;starttime;key-value1
date;starttime;key-value2
date;stoptime;key-value1

file2
date;stoptime;key-value2
date;starttime;key-value3
date;stoptime;key-value3

I would like to have a new file which looks like;
date;starttime;stoptime;key-value1
date;starttime;stoptime;key-value2
date;starttime;stoptime;key-value3

Thanks in advance!
Magnus

Here's a hint - you can split fields like this into seperate variables
with:

my ($date,$time,$key)=split /;/, $line; # may need a \ in front of
# the first ';' to say it's not
# an end of line.

and you can store all the times and dates for a given key using a hash of
array references like this:

my %hash=();

$hash{$key}=[]; # to initialise the arrayref

push @{$hash{$key}},($date,$time); # to add a single date and time to
# the array in $hash{$key}

spend a while figuring out how this works (see 'hashes of arrays' in the
perldsc manual page, and look up push and pop in perlfunc), and you should
be able to write the rest of the program. If not, you need to learn some
perl.
 
G

gnari

Magnus said:
[snip]
use 2 hashes:
$start{$key}=$start;
$stop{$key}=$stop;

or use a hash or arrays
$range{$key}[0]=$start;
$range{$key}[1]=$stop;

you do not mention if the same key can appear more than 2 times
date;starttime;stoptime;key-value3

what happens if start and stop are not on the same date?

gnari
 
A

Anno Siegel

Andy Baxter said:
At earth time Mon, 12 Jan 2004 01:47:24 -0800, the following transmission
was received from the entity known as Magnus:


Here's a hint - you can split fields like this into seperate variables
with:

my ($date,$time,$key)=split /;/, $line; # may need a \ in front of
# the first ';' to say it's not
# an end of line.

Basically a good approach, but you messed up some of the details.

There are four fields in the data, so this should be

my ($date, $starttime, $stoptime, $key) = split /;/, $line;

The comment is entirely off the mark. ";" Doesn't need to be escaped
in a regex, and it never indicates an end-of-line.
and you can store all the times and dates for a given key using a hash of
array references like this:

my %hash=();

No need to initialize the hash.

my %hash;

does exactly the same thing.
$hash{$key}=[]; # to initialise the arrayref

No need to do that either. Perl does autovivification of references.
That statement can go altogether.
push @{$hash{$key}},($date,$time); # to add a single date and time to
# the array in $hash{$key}

spend a while figuring out how this works (see 'hashes of arrays' in the
perldsc manual page, and look up push and pop in perlfunc), and you should
be able to write the rest of the program. If not, you need to learn some
perl.

You too.

Anno
 
T

Tore Aursand

How do I concatenete lines which have the same key value. For example:

file1
date;starttime;key-value1
date;starttime;key-value2
date;stoptime;key-value1

file2
date;stoptime;key-value2
date;starttime;key-value3
date;stoptime;key-value3

I would like to have a new file which looks like;
date;starttime;stoptime;key-value1
date;starttime;stoptime;key-value2
date;starttime;stoptime;key-value3

You may have a logical error here: What if the date from the second file
differs from the corresponding date in the first file? Personally, I
would have stored both these dates;

date;starttime;date;stoptime;key-value1

Anyway: What have you tried so far? What didn't work?
 
A

Andy Baxter

At earth time Mon, 12 Jan 2004 12:10:47 +0000, the following transmission
was received from the entity known as Anno Siegel:
Basically a good approach, but you messed up some of the details.

There are four fields in the data, so this should be

my ($date, $starttime, $stoptime, $key) = split /;/, $line;

That's not right - there are 3 fields in the input data which need to be
stored as four fields in memory against the right key, and then output
again as four fields, so my version was correct apart from the comment.
The comment is entirely off the mark. ";" Doesn't need to be escaped
in a regex, and it never indicates an end-of-line.

OK. Not end of line - end of statement. I do know the difference, but I
used the wrong word.
and you can store all the times and dates for a given key using a hash of
array references like this:

my %hash=();

No need to initialize the hash.

my %hash;

does exactly the same thing.
OK.
$hash{$key}=[]; # to initialise the arrayref

No need to do that either. Perl does autovivification of references.
That statement can go altogether.

Didn't know that. Thanks.

Fairy snuff.

andy.
 
A

Andy Baxter

At earth time Mon, 12 Jan 2004 13:30:14 +0100, the following transmission
was received from the entity known as Tore Aursand:
You may have a logical error here: What if the date from the second file
differs from the corresponding date in the first file? Personally, I
would have stored both these dates;

date;starttime;date;stoptime;key-value1

Or else output it as:

date;starttime;period;key-value
 
M

Magnus

Thank you all for your comments and suggestions!

I have a script now that works. I am a newbie to perl so I guess it is
a really ugly script but anyway. This is how I did it. There is a
string in the files that tells me if it is Start or Stop. If Stop
comes first I had to ignore that. The keyvalue is 100% unique for a
Start and a Stop. If I have done something really stupid please feel
free to comment.

Cheers!
Magnus


Example of input file.txt:
Start;date;starttime1;key-value1
Start;date;starttime2;key-value2
Stop;date;stoptime1;key-value1
Stop;date;stoptime4;key-value4
Stop;date;stoptime2;key-value2
Start;date;starttime3;key-value3
Stop;date;stoptime3;key-value3

Example of result:
cat file.txt | ./script.pl
Start;starttime1;Stop;stoptime1
Start;starttime2;Stop;stoptime2
Start;starttime3;Stop;stoptime3

script.pl
#!/bin/perl -w
use strict;
use Data::Dumper;

my %HoL; # Define HoL

while(<>)
{
my @line = split(/;/);
my $hashkey = $line[3]; # Set hashkey
if ($line[0] =~ /Start/) # Check if Start
{
push @{ $HoL{$hashkey} },($line[0],$line[2]);
}
if ($line[0] =~ /Stop/) # Check if Stop
{
if (exists $HoL{$hashkey}) # Check if hashkey exists, dont
bother when Stop first
{
push @{ $HoL{$hashkey} },($line[0],$line[2]);
my @catline; # Define a L
@catline = @{ $HoL{$hashkey} }; # Set L to value of HoL
print "$catline[0];$catline[1];$catline[2];$catline[3]\n";
# print Dumper(@catline);
}
}
}
 
A

Andy Baxter

At earth time Tue, 13 Jan 2004 06:21:17 -0800, the following transmission
was received from the entity known as Magnus:
Thank you all for your comments and suggestions!

I have a script now that works. I am a newbie to perl so I guess it is
a really ugly script but anyway. This is how I did it. There is a
string in the files that tells me if it is Start or Stop. If Stop
comes first I had to ignore that. The keyvalue is 100% unique for a
Start and a Stop. If I have done something really stupid please feel
free to comment.

Cheers!
Magnus


Example of input file.txt:
Start;date;starttime1;key-value1
Start;date;starttime2;key-value2
Stop;date;stoptime1;key-value1
Stop;date;stoptime4;key-value4
Stop;date;stoptime2;key-value2
Start;date;starttime3;key-value3
Stop;date;stoptime3;key-value3

Example of result:
cat file.txt | ./script.pl
Start;starttime1;Stop;stoptime1
Start;starttime2;Stop;stoptime2
Start;starttime3;Stop;stoptime3

script.pl
#!/bin/perl -w
use strict;
use Data::Dumper;

my %HoL; # Define HoL

while(<>)
{
my @line = split(/;/);
my $hashkey = $line[3]; # Set hashkey
if ($line[0] =~ /Start/) # Check if Start
{
push @{ $HoL{$hashkey} },($line[0],$line[2]);
}
if ($line[0] =~ /Stop/) # Check if Stop
{
if (exists $HoL{$hashkey}) # Check if hashkey exists, dont
bother when Stop first
{
push @{ $HoL{$hashkey} },($line[0],$line[2]);
my @catline; # Define a L
@catline = @{ $HoL{$hashkey} }; # Set L to value of HoL
print "$catline[0];$catline[1];$catline[2];$catline[3]\n";
# print Dumper(@catline);
}
}
}

Looks pretty much OK. No need to use =~ when checking for Start or Stop,
because you've already split off this field so you can just do a string
compare with eq. In theory, you don't need the Start and Stop fields
because you could check which it is by looking for the later of the two,
but that's up to you, and your way is probably simpler and more robust.

Putting 'Start' and 'Stop' in the array is redundant in a way, because you
know which is which from the positioning. Or else make the array another
hash, and use the first field from the input line as the key for the
sub-hash, so you could just do:
${$HoL{$hashkey}}{$line[0]}=$line[2];

What happens if the process you're logging runs over midnight or for more
than one day?

Personally I would have read the whole thing in, then printed it out using
a loop based around:

foreach $line ( keys(%HoL)) {
...
};

But that's a matter of taste really - would be useful if you want to pass
the data to another routine for post-processing before printing it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top