Stefan said:
I have a text data file with some thousands of rows like those:
foo;bar;33ec32.34c;0_164425;12.2;old;;99;dg; ;#asa;
table;mouse;3c32.34c;0_164425;12.2;corner;;99;ddaw2; ;/#;
see;lock;33ec3erwc;5_1121;12.2;bold;;99;ddaw2; ;//;
...
This is a pretty poor data file to use as an example, as the fifth
column of each row is an identical value.
the records are semicolon separated and each field can contain
letters, digits, dots, #, / and so on. All records have the same
number of fields.
I need to order the data file by the value of the fifth column (a real
number). How can I do? Sorry but I'm a very beginner in perl :-(
I tried this:
(for simplicity I removed declarations etc)
Don't. Declarations are important.
open(DATAFILE, "data.csv");
Always, ('yes *always*') check the return value of open() and all other
system calls.
while (<DATAFILE>) {
push @not_ordered_list, $_;
}
If you're going to read the entire file into memory anyway, you may as
well do it it one step:
my @not_ordered_list = said:
sub by_value {
# this function is used by sort
local @a, @b;
# put each line into a list
@a = split /;/, $a;
@b = split /;/, $b;
You're splitting each line of the array EVERY time sort calls this
function. That's bad.
#then compare the values
$a = $a[5];
$b = $b[5];
You originally said you wanted the data sorted by the fifth column.
$a[5] is the sixth column.
Also, don't change the values of $a and $b within the sort function.
They are aliases to the actual elements in the array.
$a <=> $b;
}
open (DATA_OUT, ">ordered_data.csv");
my @ordered_list = sort by_value @not_ordered_list;
seek (DATAFILE, 0,0);
while (<DATAFILE>) {
Why are you re-reading from the DATAFILE? You just put the sorted list
of lines into @ordered_list. The actual file never changed.
$,=";";
@data_record = split /;/, $_;
print DATA_OUT @data_record;
so... you read each line, create an array out of the elements in the
line (removing the ; in the process), and then print those elements to
the new file, separating each by a semicolon?
Wouldn't it make more sense to just print the original value of $_ ?
}
close (DATAFILE);
close (DATA_OUT);
My suggestion would be to read each element, store each line in an array
reference, store each reference in a larger array, sort the array by the
fifth element of each 'inner' array, and then print the results to a new
file:
#!/usr/bin/perl
use strict;
use warnings;
open my $DATAFILE, 'data.csv' or die "Cannot open file: $!";
my @lines; #holds all the lines of the file.
while (<$DATAFILE>) {
push @lines, [split /;/]; #add this line's elements to the array
}
open my $DATA_OUT, '>', 'ordered_data.csv'
or die "Cannot open file for writing: $!";
sub by_fifth {
$a->[4] <=> $b->[4];
}
foreach (sort by_fifth @lines){
print $DATA_OUT join (';', @{$_});
}
__END__
Paul Lalli