order a semicolon-separated data file by a value of a column

S

Stefan H.

Hi,

I have a text data file with some thousands of rows like those:


foo;bar;33ec32.34c;0_164425;12.2;old;;99;dg; ;#asa;
table;mouse;3c32.34c;0_164425;12.2;corner;;99;ddaw2; ;/#;
see;lock;33ec3erwc;5_1121;12.2;bold;;99;ddaw2; ;//;
....

the records are semicolon separated and each field can contain letters,
digits, dots, #, / and so on. All records have the same number of
fields.

I need to order the data file by the value of the fifth column (a real
number). How can I do? Sorry but I'm a very beginner in perl :-(

Thanks
Stefan
 
B

Brian McCauley

Stefan said:
I have a text data file with some thousands of rows like those:


foo;bar;33ec32.34c;0_164425;12.2;old;;99;dg; ;#asa;
table;mouse;3c32.34c;0_164425;12.2;corner;;99;ddaw2; ;/#;
see;lock;33ec3erwc;5_1121;12.2;bold;;99;ddaw2; ;//;
...

the records are semicolon separated and each field can contain letters,
digits, dots, #, / and so on. All records have the same number of
fields.

I need to order the data file by the value of the fifth column (a real
number). How can I do?

What have you tried so far?

Has it invoved...

perldoc -q sort
perldoc -f sort
 
S

Stefan H.

What have you tried so far?

Has it invoved...


yes, you're right

I tried this:
(for simplicity I removed declarations etc)




open(DATAFILE, "data.csv");

while (<DATAFILE>) {
push @not_ordered_list, $_;
}

sub by_value {
# this function is used by sort
local @a, @b;

# put each line into a list
@a = split /;/, $a;
@b = split /;/, $b;

#then compare the values
$a = $a[5];
$b = $b[5];

$a <=> $b;
}

open (DATA_OUT, ">ordered_data.csv");

my @ordered_list = sort by_value @not_ordered_list;

seek (DATAFILE, 0,0);
while (<DATAFILE>) {
$,=";";
@data_record = split /;/, $_;

print DATA_OUT @data_record;
}

close (DATAFILE);
close (DATA_OUT);


the output file is not ordered.

Sorry for the bad code!

Thanks
 
P

Paul Lalli

Stefan said:
I have a text data file with some thousands of rows like those:
foo;bar;33ec32.34c;0_164425;12.2;old;;99;dg; ;#asa;
table;mouse;3c32.34c;0_164425;12.2;corner;;99;ddaw2; ;/#;
see;lock;33ec3erwc;5_1121;12.2;bold;;99;ddaw2; ;//;
...

This is a pretty poor data file to use as an example, as the fifth
column of each row is an identical value.
the records are semicolon separated and each field can contain
letters, digits, dots, #, / and so on. All records have the same
number of fields.
I need to order the data file by the value of the fifth column (a real
number). How can I do? Sorry but I'm a very beginner in perl :-(
I tried this:
(for simplicity I removed declarations etc)

Don't. Declarations are important.
open(DATAFILE, "data.csv");

Always, ('yes *always*') check the return value of open() and all other
system calls.
while (<DATAFILE>) {
push @not_ordered_list, $_;
}

If you're going to read the entire file into memory anyway, you may as
well do it it one step:

my @not_ordered_list = said:
sub by_value {
# this function is used by sort
local @a, @b;

# put each line into a list
@a = split /;/, $a;
@b = split /;/, $b;

You're splitting each line of the array EVERY time sort calls this
function. That's bad.
#then compare the values
$a = $a[5];
$b = $b[5];

You originally said you wanted the data sorted by the fifth column.
$a[5] is the sixth column.

Also, don't change the values of $a and $b within the sort function.
They are aliases to the actual elements in the array.
$a <=> $b;
}

open (DATA_OUT, ">ordered_data.csv");

my @ordered_list = sort by_value @not_ordered_list;

seek (DATAFILE, 0,0);
while (<DATAFILE>) {

Why are you re-reading from the DATAFILE? You just put the sorted list
of lines into @ordered_list. The actual file never changed.
$,=";";
@data_record = split /;/, $_;

print DATA_OUT @data_record;

so... you read each line, create an array out of the elements in the
line (removing the ; in the process), and then print those elements to
the new file, separating each by a semicolon?

Wouldn't it make more sense to just print the original value of $_ ?
}

close (DATAFILE);
close (DATA_OUT);

My suggestion would be to read each element, store each line in an array
reference, store each reference in a larger array, sort the array by the
fifth element of each 'inner' array, and then print the results to a new
file:

#!/usr/bin/perl
use strict;
use warnings;
open my $DATAFILE, 'data.csv' or die "Cannot open file: $!";

my @lines; #holds all the lines of the file.

while (<$DATAFILE>) {
push @lines, [split /;/]; #add this line's elements to the array
}

open my $DATA_OUT, '>', 'ordered_data.csv'
or die "Cannot open file for writing: $!";

sub by_fifth {
$a->[4] <=> $b->[4];
}

foreach (sort by_fifth @lines){
print $DATA_OUT join (';', @{$_});
}
__END__

Paul Lalli
 
J

John Bokma

I tried this:
(for simplicity I removed declarations etc)
don't.

open(DATAFILE, "data.csv");

*check* if this works, ie: or die ".... $!";

And ;-separated is not csv
while (<DATAFILE>) {
push @not_ordered_list, $_;
}

You can read an array in one go:

@not_ordered_list = said:
sub by_value {
# this function is used by sort
local @a, @b;

# put each line into a list
@a = split /;/, $a;
@b = split /;/, $b;

#then compare the values
$a = $a[5];
$b = $b[5];

$a <=> $b;
}

AFAIK no need for local. Also you do the splitting for *every*
comparison, so O( n log n ). Better: do the splitting first, and then
sort.
open (DATA_OUT, ">ordered_data.csv");
check!

my @ordered_list = sort by_value @not_ordered_list;

seek (DATAFILE, 0,0);
Huh?

while (<DATAFILE>) {
$,=";";
@data_record = split /;/, $_;

print DATA_OUT @data_record;

Aargh, you read the original unsorted file in again, then split, and
then glue the result (in a bad way IIRC)... Of course it's not sorted.
}

close (DATAFILE);
close (DATA_OUT);

check and check.
the output file is not ordered.

No, since you don't use the result, you read the unsorted version, and
write that to your output file...
 
J

John W. Krahn

Stefan said:
I have a text data file with some thousands of rows like those:

foo;bar;33ec32.34c;0_164425;12.2;old;;99;dg; ;#asa;
table;mouse;3c32.34c;0_164425;12.2;corner;;99;ddaw2; ;/#;
see;lock;33ec3erwc;5_1121;12.2;bold;;99;ddaw2; ;//;
...

the records are semicolon separated and each field can contain letters,
digits, dots, #, / and so on. All records have the same number of
fields.

I need to order the data file by the value of the fifth column (a real
number). How can I do? Sorry but I'm a very beginner in perl :-(

#!/usr/bin/perl
use warnings;
use strict;

my $file_in = 'file';
my $file_out = 'sorted';

open IN, '<', $file_in or die "Cannot open $file_in: $!";
open OUT, '>', $file_out or die "Cannot open $file_out: $!";

print OUT map join( ';', @$_ ),
sort { $a->[ 5 ] <=> $b->[ 5 ] }
map [ split /;/ ],
<IN>;

__END__



John
 
G

Gerhard M

Stefan H. said:
Hi,

I have a text data file with some thousands of rows like those:


foo;bar;33ec32.34c;0_164425;12.2;old;;99;dg; ;#asa;
table;mouse;3c32.34c;0_164425;12.2;corner;;99;ddaw2; ;/#;
see;lock;33ec3erwc;5_1121;12.2;bold;;99;ddaw2; ;//;
...

the records are semicolon separated and each field can contain letters,
digits, dots, #, / and so on. All records have the same number of
fields.

I need to order the data file by the value of the fifth column (a real
number). How can I do? Sorry but I'm a very beginner in perl :-(

Thanks
Stefan

what's the reason to write an perl-script.... simply use sort:
# sort -t\; -k5n <infile >outfile

Gerhard
 
S

Stefan H.

Hi Paul, thank you very much for the suggestions.

I removed declarations and open file checking to have a lighter code to
post here.

Mi idea was:

- create a list containing each row
- apply to that list the sort function
- the sort function need to split each row and compare the fifth element

now I know that I don't know how sort function works :)

Your code works very well, thanks. I'm learning perl since last week, I
need it for my work and I'm still a beginner. I didn't know perl had
references, I read the lama book and it does't mention references. I've
just ordered the second volume by swartz (perl objects...) I think I
need it.

Thanks again you and others answered my question.

For Graham: you script is very interesting. I'm learning unix shell
scripting just now. Great.

Bye
Stefan
 
P

Paul Lalli

Hi Paul, thank you very much for the suggestions.

You're welcome.
I removed declarations and open file checking to have a lighter code to
post here.

Posting complete, correct code is far more important than saving a few
keystrokes.
Mi idea was:

- create a list containing each row
- apply to that list the sort function
- the sort function need to split each row and compare the fifth element

now I know that I don't know how sort function works :)

Correct. The sort function returns a list of the elements passed in, in
sorted order. It does not actually touch the list at all. HOWEVER,
even if sort() did directly modify the list you pass it, your code would
still not have worked. You built an array from the file, sorted that
array, and then read from the file again.
Your code works very well, thanks. I'm learning perl since last week, I
need it for my work and I'm still a beginner.

Without meaning any disrespect, that was pretty obvious. :) Allow me
then to give you some friendly advice, both about Perl and about this
newsgroup:
1) Do not 'top-post'. That means do not post your reply above what you
are replying to. I have corrected your reply into the proper order in
this post.
2) Comment your code. Explaining in your comments what you are
expecting the code to do will help people correct your faulty beliefs.
3) Read the available documentation. All installations of Perl come
with the 'perldoc' command. For instructions on using it, type 'perldoc
perldoc' at your command prompt. For example, if 'sort' isn't working
the way you think it is, read up on why:
perldoc -f sort
4) Read the Perl FAQ to find if perhaps your question has been asked so
frequently that the answer is documented:
perldoc perlfaq
or
perldoc -q said:
I didn't know perl had
references, I read the lama book and it does't mention references. I've
just ordered the second volume by swartz (perl objects...) I think I
need it.

Schwartz. Randal L. Schwartz. He posts to this newsgroup from time to
time, so it's best to spell his name correctly.

In the meantime before your copy of the Alpaca arrives, consider reading
the available tutorials and references dealing with references:
perldoc perlreftut
perldoc perllol
perldoc perldsc
perldoc perlref
Thanks again you and others answered my question.

You're welcome. We wish you the best of luck in your Perl experiences.
Feel free to post questions when you get stuck... but please try to
follow my advice. :)

Paul Lalli
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top