order a semicolon-separated data file by a value of a column

Stefan H. · Sep 22, 2004

Hi,

I have a text data file with some thousands of rows like those:

foo;bar;33ec32.34c;0_164425;12.2;old;;99;dg; ;#asa;
table;mouse;3c32.34c;0_164425;12.2;corner;;99;ddaw2; ;/#;
see;lock;33ec3erwc;5_1121;12.2;bold;;99;ddaw2; ;//;
....

the records are semicolon separated and each field can contain letters,
digits, dots, #, / and so on. All records have the same number of
fields.

I need to order the data file by the value of the fifth column (a real
number). How can I do? Sorry but I'm a very beginner in perl :-(

Thanks
Stefan

Brian McCauley · Sep 22, 2004

Stefan said:
I have a text data file with some thousands of rows like those:

foo;bar;33ec32.34c;0_164425;12.2;old;;99;dg; ;#asa;
table;mouse;3c32.34c;0_164425;12.2;corner;;99;ddaw2; ;/#;
see;lock;33ec3erwc;5_1121;12.2;bold;;99;ddaw2; ;//;
...

the records are semicolon separated and each field can contain letters,
digits, dots, #, / and so on. All records have the same number of
fields.

I need to order the data file by the value of the fifth column (a real
number). How can I do?

What have you tried so far?

Has it invoved...

perldoc -q sort
perldoc -f sort

Stefan H. · Sep 22, 2004

What have you tried so far?

Has it invoved...

yes, you're right

I tried this:
(for simplicity I removed declarations etc)

open(DATAFILE, "data.csv");

while (<DATAFILE>) {
push @not_ordered_list, $_;
}

sub by_value {
# this function is used by sort
local @a, @b;

# put each line into a list
@a = split /;/, $a;
@b = split /;/, $b;

#then compare the values
$a = $a[5];
$b = $b[5];

$a <=> $b;
}

open (DATA_OUT, ">ordered_data.csv");

my @ordered_list = sort by_value @not_ordered_list;

seek (DATAFILE, 0,0);
while (<DATAFILE>) {
$,=";";
@data_record = split /;/, $_;

print DATA_OUT @data_record;
}

close (DATAFILE);
close (DATA_OUT);

the output file is not ordered.

Sorry for the bad code!

Thanks

Paul Lalli · Sep 22, 2004

Stefan said:
I have a text data file with some thousands of rows like those:

foo;bar;33ec32.34c;0_164425;12.2;old;;99;dg; ;#asa;
table;mouse;3c32.34c;0_164425;12.2;corner;;99;ddaw2; ;/#;
see;lock;33ec3erwc;5_1121;12.2;bold;;99;ddaw2; ;//;
...

This is a pretty poor data file to use as an example, as the fifth
column of each row is an identical value.

the records are semicolon separated and each field can contain
letters, digits, dots, #, / and so on. All records have the same
number of fields.

I need to order the data file by the value of the fifth column (a real
number). How can I do? Sorry but I'm a very beginner in perl :-(

I tried this:
(for simplicity I removed declarations etc)

Don't. Declarations are important.

open(DATAFILE, "data.csv");

Always, ('yes *always*') check the return value of open() and all other
system calls.

while (<DATAFILE>) {
push @not_ordered_list, $_;
}

If you're going to read the entire file into memory anyway, you may as
well do it it one step:

my @not_ordered_list = said:
sub by_value {
# this function is used by sort
local @a, @b;

# put each line into a list
@a = split /;/, $a;
@b = split /;/, $b;

You're splitting each line of the array EVERY time sort calls this
function. That's bad.

#then compare the values
$a = $a[5];
$b = $b[5];

You originally said you wanted the data sorted by the fifth column.
$a[5] is the sixth column.

Also, don't change the values of $a and $b within the sort function.
They are aliases to the actual elements in the array.

$a <=> $b;
}

open (DATA_OUT, ">ordered_data.csv");

my @ordered_list = sort by_value @not_ordered_list;

seek (DATAFILE, 0,0);
while (<DATAFILE>) {

Why are you re-reading from the DATAFILE? You just put the sorted list
of lines into @ordered_list. The actual file never changed.

$,=";";
@data_record = split /;/, $_;

print DATA_OUT @data_record;

so... you read each line, create an array out of the elements in the
line (removing the ; in the process), and then print those elements to
the new file, separating each by a semicolon?

Wouldn't it make more sense to just print the original value of $_ ?

}

close (DATAFILE);
close (DATA_OUT);

My suggestion would be to read each element, store each line in an array
reference, store each reference in a larger array, sort the array by the
fifth element of each 'inner' array, and then print the results to a new
file:

#!/usr/bin/perl
use strict;
use warnings;
open my $DATAFILE, 'data.csv' or die "Cannot open file: $!";

my @lines; #holds all the lines of the file.

while (<$DATAFILE>) {
push @lines, [split /;/]; #add this line's elements to the array
}

open my $DATA_OUT, '>', 'ordered_data.csv'
or die "Cannot open file for writing: $!";

sub by_fifth {
$a->[4] <=> $b->[4];
}

foreach (sort by_fifth @lines){
print $DATA_OUT join (';', @{$_});
}
__END__

Paul Lalli

John Bokma · Sep 22, 2004

I tried this:
(for simplicity I removed declarations etc)
don't.

open(DATAFILE, "data.csv");

*check* if this works, ie: or die ".... $!";

And ;-separated is not csv

while (<DATAFILE>) {
push @not_ordered_list, $_;
}

You can read an array in one go:

@not_ordered_list = said:
sub by_value {
# this function is used by sort
local @a, @b;

# put each line into a list
@a = split /;/, $a;
@b = split /;/, $b;

#then compare the values
$a = $a[5];
$b = $b[5];

$a <=> $b;
}

AFAIK no need for local. Also you do the splitting for *every*
comparison, so O( n log n ). Better: do the splitting first, and then
sort.

open (DATA_OUT, ">ordered_data.csv");
check!

my @ordered_list = sort by_value @not_ordered_list;

seek (DATAFILE, 0,0);
Huh?

while (<DATAFILE>) {
$,=";";
@data_record = split /;/, $_;

print DATA_OUT @data_record;

Aargh, you read the original unsorted file in again, then split, and
then glue the result (in a bad way IIRC)... Of course it's not sorted.

}

close (DATAFILE);
close (DATA_OUT);

check and check.

the output file is not ordered.

No, since you don't use the result, you read the unsorted version, and
write that to your output file...

John W. Krahn · Sep 23, 2004

Stefan said:
I have a text data file with some thousands of rows like those:

foo;bar;33ec32.34c;0_164425;12.2;old;;99;dg; ;#asa;
table;mouse;3c32.34c;0_164425;12.2;corner;;99;ddaw2; ;/#;
see;lock;33ec3erwc;5_1121;12.2;bold;;99;ddaw2; ;//;
...

the records are semicolon separated and each field can contain letters,
digits, dots, #, / and so on. All records have the same number of
fields.

I need to order the data file by the value of the fifth column (a real
number). How can I do? Sorry but I'm a very beginner in perl :-(

#!/usr/bin/perl
use warnings;
use strict;

my $file_in = 'file';
my $file_out = 'sorted';

open IN, '<', $file_in or die "Cannot open $file_in: $!";
open OUT, '>', $file_out or die "Cannot open $file_out: $!";

print OUT map join( ';', @$_ ),
sort { $a->[ 5 ] <=> $b->[ 5 ] }
map [ split /;/ ],
<IN>;

__END__

John

Gerhard M · Sep 23, 2004

Stefan H. said:
Hi,

I have a text data file with some thousands of rows like those:

foo;bar;33ec32.34c;0_164425;12.2;old;;99;dg; ;#asa;
table;mouse;3c32.34c;0_164425;12.2;corner;;99;ddaw2; ;/#;
see;lock;33ec3erwc;5_1121;12.2;bold;;99;ddaw2; ;//;
...

the records are semicolon separated and each field can contain letters,
digits, dots, #, / and so on. All records have the same number of
fields.

I need to order the data file by the value of the fifth column (a real
number). How can I do? Sorry but I'm a very beginner in perl :-(

Thanks
Stefan

what's the reason to write an perl-script.... simply use sort:
# sort -t\; -k5n <infile >outfile

Gerhard

Stefan H. · Sep 23, 2004

Hi Paul, thank you very much for the suggestions.

I removed declarations and open file checking to have a lighter code to
post here.

Mi idea was:

- create a list containing each row
- apply to that list the sort function
- the sort function need to split each row and compare the fifth element

now I know that I don't know how sort function works

Your code works very well, thanks. I'm learning perl since last week, I
need it for my work and I'm still a beginner. I didn't know perl had
references, I read the lama book and it does't mention references. I've
just ordered the second volume by swartz (perl objects...) I think I
need it.

Thanks again you and others answered my question.

For Graham: you script is very interesting. I'm learning unix shell
scripting just now. Great.

Bye
Stefan

Paul Lalli · Sep 23, 2004

Hi Paul, thank you very much for the suggestions.

You're welcome.

I removed declarations and open file checking to have a lighter code to
post here.

Posting complete, correct code is far more important than saving a few
keystrokes.

Mi idea was:

- create a list containing each row
- apply to that list the sort function
- the sort function need to split each row and compare the fifth element

now I know that I don't know how sort function works

Correct. The sort function returns a list of the elements passed in, in
sorted order. It does not actually touch the list at all. HOWEVER,
even if sort() did directly modify the list you pass it, your code would
still not have worked. You built an array from the file, sorted that
array, and then read from the file again.

Your code works very well, thanks. I'm learning perl since last week, I
need it for my work and I'm still a beginner.

Without meaning any disrespect, that was pretty obvious.

Allow me
then to give you some friendly advice, both about Perl and about this
newsgroup:
1) Do not 'top-post'. That means do not post your reply above what you
are replying to. I have corrected your reply into the proper order in
this post.
2) Comment your code. Explaining in your comments what you are
expecting the code to do will help people correct your faulty beliefs.
3) Read the available documentation. All installations of Perl come
with the 'perldoc' command. For instructions on using it, type 'perldoc
perldoc' at your command prompt. For example, if 'sort' isn't working
the way you think it is, read up on why:
perldoc -f sort
4) Read the Perl FAQ to find if perhaps your question has been asked so
frequently that the answer is documented:
perldoc perlfaq
or

perldoc -q said:
I didn't know perl had
references, I read the lama book and it does't mention references. I've
just ordered the second volume by swartz (perl objects...) I think I
need it.

Schwartz. Randal L. Schwartz. He posts to this newsgroup from time to
time, so it's best to spell his name correctly.

In the meantime before your copy of the Alpaca arrives, consider reading
the available tutorials and references dealing with references:
perldoc perlreftut
perldoc perllol
perldoc perldsc
perldoc perlref

Thanks again you and others answered my question.

You're welcome. We wish you the best of luck in your Perl experiences.
Feel free to post questions when you get stuck... but please try to
follow my advice.

Paul Lalli

Sort by number of characters	1	Nov 2, 2023
suggestions for printing out a few records of a lengthy file	16	Jun 7, 2010
maximum value in a column of file	1	Jul 23, 2008
FAQ 5.2 How do I change, delete, or insert a line in a file, or append to the beginning of a file?	0	Feb 24, 2011
logic question for text file updates	11	Mar 26, 2010
Dynamic insert of array data in mysql using DBI	2	Jul 20, 2010
How to read/write this data.	5	Sep 4, 2008
Order of Events with JTable	0	Apr 16, 2006

order a semicolon-separated data file by a value of a column

Stefan H.

Brian McCauley

Stefan H.

Paul Lalli

John Bokma

John W. Krahn

Gerhard M

Stefan H.

Paul Lalli

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads