order a semicolon-separated data file by a value of a column

Discussion in 'Perl Misc' started by Stefan H., Sep 22, 2004.

  1. Stefan H.

    Stefan H. Guest

    Hi,

    I have a text data file with some thousands of rows like those:


    foo;bar;33ec32.34c;0_164425;12.2;old;;99;dg; ;#asa;
    table;mouse;3c32.34c;0_164425;12.2;corner;;99;ddaw2; ;/#;
    see;lock;33ec3erwc;5_1121;12.2;bold;;99;ddaw2; ;//;
    ....

    the records are semicolon separated and each field can contain letters,
    digits, dots, #, / and so on. All records have the same number of
    fields.

    I need to order the data file by the value of the fifth column (a real
    number). How can I do? Sorry but I'm a very beginner in perl :-(

    Thanks
    Stefan
     
    Stefan H., Sep 22, 2004
    #1
    1. Advertising

  2. Stefan H. wrote:

    > I have a text data file with some thousands of rows like those:
    >
    >
    > foo;bar;33ec32.34c;0_164425;12.2;old;;99;dg; ;#asa;
    > table;mouse;3c32.34c;0_164425;12.2;corner;;99;ddaw2; ;/#;
    > see;lock;33ec3erwc;5_1121;12.2;bold;;99;ddaw2; ;//;
    > ...
    >
    > the records are semicolon separated and each field can contain letters,
    > digits, dots, #, / and so on. All records have the same number of
    > fields.
    >
    > I need to order the data file by the value of the fifth column (a real
    > number). How can I do?


    What have you tried so far?

    Has it invoved...

    perldoc -q sort
    perldoc -f sort
     
    Brian McCauley, Sep 22, 2004
    #2
    1. Advertising

  3. Stefan H.

    Stefan H. Guest

    On Wed, 22 Sep 2004 22:11:52 +0100, Brian McCauley <>
    wrote:

    >What have you tried so far?
    >
    >Has it invoved...



    yes, you're right

    I tried this:
    (for simplicity I removed declarations etc)




    open(DATAFILE, "data.csv");

    while (<DATAFILE>) {
    push @not_ordered_list, $_;
    }

    sub by_value {
    # this function is used by sort
    local @a, @b;

    # put each line into a list
    @a = split /;/, $a;
    @b = split /;/, $b;

    #then compare the values
    $a = $a[5];
    $b = $b[5];

    $a <=> $b;
    }

    open (DATA_OUT, ">ordered_data.csv");

    my @ordered_list = sort by_value @not_ordered_list;

    seek (DATAFILE, 0,0);
    while (<DATAFILE>) {
    $,=";";
    @data_record = split /;/, $_;

    print DATA_OUT @data_record;
    }

    close (DATAFILE);
    close (DATA_OUT);


    the output file is not ordered.

    Sorry for the bad code!

    Thanks
     
    Stefan H., Sep 22, 2004
    #3
  4. Stefan H.

    Paul Lalli Guest

    Stefan H. originally wrote:

    > I have a text data file with some thousands of rows like those:


    > foo;bar;33ec32.34c;0_164425;12.2;old;;99;dg; ;#asa;
    > table;mouse;3c32.34c;0_164425;12.2;corner;;99;ddaw2; ;/#;
    > see;lock;33ec3erwc;5_1121;12.2;bold;;99;ddaw2; ;//;
    > ...


    This is a pretty poor data file to use as an example, as the fifth
    column of each row is an identical value.

    > the records are semicolon separated and each field can contain
    > letters, digits, dots, #, / and so on. All records have the same
    > number of fields.


    > I need to order the data file by the value of the fifth column (a real
    > number). How can I do? Sorry but I'm a very beginner in perl :-(


    Stefan H. later wrote:

    > I tried this:
    > (for simplicity I removed declarations etc)


    Don't. Declarations are important.

    > open(DATAFILE, "data.csv");


    Always, ('yes *always*') check the return value of open() and all other
    system calls.

    >
    > while (<DATAFILE>) {
    > push @not_ordered_list, $_;
    > }


    If you're going to read the entire file into memory anyway, you may as
    well do it it one step:

    my @not_ordered_list = <DATAFILE>;

    >
    > sub by_value {
    > # this function is used by sort
    > local @a, @b;
    >
    > # put each line into a list
    > @a = split /;/, $a;
    > @b = split /;/, $b;


    You're splitting each line of the array EVERY time sort calls this
    function. That's bad.
    >
    > #then compare the values
    > $a = $a[5];
    > $b = $b[5];


    You originally said you wanted the data sorted by the fifth column.
    $a[5] is the sixth column.

    Also, don't change the values of $a and $b within the sort function.
    They are aliases to the actual elements in the array.

    >
    > $a <=> $b;
    > }
    >
    > open (DATA_OUT, ">ordered_data.csv");
    >
    > my @ordered_list = sort by_value @not_ordered_list;
    >
    > seek (DATAFILE, 0,0);
    > while (<DATAFILE>) {


    Why are you re-reading from the DATAFILE? You just put the sorted list
    of lines into @ordered_list. The actual file never changed.
    > $,=";";
    > @data_record = split /;/, $_;
    >
    > print DATA_OUT @data_record;


    so... you read each line, create an array out of the elements in the
    line (removing the ; in the process), and then print those elements to
    the new file, separating each by a semicolon?

    Wouldn't it make more sense to just print the original value of $_ ?

    > }
    >
    > close (DATAFILE);
    > close (DATA_OUT);
    >


    My suggestion would be to read each element, store each line in an array
    reference, store each reference in a larger array, sort the array by the
    fifth element of each 'inner' array, and then print the results to a new
    file:

    #!/usr/bin/perl
    use strict;
    use warnings;
    open my $DATAFILE, 'data.csv' or die "Cannot open file: $!";

    my @lines; #holds all the lines of the file.

    while (<$DATAFILE>) {
    push @lines, [split /;/]; #add this line's elements to the array
    }

    open my $DATA_OUT, '>', 'ordered_data.csv'
    or die "Cannot open file for writing: $!";

    sub by_fifth {
    $a->[4] <=> $b->[4];
    }

    foreach (sort by_fifth @lines){
    print $DATA_OUT join (';', @{$_});
    }
    __END__

    Paul Lalli
     
    Paul Lalli, Sep 22, 2004
    #4
  5. Stefan H.

    John Bokma Guest

    Stefan H. <> wrote in
    news::

    > I tried this:
    > (for simplicity I removed declarations etc)


    don't.

    > open(DATAFILE, "data.csv");


    *check* if this works, ie: or die ".... $!";

    And ;-separated is not csv

    > while (<DATAFILE>) {
    > push @not_ordered_list, $_;
    > }


    You can read an array in one go:

    @not_ordered_list = <DATAFILE>;

    > sub by_value {
    > # this function is used by sort
    > local @a, @b;
    >
    > # put each line into a list
    > @a = split /;/, $a;
    > @b = split /;/, $b;
    >
    > #then compare the values
    > $a = $a[5];
    > $b = $b[5];
    >
    > $a <=> $b;
    > }


    AFAIK no need for local. Also you do the splitting for *every*
    comparison, so O( n log n ). Better: do the splitting first, and then
    sort.

    > open (DATA_OUT, ">ordered_data.csv");


    check!

    > my @ordered_list = sort by_value @not_ordered_list;
    >
    > seek (DATAFILE, 0,0);


    Huh?

    > while (<DATAFILE>) {
    > $,=";";
    > @data_record = split /;/, $_;
    >
    > print DATA_OUT @data_record;


    Aargh, you read the original unsorted file in again, then split, and
    then glue the result (in a bad way IIRC)... Of course it's not sorted.

    > }
    >
    > close (DATAFILE);
    > close (DATA_OUT);


    check and check.

    > the output file is not ordered.


    No, since you don't use the result, you read the unsorted version, and
    write that to your output file...

    --
    John MexIT: http://johnbokma.com/mexit/
    personal page: http://johnbokma.com/
    Experienced programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Sep 22, 2004
    #5
  6. Stefan H. wrote:
    >
    > I have a text data file with some thousands of rows like those:
    >
    > foo;bar;33ec32.34c;0_164425;12.2;old;;99;dg; ;#asa;
    > table;mouse;3c32.34c;0_164425;12.2;corner;;99;ddaw2; ;/#;
    > see;lock;33ec3erwc;5_1121;12.2;bold;;99;ddaw2; ;//;
    > ...
    >
    > the records are semicolon separated and each field can contain letters,
    > digits, dots, #, / and so on. All records have the same number of
    > fields.
    >
    > I need to order the data file by the value of the fifth column (a real
    > number). How can I do? Sorry but I'm a very beginner in perl :-(


    #!/usr/bin/perl
    use warnings;
    use strict;

    my $file_in = 'file';
    my $file_out = 'sorted';

    open IN, '<', $file_in or die "Cannot open $file_in: $!";
    open OUT, '>', $file_out or die "Cannot open $file_out: $!";

    print OUT map join( ';', @$_ ),
    sort { $a->[ 5 ] <=> $b->[ 5 ] }
    map [ split /;/ ],
    <IN>;

    __END__



    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Sep 23, 2004
    #6
  7. Stefan H.

    Gerhard M Guest

    Stefan H. <> wrote in message news:<>...
    > Hi,
    >
    > I have a text data file with some thousands of rows like those:
    >
    >
    > foo;bar;33ec32.34c;0_164425;12.2;old;;99;dg; ;#asa;
    > table;mouse;3c32.34c;0_164425;12.2;corner;;99;ddaw2; ;/#;
    > see;lock;33ec3erwc;5_1121;12.2;bold;;99;ddaw2; ;//;
    > ...
    >
    > the records are semicolon separated and each field can contain letters,
    > digits, dots, #, / and so on. All records have the same number of
    > fields.
    >
    > I need to order the data file by the value of the fifth column (a real
    > number). How can I do? Sorry but I'm a very beginner in perl :-(
    >
    > Thanks
    > Stefan


    what's the reason to write an perl-script.... simply use sort:
    # sort -t\; -k5n <infile >outfile

    Gerhard
     
    Gerhard M, Sep 23, 2004
    #7
  8. Stefan H.

    Stefan H. Guest

    On Wed, 22 Sep 2004 18:13:47 -0400, Paul Lalli <> wrote:

    Hi Paul, thank you very much for the suggestions.

    I removed declarations and open file checking to have a lighter code to
    post here.

    Mi idea was:

    - create a list containing each row
    - apply to that list the sort function
    - the sort function need to split each row and compare the fifth element

    now I know that I don't know how sort function works :)

    Your code works very well, thanks. I'm learning perl since last week, I
    need it for my work and I'm still a beginner. I didn't know perl had
    references, I read the lama book and it does't mention references. I've
    just ordered the second volume by swartz (perl objects...) I think I
    need it.

    Thanks again you and others answered my question.

    For Graham: you script is very interesting. I'm learning unix shell
    scripting just now. Great.

    Bye
    Stefan

    >My suggestion would be to read each element, store each line in an array
    >reference, store each reference in a larger array, sort the array by the
    >fifth element of each 'inner' array, and then print the results to a new
    >file:
    >
    >#!/usr/bin/perl
    >use strict;
    >use warnings;
    >open my $DATAFILE, 'data.csv' or die "Cannot open file: $!";
    >
    >my @lines; #holds all the lines of the file.
    >
    >while (<$DATAFILE>) {
    > push @lines, [split /;/]; #add this line's elements to the array
    >}
    >
    >open my $DATA_OUT, '>', 'ordered_data.csv'
    > or die "Cannot open file for writing: $!";
    >
    >sub by_fifth {
    > $a->[4] <=> $b->[4];
    >}
    >
    >foreach (sort by_fifth @lines){
    > print $DATA_OUT join (';', @{$_});
    >}
    >__END__
    >
    >Paul Lalli
     
    Stefan H., Sep 23, 2004
    #8
  9. Stefan H.

    Paul Lalli Guest

    Stefan H. wrote:

    > On Wed, 22 Sep 2004 18:13:47 -0400, Paul Lalli <> wrote:
    >
    >>My suggestion would be to read each element, store each line in an array
    >>reference, store each reference in a larger array, sort the array by the
    >>fifth element of each 'inner' array, and then print the results to a new
    >>file:
    >>

    <snipped the code I posted>
    >
    > Hi Paul, thank you very much for the suggestions.


    You're welcome.

    >
    > I removed declarations and open file checking to have a lighter code to
    > post here.


    Posting complete, correct code is far more important than saving a few
    keystrokes.

    > Mi idea was:
    >
    > - create a list containing each row
    > - apply to that list the sort function
    > - the sort function need to split each row and compare the fifth element
    >
    > now I know that I don't know how sort function works :)


    Correct. The sort function returns a list of the elements passed in, in
    sorted order. It does not actually touch the list at all. HOWEVER,
    even if sort() did directly modify the list you pass it, your code would
    still not have worked. You built an array from the file, sorted that
    array, and then read from the file again.

    > Your code works very well, thanks. I'm learning perl since last week, I
    > need it for my work and I'm still a beginner.


    Without meaning any disrespect, that was pretty obvious. :) Allow me
    then to give you some friendly advice, both about Perl and about this
    newsgroup:
    1) Do not 'top-post'. That means do not post your reply above what you
    are replying to. I have corrected your reply into the proper order in
    this post.
    2) Comment your code. Explaining in your comments what you are
    expecting the code to do will help people correct your faulty beliefs.
    3) Read the available documentation. All installations of Perl come
    with the 'perldoc' command. For instructions on using it, type 'perldoc
    perldoc' at your command prompt. For example, if 'sort' isn't working
    the way you think it is, read up on why:
    perldoc -f sort
    4) Read the Perl FAQ to find if perhaps your question has been asked so
    frequently that the answer is documented:
    perldoc perlfaq
    or
    perldoc -q <keyword>

    > I didn't know perl had
    > references, I read the lama book and it does't mention references. I've
    > just ordered the second volume by swartz (perl objects...) I think I
    > need it.


    Schwartz. Randal L. Schwartz. He posts to this newsgroup from time to
    time, so it's best to spell his name correctly.

    In the meantime before your copy of the Alpaca arrives, consider reading
    the available tutorials and references dealing with references:
    perldoc perlreftut
    perldoc perllol
    perldoc perldsc
    perldoc perlref

    > Thanks again you and others answered my question.


    You're welcome. We wish you the best of luck in your Perl experiences.
    Feel free to post questions when you get stuck... but please try to
    follow my advice. :)

    Paul Lalli
     
    Paul Lalli, Sep 23, 2004
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jjouett
    Replies:
    2
    Views:
    1,912
    jjouett
    May 1, 2006
  2. Nishanth
    Replies:
    10
    Views:
    847
    David Harmon
    Jul 4, 2006
  3. per
    Replies:
    2
    Views:
    621
  4. Adam

    Parsing delimiter-separated data.

    Adam, Nov 14, 2003, in forum: Perl Misc
    Replies:
    4
    Views:
    130
  5. Peter Michaux

    Cookies: semicolon vs. semicolon-space

    Peter Michaux, Dec 10, 2006, in forum: Javascript
    Replies:
    3
    Views:
    220
    Michael Winter
    Dec 11, 2006
Loading...

Share This Page