Sorting and Ordering by date etc

T

ThePotPlants

I have a nasty set of text data that I want to sort into order, and remove
duplicates from.
This would only have taken 5 seconds in SQL, but I have no idea how to do it
in Perl.

Data looks like so...

-33.580333 162.601833 01/12/2003 00:01:09
-33.579833 162.601667 01/12/2003 00:01:51
-33.579167 162.601500 01/12/2003 00:03:09
-33.578667 162.601333 01/12/2003 00:04:51
-33.578667 162.601333 01/12/2003 00:05:09

What would be the most common approach to acheive this? hash array?
Can you insert each line of data into an array, and perform operations on
specific parts of it?

Any suggestions would be greatly appreciated.

P
 
E

Eric Bohlman

I have a nasty set of text data that I want to sort into order, and
remove duplicates from.
This would only have taken 5 seconds in SQL, but I have no idea how to
do it in Perl.

Others have already pointed you to Perl's reference material concerning
duplicates and sorting. However, since you're obviously familiar with
doing such things in SQL, you ought to know that there are several perl
modules that will allow you to use SQL on arbitrary data, including text
files and in-memory structures. Check out DBI (the engine-independent
interface for working with SQL) and DBD::RAM, DBD::Anydata, and DBD::CSV
("drivers" for DBI that allow it to query arbitrary data).
 
J

John W. Krahn

ThePotPlants said:
I have a nasty set of text data that I want to sort into order, and remove
duplicates from.
This would only have taken 5 seconds in SQL, but I have no idea how to do it
in Perl.

Data looks like so...

-33.580333 162.601833 01/12/2003 00:01:09
-33.579833 162.601667 01/12/2003 00:01:51
-33.579167 162.601500 01/12/2003 00:03:09
-33.578667 162.601333 01/12/2003 00:04:51
-33.578667 162.601333 01/12/2003 00:05:09

What would be the most common approach to acheive this? hash array?
Can you insert each line of data into an array, and perform operations on
specific parts of it?

Any suggestions would be greatly appreciated.

my @data = (
"-33.580333 162.601833 01/12/2003 00:01:09\n",
"-33.579833 162.601667 01/12/2003 00:01:51\n",
"-33.579167 162.601500 01/12/2003 00:03:09\n",
"-33.578667 162.601333 01/12/2003 00:04:51\n",
"-33.578667 162.601333 01/12/2003 00:05:09\n",
);

my @sorted =
map substr( $_, 17 ),
sort
map sprintf( '%s%s%s%s',
( m<(\d\d/\d\d)/(\d{4}) (\d\d:\d\d:\d\d)> )[ 1, 0, 2 ], $_ ),
keys %{[ { map { $_ => 1 } @data } ]};

print for @sorted;



John
 
T

ThePotPlants

ThePotPlants said:
I have a nasty set of text data that I want to sort into order, and remove
duplicates from.
This would only have taken 5 seconds in SQL, but I have no idea how to do it
in Perl.

Data looks like so...

-33.580333 162.601833 01/12/2003 00:01:09
-33.579833 162.601667 01/12/2003 00:01:51
-33.579167 162.601500 01/12/2003 00:03:09
-33.578667 162.601333 01/12/2003 00:04:51
-33.578667 162.601333 01/12/2003 00:05:09

I have cheated.
Imported into Access.
Write SQL. Group by, and order output.
Export to text file.

I feel dirty...
 
U

Uri Guttman

T> I have cheated.
T> Imported into Access.
T> Write SQL. Group by, and order output.
T> Export to text file.

T> I feel dirty...

please go wash your hands and branes. now!

uri
 
T

Tad McClellan

ThePotPlants said:
I have a nasty set of text data that I want to sort into order, and remove
duplicates from.


Please check the Perl FAQ *before* posting to the Perl newsgroup!

perldoc -q sort

perldoc -q duplicate

This would only have taken 5 seconds in SQL, but I have no idea how to do it ^^
in Perl.


Me either because you have left the "it" unspecified AFAICT.

Maybe you want to sort by date?

The Subject says sort by date, but your sample data are all
the same date, so that isn't it.

Maybe you want to sort by time?

Nope, your sample data is already sorted by time.

If you tell us what you want done, it will be much easier for us
to help you get it done...

Data looks like so...

-33.580333 162.601833 01/12/2003 00:01:09
-33.579833 162.601667 01/12/2003 00:01:51
-33.579167 162.601500 01/12/2003 00:03:09
-33.578667 162.601333 01/12/2003 00:04:51
-33.578667 162.601333 01/12/2003 00:05:09

What would be the most common approach to acheive this?


That depends on what your "this" is and you haven't shared that.

What columns do you want to sort by?

What columns do you want to uniqify?

Can you insert each line of data into an array, and perform operations on
specific parts of it?


Sure. By using an "array slice" see perlsyn.pod.

Any suggestions would be greatly appreciated.


Since you haven't specified a problem that we can solve, I'll make
one up. :)

Sort by time column, forget about finding duplicates.

Adapting the Schwartzian Transform code given in the answer
to one of your Frequently Asked Questions:

-----------------------------------
#!/usr/bin/perl
use warnings;
use strict;

my @records = <DATA>;

my @sorted = map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [ $_, (split)[3] ] } @records;

print for @sorted;


__DATA__
-33.579167 162.601500 01/12/2003 00:03:09
-33.579833 162.601667 01/12/2003 00:01:51
-33.578667 162.601333 01/12/2003 00:04:51
-33.578667 162.601333 01/12/2003 00:05:09
-33.580333 162.601833 01/12/2003 00:01:09
 
T

Tad McClellan

ThePotPlants said:
Write SQL.


Maybe someone could translate the SQL sorting into Perl sorting for you.

They would need to _see_ the SQL for that of course... (hint)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top