How to sort very large arrays?

K

kj

I'm downloading some very large tables from a remote site. I want
to sort these tables in a particular way before saving them to
disk. In the past I found that the most efficient way to do this
was to piggy-back on Unix's highly optimized sort command. So,
from within a Perl script, I'd create a pipe handle through sort
and then just print the data through that handle:

open my $out, "|$sort -t '\t' -k1,1 -k2,2 -u > $out_file" or die $!;
print $out $_ for @data;

But that's distinctly Perlish, and I'm wondering what's the "Python
Way" to do this.

TIA!

kynn
 
D

Dan Stromberg

I'm downloading some very large tables from a remote site. I want to
sort these tables in a particular way before saving them to disk. In
the past I found that the most efficient way to do this was to
piggy-back on Unix's highly optimized sort command. So, from within a
Perl script, I'd create a pipe handle through sort and then just print
the data through that handle:

open my $out, "|$sort -t '\t' -k1,1 -k2,2 -u > $out_file" or die $!;
print $out $_ for @data;

But that's distinctly Perlish, and I'm wondering what's the "Python Way"
to do this.

TIA!

kynn

os.system and os.popen are much like what you'd find in C.

The subprocess module is more specific to python, and is a little more
complicated but more powerful.
 
T

Terry Reedy

| I'm downloading some very large tables from a remote site. I want
| to sort these tables in a particular way before saving them to
| disk. In the past I found that the most efficient way to do this
| was to piggy-back on Unix's highly optimized sort command. So,

If the tables can fit in memory as a list of key,text tuples and if they
have some of the non-random structure exploited by Python's current
list.sort (only documented, as far as I know, either in the source or test
code, not sure), then you might consider that. Otherwise, use the system
sort.
 
R

rent

I'm downloading some very large tables from a remote site. I want
to sort these tables in a particular way before saving them to
disk. In the past I found that the most efficient way to do this
was to piggy-back on Unix's highly optimized sort command. So,
from within a Perl script, I'd create a pipe handle through sort
and then just print the data through that handle:
This is a python clone of your code from a python rookie :)

from os import popen

p = popen("sort -t '\t' -k1,1 -k2,2 -u > %s" % out_file)
for line in data:
print >> p, line

there is no "die $!" here, I think it is good to let python
throw the exception to your console
 
R

rent

I'm downloading some very large tables from a remote site. I want
to sort these tables in a particular way before saving them to
disk. In the past I found that the most efficient way to do this
was to piggy-back on Unix's highly optimized sort command. So,
from within a Perl script, I'd create a pipe handle through sort
and then just print the data through that handle:
This is a python clone of your code from a python rookie :)

from os import popen

p = popen("sort -t '\t' -k1,1 -k2,2 -u > %s" % out_file)
for line in data:
print >> p, line

there is no "die $!" here, I think it is good to let python
throw the exception to your console
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top