Sorted Hash

Discussion in 'Perl Misc' started by palexvs@gmail.com, Nov 29, 2007.

  1. Guest

    I filled hash and then printed it (sorted by key):
    my %hs = (
    key10 => 5,
    key5 => b,
    aey9 => 7,
    )
    foreach my $k (sort keys %hs) { print "$k $hs{$k}\n"; }

    key - string ([0-9A-F]{72}), 50K records.
    How do it more effective?
    , Nov 29, 2007
    #1
    1. Advertising

  2. Ben Morrow Guest

    Quoth "" <>:
    > I filled hash and then printed it (sorted by key):
    > my %hs = (
    > key10 => 5,
    > key5 => b,
    > aey9 => 7,
    > )
    > foreach my $k (sort keys %hs) { print "$k $hs{$k}\n"; }
    >
    > key - string ([0-9A-F]{72}), 50K records.
    > How do it more effective?


    Tie::IxHash, or maintain an array of keys yourself. See perldoc -q
    sorted.

    Ben
    Ben Morrow, Nov 30, 2007
    #2
    1. Advertising

  3. "" <> wrote in news:a1ddaad2-89c6-4b01-
    :

    > I filled hash and then printed it (sorted by key):
    > my %hs = (
    > key10 => 5,
    > key5 => b,
    > aey9 => 7,
    > )
    > foreach my $k (sort keys %hs) { print "$k $hs{$k}\n"; }
    >
    > key - string ([0-9A-F]{72}), 50K records.
    > How do it more effective?


    Well, depends on what you mean by 'more effective'.

    If you are talking about speed, note that the only place where you can
    really get any meaningful improvement is IO.

    I am using Windows XP SP2 with ActiveState Perl 5.8.8.822 on an Intel Core
    Duo 1.66 Mhz with 1 GB physical memory with a whole bunch of other apps
    open.

    Let's start with:

    #!/usr/bin/perl

    use strict;
    use warnings;

    my %hash;

    for my $k ( 1 .. 50_000 ) {
    $hash{ make_key() } = $k;
    }

    ### printing code here

    sub make_key {
    join('', map { sprintf( '%.16f', $_ ) } (rand, rand, rand, rand) );
    }

    __END__
    C:\DOCUME~1\asu1\LOCALS~1\Temp\t> timethis t

    TimeThis : Command Line : t
    TimeThis : Start Time : Thu Nov 29 19:32:35 2007
    TimeThis : End Time : Thu Nov 29 19:32:36 2007
    TimeThis : Elapsed Time : 00:00:01.109

    So, let's add printing to this script:

    my @keys = sort keys %hash;
    print "$_ $hash{$_}\n" for @keys;

    TimeThis : Command Line : t
    TimeThis : Start Time : Thu Nov 29 19:37:11 2007
    TimeThis : End Time : Thu Nov 29 19:37:14 2007
    TimeThis : Elapsed Time : 00:00:03.156

    So, we are talking about 2.047 seconds added by printing the hash.

    Assuming 80 bytes per print statement and 50,000 print statements, that's
    3906.25 K of output in about two seconds.

    That's not bad and does not leave a lot of room for improvement.

    Following the printing late strategy and trying to trade-off memory for
    speed, let's accumulate all output before printing:

    my $out;
    for ( @keys ) {
    $out .= "$_ $hash{$_}\n";
    }

    {
    local $| = 0;
    print $out;
    }

    TimeThis : Command Line : t
    TimeThis : Start Time : Thu Nov 29 19:56:33 2007
    TimeThis : End Time : Thu Nov 29 19:56:39 2007
    TimeThis : Elapsed Time : 00:00:05.859

    Well, that wasn't good (and I really did not expect it to be). But we don't
    have to accumulate all output. We can just print after every 9K or so. (I
    came up with that magic number on my system after some trial and error ...
    Time program, go do something in Word, load cnn.com on Firefox, go back and
    re-time this script ... poor man's cache clearing).

    my $out;
    for ( @keys ) {
    $out .= "$_ $hash{$_}\n";
    if ( length $out > 9_000 ) {
    print $out;
    $out = q{};
    }
    }

    TimeThis : Command Line : t
    TimeThis : Start Time : Thu Nov 29 20:04:57 2007
    TimeThis : End Time : Thu Nov 29 20:05:00 2007
    TimeThis : Elapsed Time : 00:00:03.062

    So, after all that tinkering, I got an improvement of 0.094 seconds, that
    is about 3%.

    Now, my against my gut feeling, I did try:

    print "$_ $hash{$_}\n" for sort keys %hash;

    TimeThis : Command Line : t
    TimeThis : Start Time : Thu Nov 29 20:11:01 2007
    TimeThis : End Time : Thu Nov 29 20:11:04 2007
    TimeThis : Elapsed Time : 00:00:02.953

    Hmmm ... It does not look like I can beat the simplest alternative by
    trying to be clever.

    That took 1.844 seconds to output about 3906.25K.

    of course, I probably wasted my time because I misunderstood your vaguely
    phrased question.

    Sinan

    --
    A. Sinan Unur <>
    (remove .invalid and reverse each component for email address)
    clpmisc guidelines: <URL:http://www.augustmail.com/~tadmc/clpmisc.shtml>
    A. Sinan Unur, Nov 30, 2007
    #3
  4. wrote:
    > I filled hash and then printed it (sorted by key):
    > my %hs = (
    > key10 => 5,
    > key5 => b,
    > aey9 => 7,
    > )
    > foreach my $k (sort keys %hs) { print "$k $hs{$k}\n"; }
    >
    > key - string ([0-9A-F]{72}), 50K records.
    > How do it more effective?
    Jürgen Exner, Nov 30, 2007
    #4
  5. wrote:
    > I filled hash and then printed it (sorted by key):

    [...]
    > foreach my $k (sort keys %hs) { print "$k $hs{$k}\n"; }
    >
    > key - string ([0-9A-F]{72}), 50K records.
    > How do it more effective?


    Your algorithm as posted should be 100% effective or are you observing any
    unsorted keys?

    jue
    Jürgen Exner, Nov 30, 2007
    #5
  6. Guest

    "" <> wrote:
    > I filled hash and then printed it (sorted by key):
    > my %hs = (
    > key10 => 5,
    > key5 => b,
    > aey9 => 7,
    > )
    > foreach my $k (sort keys %hs) { print "$k $hs{$k}\n"; }
    >
    > key - string ([0-9A-F]{72}), 50K records.
    > How do it more effective?


    What is ineffective about the current way?

    You could keep the structure sorted throughout its lifetime by using
    a tied hash or a non-hash structure, but the overhead of doing so
    is almost certainly going to be greater than a one-time sort.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
    , Nov 30, 2007
    #6
  7. wrote:
    > I filled hash and then printed it (sorted by key):
    > my %hs = (
    > key10 => 5,
    > key5 => b,
    > aey9 => 7,
    > )
    > foreach my $k (sort keys %hs) { print "$k $hs{$k}\n"; }
    >
    > key - string ([0-9A-F]{72}), 50K records.
    > How do it more effective?
    >


    yu can use the radix sort implementation available from Sort::Key::Radix
    that is usually faster for this kind of data that the merge sort used
    internally by perl:


    #!/usr/bin/perl

    use strict;
    use warnings;

    use Benchmark qw(cmpthese);
    use Sort::Key::Radix qw(ssort);

    my @l = ('a'..'z','A'..'Z','1'..'9');

    sub genkey { join '', map $l[rand @l], 0..71 }

    my @keys = map genkey, 0..50_000;


    sub psort { my @sorted = sort @keys }
    sub rsort { my @sorted = ssort @keys }

    cmpthese (-1, { psort => \&psort,
    rsort => \&rsort } );


    -----------

    psort 4.39/s -- -41%
    rsort 7.41/s 69% --
    Salvador Fandino, Nov 30, 2007
    #7
  8. Salvador Fandino <> wrote in news:fiorj9$gto$1
    @hefestos.uned.es:

    > wrote:
    >> I filled hash and then printed it (sorted by key):
    >> my %hs = (
    >> key10 => 5,
    >> key5 => b,
    >> aey9 => 7,
    >> )
    >> foreach my $k (sort keys %hs) { print "$k $hs{$k}\n"; }
    >>
    >> key - string ([0-9A-F]{72}), 50K records.
    >> How do it more effective?
    >>

    >
    > yu can use the radix sort implementation available from
    > Sort::Key::Radix
    > that is usually faster for this kind of data that the merge sort used
    > internally by perl:


    Does sorting speed matter when most of the time is spent printing?

    Sinan


    --
    A. Sinan Unur <>
    (remove .invalid and reverse each component for email address)
    clpmisc guidelines: <URL:http://www.augustmail.com/~tadmc/clpmisc.shtml>
    A. Sinan Unur, Nov 30, 2007
    #8
  9. A. Sinan Unur schrieb:

    > Does sorting speed matter when most of the time is spent printing?


    You never know whether he's not redirecting stdout to a file on a
    ramdisk. :)
    Damian Lukowski, Nov 30, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. rp
    Replies:
    1
    Views:
    517
    red floyd
    Nov 10, 2011
  2. Chris
    Replies:
    12
    Views:
    165
    Brian Candler
    Sep 18, 2004
  3. robertj

    ordered/sorted hash

    robertj, Dec 8, 2005, in forum: Ruby
    Replies:
    19
    Views:
    204
    William James
    Dec 11, 2005
  4. banker123

    Sorted Hash and or Array

    banker123, Nov 3, 2006, in forum: Perl Misc
    Replies:
    4
    Views:
    101
    banker123
    Nov 5, 2006
  5. PerlFAQ Server

    FAQ 4.61 How can I always keep my hash sorted?

    PerlFAQ Server, Jan 21, 2011, in forum: Perl Misc
    Replies:
    0
    Views:
    100
    PerlFAQ Server
    Jan 21, 2011
Loading...

Share This Page