Sorting hash of hashes

Discussion in 'Perl Misc' started by Justin C, Feb 19, 2009.

  1. Justin C

    Justin C Guest

    Here's how my hash looks:

    $hashes{$report}{$line}{$colNumber}

    So, in %hashes, is data from several reports, grouped by report, then
    line (well, stock code, which is the first field on the line), the
    column number. Here is a little of the output of Data::Dumper:

    $VAR1 = 'types';
    $VAR2 = {
    'BE' => 1
    };
    $VAR3 = 'stk1324r2';
    $VAR4 = {
    'BE/COF/DRAGON' => {
    '8' => '0',
    '6' => '0',
    '4' => 'E',
    '3' => '1',
    '7' => '2',
    '9' => '2',
    '2' => 'CRADLE OF FILTH dragon WEB BELT',
    '5' => '2'
    },
    'BE/MOT/WARPIG' => {
    '8' => '0',
    '6' => '0',
    '4' => 'E',
    '3' => '1',
    '7' => '125',
    '9' => '125',
    '2' => 'MOTORHEAD warpig BADGEBUCKLE BELT',
    '5' => '125'
    },

    I want to output the lines sorted asciibetically based on column 2, in
    the above examples that'd be CRADLE OF FILTH..., and MOTORHEAD...

    I'd like to provided an example of what I've tried, but the only idea
    I've had is to extract the column 2 entries into an array, and then sort
    the array and use it in a foreach loop, but then I've got to find a way
    of accessing the hash from a value, and not a key (I can't use the
    description as my key because there will be some sized items which will
    have a different code, but identical description, so the code has to
    stay as the key).

    Thank you for any help you can give with this.

    Justin.

    --
    Justin C, by the sea.
     
    Justin C, Feb 19, 2009
    #1
    1. Advertising

  2. Justin C

    Justin C Guest

    On 2009-02-19, Justin C <> wrote:
    > Here's how my hash looks:
    >
    > $hashes{$report}{$line}{$colNumber}
    >
    > So, in %hashes, is data from several reports, grouped by report, then
    > line (well, stock code, which is the first field on the line), the
    > column number. Here is a little of the output of Data::Dumper:
    >
    > $VAR1 = 'types';
    > $VAR2 = {
    > 'BE' => 1
    > };
    > $VAR3 = 'stk1324r2';
    > $VAR4 = {
    > 'BE/COF/DRAGON' => {
    > '8' => '0',
    > '6' => '0',
    > '4' => 'E',
    > '3' => '1',
    > '7' => '2',
    > '9' => '2',
    > '2' => 'CRADLE OF FILTH dragon WEB BELT',
    > '5' => '2'
    > },
    > 'BE/MOT/WARPIG' => {
    > '8' => '0',
    > '6' => '0',
    > '4' => 'E',
    > '3' => '1',
    > '7' => '125',
    > '9' => '125',
    > '2' => 'MOTORHEAD warpig BADGEBUCKLE BELT',
    > '5' => '125'
    > },
    >
    > I want to output the lines sorted asciibetically based on column 2, in
    > the above examples that'd be CRADLE OF FILTH..., and MOTORHEAD...
    >
    > I'd like to provided an example of what I've tried, but the only idea
    > I've had is to extract the column 2 entries into an array, and then sort
    > the array and use it in a foreach loop, but then I've got to find a way
    > of accessing the hash from a value, and not a key (I can't use the
    > description as my key because there will be some sized items which will
    > have a different code, but identical description, so the code has to
    > stay as the key).


    I've looked further at this and found a "sort hash by value" example.
    Looking at it, I remember it from the Llama. I can see how to apply that
    to a simple hash, "@sorted = sort { $hash{$a} cmp $hash{$b} } keys
    %hash". But am not sure how to make $hashes{$report}{$item}{2} fit.

    What I've got is:

    my @sorted = sort byAlpha ( keys ( %{$hashes{$report}} ) );

    sub byAlpha {
    $hashes{$report}{$item}{2}{a} cmp $hashes{$report}{$item}{2}{a};
    }

    And, of course, $report is not passed to the sub and is therefore undef,
    and perl, rightly, complains. And at this point I'm stumped... again.

    Justin.

    --
    Justin C, by the sea.
     
    Justin C, Feb 19, 2009
    #2
    1. Advertising

  3. Justin C

    J. Gleixner Guest

    $VAR3 = 'stk1324r2';
    >> $VAR4 = {
    >> 'BE/COF/DRAGON' => {
    >> '8' => '0',
    >> '6' => '0',
    >> '4' => 'E',
    >> '3' => '1',
    >> '7' => '2',
    >> '9' => '2',
    >> '2' => 'CRADLE OF FILTH dragon WEB BELT',
    >> '5' => '2'
    >> },
    >> 'BE/MOT/WARPIG' => {
    >> '8' => '0',
    >> '6' => '0',
    >> '4' => 'E',
    >> '3' => '1',
    >> '7' => '125',
    >> '9' => '125',
    >> '2' => 'MOTORHEAD warpig BADGEBUCKLE

    BELT',
    >> '5' => '125'
    >> },

    $VAR3 = 'stk1324r2';
    >> $VAR4 = {
    >> 'BE/COF/DRAGON' => {
    >> '8' => '0',
    >> '6' => '0',
    >> '4' => 'E',
    >> '3' => '1',
    >> '7' => '2',
    >> '9' => '2',
    >> '2' => 'CRADLE OF FILTH dragon WEB BELT',
    >> '5' => '2'
    >> },
    >> 'BE/MOT/WARPIG' => {
    >> '8' => '0',
    >> '6' => '0',
    >> '4' => 'E',
    >> '3' => '1',
    >> '7' => '125',
    >> '9' => '125',
    >> '2' => 'MOTORHEAD warpig BADGEBUCKLE

    BELT',
    >> '5' => '125'
    >> },

    Justin C wrote:
    > On 2009-02-19, Justin C <> wrote:
    >> Here's how my hash looks:
    >>
    >> $hashes{$report}{$line}{$colNumber}
    >>
    >> So, in %hashes, is data from several reports, grouped by report, then
    >> line (well, stock code, which is the first field on the line), the
    >> column number. Here is a little of the output of Data::Dumper:
    >>
    >> $VAR1 = 'types';
    >> $VAR2 = {
    >> 'BE' => 1
    >> };
    >> $VAR3 = 'stk1324r2';
    >> $VAR4 = {
    >> 'BE/COF/DRAGON' => {
    >> '8' => '0',
    >> '6' => '0',
    >> '4' => 'E',
    >> '3' => '1',
    >> '7' => '2',
    >> '9' => '2',
    >> '2' => 'CRADLE OF FILTH dragon WEB BELT',
    >> '5' => '2'
    >> },
    >> 'BE/MOT/WARPIG' => {
    >> '8' => '0',
    >> '6' => '0',
    >> '4' => 'E',
    >> '3' => '1',
    >> '7' => '125',
    >> '9' => '125',
    >> '2' => 'MOTORHEAD warpig BADGEBUCKLE BELT',
    >> '5' => '125'
    >> },


    print Dumper( \%hashes );

    Will give you nicer output.

    >>
    >> I want to output the lines sorted asciibetically based on column 2, in
    >> the above examples that'd be CRADLE OF FILTH..., and MOTORHEAD...
    >>
    >> I'd like to provided an example of what I've tried, but the only idea
    >> I've had is to extract the column 2 entries into an array, and then sort
    >> the array and use it in a foreach loop, but then I've got to find a way
    >> of accessing the hash from a value, and not a key (I can't use the
    >> description as my key because there will be some sized items which will
    >> have a different code, but identical description, so the code has to
    >> stay as the key).

    >
    > I've looked further at this and found a "sort hash by value" example.
    > Looking at it, I remember it from the Llama. I can see how to apply that
    > to a simple hash, "@sorted = sort { $hash{$a} cmp $hash{$b} } keys
    > %hash". But am not sure how to make $hashes{$report}{$item}{2} fit.
    >
    > What I've got is:
    >
    > my @sorted = sort byAlpha ( keys ( %{$hashes{$report}} ) );
    >
    > sub byAlpha {
    > $hashes{$report}{$item}{2}{a} cmp $hashes{$report}{$item}{2}{a};
    > }
    >
    > And, of course, $report is not passed to the sub and is therefore undef,
    > and perl, rightly, complains. And at this point I'm stumped... again.


    One way would be to go through all of your data, storing the field
    you're after as a key, provided they're all unique, and maybe the
    hash of data as the value.

    my %by_fld2;
    for my $report ( keys %hashes )
    {
    for my $item ( keys %{ $hashes{ $report } } )
    {
    next unless exists $hashes{ $report }{ $item }{ '2' };
    $by_fld2{ $hashes{ $report }{ $item }{ '2' } } =
    $hashes{ $report }{ $item };
    }
    }

    for my $field2 ( sort keys %by_flds )
    {
    # print whatever...
    }

    Having keys like 1,2,3, etc. is not very descriptive and after a few
    months, when you look at your code again, you'll wish you used
    something that made sense.
     
    J. Gleixner, Feb 19, 2009
    #3
  4. Justin C

    Guest

    wrote:
    > On 2009-02-19, Justin C <> wrote:
    > > Here's how my hash looks:
    > >
    > > $hashes{$report}{$line}{$colNumber}
    > >
    > > So, in %hashes, is data from several reports, grouped by report, then
    > > line (well, stock code, which is the first field on the line), the
    > > column number. Here is a little of the output of Data::Dumper:
    > >
    > > $VAR1 = 'types';
    > > $VAR2 = {
    > > 'BE' => 1
    > > };
    > > $VAR3 = 'stk1324r2';
    > > $VAR4 = {


    ....

    You should give Dumper a reference to your hash, not the flattened list
    of your hash. Then there would be only one $VARx and it reflect the true
    structure you have.


    > I've looked further at this and found a "sort hash by value" example.
    > Looking at it, I remember it from the Llama. I can see how to apply that
    > to a simple hash, "@sorted = sort { $hash{$a} cmp $hash{$b} } keys
    > %hash". But am not sure how to make $hashes{$report}{$item}{2} fit.
    >
    > What I've got is:
    >
    > my @sorted = sort byAlpha ( keys ( %{$hashes{$report}} ) );
    >
    > sub byAlpha {
    > $hashes{$report}{$item}{2}{a} cmp $hashes{$report}{$item}{2}{a};
    > }


    What is the last {a} for? You don't have such a critter in your data dump,
    the value associated with key 2 is a string, not a hashref with 'a' as a
    key.

    It seems like it should be
    $hashes{$report}{$a}{2} cmp $hashes{$report}{$b}{2};

    > And, of course, $report is not passed to the sub and is therefore undef,
    > and perl, rightly, complains. And at this point I'm stumped... again.


    Don't use a named subroutine as the sort's code block, just put the code
    directly into the sort block. That way it executes in the context of
    the sort, where $report is defined.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
     
    , Feb 19, 2009
    #4
  5. Justin C

    Justin C Guest

    On 2009-02-19, Greg Bacon <> wrote:
    > It looks like you must have called Dumper(%hashes). What you really
    > want is Dumper(\%hashes) -- note the backslash -- so you can later
    > evaluate the dump to create an equivalent hash.


    I copied that output from a web-browser, it doesn't appear much
    different using what you suggest, only the $VAR2, $VAR3 etc are
    missing.


    > Note that %hashes is a really bad name.


    Oh, I know! I didn't spend enough time thinking this all through before
    I got coding.


    > You described your structure with
    >
    > $hashes{$report}{$line}{$colNumber}
    >
    > so %report is probably a better name. (I assume stk1324r2 is a
    > report code.)
    >
    > You can sort and print the descriptions with
    >
    > print map "$_\n",
    > sort
    > map { my $r = $report{$_};
    > map ref $r->{$_} && $r->{$_}{2} ? $r->{$_}{2} : (),
    > keys %$r;
    > }
    > keys %report;
    >
    > I don't recommend this approach, but it's instructive. Read it
    > bottom-to-top:
    >
    > 1. For each possible report code (keys %report)
    > 2. Make note of where we are in the hash ($r = $report{$_})
    > 3. Make sure it's a report code (ref $r->{$_} && $r->{$_}{2})
    > 4. Extract the description from column 2.


    I've seen 'map' used a few times before, it's not a command I'm familiar
    with. I'll read the documentation before I do any more work on this.


    > You can make it a little nicer if you can infer from the key that
    > its value is a report (say, if it starts with "stk"):
    >
    > print map "$_\n",
    > sort
    > map { my $r = $report{$_}; map $r->{$_}{2}, keys %$r; }
    > grep /^stk/,
    > keys %report;
    >
    > I take it you want more than the descriptions, though. Do you also
    > need the "paths" to get to them?
    >
    > Hope this helps,
    > Greg


    It's given me stuff to think about, and some help, yes. Thank you.


    Justin.

    --
    Justin C, by the sea.
     
    Justin C, Feb 20, 2009
    #5
  6. Justin C

    Justin C Guest

    On 2009-02-19, Greg Bacon <> wrote:
    >: I take it you want more than the descriptions, though. Do you also
    >: need the "paths" to get to them?
    >
    > If that's the approach you're after, use code similar to the following:
    >
    > my $DESC_COLUMN = 2;
    >
    > sub item_paths {
    > my($report) = @_;
    >
    > my @descs;
    >
    > foreach my $code (keys %$report) {
    > next unless $code =~ /^stk/ && ref $report->{$code};
    >
    > foreach my $line (keys %{ $report->{$code} }) {
    > push @descs => [ $code, $line ]
    > if exists $report->{$code}{$line}{$DESC_COLUMN};
    > }
    > }
    >
    > @descs;
    > }
    >
    > print map "$_\n",
    > sort
    > map $report{ $_->[0] }{ $_->[1] }{ $DESC_COLUMN },
    > item_paths \%report;


    More to think about, thank you. I'm considering an alteration to my data
    structure (%hashes), with each report getting it's own hash, named at
    run-time from the report code. Seems a better way to me. It'll reduce
    the 'depth' of the hash for a start. I'm just going through the replies
    I've received to my post before I read perldoc -f map, so I'll keep this
    to re-read after I've read the docs. Thanks for you help.

    Justin.

    --
    Justin C, by the sea.
     
    Justin C, Feb 20, 2009
    #6
  7. Justin C

    Justin C Guest

    On 2009-02-19, J. Gleixner <> wrote:
    > Justin C wrote:
    >> On 2009-02-19, Justin C <> wrote:
    >>> Here's how my hash looks:
    >>>
    >>> $hashes{$report}{$line}{$colNumber}
    >>>
    >>> So, in %hashes, is data from several reports, grouped by report, then
    >>> line (well, stock code, which is the first field on the line), the
    >>> column number. Here is a little of the output of Data::Dumper:
    >>>

    [snip]
    >
    > One way would be to go through all of your data, storing the field
    > you're after as a key, provided they're all unique, and maybe the
    > hash of data as the value.


    I can't garuantee that the descriptions will be unique. But thanks
    anyway.


    > Having keys like 1,2,3, etc. is not very descriptive and after a few
    > months, when you look at your code again, you'll wish you used
    > something that made sense.


    I have, within the code, described the construction, and the numbers are
    the column numbers within the report, so it's not overly confusing.

    Justin.

    --
    Justin C, by the sea.
     
    Justin C, Feb 20, 2009
    #7
  8. Justin C

    Justin C Guest

    On 2009-02-19, <> wrote:
    > wrote:
    >> On 2009-02-19, Justin C <> wrote:
    >> > Here's how my hash looks:
    >> >
    >> > $hashes{$report}{$line}{$colNumber}
    >> >
    >> > So, in %hashes, is data from several reports, grouped by report, then
    >> > line (well, stock code, which is the first field on the line), the
    >> > column number. Here is a little of the output of Data::Dumper:
    >> >
    >> > $VAR1 = 'types';
    >> > $VAR2 = {
    >> > 'BE' => 1
    >> > };
    >> > $VAR3 = 'stk1324r2';
    >> > $VAR4 = {

    >
    > ...
    >
    > You should give Dumper a reference to your hash, not the flattened list
    > of your hash. Then there would be only one $VARx and it reflect the true
    > structure you have.


    Yes, others have said similar. It does look clearer that way. I have
    made a note for next time.


    >> I've looked further at this and found a "sort hash by value" example.
    >> Looking at it, I remember it from the Llama. I can see how to apply that
    >> to a simple hash, "@sorted = sort { $hash{$a} cmp $hash{$b} } keys
    >> %hash". But am not sure how to make $hashes{$report}{$item}{2} fit.
    >>
    >> What I've got is:
    >>
    >> my @sorted = sort byAlpha ( keys ( %{$hashes{$report}} ) );
    >>
    >> sub byAlpha {
    >> $hashes{$report}{$item}{2}{a} cmp $hashes{$report}{$item}{2}{a};
    >> }

    >
    > What is the last {a} for? You don't have such a critter in your data dump,
    > the value associated with key 2 is a string, not a hashref with 'a' as a
    > key.


    Typo, I was going for, more or less, { $a cmp $b } but managed to type
    $a both times.



    > It seems like it should be
    > $hashes{$report}{$a}{2} cmp $hashes{$report}{$b}{2};
    >
    >> And, of course, $report is not passed to the sub and is therefore undef,
    >> and perl, rightly, complains. And at this point I'm stumped... again.

    >
    > Don't use a named subroutine as the sort's code block, just put the code
    > directly into the sort block. That way it executes in the context of
    > the sort, where $report is defined.


    That all makes perfect sense, but I'm having difficulty today making
    things work.

    ..
    ..
    ..
    Nope, just a hard of understanding day. Got it now with:

    my @sorted = sort { $hashes{$report}{$a}{2} cmp $hashes{$report}{$b}{2}
    } keys %{$hashes{$report}}

    Great stuff. Thank you for your help.


    Justin.

    --
    Justin C, by the sea.
     
    Justin C, Feb 20, 2009
    #8
  9. Justin C

    Justin C Guest

    On 2009-02-20, Greg Bacon <> wrote:
    > Justin C wrote:
    >
    >: J. Gleixner wrote:
    >:
    >: > Having keys like 1,2,3, etc. is not very descriptive and after a few
    >: > months, when you look at your code again, you'll wish you used
    >: > something that made sense.
    >:
    >: I have, within the code, described the construction, and the numbers
    >: are the column numbers within the report, so it's not overly
    >: confusing.
    >
    > Using sequence numbers as keys in a hash is often a strong sign
    > that you should use an array instead. However...


    Yes, you've made me think about that, and I think you're right, an array
    would have done the job.

    Following the comments here I can see a bit of a re-write is needed. All
    comments have been noted, and I shall review the code in a while. I
    actually need the code to be working this week due to company year-end,
    and it is in working order just now. It is only of use at stock-take
    time anyway, I'll review it as soon as we've finished with it so it's
    better for next time.



    > I assume you used comments to describe the construction. Consider
    > Rob Pike's advice[*]:
    >
    > If your code needs a comment to be understood, it
    > would be better to rewrite it so it's easier to
    > understand.


    That's something I *really* should think about... actually, I think I'm
    going to print it and tape it to the bottom of my monitor.


    >
    > [*] http://www.lysator.liu.se/c/pikestyle.html


    Interesting read.


    > *The Practice of Programming* (a book Pike co-authored with
    > Brian Kernighan -- the K in K&R) succinctly advises: "Give names
    > to magic numbers." That is, use meaningful symbolic names instead
    > of bare numeric literals.


    I'm working through a K&R book at the moment (well, when I'm not
    studying for YachtMaster sailing qualification), it's a second attempt,
    and I have to say, it's much easier this time round. I'm understanding
    many concepts through becoming familiar with [P|p]erl.

    Justin.

    --
    Justin C, by the sea.
     
    Justin C, Feb 23, 2009
    #9
  10. [DGBI] It depends (was: Sorting hash of hashes)

    On 2009-02-23, Greg Bacon <> wrote:
    *SKIP*
    > Note that /[P|p]erl/ matches "Perl", "perl", and "|erl". You've
    > conflated /[Pp]erl/ and /(P|p)erl/. Prefer the former pattern:
    > the latter, although equivalent, looks odd. (There's also a small
    > performance penalty.)


    In *this* case, that matters only in case string matches. Otherwise
    they can't decide who would be faster (string C<erl> is anchored at
    point 1, then if there's no such string in a pattern the whole match
    fails). OTOH, I wouldn't call B<TRIE-EXACT> being *2* times slower a
    'small performance penalty'.

    perl -wle '
    use Benchmark qw|cmpthese timethese|;
    my $x = q|Perl|;
    my $y = q|PERL|;
    cmpthese timethese -10, {
    alt_p => sub { $x =~ m{(?:p|p)erl} },
    alt_n => sub { $y =~ m{(?:p|p)erl} },
    cls_p => sub { $x =~ m{[Pp]erl} },
    cls_n => sub { $y =~ m{[Pp]erl} },
    };
    '
    Benchmark:
    running
    alt_n, alt_p, cls_n, cls_p
    for at least 10 CPU seconds
    ....

    alt_n: 12 wallclock secs (11.27 usr + -0.19 sys = 11.08 CPU) @
    1174967.24/s (n=13018637)

    alt_p: 11 wallclock secs (10.02 usr + 0.01 sys = 10.03 CPU) @
    252231.51/s (n=2529882)

    cls_n: 12 wallclock secs (11.83 usr + 0.16 sys = 11.99 CPU) @
    1114404.34/s (n=13361708)

    cls_p: 12 wallclock secs (10.46 usr + 0.05 sys = 10.51 CPU) @
    503643.10/s (n=5293289)

    Rate alt_p cls_p cls_n alt_n
    alt_p 252232/s -- -50% -77% -79%
    cls_p 503643/s 100% -- -55% -57%
    cls_n 1114404/s 342% 121% -- -5%
    alt_n 1174967/s 366% 133% 5% --


    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
     
    Eric Pozharski, Feb 24, 2009
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Scott  Gilpin
    Replies:
    2
    Views:
    232
  2. Perl Learner

    Hashes of hashes or just one hash ?

    Perl Learner, Jun 8, 2005, in forum: Perl Misc
    Replies:
    11
    Views:
    226
  3. Tim O'Donovan

    Hash of hashes, of hashes, of arrays of hashes

    Tim O'Donovan, Oct 27, 2005, in forum: Perl Misc
    Replies:
    5
    Views:
    230
  4. IanW
    Replies:
    3
    Views:
    136
    Ian Stuart
    Dec 14, 2005
  5. Replies:
    3
    Views:
    227
Loading...

Share This Page