Sorting hash of hashes

Justin C · Feb 19, 2009

Here's how my hash looks:

$hashes{$report}{$line}{$colNumber}

So, in %hashes, is data from several reports, grouped by report, then
line (well, stock code, which is the first field on the line), the
column number. Here is a little of the output of Data:

umper:

$VAR1 = 'types';
$VAR2 = {
'BE' => 1
};
$VAR3 = 'stk1324r2';
$VAR4 = {
'BE/COF/DRAGON' => {
'8' => '0',
'6' => '0',
'4' => 'E',
'3' => '1',
'7' => '2',
'9' => '2',
'2' => 'CRADLE OF FILTH dragon WEB BELT',
'5' => '2'
},
'BE/MOT/WARPIG' => {
'8' => '0',
'6' => '0',
'4' => 'E',
'3' => '1',
'7' => '125',
'9' => '125',
'2' => 'MOTORHEAD warpig BADGEBUCKLE BELT',
'5' => '125'
},

I want to output the lines sorted asciibetically based on column 2, in
the above examples that'd be CRADLE OF FILTH..., and MOTORHEAD...

I'd like to provided an example of what I've tried, but the only idea
I've had is to extract the column 2 entries into an array, and then sort
the array and use it in a foreach loop, but then I've got to find a way
of accessing the hash from a value, and not a key (I can't use the
description as my key because there will be some sized items which will
have a different code, but identical description, so the code has to
stay as the key).

Thank you for any help you can give with this.

Justin.

Justin C · Feb 19, 2009

Here's how my hash looks:

$hashes{$report}{$line}{$colNumber}

So, in %hashes, is data from several reports, grouped by report, then
line (well, stock code, which is the first field on the line), the
column number. Here is a little of the output of Data:umper:

$VAR1 = 'types';
$VAR2 = {
'BE' => 1
};
$VAR3 = 'stk1324r2';
$VAR4 = {
'BE/COF/DRAGON' => {
'8' => '0',
'6' => '0',
'4' => 'E',
'3' => '1',
'7' => '2',
'9' => '2',
'2' => 'CRADLE OF FILTH dragon WEB BELT',
'5' => '2'
},
'BE/MOT/WARPIG' => {
'8' => '0',
'6' => '0',
'4' => 'E',
'3' => '1',
'7' => '125',
'9' => '125',
'2' => 'MOTORHEAD warpig BADGEBUCKLE BELT',
'5' => '125'
},

I want to output the lines sorted asciibetically based on column 2, in
the above examples that'd be CRADLE OF FILTH..., and MOTORHEAD...

I'd like to provided an example of what I've tried, but the only idea
I've had is to extract the column 2 entries into an array, and then sort
the array and use it in a foreach loop, but then I've got to find a way
of accessing the hash from a value, and not a key (I can't use the
description as my key because there will be some sized items which will
have a different code, but identical description, so the code has to
stay as the key).

I've looked further at this and found a "sort hash by value" example.
Looking at it, I remember it from the Llama. I can see how to apply that
to a simple hash, "@sorted = sort { $hash{$a} cmp $hash{$b} } keys
%hash". But am not sure how to make $hashes{$report}{$item}{2} fit.

What I've got is:

my @sorted = sort byAlpha ( keys ( %{$hashes{$report}} ) );

sub byAlpha {
$hashes{$report}{$item}{2}{a} cmp $hashes{$report}{$item}{2}{a};
}

And, of course, $report is not passed to the sub and is therefore undef,
and perl, rightly, complains. And at this point I'm stumped... again.

Justin.

J. Gleixner · Feb 19, 2009

$VAR3 = 'stk1324r2';
print Dumper( \%hashes );

Will give you nicer output.

I've looked further at this and found a "sort hash by value" example.
Looking at it, I remember it from the Llama. I can see how to apply that
to a simple hash, "@sorted = sort { $hash{$a} cmp $hash{$b} } keys
%hash". But am not sure how to make $hashes{$report}{$item}{2} fit.

What I've got is:

my @sorted = sort byAlpha ( keys ( %{$hashes{$report}} ) );

sub byAlpha {
$hashes{$report}{$item}{2}{a} cmp $hashes{$report}{$item}{2}{a};
}

And, of course, $report is not passed to the sub and is therefore undef,
and perl, rightly, complains. And at this point I'm stumped... again.

One way would be to go through all of your data, storing the field
you're after as a key, provided they're all unique, and maybe the
hash of data as the value.

my %by_fld2;
for my $report ( keys %hashes )
{
for my $item ( keys %{ $hashes{ $report } } )
{
next unless exists $hashes{ $report }{ $item }{ '2' };
$by_fld2{ $hashes{ $report }{ $item }{ '2' } } =
$hashes{ $report }{ $item };
}
}

for my $field2 ( sort keys %by_flds )
{
# print whatever...
}

Having keys like 1,2,3, etc. is not very descriptive and after a few
months, when you look at your code again, you'll wish you used
something that made sense.

xhoster · Feb 19, 2009

....

You should give Dumper a reference to your hash, not the flattened list
of your hash. Then there would be only one $VARx and it reflect the true
structure you have.

I've looked further at this and found a "sort hash by value" example.
Looking at it, I remember it from the Llama. I can see how to apply that
to a simple hash, "@sorted = sort { $hash{$a} cmp $hash{$b} } keys
%hash". But am not sure how to make $hashes{$report}{$item}{2} fit.

What I've got is:

my @sorted = sort byAlpha ( keys ( %{$hashes{$report}} ) );

sub byAlpha {
$hashes{$report}{$item}{2}{a} cmp $hashes{$report}{$item}{2}{a};
}

What is the last {a} for? You don't have such a critter in your data dump,
the value associated with key 2 is a string, not a hashref with 'a' as a
key.

It seems like it should be
$hashes{$report}{$a}{2} cmp $hashes{$report}{$b}{2};

And, of course, $report is not passed to the sub and is therefore undef,
and perl, rightly, complains. And at this point I'm stumped... again.

Don't use a named subroutine as the sort's code block, just put the code
directly into the sort block. That way it executes in the context of
the sort, where $report is defined.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Justin C · Feb 20, 2009

It looks like you must have called Dumper(%hashes). What you really
want is Dumper(\%hashes) -- note the backslash -- so you can later
evaluate the dump to create an equivalent hash.

I copied that output from a web-browser, it doesn't appear much
different using what you suggest, only the $VAR2, $VAR3 etc are
missing.

Note that %hashes is a really bad name.

Oh, I know! I didn't spend enough time thinking this all through before
I got coding.

You described your structure with

$hashes{$report}{$line}{$colNumber}

so %report is probably a better name. (I assume stk1324r2 is a
report code.)

You can sort and print the descriptions with

print map "$_\n",
sort
map { my $r = $report{$_};
map ref $r->{$_} && $r->{$_}{2} ? $r->{$_}{2} : (),
keys %$r;
}
keys %report;

I don't recommend this approach, but it's instructive. Read it
bottom-to-top:

1. For each possible report code (keys %report)
2. Make note of where we are in the hash ($r = $report{$_})
3. Make sure it's a report code (ref $r->{$_} && $r->{$_}{2})
4. Extract the description from column 2.

I've seen 'map' used a few times before, it's not a command I'm familiar
with. I'll read the documentation before I do any more work on this.

You can make it a little nicer if you can infer from the key that
its value is a report (say, if it starts with "stk"):

print map "$_\n",
sort
map { my $r = $report{$_}; map $r->{$_}{2}, keys %$r; }
grep /^stk/,
keys %report;

I take it you want more than the descriptions, though. Do you also
need the "paths" to get to them?

Hope this helps,
Greg

It's given me stuff to think about, and some help, yes. Thank you.

Justin.

Justin C · Feb 20, 2009

: I take it you want more than the descriptions, though. Do you also
: need the "paths" to get to them?

If that's the approach you're after, use code similar to the following:

my $DESC_COLUMN = 2;

sub item_paths {
my($report) = @_;

my @descs;

foreach my $code (keys %$report) {
next unless $code =~ /^stk/ && ref $report->{$code};

foreach my $line (keys %{ $report->{$code} }) {
push @descs => [ $code, $line ]
if exists $report->{$code}{$line}{$DESC_COLUMN};
}
}

@descs;
}

print map "$_\n",
sort
map $report{ $_->[0] }{ $_->[1] }{ $DESC_COLUMN },
item_paths \%report;

More to think about, thank you. I'm considering an alteration to my data
structure (%hashes), with each report getting it's own hash, named at
run-time from the report code. Seems a better way to me. It'll reduce
the 'depth' of the hash for a start. I'm just going through the replies
I've received to my post before I read perldoc -f map, so I'll keep this
to re-read after I've read the docs. Thanks for you help.

Justin.

Justin C · Feb 20, 2009

[snip]

One way would be to go through all of your data, storing the field
you're after as a key, provided they're all unique, and maybe the
hash of data as the value.

I can't garuantee that the descriptions will be unique. But thanks
anyway.

Having keys like 1,2,3, etc. is not very descriptive and after a few
months, when you look at your code again, you'll wish you used
something that made sense.

I have, within the code, described the construction, and the numbers are
the column numbers within the report, so it's not overly confusing.

Justin.

Justin C · Feb 20, 2009

...

You should give Dumper a reference to your hash, not the flattened list
of your hash. Then there would be only one $VARx and it reflect the true
structure you have.

Yes, others have said similar. It does look clearer that way. I have
made a note for next time.

What is the last {a} for? You don't have such a critter in your data dump,
the value associated with key 2 is a string, not a hashref with 'a' as a
key.

Typo, I was going for, more or less, { $a cmp $b } but managed to type
$a both times.

It seems like it should be
$hashes{$report}{$a}{2} cmp $hashes{$report}{$b}{2};

Don't use a named subroutine as the sort's code block, just put the code
directly into the sort block. That way it executes in the context of
the sort, where $report is defined.

That all makes perfect sense, but I'm having difficulty today making
things work.

..
..
..
Nope, just a hard of understanding day. Got it now with:

my @sorted = sort { $hashes{$report}{$a}{2} cmp $hashes{$report}{$b}{2}
} keys %{$hashes{$report}}

Great stuff. Thank you for your help.

Justin.

Justin C · Feb 23, 2009

Justin C wrote:

: J. Gleixner wrote:
:
: > Having keys like 1,2,3, etc. is not very descriptive and after a few
: > months, when you look at your code again, you'll wish you used
: > something that made sense.
:
: I have, within the code, described the construction, and the numbers
: are the column numbers within the report, so it's not overly
: confusing.

Using sequence numbers as keys in a hash is often a strong sign
that you should use an array instead. However...

Yes, you've made me think about that, and I think you're right, an array
would have done the job.

Following the comments here I can see a bit of a re-write is needed. All
comments have been noted, and I shall review the code in a while. I
actually need the code to be working this week due to company year-end,
and it is in working order just now. It is only of use at stock-take
time anyway, I'll review it as soon as we've finished with it so it's
better for next time.

I assume you used comments to describe the construction. Consider
Rob Pike's advice[*]:

If your code needs a comment to be understood, it
would be better to rewrite it so it's easier to
understand.

That's something I *really* should think about... actually, I think I'm
going to print it and tape it to the bottom of my monitor.

[*] http://www.lysator.liu.se/c/pikestyle.html

Interesting read.

*The Practice of Programming* (a book Pike co-authored with
Brian Kernighan -- the K in K&R) succinctly advises: "Give names
to magic numbers." That is, use meaningful symbolic names instead
of bare numeric literals.

I'm working through a K&R book at the moment (well, when I'm not
studying for YachtMaster sailing qualification), it's a second attempt,
and I have to say, it's much easier this time round. I'm understanding
many concepts through becoming familiar with [P|p]erl.

Justin.

Eric Pozharski · Feb 24, 2009

On 2009-02-23 said:
Note that /[P|p]erl/ matches "Perl", "perl", and "|erl". You've
conflated /[Pp]erl/ and /(P|p)erl/. Prefer the former pattern:
the latter, although equivalent, looks odd. (There's also a small
performance penalty.)

In *this* case, that matters only in case string matches. Otherwise
they can't decide who would be faster (string C<erl> is anchored at
point 1, then if there's no such string in a pattern the whole match
fails). OTOH, I wouldn't call B<TRIE-EXACT> being *2* times slower a
'small performance penalty'.

perl -wle '
use Benchmark qw|cmpthese timethese|;
my $x = q|Perl|;
my $y = q|PERL|;
cmpthese timethese -10, {
alt_p => sub { $x =~ m{(?

|p)erl} },
alt_n => sub { $y =~ m{(?

|p)erl} },
cls_p => sub { $x =~ m{[Pp]erl} },
cls_n => sub { $y =~ m{[Pp]erl} },
};
'
Benchmark:
running
alt_n, alt_p, cls_n, cls_p
for at least 10 CPU seconds
....

alt_n: 12 wallclock secs (11.27 usr + -0.19 sys = 11.08 CPU) @
1174967.24/s (n=13018637)

alt_p: 11 wallclock secs (10.02 usr + 0.01 sys = 10.03 CPU) @
252231.51/s (n=2529882)

cls_n: 12 wallclock secs (11.83 usr + 0.16 sys = 11.99 CPU) @
1114404.34/s (n=13361708)

cls_p: 12 wallclock secs (10.46 usr + 0.05 sys = 10.51 CPU) @
503643.10/s (n=5293289)

Rate alt_p cls_p cls_n alt_n
alt_p 252232/s -- -50% -77% -79%
cls_p 503643/s 100% -- -55% -57%
cls_n 1114404/s 342% 121% -- -5%
alt_n 1174967/s 366% 133% 5% --

Sorting hash of hashes	3	Nov 21, 2011
Hash	4	Dec 23, 2011
Hash of Hashes	5	Jan 25, 2007
Help with Hash of Hashes	1	Mar 1, 2006
Hash of hashes	5	Apr 10, 2006
warnings on sorting hash of hashes	0	Jan 5, 2005
hash of arrays	1	Sep 13, 2012
Nested sorting of a hash	6	Dec 6, 2007

Sorting hash of hashes

Justin C

Justin C

J. Gleixner

xhoster

Justin C

Justin C

Justin C

Justin C

Justin C

Eric Pozharski

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads