Sorting hash of hashes


J

Justin C

Here's how my hash looks:

$hashes{$report}{$line}{$colNumber}

So, in %hashes, is data from several reports, grouped by report, then
line (well, stock code, which is the first field on the line), the
column number. Here is a little of the output of Data::Dumper:

$VAR1 = 'types';
$VAR2 = {
'BE' => 1
};
$VAR3 = 'stk1324r2';
$VAR4 = {
'BE/COF/DRAGON' => {
'8' => '0',
'6' => '0',
'4' => 'E',
'3' => '1',
'7' => '2',
'9' => '2',
'2' => 'CRADLE OF FILTH dragon WEB BELT',
'5' => '2'
},
'BE/MOT/WARPIG' => {
'8' => '0',
'6' => '0',
'4' => 'E',
'3' => '1',
'7' => '125',
'9' => '125',
'2' => 'MOTORHEAD warpig BADGEBUCKLE BELT',
'5' => '125'
},

I want to output the lines sorted asciibetically based on column 2, in
the above examples that'd be CRADLE OF FILTH..., and MOTORHEAD...

I'd like to provided an example of what I've tried, but the only idea
I've had is to extract the column 2 entries into an array, and then sort
the array and use it in a foreach loop, but then I've got to find a way
of accessing the hash from a value, and not a key (I can't use the
description as my key because there will be some sized items which will
have a different code, but identical description, so the code has to
stay as the key).

Thank you for any help you can give with this.

Justin.
 
Ad

Advertisements

J

Justin C

Here's how my hash looks:

$hashes{$report}{$line}{$colNumber}

So, in %hashes, is data from several reports, grouped by report, then
line (well, stock code, which is the first field on the line), the
column number. Here is a little of the output of Data::Dumper:

$VAR1 = 'types';
$VAR2 = {
'BE' => 1
};
$VAR3 = 'stk1324r2';
$VAR4 = {
'BE/COF/DRAGON' => {
'8' => '0',
'6' => '0',
'4' => 'E',
'3' => '1',
'7' => '2',
'9' => '2',
'2' => 'CRADLE OF FILTH dragon WEB BELT',
'5' => '2'
},
'BE/MOT/WARPIG' => {
'8' => '0',
'6' => '0',
'4' => 'E',
'3' => '1',
'7' => '125',
'9' => '125',
'2' => 'MOTORHEAD warpig BADGEBUCKLE BELT',
'5' => '125'
},

I want to output the lines sorted asciibetically based on column 2, in
the above examples that'd be CRADLE OF FILTH..., and MOTORHEAD...

I'd like to provided an example of what I've tried, but the only idea
I've had is to extract the column 2 entries into an array, and then sort
the array and use it in a foreach loop, but then I've got to find a way
of accessing the hash from a value, and not a key (I can't use the
description as my key because there will be some sized items which will
have a different code, but identical description, so the code has to
stay as the key).

I've looked further at this and found a "sort hash by value" example.
Looking at it, I remember it from the Llama. I can see how to apply that
to a simple hash, "@sorted = sort { $hash{$a} cmp $hash{$b} } keys
%hash". But am not sure how to make $hashes{$report}{$item}{2} fit.

What I've got is:

my @sorted = sort byAlpha ( keys ( %{$hashes{$report}} ) );

sub byAlpha {
$hashes{$report}{$item}{2}{a} cmp $hashes{$report}{$item}{2}{a};
}

And, of course, $report is not passed to the sub and is therefore undef,
and perl, rightly, complains. And at this point I'm stumped... again.

Justin.
 
J

J. Gleixner

$VAR3 = 'stk1324r2';
print Dumper( \%hashes );

Will give you nicer output.
I've looked further at this and found a "sort hash by value" example.
Looking at it, I remember it from the Llama. I can see how to apply that
to a simple hash, "@sorted = sort { $hash{$a} cmp $hash{$b} } keys
%hash". But am not sure how to make $hashes{$report}{$item}{2} fit.

What I've got is:

my @sorted = sort byAlpha ( keys ( %{$hashes{$report}} ) );

sub byAlpha {
$hashes{$report}{$item}{2}{a} cmp $hashes{$report}{$item}{2}{a};
}

And, of course, $report is not passed to the sub and is therefore undef,
and perl, rightly, complains. And at this point I'm stumped... again.

One way would be to go through all of your data, storing the field
you're after as a key, provided they're all unique, and maybe the
hash of data as the value.

my %by_fld2;
for my $report ( keys %hashes )
{
for my $item ( keys %{ $hashes{ $report } } )
{
next unless exists $hashes{ $report }{ $item }{ '2' };
$by_fld2{ $hashes{ $report }{ $item }{ '2' } } =
$hashes{ $report }{ $item };
}
}

for my $field2 ( sort keys %by_flds )
{
# print whatever...
}

Having keys like 1,2,3, etc. is not very descriptive and after a few
months, when you look at your code again, you'll wish you used
something that made sense.
 
X

xhoster

....

You should give Dumper a reference to your hash, not the flattened list
of your hash. Then there would be only one $VARx and it reflect the true
structure you have.

I've looked further at this and found a "sort hash by value" example.
Looking at it, I remember it from the Llama. I can see how to apply that
to a simple hash, "@sorted = sort { $hash{$a} cmp $hash{$b} } keys
%hash". But am not sure how to make $hashes{$report}{$item}{2} fit.

What I've got is:

my @sorted = sort byAlpha ( keys ( %{$hashes{$report}} ) );

sub byAlpha {
$hashes{$report}{$item}{2}{a} cmp $hashes{$report}{$item}{2}{a};
}

What is the last {a} for? You don't have such a critter in your data dump,
the value associated with key 2 is a string, not a hashref with 'a' as a
key.

It seems like it should be
$hashes{$report}{$a}{2} cmp $hashes{$report}{$b}{2};
And, of course, $report is not passed to the sub and is therefore undef,
and perl, rightly, complains. And at this point I'm stumped... again.

Don't use a named subroutine as the sort's code block, just put the code
directly into the sort block. That way it executes in the context of
the sort, where $report is defined.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
J

Justin C

It looks like you must have called Dumper(%hashes). What you really
want is Dumper(\%hashes) -- note the backslash -- so you can later
evaluate the dump to create an equivalent hash.

I copied that output from a web-browser, it doesn't appear much
different using what you suggest, only the $VAR2, $VAR3 etc are
missing.

Note that %hashes is a really bad name.

Oh, I know! I didn't spend enough time thinking this all through before
I got coding.

You described your structure with

$hashes{$report}{$line}{$colNumber}

so %report is probably a better name. (I assume stk1324r2 is a
report code.)

You can sort and print the descriptions with

print map "$_\n",
sort
map { my $r = $report{$_};
map ref $r->{$_} && $r->{$_}{2} ? $r->{$_}{2} : (),
keys %$r;
}
keys %report;

I don't recommend this approach, but it's instructive. Read it
bottom-to-top:

1. For each possible report code (keys %report)
2. Make note of where we are in the hash ($r = $report{$_})
3. Make sure it's a report code (ref $r->{$_} && $r->{$_}{2})
4. Extract the description from column 2.

I've seen 'map' used a few times before, it's not a command I'm familiar
with. I'll read the documentation before I do any more work on this.

You can make it a little nicer if you can infer from the key that
its value is a report (say, if it starts with "stk"):

print map "$_\n",
sort
map { my $r = $report{$_}; map $r->{$_}{2}, keys %$r; }
grep /^stk/,
keys %report;

I take it you want more than the descriptions, though. Do you also
need the "paths" to get to them?

Hope this helps,
Greg

It's given me stuff to think about, and some help, yes. Thank you.


Justin.
 
J

Justin C

: I take it you want more than the descriptions, though. Do you also
: need the "paths" to get to them?

If that's the approach you're after, use code similar to the following:

my $DESC_COLUMN = 2;

sub item_paths {
my($report) = @_;

my @descs;

foreach my $code (keys %$report) {
next unless $code =~ /^stk/ && ref $report->{$code};

foreach my $line (keys %{ $report->{$code} }) {
push @descs => [ $code, $line ]
if exists $report->{$code}{$line}{$DESC_COLUMN};
}
}

@descs;
}

print map "$_\n",
sort
map $report{ $_->[0] }{ $_->[1] }{ $DESC_COLUMN },
item_paths \%report;

More to think about, thank you. I'm considering an alteration to my data
structure (%hashes), with each report getting it's own hash, named at
run-time from the report code. Seems a better way to me. It'll reduce
the 'depth' of the hash for a start. I'm just going through the replies
I've received to my post before I read perldoc -f map, so I'll keep this
to re-read after I've read the docs. Thanks for you help.

Justin.
 
Ad

Advertisements

J

Justin C

[snip]

One way would be to go through all of your data, storing the field
you're after as a key, provided they're all unique, and maybe the
hash of data as the value.

I can't garuantee that the descriptions will be unique. But thanks
anyway.

Having keys like 1,2,3, etc. is not very descriptive and after a few
months, when you look at your code again, you'll wish you used
something that made sense.

I have, within the code, described the construction, and the numbers are
the column numbers within the report, so it's not overly confusing.

Justin.
 
J

Justin C

...

You should give Dumper a reference to your hash, not the flattened list
of your hash. Then there would be only one $VARx and it reflect the true
structure you have.

Yes, others have said similar. It does look clearer that way. I have
made a note for next time.

What is the last {a} for? You don't have such a critter in your data dump,
the value associated with key 2 is a string, not a hashref with 'a' as a
key.

Typo, I was going for, more or less, { $a cmp $b } but managed to type
$a both times.


It seems like it should be
$hashes{$report}{$a}{2} cmp $hashes{$report}{$b}{2};


Don't use a named subroutine as the sort's code block, just put the code
directly into the sort block. That way it executes in the context of
the sort, where $report is defined.

That all makes perfect sense, but I'm having difficulty today making
things work.

..
..
..
Nope, just a hard of understanding day. Got it now with:

my @sorted = sort { $hashes{$report}{$a}{2} cmp $hashes{$report}{$b}{2}
} keys %{$hashes{$report}}

Great stuff. Thank you for your help.


Justin.
 
J

Justin C

Justin C wrote:

: J. Gleixner wrote:
:
: > Having keys like 1,2,3, etc. is not very descriptive and after a few
: > months, when you look at your code again, you'll wish you used
: > something that made sense.
:
: I have, within the code, described the construction, and the numbers
: are the column numbers within the report, so it's not overly
: confusing.

Using sequence numbers as keys in a hash is often a strong sign
that you should use an array instead. However...

Yes, you've made me think about that, and I think you're right, an array
would have done the job.

Following the comments here I can see a bit of a re-write is needed. All
comments have been noted, and I shall review the code in a while. I
actually need the code to be working this week due to company year-end,
and it is in working order just now. It is only of use at stock-take
time anyway, I'll review it as soon as we've finished with it so it's
better for next time.


I assume you used comments to describe the construction. Consider
Rob Pike's advice[*]:

If your code needs a comment to be understood, it
would be better to rewrite it so it's easier to
understand.

That's something I *really* should think about... actually, I think I'm
going to print it and tape it to the bottom of my monitor.


Interesting read.

*The Practice of Programming* (a book Pike co-authored with
Brian Kernighan -- the K in K&R) succinctly advises: "Give names
to magic numbers." That is, use meaningful symbolic names instead
of bare numeric literals.

I'm working through a K&R book at the moment (well, when I'm not
studying for YachtMaster sailing qualification), it's a second attempt,
and I have to say, it's much easier this time round. I'm understanding
many concepts through becoming familiar with [P|p]erl.

Justin.
 
Ad

Advertisements

E

Eric Pozharski

On 2009-02-23 said:
Note that /[P|p]erl/ matches "Perl", "perl", and "|erl". You've
conflated /[Pp]erl/ and /(P|p)erl/. Prefer the former pattern:
the latter, although equivalent, looks odd. (There's also a small
performance penalty.)

In *this* case, that matters only in case string matches. Otherwise
they can't decide who would be faster (string C<erl> is anchored at
point 1, then if there's no such string in a pattern the whole match
fails). OTOH, I wouldn't call B<TRIE-EXACT> being *2* times slower a
'small performance penalty'.

perl -wle '
use Benchmark qw|cmpthese timethese|;
my $x = q|Perl|;
my $y = q|PERL|;
cmpthese timethese -10, {
alt_p => sub { $x =~ m{(?:p|p)erl} },
alt_n => sub { $y =~ m{(?:p|p)erl} },
cls_p => sub { $x =~ m{[Pp]erl} },
cls_n => sub { $y =~ m{[Pp]erl} },
};
'
Benchmark:
running
alt_n, alt_p, cls_n, cls_p
for at least 10 CPU seconds
....

alt_n: 12 wallclock secs (11.27 usr + -0.19 sys = 11.08 CPU) @
1174967.24/s (n=13018637)

alt_p: 11 wallclock secs (10.02 usr + 0.01 sys = 10.03 CPU) @
252231.51/s (n=2529882)

cls_n: 12 wallclock secs (11.83 usr + 0.16 sys = 11.99 CPU) @
1114404.34/s (n=13361708)

cls_p: 12 wallclock secs (10.46 usr + 0.05 sys = 10.51 CPU) @
503643.10/s (n=5293289)

Rate alt_p cls_p cls_n alt_n
alt_p 252232/s -- -50% -77% -79%
cls_p 503643/s 100% -- -55% -57%
cls_n 1114404/s 342% 121% -- -5%
alt_n 1174967/s 366% 133% 5% --
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top