rearrange "columns" of a multi-level hash?

H

hymie!

Greetings. I don't have that special knack for properly forming a
Google search that will give me the answer I seek, so I apologize if
I'm asking an old question.

I'm taking over a project from a co-worker.

We are processing a file that has information in it:
customer vendor transType productCode appNumber resultCode

We have to prepare 2 reports from this data.

Without too much detail, the first report is sorted by Customer, then by
TransactionType, then by ProductCode, and then by resultCode, with a count
of the number of lines that match each configuration.

The second report is similar, but is sorted by Customer, then by Vendor,
then by TransType, ProdCode, and resultCode.

Co-worker wrote the script with (I don't know the correct term) a multi-
level hash:

unless( exists $list{ $customer } )
{
$list{ $customer } = () ;
}
unless( exists $list{ $customer }{ $type } )
{
$list{ $customer }{ $type } = () ;
}
and so on, down to
unless( exists $list{$customer}{$type}{$productCode}{$appNo}{$vendor} )
{
$list{ $customer }{ $type }{ $productCode }{ $appNo }{ $vendor } =
{ 130 => 0,
150 => 0,
385 => 0 } ;
}
if( exists $list{$customer}{$type}{$productCode}{$appNo}
{$vendor}{$returnCode} )
{
$list{$customer}{$type}{$productCode}{$appNo}{$vendor}{$returnCode} = 1
}

Then he reads the data thusly:

foreach $customer ( keys(%list) )
{
print "Customer: $customer\n\n" ;
foreach $type ( keys(%{$list{ $customer }}) )
{
printf "\n Type: %s", $type ;
foreach $productCode ( keys(%{$list{ $customer }{ $type }}) )
{
printf "\n Product: %s\n", $productCode ;
foreach $appNo (keys(%{$list{ $customer }{ $type }{ $productCode }}))
{
foreach $vendor (keys(%{$list{$customer}{$type}
{$productCode}{$appNo}}))
{
if( $list{ $customer }{ $type }{ $productCode }
{ $appNo }{ $vendor }{385})
{
$no385++ ;
}
if( $list{ $customer }{ $type }{ $productCode }
{ $appNo }{ $vendor }{150})
{
$no150++ ;
}
}
}
}
}
}

This generates the frst report that is sorted by Customer and Type.
He wrote a second, almost identical script to re-parse all of the original
data into a new hash with the variables in customer-vendor-type order.

What I want to know is if there is
(*) an easier way to re-arrange the first hash (visualize taking a column
in a spreadsheet and moving it over, then resorting with the new
column order)?
(*) a better/easier way to start from scratch?

Thanks.

hymie! http://www.smart.net/~hymowitz (e-mail address removed)
===============================================================================
 
B

Ben Morrow

Quoth (e-mail address removed) (hymie!):
We are processing a file that has information in it:
customer vendor transType productCode appNumber resultCode

We have to prepare 2 reports from this data.

Without too much detail, the first report is sorted by Customer, then by
TransactionType, then by ProductCode, and then by resultCode, with a count
of the number of lines that match each configuration.

The second report is similar, but is sorted by Customer, then by Vendor,
then by TransType, ProdCode, and resultCode.

Co-worker wrote the script with (I don't know the correct term) a multi-
level hash:

That term'll do fine... some people here use HoH, for Hash-of-Hash.
unless( exists $list{ $customer } )
{
$list{ $customer } = () ;
}
unless( exists $list{ $customer }{ $type } )
{
$list{ $customer }{ $type } = () ;
}
and so on, down to
unless( exists $list{$customer}{$type}{$productCode}{$appNo}{$vendor} )
{
$list{ $customer }{ $type }{ $productCode }{ $appNo }{ $vendor } =
{ 130 => 0,
150 => 0,
385 => 0 } ;
}

None of this is necessary. What was *meant* here, I think, is to create
a new anon hash for each level; in that case it should have been '= {}'
not '= ()'. In any case, if you treat an undefined variable as though
it's got a hashref in it, Perl will create a new anon hash and put a ref
to it in there for you. Also, a hash key that doesn't exist will return
a value of undef, which is zero in numeric context (with a warning you
can turn off)...
if( exists $list{$customer}{$type}{$productCode}{$appNo}
{$vendor}{$returnCode} )
{
$list{$customer}{$type}{$productCode}{$appNo}{$vendor}{$returnCode} = 1
}

....so this whole lot can be replaced with the one line

$list{$customer}{$type}{$productCode}{$appNo}{$vendor}{$returnCode} = 1
if grep $_ == $returnCode, qw/130 150 385/;

If you need the keys of the last hash to be right, or you specifically
need the zeros, you could do

my @validCodes = qw/130 150 385/;

for ($list{$customer}...{$vendor}) {
@{$_}{@validCodes} = (0) x @validCodes;

grep $_ == $returnCode, @validCodes
and $_->{$returnCode} = 1;
}

The expression @{$_}{@validCodes} is perhaps a little confusing: the
first {} are for disambiguation, the second are a hash slice. Compare
with @hash{@validCodes}.

Why is the data stored like this at all? Surely it would be better to
store the return code straight in the hash, rather than have another
level with only one (significant) value?

$list{$customer}...{$vendor} = $returnCode
if grep $_ == $returnCode, qw/130 150 385/;

Also, what happens if the return code *isn't* in the list? Is the list
supposed to be exhaustive (in which case you can simply strip the greps
out of the above)?
Then he reads the data thusly:

(unnecessary use of -ly: 'thus' means 'like this' all by itself)
foreach $customer ( keys(%list) )

This is not sorted. Did you simply mean 'grouped by', or should it be

for $customer (sort keys %list)
?
{
print "Customer: $customer\n\n" ;
foreach $type ( keys(%{$list{ $customer }}) )
{
printf "\n Type: %s", $type ;

Don't use printf when interpolation will do.
foreach $productCode ( keys(%{$list{ $customer }{ $type }}) )
{
printf "\n Product: %s\n", $productCode ;
foreach $appNo (keys(%{$list{ $customer }{ $type }{ $productCode }}))
{
foreach $vendor (keys(%{$list{$customer}{$type}
{$productCode}{$appNo}}))
{
if( $list{ $customer }{ $type }{ $productCode }
{ $appNo }{ $vendor }{385})
{
$no385++ ;

This should be a hash. Variables with systematically similar names
nearly always should be.

$no{385}++;

I would use a hash rather than an array even though the keys are
numeric because they are sparse.
}
if( $list{ $customer }{ $type }{ $productCode }
{ $appNo }{ $vendor }{150})
{
$no150++ ;
}
}
}
}
}
}

This is a mess :). I would recast it with a dispatch table (untested):

my %no;

sub do_keys {
my $ivalue = shift;
my $action = shift;

if ('HASH' eq ref $value) {
for (keys %$value) {
$action and $action->($_);
do_keys $value->{$_}, @_;
}
}
else {
$action and $action->($value);
}
}

do_keys \$list,
sub { print "Customer: $_[0]" },
sub { print " Type: $_[0]" },
sub { print " Product: $_[0]" },
undef,
undef,
sub { $no{$_[0]}++ };

This generates the frst report that is sorted by Customer and Type.
He wrote a second, almost identical script to re-parse all of the original
data into a new hash with the variables in customer-vendor-type order.

What I want to know is if there is
(*) an easier way to re-arrange the first hash (visualize taking a column
in a spreadsheet and moving it over, then resorting with the new
column order)?

The obvious, though maybe not the most efficient, way is to unwrap it
into a big list of records and wrap it up again:

# old order: customer type product appno vendor
# new order: customer vendor type product appno

my %new_list;

# customer is still first, so we can leave that

for my $cust (keys %list) {

my @records;
my %me;

do_keys $list{$cust},
sub { $me{type} = $_[0] },
sub { $me{prod} = $_[0] },
sub { $me{appn} = $_[0] },
sub { $me{vend} = $_[0] },
sub { push @records, { %me, code => $_[0] } }

for (@records) {
$new_list{ $cust }
{ $_->{vend} }
{ $_->{type} }
{ $_->{prod} }
{ $_->{appn} } = $_->{code};
}
}

Ben
 
B

Ben Morrow

Quoth Ben Morrow said:
This is a mess :). I would recast it with a dispatch table (untested)...:

my %no;

sub do_keys {
my $ivalue = shift;
^
....but without a typo (yes, I use vim) :).
my $action = shift;

if ('HASH' eq ref $value) { [...]

Quoth (e-mail address removed) (hymie!):
What I want to know is if there is
(*) an easier way to re-arrange the first hash (visualize taking a column
in a spreadsheet and moving it over, then resorting with the new
column order)?

The obvious, though maybe not the most efficient, way is to unwrap it
into a big list of records and wrap it up again:

....except there's no need to actually build the list.
# old order: customer type product appno vendor
# new order: customer vendor type product appno

my %new_list;

# customer is still first, so we can leave that

for my $cust (keys %list) {

my @records;

We don't need this temp array.
my %me;

do_keys $list{$cust},
sub { $me{type} = $_[0] },
sub { $me{prod} = $_[0] },
sub { $me{appn} = $_[0] },
sub { $me{vend} = $_[0] },
sub { push @records, { %me, code => $_[0] } }

Gah! another typo... clearly not concentrating :(.
Anyway, this last should have been:

sub {
$new_list{ $cust }
{ $me{vend} }
{ $me{type} }
{ $me{prod} }
{ $me{appn} } = $_[0];
};

for (@records) {

....and then we don't need to loop over the data a second time.

Ben
 
H

hymie!

In our last episode, the evil Dr. Lacto had captured our hero,
Quoth (e-mail address removed) (hymie!):

That term'll do fine... some people here use HoH, for Hash-of-Hash.
Also, a hash key that doesn't exist will return
a value of undef, which is zero in numeric context (with a warning you
can turn off)...

Excellent. That was one of my fears. :)
...so this whole lot can be replaced with the one line

$list{$customer}{$type}{$productCode}{$appNo}{$vendor}{$returnCode} = 1
if grep $_ == $returnCode, qw/130 150 385/;

If you need the keys of the last hash to be right, or you specifically
need the zeros, you could do

I don't specifically need the zeros as long as (like you said) an
undefined hash will return 0.
Why is the data stored like this at all? Surely it would be better to
store the return code straight in the hash, rather than have another
level with only one (significant) value?

Because a single set of customer-type-productcode-appno-vendor may have
more than one return code, and I need to track all of them that appear.

But I'll probaby switch it something like
$list{$customer}...{$vendor} .= "$returnCode:"
if grep $_ == $returnCode, qw/130 150 385/;
and then I can m// through the values later.
Also, what happens if the return code *isn't* in the list?

Then I can ignore that entire line of data.
This is not sorted. Did you simply mean 'grouped by', or should it be

Sorting isn't required.
Don't use printf when interpolation will do.

Oops. Oversight.
This should be a hash. Variables with systematically similar names
nearly always should be.

$no{385}++;

It's actually a little more complicated than that, but the hash is still
a good idea.
This is a mess :). I would recast it with a dispatch table (untested):

I apprecite the table, but this is probably beyond my ability to
maintain and troubleshoot. But it's a great idea and, if nothing else,
a learning exercise.
The obvious, though maybe not the most efficient, way is to unwrap it
into a big list of records and wrap it up again:

No, but somewhere along the line, you gave me an idea -- in short, when
I process
$list{$cust}{$type}{$prod}{$appno}{$vend}
I can create
$list2{$cust}{$vend}{$type}{$prod}{$appno}

Razors pain you / Rivers are damp
Acids stain you / And drugs cause cramp.
Guns aren't lawful / Nooses give
Gas smells awful / You might as well live.

Ooh! I'd been looking for that poem. Thank you.

hymie! http://www.smart.net/~hymowitz (e-mail address removed)
===============================================================================
 
B

Ben Morrow

Quoth (e-mail address removed) (hymie!):
In our last episode, the evil Dr. Lacto had captured our hero,


Because a single set of customer-type-productcode-appno-vendor may have
more than one return code, and I need to track all of them that appear.

But I'll probaby switch it something like
$list{$customer}...{$vendor} .= "$returnCode:"
if grep $_ == $returnCode, qw/130 150 385/;
and then I can m// through the values later.

Oooh, no, that's very shell :).
Use an array:

push @{ $list{...} }, $returnCode if ...;

Ben
 
P

Peter Scott

Quoth (e-mail address removed) (hymie!):

Oooh, no, that's very shell :).
Use an array:

push @{ $list{...} }, $returnCode if ...;

The condition is somewhat shell too :)

... if $returnCode =~ /^(130|150|385)\Z/;
... if {map {$=>1} (130,150,385)}->{$returnCode};

Although it would be more maintainable to put something like

my %Ok_Return_Code = map { $_ => 1 } (130, 150, 385);

in a configuration section and then just do

... if $Ok_Return_Code{$returnCode};
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top