rearrange "columns" of a multi-level hash?

hymie! · Jun 14, 2004

Greetings. I don't have that special knack for properly forming a
Google search that will give me the answer I seek, so I apologize if
I'm asking an old question.

I'm taking over a project from a co-worker.

We are processing a file that has information in it:
customer vendor transType productCode appNumber resultCode

We have to prepare 2 reports from this data.

Without too much detail, the first report is sorted by Customer, then by
TransactionType, then by ProductCode, and then by resultCode, with a count
of the number of lines that match each configuration.

The second report is similar, but is sorted by Customer, then by Vendor,
then by TransType, ProdCode, and resultCode.

Co-worker wrote the script with (I don't know the correct term) a multi-
level hash:

unless( exists $list{ $customer } )
{
$list{ $customer } = () ;
}
unless( exists $list{ $customer }{ $type } )
{
$list{ $customer }{ $type } = () ;
}
and so on, down to
unless( exists $list{$customer}{$type}{$productCode}{$appNo}{$vendor} )
{
$list{ $customer }{ $type }{ $productCode }{ $appNo }{ $vendor } =
{ 130 => 0,
150 => 0,
385 => 0 } ;
}
if( exists $list{$customer}{$type}{$productCode}{$appNo}
{$vendor}{$returnCode} )
{
$list{$customer}{$type}{$productCode}{$appNo}{$vendor}{$returnCode} = 1
}

Then he reads the data thusly:

foreach $customer ( keys(%list) )
{
print "Customer: $customer\n\n" ;
foreach $type ( keys(%{$list{ $customer }}) )
{
printf "\n Type: %s", $type ;
foreach $productCode ( keys(%{$list{ $customer }{ $type }}) )
{
printf "\n Product: %s\n", $productCode ;
foreach $appNo (keys(%{$list{ $customer }{ $type }{ $productCode }}))
{
foreach $vendor (keys(%{$list{$customer}{$type}
{$productCode}{$appNo}}))
{
if( $list{ $customer }{ $type }{ $productCode }
{ $appNo }{ $vendor }{385})
{
$no385++ ;
}
if( $list{ $customer }{ $type }{ $productCode }
{ $appNo }{ $vendor }{150})
{
$no150++ ;
}
}
}
}
}
}

This generates the frst report that is sorted by Customer and Type.
He wrote a second, almost identical script to re-parse all of the original
data into a new hash with the variables in customer-vendor-type order.

What I want to know is if there is
(*) an easier way to re-arrange the first hash (visualize taking a column
in a spreadsheet and moving it over, then resorting with the new
column order)?
(*) a better/easier way to start from scratch?

Thanks.

hymie! http://www.smart.net/~hymowitz (e-mail address removed)
===============================================================================

Ben Morrow · Jun 14, 2004

Quoth (e-mail address removed) (hymie!):

We are processing a file that has information in it:
customer vendor transType productCode appNumber resultCode

We have to prepare 2 reports from this data.

Without too much detail, the first report is sorted by Customer, then by
TransactionType, then by ProductCode, and then by resultCode, with a count
of the number of lines that match each configuration.

The second report is similar, but is sorted by Customer, then by Vendor,
then by TransType, ProdCode, and resultCode.

Co-worker wrote the script with (I don't know the correct term) a multi-
level hash:

That term'll do fine... some people here use HoH, for Hash-of-Hash.

unless( exists $list{ $customer } )
{
$list{ $customer } = () ;
}
unless( exists $list{ $customer }{ $type } )
{
$list{ $customer }{ $type } = () ;
}
and so on, down to
unless( exists $list{$customer}{$type}{$productCode}{$appNo}{$vendor} )
{
$list{ $customer }{ $type }{ $productCode }{ $appNo }{ $vendor } =
{ 130 => 0,
150 => 0,
385 => 0 } ;
}

None of this is necessary. What was *meant* here, I think, is to create
a new anon hash for each level; in that case it should have been '= {}'
not '= ()'. In any case, if you treat an undefined variable as though
it's got a hashref in it, Perl will create a new anon hash and put a ref
to it in there for you. Also, a hash key that doesn't exist will return
a value of undef, which is zero in numeric context (with a warning you
can turn off)...

if( exists $list{$customer}{$type}{$productCode}{$appNo}
{$vendor}{$returnCode} )
{
$list{$customer}{$type}{$productCode}{$appNo}{$vendor}{$returnCode} = 1
}

....so this whole lot can be replaced with the one line

$list{$customer}{$type}{$productCode}{$appNo}{$vendor}{$returnCode} = 1
if grep $_ == $returnCode, qw/130 150 385/;

If you need the keys of the last hash to be right, or you specifically
need the zeros, you could do

my @validCodes = qw/130 150 385/;

for ($list{$customer}...{$vendor}) {
@{$_}{@validCodes} = (0) x @validCodes;

grep $_ == $returnCode, @validCodes
and $_->{$returnCode} = 1;
}

The expression @{$_}{@validCodes} is perhaps a little confusing: the
first {} are for disambiguation, the second are a hash slice. Compare
with @hash{@validCodes}.

Why is the data stored like this at all? Surely it would be better to
store the return code straight in the hash, rather than have another
level with only one (significant) value?

$list{$customer}...{$vendor} = $returnCode
if grep $_ == $returnCode, qw/130 150 385/;

Also, what happens if the return code *isn't* in the list? Is the list
supposed to be exhaustive (in which case you can simply strip the greps
out of the above)?

Then he reads the data thusly:

(unnecessary use of -ly: 'thus' means 'like this' all by itself)

foreach $customer ( keys(%list) )

This is not sorted. Did you simply mean 'grouped by', or should it be

for $customer (sort keys %list)
?

{
print "Customer: $customer\n\n" ;
foreach $type ( keys(%{$list{ $customer }}) )
{
printf "\n Type: %s", $type ;

Don't use printf when interpolation will do.

foreach $productCode ( keys(%{$list{ $customer }{ $type }}) )
{
printf "\n Product: %s\n", $productCode ;
foreach $appNo (keys(%{$list{ $customer }{ $type }{ $productCode }}))
{
foreach $vendor (keys(%{$list{$customer}{$type}
{$productCode}{$appNo}}))
{
if( $list{ $customer }{ $type }{ $productCode }
{ $appNo }{ $vendor }{385})
{
$no385++ ;

This should be a hash. Variables with systematically similar names
nearly always should be.

$no{385}++;

I would use a hash rather than an array even though the keys are
numeric because they are sparse.

}
if( $list{ $customer }{ $type }{ $productCode }
{ $appNo }{ $vendor }{150})
{
$no150++ ;
}
}
}
}
}
}

This is a mess

. I would recast it with a dispatch table (untested):

my %no;

sub do_keys {
my $ivalue = shift;
my $action = shift;

if ('HASH' eq ref $value) {
for (keys %$value) {
$action and $action->($_);
do_keys $value->{$_}, @_;
}
}
else {
$action and $action->($value);
}
}

do_keys \$list,
sub { print "Customer: $_[0]" },
sub { print " Type: $_[0]" },
sub { print " Product: $_[0]" },
undef,
undef,
sub { $no{$_[0]}++ };

This generates the frst report that is sorted by Customer and Type.
He wrote a second, almost identical script to re-parse all of the original
data into a new hash with the variables in customer-vendor-type order.

What I want to know is if there is
(*) an easier way to re-arrange the first hash (visualize taking a column
in a spreadsheet and moving it over, then resorting with the new
column order)?

The obvious, though maybe not the most efficient, way is to unwrap it
into a big list of records and wrap it up again:

# old order: customer type product appno vendor
# new order: customer vendor type product appno

my %new_list;

# customer is still first, so we can leave that

for my $cust (keys %list) {

my @records;
my %me;

do_keys $list{$cust},
sub { $me{type} = $_[0] },
sub { $me{prod} = $_[0] },
sub { $me{appn} = $_[0] },
sub { $me{vend} = $_[0] },
sub { push @records, { %me, code => $_[0] } }

for (@records) {
$new_list{ $cust }
{ $_->{vend} }
{ $_->{type} }
{ $_->{prod} }
{ $_->{appn} } = $_->{code};
}
}

Ben

Ben Morrow · Jun 14, 2004

Quoth Ben Morrow said:
This is a mess . I would recast it with a dispatch table (untested)...:

my %no;

sub do_keys {
my $ivalue = shift;

^
....but without a typo (yes, I use vim)

.

my $action = shift;

if ('HASH' eq ref $value) { [...]

Quoth (e-mail address removed) (hymie!):

What I want to know is if there is
(*) an easier way to re-arrange the first hash (visualize taking a column
in a spreadsheet and moving it over, then resorting with the new
column order)?

Click to expand...

The obvious, though maybe not the most efficient, way is to unwrap it
into a big list of records and wrap it up again:

....except there's no need to actually build the list.

# old order: customer type product appno vendor
# new order: customer vendor type product appno

my %new_list;

# customer is still first, so we can leave that

for my $cust (keys %list) {

my @records;

We don't need this temp array.

my %me;

do_keys $list{$cust},
sub { $me{type} = $_[0] },
sub { $me{prod} = $_[0] },
sub { $me{appn} = $_[0] },
sub { $me{vend} = $_[0] },
sub { push @records, { %me, code => $_[0] } }

Gah! another typo... clearly not concentrating

.
Anyway, this last should have been:

sub {
$new_list{ $cust }
{ $me{vend} }
{ $me{type} }
{ $me{prod} }
{ $me{appn} } = $_[0];
};

for (@records) {

....and then we don't need to loop over the data a second time.

Ben

hymie! · Jun 14, 2004

In our last episode, the evil Dr. Lacto had captured our hero,

Quoth (e-mail address removed) (hymie!):

That term'll do fine... some people here use HoH, for Hash-of-Hash.

Also, a hash key that doesn't exist will return
a value of undef, which is zero in numeric context (with a warning you
can turn off)...

Excellent. That was one of my fears.

...so this whole lot can be replaced with the one line

$list{$customer}{$type}{$productCode}{$appNo}{$vendor}{$returnCode} = 1
if grep $_ == $returnCode, qw/130 150 385/;

If you need the keys of the last hash to be right, or you specifically
need the zeros, you could do

I don't specifically need the zeros as long as (like you said) an
undefined hash will return 0.

Why is the data stored like this at all? Surely it would be better to
store the return code straight in the hash, rather than have another
level with only one (significant) value?

Because a single set of customer-type-productcode-appno-vendor may have
more than one return code, and I need to track all of them that appear.

But I'll probaby switch it something like
$list{$customer}...{$vendor} .= "$returnCode:"
if grep $_ == $returnCode, qw/130 150 385/;
and then I can m// through the values later.

Also, what happens if the return code *isn't* in the list?

Then I can ignore that entire line of data.

This is not sorted. Did you simply mean 'grouped by', or should it be

Sorting isn't required.

Don't use printf when interpolation will do.

Oops. Oversight.

This should be a hash. Variables with systematically similar names
nearly always should be.

$no{385}++;

It's actually a little more complicated than that, but the hash is still
a good idea.

This is a mess . I would recast it with a dispatch table (untested):

I apprecite the table, but this is probably beyond my ability to
maintain and troubleshoot. But it's a great idea and, if nothing else,
a learning exercise.

The obvious, though maybe not the most efficient, way is to unwrap it
into a big list of records and wrap it up again:

No, but somewhere along the line, you gave me an idea -- in short, when
I process
$list{$cust}{$type}{$prod}{$appno}{$vend}
I can create
$list2{$cust}{$vend}{$type}{$prod}{$appno}

Razors pain you / Rivers are damp
Acids stain you / And drugs cause cramp.
Guns aren't lawful / Nooses give
Gas smells awful / You might as well live.

Ooh! I'd been looking for that poem. Thank you.

hymie! http://www.smart.net/~hymowitz (e-mail address removed)
===============================================================================

Ben Morrow · Jun 14, 2004

Quoth (e-mail address removed) (hymie!):

In our last episode, the evil Dr. Lacto had captured our hero,

Because a single set of customer-type-productcode-appno-vendor may have
more than one return code, and I need to track all of them that appear.

But I'll probaby switch it something like
$list{$customer}...{$vendor} .= "$returnCode:"
if grep $_ == $returnCode, qw/130 150 385/;
and then I can m// through the values later.

Oooh, no, that's very shell

.
Use an array:

push @{ $list{...} }, $returnCode if ...;

Ben

Peter Scott · Jun 15, 2004

Quoth (e-mail address removed) (hymie!):

Oooh, no, that's very shell .
Use an array:

push @{ $list{...} }, $returnCode if ...;

The condition is somewhat shell too

... if $returnCode =~ /^(130|150|385)\Z/;
... if {map {$=>1} (130,150,385)}->{$returnCode};

Although it would be more maintainable to put something like

my %Ok_Return_Code = map { $_ => 1 } (130, 150, 385);

in a configuration section and then just do

... if $Ok_Return_Code{$returnCode};

having trouble with hash of arrays...	12	Jul 3, 2013
Multi-level list generation	21	Aug 21, 2010
Sorting hash of hashes	3	Nov 21, 2011
sorting by prior value in a deeply nested hash	1	Nov 6, 2013
traversing a hash structure	3	Aug 29, 2011
data to hash	1	Jul 29, 2011
FAQ 4.60 How do I sort a hash (optionally by value instead of key)?	0	Mar 14, 2011
Read a hash	5	Apr 28, 2011

rearrange "columns" of a multi-level hash?

hymie!

Ben Morrow

Ben Morrow

hymie!

Ben Morrow

Peter Scott

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads