hash of arrays


R

removeps groups

My perl script reads lines of a file, or rows of a database. The first column is the key (the id, an integer), and for each distinct key an entry should be added into the hashmap. The second column is a string (the name). The third and fourth columns will go into the first and second array of thekey-value, respectively. So if the file is (the 4 columns are tab-separated):

1 First Id 2 3
1 First Id 2 4
2 Second Id 3 4

the hash will contain two items. The first item has key 1. The value has one string with value "First Id", two arrays, and the first array is (2,2) and the second is (3,4). The second item has key 2. The value has one string with value "Second Id", two arrays, (3) and (4). In Java the structurewould be Map<Integer, Triplet<String, List<Integer>, List<Integer>>>.

To model this data structure in perl I had to do stuff like the following, which works on Strawberry Perl and Linux Perl:

my %data; # map of id to [ Name, ArrayOfInt, ArrayOfInt ]
my $value = $data{$id};
my @tmpvalue = ($name, [], []);
$value = \@tmpvalue;
$data{$id} = $value;
push(@{$$value[1]}, $firstInt);

I don't know why the particular combination of $ @ {} \ works, but it does.Questions are:

(1) Is this the most efficient way?
(2) Why does it work?

The full script is below


#!/usr/bin/perl

use strict;
use warnings;

my %data; # map of id to [ Name, ArrayOfInt, ArrayOfInt ]

open sqlData, "hash-rows.txt" || die $!;

while (<sqlData>)
{
chomp $_;
my @row = split '\t', $_;

my $id = $row[0];
my $name = $row[1];
my $firstInt = $row[2];
my $secondInt = $row[3];

my $value = $data{$id};
if (not defined $value)
{
my @tmpvalue = ($name, [], []);
$value = \@tmpvalue;
$data{$id} = $value;
}
push(@{$$value[1]}, $firstInt);
push(@{$$value[2]}, $secondInt);
}

foreach my $id (keys %data)
{
my $value = $data{$id};
my $name = $$value[0];
my @firstInts = @{$$value[1]};
my @secondInts = @{$$value[2]};
print "ENTRY\n id=$id\n name=$name\n firstInts=(@firstInts)\n secondInts=(@secondInts)\n";
}


ENTRY
id=1
name=First
firstInts=(Id Id)
secondInts=(2 2)
ENTRY
id=2
name=Second
firstInts=(Id)
secondInts=(3)
 
Ad

Advertisements

J

Jim Gibson

removeps groups said:
My perl script reads lines of a file, or rows of a database...
To model this data structure in perl I had to do stuff like the following,
which works on Strawberry Perl and Linux Perl:

my %data; # map of id to [ Name, ArrayOfInt, ArrayOfInt ]
my $value = $data{$id};
my @tmpvalue = ($name, [], []);
$value = \@tmpvalue;
$data{$id} = $value;
push(@{$$value[1]}, $firstInt);

I don't know why the particular combination of $ @ {} \ works, but it does.
Questions are:

(1) Is this the most efficient way?

The data structure is efficient, but you have some unnecessary
assignment operations in the code above. For example, you assign to the
variable $value twice, without using the first value.
(2) Why does it work?

Because of Perl's dereferencing and precedence rules. If you want a
more detailed explanation, you need to specify more exactly what "it"
is.

I find the expression @{$$value[1]} somewhat ambiguous and difficult to
parse.

$value is a reference to an array. $value->[0] would be the first
element of that array, as would ${$value}[0].

$$value[1] could be interpreted as either ${$value[1]} or ${$value}[1],
depending upon Perl's precedence rules. The former is dereferencing the
second element of the @value array. The latter is the second element of
the anonymous array referenced by the scalar $value. Since
dereferencing a scalar has highest precedence, Perl will do the latter.

If you don't want to learn Perl's precedence rules, then just use the
arrow notation and explicit braces ({}). That is what I do.

push( @{ $data{$id}->[1] }, $firstInt );

From the inside out:

1. %data is a hash
2. $id is a key for that $hash
3. $data{$id} is the value associated with that key, a reference to an
anonymous array.
4. $data{$id}->[1] is the second element of that array, a reference to
another anonymous array
5. @{$data{$id}->[1]} is that anonymous array
6. push( ${$data{$id}->[1]}, $firstInt ) is pushing the value from
$firstInt onto the end of that array.

The full script is below


#!/usr/bin/perl

use strict;
use warnings;

my %data; # map of id to [ Name, ArrayOfInt, ArrayOfInt ]

open sqlData, "hash-rows.txt" || die $!;

while (<sqlData>)
{
chomp $_;
my @row = split '\t', $_;

my $id = $row[0];
my $name = $row[1];
my $firstInt = $row[2];
my $secondInt = $row[3];

You can also use:

my( $id, $name, $firstInt, $secondInt ) = split '\t', $_;

or just

my( $id, $name, $firstInt, $secondInt ) = split;
my $value = $data{$id};
if (not defined $value)

It is better to use the exists function for testing hashes for the
presence of a key,value pair.
{
my @tmpvalue = ($name, [], []);
$value = \@tmpvalue;
$data{$id} = $value;

You can replace the 3 lines above with:

$data{$id} = [ $name, [], [] ];
push(@{$$value[1]}, $firstInt);

push( @{$value->[1]}, $firstInt );
push(@{$$value[2]}, $secondInt);

push( @{$value->[2]}, $secondInt );
}

foreach my $id (keys %data)
{
my $value = $data{$id};
my $name = $$value[0];

my $name = $value->[0];

etc.
my @firstInts = @{$$value[1]};
my @secondInts = @{$$value[2]};
print "ENTRY\n id=$id\n name=$name\n firstInts=(@firstInts)\n
secondInts=(@secondInts)\n";
}


ENTRY
id=1
name=First
firstInts=(Id Id)
secondInts=(2 2)
ENTRY
id=2
name=Second
firstInts=(Id)
secondInts=(3)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top