Indirection in a hash

D

dn.perl

Here is a piece of code whose output is : 'miami21 and miami21' :

#!/usr/bin/perl

use strict ;

my %hash = () ;

push @{$hash{florida}}, 'miami', 'miami01', 'miami02' ;
push @{$hash{florida}}, 'miami', 'miami21', 'miami22' ;

my $name = ${$hash{florida}}[4] . " and " . $hash{florida}[4] ;
print "name = $name \n" ;


How are $hash{florida}[4] and ${$hash{florida}}[4] the same thing?
I would have expected that indirection (I hope that is the correct
word for it) to produce something unintelligible or something
meaningless.

TIA.
 
P

Paul Lalli

How are $hash{florida}[4]  and  ${$hash{florida}}[4]  the same thing?
I would have expected that indirection (I hope that is the correct
word for it) to produce something unintelligible or something
meaningless.

%hash is a hash.

$hash{florida} is a reference to an array.

@{$hash{florida}} is the array that $hash{$forida} references

${$hash{florida}}[4] is the 5th element of the array @{$hash{florida}}

$hash{forida}->[4] is the alternative "arrow syntax" notation for the
5th element of the array @{$hash{florida}}

$hash{florida}[4] is the same syntax, taking advantage of the face
that when an arrow separates two bracket/braces, it can be dropped.


perldoc perlreftut's Use Rule #1 and Use Rule #2 would be very good
things for you to read.

Paul Lalli
 
J

Jens Thoms Toerring

Here is a piece of code whose output is : 'miami21 and miami21' :
#!/usr/bin/perl
use strict ;
my %hash = () ;
push @{$hash{florida}}, 'miami', 'miami01', 'miami02' ;
push @{$hash{florida}}, 'miami', 'miami21', 'miami22' ;
my $name = ${$hash{florida}}[4] . " and " . $hash{florida}[4] ;
print "name = $name \n" ;
How are $hash{florida}[4] and ${$hash{florida}}[4] the same thing?
I would have expected that indirection (I hope that is the correct
word for it) to produce something unintelligible or something
meaningless.

There's a special rule that '$hash{florida}[4]' is understood
to mean '$hash{florida}->[4]' and that's why you get the same
value as from '${hash{florida}}[4]'. The same holds if you have
a hash reference instead of an array reference, e.g. with

$hash{x} = { age => 7, name => 'Marvin };
print $hash{x}{age};

This will give you 7 since it's treated as if you had written
'$hash{x}->[age}' which is the same as ${$hash{x}}{age}'.

Or another example where there's a shortcut

$a = [ [ 1, 2, 3 ], [ 4, 6, 7 ] ];
print $b->[1][1];

will print 5 since it's taken to mean '$b->[1]->[1] or
${$b->[1]}[1]}' or '${$$b[1]}[1]' or '${${$b}[1]}[1]';-)

All this follows the same rule: the arrow operator is
optional between brackets and braces (or between a closing
bracket or brace and a paranthesis for an indirect function
call).

Most often it's probably used for "multidimensional arrays"
(I put that in parantheses since there aren't real multi-
dimensional arrays in Perl but instead you use arrays of
array references to emulate them). So when you have e.g.

@a = ( [ 1, 2, 3 ], [ 4, 5, 6 ] );

without that rule you would have to use one of

$a[1]->[1] or ${$a[1]}[1]

to get at the second element of the second array. But know
you can it write it instead as

$a[1][1]

which will make it look more like a normal multi-dimensional
array for people used to e.g. C.

Regards, Jens
 
C

C.DeRykus

Here is a piece of code whose output is : 'miami21 and miami21' :

#!/usr/bin/perl

use strict ;

my %hash = () ;

push @{$hash{florida}}, 'miami', 'miami01', 'miami02' ;
push @{$hash{florida}}, 'miami', 'miami21', 'miami22' ;

my $name = ${$hash{florida}}[4] . " and " . $hash{florida}[4] ;
print "name = $name \n" ;

How are $hash{florida}[4]  and  ${$hash{florida}}[4]  the same thing?
I would have expected that indirection (I hope that is the correct
word for it) to produce something unintelligible or something
meaningless.

If this had been a named array: you could just say:

push @array, ....
push @array, 'miami', 'miami21', 'miami22'

and later reference one of the array's members:

$array[4]


But with an anonymous array, you just need to replace the
named array with an anonymous ref you create on the fly:

push @{$hash{florida}}, 'miami', 'miami21', 'miami22'

And then later refer to the anonymous array member with:

${$hash{florida}}[4]


For details: perldoc perlref, perlreftut, and/or perldsc .

Data::Dumper can help with a peek under the covers too:

use Data::Dumper; ... etc; print Dumper \%hash;
 
M

Mart van de Wege

"multidimensional arrays" (I put that in parantheses since there
aren't real multi- dimensional arrays in Perl but instead you use
arrays of array references to emulate them).

(ITYM quotes, not parentheses).

What is the difference? Surely this is the most obvious way to implement
multidimensional arrays on a lower level?

And if Perl's "array of references" is functionally equivalent to a
multidimensional array (and it is), then isn't it a multidimensional
array?

Mart
 
U

Uri Guttman

MvdW> (ITYM quotes, not parentheses).

MvdW> What is the difference? Surely this is the most obvious way to implement
MvdW> multidimensional arrays on a lower level?

MvdW> And if Perl's "array of references" is functionally equivalent to a
MvdW> multidimensional array (and it is), then isn't it a multidimensional
MvdW> array?

by the ways classic compiled langs like c does them, then no, perl
doesn't support multidimensional arrays. in those langs you usually
predeclare all the array's dimensions and you can directly access
elements with a fast calculation of the correct offset given the
indices. in perl's version you have to loop over each dimension then
index to get the next level down. it is linear growth with the number of
dimensions so it is slower. the advantage of perl's design is that you
can mix and match arrays and hashes in such a data tree and create it
dynamically without knowing any dimensions in advance. this is worth the
extra cost in speed IMO as real world data is rarely fixed in dimensions
at compile time.

uri
 
J

Jürgen Exner

Mart van de Wege said:
(ITYM quotes, not parentheses).

What is the difference? Surely this is the most obvious way to implement
multidimensional arrays on a lower level?

Actually no because individual elements cannot be accessed directly with
O(1) by pre-calculating the memory location. Instead in Perl the
low-level implementation has to follow the reference chain for each and
every access.
And if Perl's "array of references" is functionally equivalent to a
multidimensional array (and it is), then isn't it a multidimensional
array?

That is debateable. Depending upon which level of abstraction you are
talking about a yes is as justifiable as a no.

jue
 
M

Mart van de Wege

Uri Guttman said:
MvdW> (ITYM quotes, not parentheses).

MvdW> What is the difference? Surely this is the most obvious way to implement
MvdW> multidimensional arrays on a lower level?

MvdW> And if Perl's "array of references" is functionally equivalent to a
MvdW> multidimensional array (and it is), then isn't it a multidimensional
MvdW> array?

by the ways classic compiled langs like c does them, then no, perl
doesn't support multidimensional arrays. in those langs you usually
predeclare all the array's dimensions and you can directly access
elements with a fast calculation of the correct offset given the
indices. in perl's version you have to loop over each dimension then
index to get the next level down.

Surely this is only true if the dimensions are of unknown size. If your data
model is such that you expect to have fixed size arrays, you can still
directly index into it, even with Perl's model.

I agree it is not common, but functionally multidimensional arrays are
possible to implement with Perl's "array of references" model.

Mart
 
M

Mart van de Wege

Jürgen Exner said:
Actually no because individual elements cannot be accessed directly with
O(1) by pre-calculating the memory location.

Erm.

'Most obvious' != 'Most efficient'.

Certainly pre-calculating is faster, and can't be done in Perl because
the size of the dimensions is unknown, but that's a compiler issue. From
the point of view of the programmer it is quite possible to do
multidimensional arrays, even if the syntax is a bit awkward at times.
That is debateable. Depending upon which level of abstraction you are
talking about a yes is as justifiable as a no.
Obviously. And I agree that from the Perl implementer's view it is not
the same.

I am however not a Perl internals hacker. I am a sysadmin who does a bit
of Perl programming to automate tasks and to support business. For me,
if my data model consists of matrix-like structures, an array of
references to arrays looks just like a multidimensional array.

Mart
 
U

Uri Guttman

MvdW> What is the difference? Surely this is the most obvious way to implement
MvdW> multidimensional arrays on a lower level?MvdW> And if Perl's "array of references" is functionally equivalent to a
MvdW> multidimensional array (and it is), then isn't it a multidimensional
MvdW> array?
MvdW> Surely this is only true if the dimensions are of unknown
MvdW> size. If your data model is such that you expect to have fixed
MvdW> size arrays, you can still directly index into it, even with
MvdW> Perl's model.

no you can't. with perl all array accesses require looping over each
dimension. there is no way to directly access a lower level. the code
looks like it does direct access but internally it does a loop over the
indices.

MvdW> I agree it is not common, but functionally multidimensional arrays are
MvdW> possible to implement with Perl's "array of references" model.

that isn't the issue. you can get something which looks and behaves like
a multidim array but it really isn't. this is why i always review code
that repeats deep access code and tell people to factor out the
duplicate code. in fortran and other langs, deep repeated accesses get
the common code optimized out of the loop by common subexpression
removal. perl can't do that due to its dynamic nature so the coder has
to do it. this is where perl's dynamic arrays show that their internal
design is different than fortran and c's.

uri
 
M

Mart van de Wege

Uri Guttman said:
MvdW> What is the difference? Surely this is the most obvious way to implement
MvdW> multidimensional arrays on a lower level?
MvdW> And if Perl's "array of references" is functionally equivalent to a
MvdW> multidimensional array (and it is), then isn't it a multidimensional
MvdW> array?

MvdW> Surely this is only true if the dimensions are of unknown
MvdW> size. If your data model is such that you expect to have fixed
MvdW> size arrays, you can still directly index into it, even with
MvdW> Perl's model.

no you can't. with perl all array accesses require looping over each
dimension. there is no way to directly access a lower level. the code
looks like it does direct access but internally it does a loop over the
indices.

I think we're talking past each other. That the compiler can't directly
access a value does not mean that the language syntax doesn't allow
it. It merely makes this an inefficient operation.

This also means that if your data model heavily depends on two- or
more-dimensional arrays, Perl may not be the right tool for the job,
or your data model needs rethinking.

Mart
 
J

Jürgen Exner

Mart van de Wege said:
Erm.

'Most obvious' != 'Most efficient'.

Certainly pre-calculating is faster, and can't be done in Perl because
the size of the dimensions is unknown, but that's a compiler issue.

True. But when creating a compiler you thrive for efficieny of the
generated code. Therefore the non-efficient way is usually not the
obvious choice for a compiler architect.
I am however not a Perl internals hacker. I am a sysadmin who does a bit
of Perl programming to automate tasks and to support business. For me,
if my data model consists of matrix-like structures, an array of
references to arrays looks just like a multidimensional array.

Then the design goal for Perl arrays works very well for you.

However there are situations where the wrong mental model can bite you.
One example:
@fourthRow = @multiArray[4];
In a true multi-dimensional model @multiArray[4] would yield the whole
fourth (actually fifth but let's not go there) row of the
multi-dimensional array.
In Perl however you get a result array with a single element: the
reference to the sub-array.

Same vice-versa:
@multiArray[4] = @fourthRow;
does not assign the fourth (fifth) row of multiArray but puts the length
of @fourthRow into the fourth element of multiArray.

To me that is a very relevant difference and actually questions like
this pop up here every now and then when newbies to Perl who have never
dealt with Perl references before are lost as to what is happening.

jue
 
U

Uri Guttman

MvdW> What is the difference? Surely this is the most obvious way to implement
MvdW> multidimensional arrays on a lower level?MvdW> And if Perl's "array of references" is functionally equivalent to a
MvdW> multidimensional array (and it is), then isn't it a multidimensional
MvdW> array?MvdW> Surely this is only true if the dimensions are of unknown
MvdW> size. If your data model is such that you expect to have fixed
MvdW> size arrays, you can still directly index into it, even with
MvdW> Perl's model.
MvdW> I think we're talking past each other. That the compiler can't directly
MvdW> access a value does not mean that the language syntax doesn't allow
MvdW> it. It merely makes this an inefficient operation.

no, i get both sides very well. i worked in a compiler company way back
and know perl well too. i was just emphasizing the difference between
the abstraction of multidim arrays which perl supports and the
implementation which is slower in perl. it does matter sometimes and you
can't just say perl supports multidim arrays without that context.

MvdW> This also means that if your data model heavily depends on two- or
MvdW> more-dimensional arrays, Perl may not be the right tool for the job,
MvdW> or your data model needs rethinking.

not really. speed is only an issue if it becomes an issue. i have dealt
with largish data trees (my term for mixed hash/array structures)
without any issues. the flexibility overrides any speed issues i may
have seen. but i still do my own optimization by factoring out common
deep accesses. i see this way too often:

$x = $foo{bar}[1]{baz}[0] ;
$y = $foo{bar}[1]{baz}[1] ;

and so on. the higher level accesses would be factored out by a good
optimizing compiler in fortran but not in perl. and it is also harder to
read and edit. and when you remove the common stuff, you end up with
access to a simpler shallower tree which is easy to code and
handle. just one of the things i try to teach and the reasons are both
speed and code clarity. just because you can write code like the above
and deal directly with multidim structures doesn't mean you should do it
that way. the data can still be in that form but you code it to make
accessing it better and faster.

uri
 
M

Mart van de Wege

Uri Guttman said:
MvdW> I think we're talking past each other. That the compiler can't directly
MvdW> access a value does not mean that the language syntax doesn't allow
MvdW> it. It merely makes this an inefficient operation.

no, i get both sides very well. i worked in a compiler company way back
and know perl well too.

I have written code in low-level languages like Forth and 6502 assembly,
where I had to implement common data structures myself, so I know a bit
about this subject too. You obviously know a lot more, but I agree in
principle with your points.
i was just emphasizing the difference between the abstraction of
multidim arrays which perl supports and the implementation which is
slower in perl. it does matter sometimes and you can't just say perl
supports multidim arrays without that context.
I tend to take that context for granted, as I am fairly comfortable in
Perl. But you are right, it is not exactly 'walks like a duck, quacks
like a duck'. It is damn close though.

Mart
 
P

Peter J. Holzer

Actually no because individual elements cannot be accessed directly with
O(1) by pre-calculating the memory location. Instead in Perl the
low-level implementation has to follow the reference chain for each and
every access.

Unless you are talking about constant indexes, the complexity is the
same: O(n) for n dimensions in both cases.

But the operations are different: With a multidimensional array, it's a
multiplication and an addition for each dimension; with a nested array
of references (that's called an Iliffe vector, btw) it's a memory
reference and an addition for each dimension. So which one is faster
depends on the relative speed of a memory access and a multiplication.

On modern CPUs, a multiplication is a lot faster than a memory access,
so multidimensional arrays are faster. However, 20+ years ago that wasn't
so clear: In one of our introductory programming courses we had to
implement some algorithm with both multidimensional arrays and Iliffe
vectors and compare speed and memory consumption. The example was to
show that it's a tradeoff (the rows were of different lengths, so the
Iliffe vector saved space), but we didn't implement in on a VAX (as the
assistand had) but on an 8086, which had slow multiplication (something
like 100-150 cycles, IIRC) and fast memory accesses: So the Iliffe
implementation was both smaller and faster ;-).

That is debateable. Depending upon which level of abstraction you are
talking about a yes is as justifiable as a no.

Consider this example:

@a = ([1, 2], [3, 4]);
@b = $a[1];

If @a was a two-dimensional array (or an array of arrays), then I would
expect @b to contain (3, 4) after this snippet. But it doesn't. Instead
@b contains only one element which is a reference to an array. I have to
explicitely derefence $a[1]:

@b = @{ $a[1] };

For me that is sufficiently different from the semantics of a
multidimensional array that I cannot view Perl Iliffe arrays of
references as multidimensional arrays.

Note that this is a difference at the language level - it has nothing to
do with the implementation.

hp
 
P

Peter J. Holzer

MvdW> Surely this is only true if the dimensions are of unknown
MvdW> size. If your data model is such that you expect to have fixed
MvdW> size arrays, you can still directly index into it, even with
MvdW> Perl's model.

no you can't. with perl all array accesses require looping over each
dimension.

I think "looping over each dimension" may be ambiguous. To clarify,
consider a 3-dimensional array with n_x * n_y * n_z elements.

With a multidimensional array, to access a[x][y][z], a C compiler would
generate code like this (pseudo assembly):

r1 <- x * n_y
r1 <- r1 + y
r1 <- r1 * n_z
r1 <- r1 + z
r1 <- a[r1]

With an Iliffe vector, the code would look like this:

r1 <- a[x]
r1 <- r1[y]
r1 <- r1[z]

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top