Deep structure access and temp variable

A

Adrien BARREAU

Hi all.


A simple question, but I have no clue about it:

# Version 1
my $A = $stuff->{a}{b}{c}{d}{e};
my $B = $stuff->{a}{b}{c}{d}{f};
my $C = $stuff->{a}{b}{c}{d}{g};

# Version 2

my $tmp = $stuff->{a}{b}{c}{d};
my $A = $tmp->{e};
my $B = $tmp->{f};
my $C = $tmp->{g};


What is the most efficient?
What appears to be the most idiomatic?


I don't have the knowledge about Perl guts to properly guess here.
Of course, it does not have be only HASHes refs in the structure.


Adrien.
 
R

Rainer Weikusat

Adrien BARREAU said:
A simple question, but I have no clue about it:

# Version 1
my $A = $stuff->{a}{b}{c}{d}{e};
my $B = $stuff->{a}{b}{c}{d}{f};
my $C = $stuff->{a}{b}{c}{d}{g};

# Version 2

my $tmp = $stuff->{a}{b}{c}{d};
my $A = $tmp->{e};
my $B = $tmp->{f};
my $C = $tmp->{g};


What is the most efficient?

This depends on the relative cost of the additional assignment versus
that of the dereferencing chain. I usually follow the rule of thumb
that using an intermediate variable only makes sense if it is going to
be used for at least three times.

But there's a more important concern: Readiblity of the code and the
ease with which it can be modified: Both suffer in case of 'obnoxious
repetition', as in your first example. Also, the intermediate variable
could have a more sensible name than $tmp which could communicate its
nature better the $stuff->{a}{b}{c}{d}.
What appears to be the most idiomatic?

Is you're asking about "what everyone else does" you can rest assured
that no problem which can be solved with copy'n'paste will be solved
with anything but copy'n'paste because this minimizes that time of
each individual 'trivial work unit' (also, people greatly prefer
moving over thingking).
 
J

Jürgen Exner

Adrien BARREAU said:
A simple question, but I have no clue about it:

# Version 1
my $A = $stuff->{a}{b}{c}{d}{e};
my $B = $stuff->{a}{b}{c}{d}{f};
my $C = $stuff->{a}{b}{c}{d}{g};

# Version 2

my $tmp = $stuff->{a}{b}{c}{d};
my $A = $tmp->{e};
my $B = $tmp->{f};
my $C = $tmp->{g};

What is the most efficient?

Most efficient in regard to what? No, I'm not joking but I really mean
it. I'm guessing you are referring to runtime but that is by no means
the rule-it-all. Actually, if you program is so slow that you have to
optimize on this micro-level, then you should look for a better
algorithm or a different programming language.

Having said that I would guess the runtime depends upon how large $tmp
is. In version 2 you are creating a copy of the data. This can be 'slow'
for large data. If you want a trustworthy answer you will have to
benchmark both versions using your(!) live date.

But that really doesn't matter that much because ....
What appears to be the most idiomatic?

.... much more important than runtime micro-optimization is
maintainability of your code. And assuming that you can find a more
meaningful name than $tmp, then the second version wins hands-down in
that department.
And that kind of efficiency wins in the long run.

jue
 
A

Adrien BARREAU

Quoth Adrien BARREAU said:
A simple question, but I have no clue about it:
[cut]

What is the most efficient?

Who cares? (Seriously, unless you've determined that you have a
performance problem and that this bit of code is responsible, it's not
worth compromising readability.)

Well, I do. I would not ask if I were not, would I?
I'm working on some piece of code which must run as fast as possible (in
Perl limits), hence the question.
And a bit of curiosity, too.
The second. One of the more important rules of programming is DRY:
'Don't Repeat Yourself'. The first repeats a whole lot of stuff
unnecessarily.

I agree.
That what I do for now. That was just in case.


Thanks :).


Adrien.
 
A

Adrien BARREAU

This depends on the relative cost of the additional assignment versus
that of the dereferencing chain. I usually follow the rule of thumb
that using an intermediate variable only makes sense if it is going to
be used for at least three times.

But there's a more important concern: Readiblity of the code and the
ease with which it can be modified: Both suffer in case of 'obnoxious
repetition', as in your first example. Also, the intermediate variable
could have a more sensible name than $tmp which could communicate its
nature better the $stuff->{a}{b}{c}{d}.


Is you're asking about "what everyone else does" you can rest assured
that no problem which can be solved with copy'n'paste will be solved
with anything but copy'n'paste because this minimizes that time of
each individual 'trivial work unit' (also, people greatly prefer
moving over thingking).


ACK.

Thanks :).

Adrien.
 
A

Adrien BARREAU

[cut]
Most efficient in regard to what? No, I'm not joking but I really mean
it. I'm guessing you are referring to runtime but that is by no means
the rule-it-all. Actually, if you program is so slow that you have to
optimize on this micro-level, then you should look for a better
algorithm or a different programming language.

Having said that I would guess the runtime depends upon how large $tmp
is. In version 2 you are creating a copy of the data. This can be 'slow'
for large data. If you want a trustworthy answer you will have to
benchmark both versions using your(!) live date.

But that really doesn't matter that much because ....


... much more important than runtime micro-optimization is
maintainability of your code. And assuming that you can find a more
meaningful name than $tmp, then the second version wins hands-down in
that department.
And that kind of efficiency wins in the long run.

ACK.

Thanks :).

Adrien.
 
J

Jim Gibson

Adrien BARREAU said:
A simple question, but I have no clue about it:

# Version 1
my $A = $stuff->{a}{b}{c}{d}{e};
my $B = $stuff->{a}{b}{c}{d}{f};
my $C = $stuff->{a}{b}{c}{d}{g};

# Version 2

my $tmp = $stuff->{a}{b}{c}{d};
my $A = $tmp->{e};
my $B = $tmp->{f};
my $C = $tmp->{g};

What is the most efficient?
What appears to be the most idiomatic?

I don't have the knowledge about Perl guts to properly guess here.
Of course, it does not have be only HASHes refs in the structure.

There is no need to guess. Perl has the Benchmark module that makes
doing these kind of comparisons relatively easy:

#!/usr/bin/perl
use warnings;
use strict;
use Benchmark qw(cmpthese);

my $stuff = { a => { b => { c => { d => { e => 1, f => 2, g => 3 }}}}};

cmpthese( 1_000_000, {
'Deep' => sub {
my $x = $stuff->{a}{b}{c}{d}{e};
my $y = $stuff->{a}{b}{c}{d}{f};
my $z = $stuff->{a}{b}{c}{d}{g};
},
'Temp' => sub {
my $tmp = $stuff->{a}{b}{c}{d};
my $x = $tmp->{e};
my $y = $tmp->{f};
my $z = $tmp->{g};
}});
__END__

Results from five runs:

41% perl barreau.pl
Rate Deep Temp
Deep 591716/s -- -63%
Temp 1612903/s 173% --
42% perl barreau.pl
Rate Deep Temp
Deep 621118/s -- -32%
Temp 917431/s 48% --
43% perl barreau.pl
Rate Deep Temp
Deep 854701/s -- -20%
Temp 1063830/s 24% --
44% perl barreau.pl
Rate Deep Temp
Deep 740741/s -- -24%
Temp 970874/s 31% --
45% perl barreau.pl
Rate Deep Temp
Deep 793651/s -- -17%
Temp 961538/s 21% --

That is a lot of variation from run to run, but the Temp approach is
clearly faster, which I would expect. It requires seven hash lookups
instead of fifteen.

As far as "idiomatic", I wouldn't worry about it. One person's
idiomatic is another person's idiotic.
 
J

Jürgen Exner

Jim Gibson said:
There is no need to guess. Perl has the Benchmark module that makes
doing these kind of comparisons relatively easy:
[...]

A very good example for "Wer mißt mißt Mist" (literal: "who measures,
measures manure").
That is a lot of variation from run to run, but the Temp approach is
clearly faster, which I would expect. It requires seven hash lookups
instead of fifteen.

Completely irrelevant because your sample code is not using the OPs
actual live data or at the very least a representative sample of his
live data.
And depending on the size of $tmp this will make a big difference.

jue
 
P

Peter J. Holzer

Most efficient in regard to what? No, I'm not joking but I really mean
it. I'm guessing you are referring to runtime but that is by no means
the rule-it-all. Actually, if you program is so slow that you have to
optimize on this micro-level, then you should look for a better
algorithm or a different programming language.

Agreed (mostly).

Having said that I would guess the runtime depends upon how large $tmp
is. In version 2 you are creating a copy of the data.

$tmp is reference, so it's always the same size.

hp
 
I

Ivan Shmakov

[...]
# Version 2
my $tmp = $stuff->{a}{b}{c}{d};
my $A = $tmp->{e};
my $B = $tmp->{f};
my $C = $tmp->{g};
What is the most efficient? What appears to be the most idiomatic?

FWIW (and now that the experts have said their word), my
personal preference would be to write it as follows:

my $tmp = $stuff->{a}{b}{c}{d};
my ($A, $B, $C) = @$tmp{qw (e f g)};

[...]
 
J

Jürgen Exner

Ben Morrow said:
Quoth J?Exner said:
Adrien BARREAU said:
my $tmp = $stuff->{a}{b}{c}{d};
my $A = $tmp->{e};
my $B = $tmp->{f};
my $C = $tmp->{g};

What is the most efficient?
[...]

Having said that I would guess the runtime depends upon how large $tmp
is.

$tmp is a reference, so it's as small as a Perl value can be (SV
overhead plus one 4- or 8-byte pointer).

Ahmm, well, aehhh, it goes against all my intuition, but you are right.
A reference by any other name is still a humble reference, even if the
referenced item is a gigantic data set.

I stand corrected.

jue
 
T

Tim McDaniel

FWIW (and now that the experts have said their word), my
personal preference would be to write it as follows:

my $tmp = $stuff->{a}{b}{c}{d};
my ($A, $B, $C) = @$tmp{qw (e f g)};

I had to write a test program and check "man perlref" to make sure it
parses OK. (I had been expecting $tmp{...} to be interpreted as a
hashref before @ was applied.) I would have written, and still
prefer, what I think is clearer:

my ($A, $B, $C) = @{$tmp}{qw (e f g)};

Pity that it does not permit $tmp->{qw (e f g)}, but man perlref says
that -> is for single-element access only.
 
C

Charles DeRykus

I had to write a test program and check "man perlref" to make sure it
parses OK. (I had been expecting $tmp{...} to be interpreted as a
hashref before @ was applied.) I would have written, and still
prefer, what I think is clearer:

my ($A, $B, $C) = @{$tmp}{qw (e f g)};

I suspect most would agree. And {} would be required anyway if the hash
ref is anything but a simple scalar.

A nice reference cheat sheet by Tye McQueen:
http://www.perlmonks.org/?node=References quick reference
Pity that it does not permit $tmp->{qw (e f g)}, but man perlref says
that -> is for single-element access only.

Lots agree including the cheat sheet author.
 
D

Dr.Ruud

my ($A, $B, $C) = @$tmp{qw (e f g)};

[..] I would have written, and still
prefer, what I think is clearer:

my ($A, $B, $C) = @{$tmp}{qw (e f g)};

Many styles possible. I like to write it as:

@{ $tmp }{qw/ e f g /};

One space before the first item and after the last, 2 spaces in between.

With qw, I often use the slash as separator, but () as well:

use POSIX qw(
nice
strftime
);

(and the imports in alphabetical order please :)
 
R

Rainer Weikusat

Charles DeRykus said:
I suspect most would agree. And {} would be required anyway if the
hash ref is anything but a simple scalar.

A nice reference cheat sheet by Tye McQueen:
http://www.perlmonks.org/?node=References quick reference

This is incomplete. It doesn't mention code references,

perl -e '$s = sub { print("What about me?\n") }; $s->();'

doesn't mention an important special-case about glob references/ <>,

perl -e 'my $fh; open($fh, "<", "/etc/services"); print <$fh>, "\n";'

versus

perl -e 'my $fh; open($fh, "<", "/etc/services"); print <{$fh}>, "\n";'

and the

If the reference is in a hash or an array (and you are getting
back a scalar), then you can drop the -> between the adjacent
[0] and/or {KEY} parts:

is inaccurate: Everything stored in a 'Perl container object' is a
scalar.
 
T

Tim McDaniel

doesn't mention an important special-case about glob references/ <>,

perl -e 'my $fh; open($fh, "<", "/etc/services"); print <$fh>, "\n";'

versus

perl -e 'my $fh; open($fh, "<", "/etc/services"); print <{$fh}>, "\n";'

Or print <${fh}>

It would have been a kindness to explain the problem for those of us
who were unfamiliar with it. It took me a little while to find the
explanation.

"man perlop", towards the end of the I/O Operators section (in the
Perl version I checked):

If what the angle brackets contain is a simple scalar variable
(e.g., <$foo>), then that variable contains the name of the
filehandle to input from, or its typeglob, or a reference to
the same. For example:

$fh = \*STDIN;
$line = <$fh>;

If what's within the angle brackets is neither a filehandle nor
a simple scalar variable containing a filehandle name,
typeglob, or typeglob reference, it is interpreted as a
filename pattern to be globbed, and either a list of filenames
or the next filename in the list is returned, depending on
context. This distinction is determined on syntactic grounds
alone. That means "<$x>" is always a readline() from an
indirect handle, but "<$hash{key}>" is always a glob(). That's
because $x is a simple scalar variable, but $hash{key} is
not--it's a hash element. Even "<$x >" (note the extra space)
is treated as "glob("$x ")", not "readline($x)".

One level of double-quote interpretation is done first, but you
can't say "<$foo>" because that's an indirect filehandle as
explained in the previous paragraph. (In older versions of
Perl, programmers would insert curly brackets to force
interpretation as a filename glob: "<${foo}>". These days,
it's considered cleaner to call the internal function directly
as "glob($foo)", which is probably the right way to have done
it in the first place.)

So on this system,
perl -e 'my $fh; open($fh, "<", "/etc/services"); print <$fh>, "\n";'
outputs the contents of the file /etc/services, but
perl -e 'my $fh; open($fh, "<", "/etc/services"); print <{$fh}>, "\n";'
outputs one line,
GLOB(0xbb5061d0)

I didn't know that. Thank you.
 
C

Charles DeRykus

This is incomplete. It doesn't mention code references,

perl -e '$s = sub { print("What about me?\n") }; $s->();'

doesn't mention an important special-case about glob references/ <>,

perl -e 'my $fh; open($fh, "<", "/etc/services"); print <$fh>, "\n";'

versus

perl -e 'my $fh; open($fh, "<", "/etc/services"); print <{$fh}>, "\n";'


The author reminds a commenter that the scope was 4 simple rules to
dereference "data structure references". He also clarifies:

I also didn't talk about creating references, symbolic references,
blessed references, closures, ref, UNIVERSAL::isa(), references to
globs, to the IO chunks of globs, to compiled regular expressions, nor
to any other types of things that Perl lets you have a reference to.

and the

If the reference is in a hash or an array (and you are getting
back a scalar), then you can drop the -> between the adjacent
[0] and/or {KEY} parts:

is inaccurate: Everything stored in a 'Perl container object' is a
scalar.

Yeah, I don't get the "getting back a scalar" part.

An example of the shortcut:

perl -e '@a=(["a","b"]);$r=\@a;print $r->[0][1]' ## vs. $r->[0]->[1]


But, it'd work in other cases too:

perl -le '@a=(["a",{b=>"c",d=>"e"}]);$r=\@a;
print $r->[0][1]{d}'
 
R

Rainer Weikusat

Charles DeRykus said:
perlref" to make sure it


The author reminds a commenter that

.... all perceived shortcomings in his text are really intentional
omissions, at least in hindsight, lest he would have to admit that he
made some mistakes.
 
R

Rainer Weikusat

Rainer Weikusat said:
... all perceived shortcomings in his text are really intentional
omissions, at least in hindsight, lest he would have to admit that he
made some mistakes.

In case this seems overly cryptic: Perl doesn't have 'data structures'
it has various kinds of objects and references to objects. This means
unless the author provides a definition of 'data structures' in the
context of Perl, a reader has to guess what was meant by that and
guessing that it was supposed to mean 'objects' isn't very far
fetched.

Leaving this aside, this 'cheat sheet' is a partially erroneous
paraphrase of the 'Using references' section of the perlref manpage:

1. Anywhere you'd put an identifier (or chain of identifiers)
as part of a variable or subroutine name, you can replace
the identifier with a simple scalar variable containing a
reference of the correct type:

[...]


2. Anywhere you'd put an identifier (or chain of identifiers)
as part of a variable or subroutine name, you can replace
the identifier with a BLOCK returning a reference of the
correct type.

[...]

3. Subroutine calls and lookups of individual array elements
arise often enough that it gets cumbersome to use
method 2. As a form of syntactic sugar, the examples
for method 2 may be written:

$arrayref->[0] = "January"; # Array element

[...]

The arrow is optional between brackets subscripts

This is not only complete and correct but also (IMHO) no more
difficult to understand than the other text.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top