Why are arrays and hashes this way?

X

Xavier Noria

When I introduce references the first thing I mention is that they
allow us to build nested structures. However, the importance of that
feature is a consequence of the fact that structures cannot be nested
themselves.

Does anybody know why structures were designed so that they could just
hold scalars?

-- fxn
 
A

Ala Qumsieh

Xavier said:
Does anybody know why structures were designed so that they could just
hold scalars?

AFAIK, that was how Larry Wall originally designed it, and it stayed
this way for backward compatibility. The introduction of reference with
Perl5 was specifically targeted at adding the ability to build nested
data structures.

--Ala
 
U

Uri Guttman

AQ> AFAIK, that was how Larry Wall originally designed it, and it stayed
AQ> this way for backward compatibility. The introduction of reference
AQ> with Perl5 was specifically targeted at adding the ability to build
AQ> nested data structures.

and even without that, it makes very good sense. the problem with
storing a real hash where a scalar is, is how do you store it? the slot
in an SV can hold a single item (a scalar) so what would you put there
to represent a hash? and if any of those hash elements was a hash, all
memory hell breaks out. in c, you can only do multidim arrays of known
element size. with perl you can have each thing at any level be any
thing of any size. so the win is major flexibility at a cost of
understanding and dealing with refs. not a bad tradeoff IMO.

uri
 
J

Jürgen Exner

Uri said:
could just >> hold scalars?

and even without that, it makes very good sense. the problem with
storing a real hash where a scalar is, is how do you store it? the
slot in an SV can hold a single item (a scalar) so what would you put
there to represent a hash? and if any of those hash elements was a
hash, all memory hell breaks out. in c, you can only do multidim
arrays of known element size. with perl you can have each thing at
any level be any thing of any size. so the win is major flexibility
at a cost of understanding and dealing with refs. not a bad tradeoff
IMO.

First I was thinking the same but on second thought that is really an
implementation detail of the compiler that can and should be hidden from the
user of the language.
While I agree that sometimes a language designer has to compromise because
of implementation considerations, in general that should not be the guiding
principle for designing a computer language.
Otherwise we would still be stuck with C, a languages whos features can
easily be translated one-to-one into assembler code. And which I refuse to
call "higher level" because it is too close to the computer architecture,
thus leaving the programmer with the burden of translating his problem into
too low a detail and thinking in computer architecture terms instead of in
problem area terms.
This may be fine for implementing an operating system, but it is an
unecessary burden for application programming.

In some areas Perl went beyond C, e.g. wrt. memory management for arrays and
hashes. They are just there, the programmer doesn't have to worry about
pre-allocating memory, enlarging them, or releasing them when not needed any
longer. It is a pitty that Larry didn't go all the way and eliminated the
concept of pointers (aka references) altogether. Technically there is no
need for them, the compiler could manage them automatically and hide them
from the user.

The one exception would be how to realize a call-by-reference if there are
no references in the language. However even that can be solved on the
language level by an additional keyword like e.g. "VAR" like in Pascal which
would indicate to the compiler that this parameter is call-by-reference
rather then call-by-value. Again, no need for an explicit notion ("&foobar")
of pointers/references in the technical sense.

jue
 
M

Michele Dondi

AQ> AFAIK, that was how Larry Wall originally designed it, and it stayed
AQ> this way for backward compatibility. The introduction of reference
AQ> with Perl5 was specifically targeted at adding the ability to build
AQ> nested data structures.

and even without that, it makes very good sense. the problem with

Indeed!

As a side note, even if I have not had any exposure to Perl4 but in
terms of corrections to old(-fashioned)/obsolete scripts, I'm
fascinated by the way these extremely powerful features were added in
a manner not only backwards-compatible with previous relases of the
language, but even consistent with them, that is with Perl's basic
syntax that we all, presumably, appreciate so much!
storing a real hash where a scalar is, is how do you store it? the slot
in an SV can hold a single item (a scalar) so what would you put there
to represent a hash? and if any of those hash elements was a hash, all
memory hell breaks out. in c, you can only do multidim arrays of known

Well, as far as the UI is concerned the "look and feel" of references
is exacly this, i.e., loosely speaking (about a subset of the meanings
that can be given to refs), of "arrays and hashes that can be stored
into a single scalar variable".

IMHO the only situation when the fake nature of refs as nested
structures becomes evident is with "copy": it's not so bad in the end,
and we all do such things routinely either with our own handmade
solutions or by means of a cloning module, but that's it!

Now that I come to think of this, IMHO it would be fine if an
assignement operator existed (what about ':=') that does an automatic
recursive copy/cloning of its RHS. Or even better, an operator to
return such a clone... what about:

<-

Hmmm, no, would be an hell for a parser to tell from "less than,
minus",

<--

no, same cmt,

<=

no, already taken! (and context wouldn't help much here, I guess),

<==

Hmmm, well, what about this?

my @AoA = ([1,0], [0,1]);
my @new_AoA = <== @AoA;
$new_AoA[0][1]=1; # it's now ([1,1], [0,1]), @AoA unchanged
element size. with perl you can have each thing at any level be any
thing of any size. so the win is major flexibility at a cost of
understanding and dealing with refs. not a bad tradeoff IMO.
^^^^^^^^^^^^^^^^^^^^^^

Definitely, IMO too!

Also, it seems that in Perl6 dealing with references will be made much
more transparent, won't it?


Michele,
whose judgement capabilities may be strongly biased/injured by our
traditional Easter lunch...
 
P

pkent

When I introduce references the first thing I mention is that they
allow us to build nested structures. However, the importance of that
feature is a consequence of the fact that structures cannot be nested
themselves.

They can, indeed. I haven't yet read the other 4 posts showing up in
this thread but I'll guess that they will point out that:

a) references are scalars
b) the values in hashes and arrays are scalars
therefore, using some Ancient Greek logic :)
c) hashes and arrays may contain other hashes or arrays by holding
references to them

See the perlreftut page for more.

E.g.

my %nest = (
foo => [
1, 2, { x => 'y'}, [ 'r' ]
],
bar => {
baz => 'qux',
quux => [
'l'
],
}
);
print $nest{bar}{quux}[0] . "\n";
print $nest{foo}[2]{x} . "\n";
__END__

P
 
P

pkent

When I introduce references the first thing I mention is that they
allow us to build nested structures. However, the importance of that
feature is a consequence of the fact that structures cannot be nested
themselves.

Does anybody know why structures were designed so that they could just
hold scalars?

Oh, I see now. I misunderstood what you were getting at. Don't mind me,
nothing to see here, move along now... :)

P
 
X

Xavier Noria

Uri Guttman said:
and even without that, it makes very good sense. the problem with
storing a real hash where a scalar is, is how do you store it? the slot
in an SV can hold a single item (a scalar) so what would you put there
to represent a hash? and if any of those hash elements was a hash, all
memory hell breaks out. in c, you can only do multidim arrays of known
element size. with perl you can have each thing at any level be any
thing of any size. so the win is major flexibility at a cost of
understanding and dealing with refs. not a bad tradeoff IMO.

In my opinion the reason cannot be only "because the slot is an SV".

Why then arrays and hashes are data types that cannot be stored in
SVs? I guess there was some choice made when those data types were
defined that matters here. My question is why that initial choice was
done like that. Efficiency? No particular reason but historical
accident? Different goals than today for which those types were better
suited?

I think this is important to know. Being arrays and hashes first-class
citizens it kind of surprises to newcomers (like me when I learned
Perl5) that they cannot be nested. The class about structures should
have some comment like "You see we have all these high-level
structures so easily handled at the tips of our fingers, but because
of <<historical reasons>> they cannot be nested. We'll learn how to do
that when we see references."

-- fxn
 
U

Uri Guttman

XN> In my opinion the reason cannot be only "because the slot is an SV".

XN> I think this is important to know. Being arrays and hashes first-class
XN> citizens it kind of surprises to newcomers (like me when I learned
XN> Perl5) that they cannot be nested. The class about structures should
XN> have some comment like "You see we have all these high-level
XN> structures so easily handled at the tips of our fingers, but because
XN> of <<historical reasons>> they cannot be nested. We'll learn how to do
XN> that when we see references."

then you need to learn some c. there is no easy way to truly nest stuff
without pointers in c. any tree of mixed structures must use
pointers. so now it comes to translating that to perl. how would you
assign a hash to a scalar slot? currently a hash or an array in a scalar
context (and this is mostly true in perl4) returns its size. do you make
a full copy during the assignment? how do you handle looped data? with
full copies and no references/pointers you can't have data loops. how
would you pass things around to subs, again with full copies? what does
it mean to assign an array which has arrays to another array? does it do
a flattening or a deep copy? you have many more questions like this to
answer and most of the answers suck for either efficiency reasons or
behavioral ones. trust me, larry knows what he is doing and by making
trees require refs he chose a good path. it meant very clean
compability, it made semantics clean and easy to explain. the only issue
is that it is a little harder for newbies to pick up the concepts of
refs in trees. but as with much of perl, it may be harder the first time
to learn it, but the payoff is massive time savings later for
experienced hackers. perl is meant to save development time, not be a
sop to newbies who want nested structures without having to think about
things and all the ugliness they have.

uri
 
M

Michele Dondi

c) hashes and arrays may contain other hashes or arrays by holding
references to them

Definitely correct IMHO (see e.g. my other post in this thread) once
s/may contain/may (fake very nicely to) contain/;


Michele
 
C

ctcgag

In my opinion the reason cannot be only "because the slot is an SV".

You are right, they could have changed that if they wanted to. Or just
papered over it in the parser and left it the same behind the scenes.
Why then arrays and hashes are data types that cannot be stored in
SVs? I guess there was some choice made when those data types were
defined that matters here. My question is why that initial choice was
done like that.

How would you assign a hash or an array to a scalar?

$x=@a is already used to mean something different.
($x)=@a is already used to mean something different.

What notation would you use?

When it comes to dereferencing, what notation would you use?
And you have to use some notation, because sometimes I want
a shallow copy and sometimes I want a deep copy, so you have to give
me the power to declare which one I want. I guess they could have
made dereferencing the default behavior (and deny that that is what they
are doing, by decreeing that they weren't references in the firts place),
and you would instead need a special notation for a non-dereferencing
access. But regardless of which one is the default, the other one is still
necessary as an option. Or would you forbid the whole concept of multiple
handles into the same piece of data?
Efficiency?

I doubt it. Behind the scenes it probably be pretty much the same.
(Unless you did forbid the concept of multiple handles into the same
piece of data.)
No particular reason but historical
accident? Different goals than today for which those types were better
suited?

Well, until I see you give a grammar/syntax which allows us to accomplish
everything we can currently accomplish, I'll stick with the notion that
they didn't do it because it is a bad idea.
I think this is important to know. Being arrays and hashes first-class
citizens it kind of surprises to newcomers (like me when I learned
Perl5) that they cannot be nested. The class about structures should
have some comment like "You see we have all these high-level
structures so easily handled at the tips of our fingers, but because
of <<historical reasons>> they cannot be nested. We'll learn how to do
that when we see references."

"Nested structures have a certain irreducible complexity, and you ignore
this complexity at your own peril. We need to thoroughly understand the
data structures themselves before we delve into nesting them. We will
learn how to appropriately deal with this complexity when we learn about
references."

Xho
 
X

Xavier Noria

Thank you very much for your response Xho!

Nevertheless both your post and Uri's seem to answer "why structures
cannot be nested in Perl 5". That's helpful, but it is not the real
question. The argument more or less goes: "Those semantics in Perl 5
are the most reasonable choice because otherwise how could you achieve
backwards compatibility?".

But since that's a consequence of history, to answer the question "why
structures cannot be nested today" we have in turn to answer why they
didn't nest in previous versions of Perl. Why they didn't nest from
the start being first-class citizens and containers. I don't mean to
read Larry's mind, but maybe somebody around just know it.

-- fxn
 
M

Malcolm Dew-Jones

Xavier Noria ([email protected]) wrote:
: When I introduce references the first thing I mention is that they
: allow us to build nested structures. However, the importance of that
: feature is a consequence of the fact that structures cannot be nested
: themselves.

: Does anybody know why structures were designed so that they could just
: hold scalars?

It's conceptually simple, it's consistent, it allows for all the necessary
functionality of nested data structures, and the references themselves are
a general purpose mechanism that provides a lot more than just nested
structures.

Anything else would be more complicated in virtually every situation other
than doing deep copies of nested structures.
 
J

Juha Laiho

(e-mail address removed) (Xavier Noria) said:
When I introduce references the first thing I mention is that they
allow us to build nested structures. However, the importance of that
feature is a consequence of the fact that structures cannot be nested
themselves.

Does anybody know why structures were designed so that they could just
hold scalars?

Making guesses:
- space management for structures becomes easier
- allows for more complex data structures - f.ex. to have structures
A, B and C so that both B and C refer to a single instance of
structure A (so, if you change something in the 'A' referred to in
structure 'B', the same change is seen through 'C')
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top