hv_iterinit has side effects - who cares about PL theory

O

Ozgun Erdogan

As a newbie to Perl programming, I've realized that hv_iterinit has
side effects. That is, if you want to walk over a hash structure,
there is no way you can walk over it without changing the internal
values of the hash itself. And hashes in Perl are primitive data
types.

So, if you want to read the values in a hash, there is no way for you
to do it without modifying other values in the hash itself. From my
perspective (and from my perspective only), this is some what
equivalent to:

$x = 4;
Print $x;
(now $x is 10).

I know it sounds exaggerated, and probably people discussed for a
while and thought it'd bite no one to implement it like this, but
here's where this unexpected behavior hurts:

Take CPAN module Data::Structure::Util, which detects circular
references. This is something which I really need, but goes into
infinite recursion when we have:
hash1 has a weak reference weak1 to hash2. hash2 has a hard reference
hard1 pointing back to hash1. The code goes into infinite recursion
because hv_iterinit would set xhv_riter to 0 for hash1 every time. I
think this is not a bug in Data::Structure::Util, it's a bug in Perl!

You should be able to walk over a primitive type without changing its
value, especially in a widely used language as Perl.

Are there any other Perl 5 API functions available that'd let me walk
over a hash without modifying the hash itself?

Ozgun.
 
T

Tassilo v. Parseval

Also sprach Ozgun Erdogan:
As a newbie to Perl programming, I've realized that hv_iterinit has
side effects. That is, if you want to walk over a hash structure,
there is no way you can walk over it without changing the internal
values of the hash itself. And hashes in Perl are primitive data
types.

So, if you want to read the values in a hash, there is no way for you
to do it without modifying other values in the hash itself. From my
perspective (and from my perspective only), this is some what
equivalent to:

$x = 4;
Print $x;
(now $x is 10).

I know it sounds exaggerated, and probably people discussed for a
while and thought it'd bite no one to implement it like this, but
here's where this unexpected behavior hurts:

Take CPAN module Data::Structure::Util, which detects circular
references. This is something which I really need, but goes into
infinite recursion when we have:
hash1 has a weak reference weak1 to hash2. hash2 has a hard reference
hard1 pointing back to hash1. The code goes into infinite recursion
because hv_iterinit would set xhv_riter to 0 for hash1 every time. I
think this is not a bug in Data::Structure::Util, it's a bug in Perl!

I don't understand this. What would you expect the below code to do?

my %hash;
$hash{ key } = \%hash;
walk(\%hash);

sub walk {
my $ref = shift;
while (my ($k, $v) = each %$ref) {
print "$k => ";
if (not ref $v) {
print "$v\n";
}
else {
walk($v);
}
}
}

Clearly, this will end in an infinite recursion. But is this a bug in
perl? One has to keep track of the references to make sure that circular
ones are detected correctly.
You should be able to walk over a primitive type without changing its
value, especially in a widely used language as Perl.

You aren't modifying a hash's values when iterating over it. You are
however modifying the internal state of the iterator. There is no other
way to do it in any language.

So the bug is in the module and not in perl. You should report this to
its author.

Tassilo
 
J

Joe Smith

Ozgun said:
As a newbie to Perl programming, I've realized that hv_iterinit has
side effects.

Correct. Don't mess with the iterator or it will cause bad effects.
there is no way you can walk over it without changing the internal
values of the hash itself.

Of course you can. If you gather up all the keys before looking at
the hash values, there is no problem. If you use the each() function,
you must not start another each() on the same hash while still inside
the first one.

Here's an example of how to shoot yourself in the foot:

while(($key,$val) = each %ENV) {
print "$key = $val\n";
if ($key eq "PATH") {
print "Attempting to screw up the each() function\n";
while(($key2,$val2) = each %ENV) {
$count++;
}
print "count=$count\nDone mucking around with each() inside of each()\n";
}
}

It goes into an infinite loop because an improper call to each(%HASH)
while in the middle of an each(%HASH).
So, if you want to read the values in a hash, there is no way for you
to do it without modifying other values in the hash itself.

I don't believe you. Show us some real code, and I bet that we can
show you how it can be done correctly. Perl programmers read the
values of a hash all the time without modifying other values in the hash.
this is some what equivalent to:

$x = 4;
Print $x;
(now $x is 10).

Yes, that is expected behavior if Print() is explicitly defined to
alter its arguments. Like this:

sub Print(\$) {
my $sref = shift;
print "Subroutine Print() called with $sref => $$sref\n";
$$sref = 10;
}

$x = 4;
Print $x;
print "Afterwards, \$x is $x\n";

Take CPAN module Data::Structure::Util, which detects circular
references, but goes into infinite recursion

That is a problem with one particular module, not a problem with
the Perl language.
-Joe
 
O

Ozgun Erdogan

I don't believe you. Show us some real code, and I bet that we can
show you how it can be done correctly. Perl programmers read the
values of a hash all the time without modifying other values in the hash.

What I meant in here is that, the iterator is part of the hash
structure itself. From what gdb shows me, the hash structure in Perl
has RITER and EITER in it that are used for iterating over the hash.
Yes, that is expected behavior if Print() is explicitly defined to
alter its arguments. Like this:

I see your point, but that's not what I meant. What I tried to say is,
I'm doing a variable read, and it's modifying the values (RITER and
EITER) in the hash data structure. So, when you read $x, the value of
$x (maybe not the part of the structure that stores the value 4, but
some other value in the data structure) changes.

Data::Structure::Util never does a hash write. It only iterates over
the values of the hash.
 
O

Ozgun Erdogan

Clearly, this will end in an infinite recursion. But is this a bug in
perl? One has to keep track of the references to make sure that circular
ones are detected correctly.

I think another example will make it clear why I think Perl's
hv_iterinit()/hv_iternext() functions are not implemented as expected.
Technically, these functions are doing reads. Say, you have two
threads:

T1
----
grab_read_lock(hash1);
hv_iterinit(hash1);
while (not_endof_hash(hash1));
{
hv_iternext(hash1);
print(value(hash1));
}
release_read_lock(hash1);

T2
----
grab_read_lock(hash1);
hv_iterinit(hash1);
while (not_endof_hash(hash1));
{
hv_iternext(hash1);
print(value(hash1));
}
release_read_lock(hash1);

Technically, independent of scheduling, you should see each of hash1's
elements printed twice. This is because hv_iterinit()/hv_iternext()
are supposed to do reads. Honestly, I haven't tried this, but am
pretty sure it'll mess up in Perl. On the other hand, in C, you can
do:

HE *curr_it = get_hv_head(hash1);
... and walk over the hash with your own C functions ...
You aren't modifying a hash's values when iterating over it. You are
however modifying the internal state of the iterator. There is no other
way to do it in any language.

No, the problem is, the value of the iterator is in the hash structure
itself. And yes, you can easily do it in any other language, C++ STL
libraries provide you a data structure (like <vector>) and an iterator
that's independent of the data structure. Try the above example with
STL, and it'll give you what you want (or what I expect).
So the bug is in the module and not in perl. You should report this to
its author.

No, that depends on your perspective. I'm aware that implementing
hashes in a different way would break Perl's lazy deletion idea.
However, implementing it this way breaks one of the fundamental rules
of programming language theory: A read is supposed to do a read.

Ozgun.
 
T

Tassilo v. Parseval

Also sprach Ozgun Erdogan:
I think another example will make it clear why I think Perl's
hv_iterinit()/hv_iternext() functions are not implemented as expected.
Technically, these functions are doing reads. Say, you have two
threads:

T1
----
grab_read_lock(hash1);
hv_iterinit(hash1);
while (not_endof_hash(hash1));
{
hv_iternext(hash1);
print(value(hash1));
}
release_read_lock(hash1);

T2
----
grab_read_lock(hash1);
hv_iterinit(hash1);
while (not_endof_hash(hash1));
{
hv_iternext(hash1);
print(value(hash1));
}
release_read_lock(hash1);

Technically, independent of scheduling, you should see each of hash1's
elements printed twice. This is because hv_iterinit()/hv_iternext()
are supposed to do reads. Honestly, I haven't tried this, but am
pretty sure it'll mess up in Perl.

I haven't tried. However, perl is thread-safe and hashes are used with
threads. So far I haven't yet heard any complains about this particular
aspect.
No, the problem is, the value of the iterator is in the hash structure
itself. And yes, you can easily do it in any other language, C++ STL
libraries provide you a data structure (like <vector>) and an iterator
that's independent of the data structure. Try the above example with
STL, and it'll give you what you want (or what I expect).

That's because you create each iterator object manually. If you do:

void function (map<int,int>::iterator i) {
cout << i->first << i->second;
}

map<int,int> m;
m[1] = 1;
m[2] = 2;
map<int,int>::iterator i = m.begin();
cout << i->first << i->second;
function(++i);

that is equivalent to Perl's:

sub function { print each %{$_[0]} }
%hash = (1=>1, 2=>2);
print each %hash;
function(\%hash);

The latter's even more powerful as values can be added to the hash in
function(). In C++ you'd additionally have to pass the map to the
function.
No, that depends on your perspective. I'm aware that implementing
hashes in a different way would break Perl's lazy deletion idea.
However, implementing it this way breaks one of the fundamental rules
of programming language theory: A read is supposed to do a read.

It still is a read. The fact that you do not have to create iterators
yourself in Perl is a feature, and a good one IMO.

The fact that that's the way Perl hashes behave is well known. The
module of this particular module appears to neglect it if his module
falls into infinite recursion. In order to avoid it he has to save the
iterators value somewhere and restore it later when he wants to continue
where he left off.

Tassilo
 
O

Ozgun Erdogan

that, most thread implementations (POSIX threads for instance) don't
know of a 'write lock'.

POSIX threads just give you the basics. You can easily get read/write
locks by using these primitives. Type "POSIX threads read write locks"
in Google. And this POSIX argument is completely out of the point. C
doesn't have any primitive types that result in a write when you do a
read.
Secure access to shared data happens through
mutexes which are independent of the kind of access (read/write) that
might happen. So the distinction into read and write lock is moot.

Well, yes the distinction between read and write lock is moot. Unless
of course you are writing concurrent programs. Like databases, like
operating systems, etc..

I just gave "two threads" as a concrete example to show that reading
the hash structures was absolutely doing writes. There is a
distinction between a read and a write, and from my perspective (and
from my perspective), a read should do a read. This independent of
your claim for "the distinction into read and write lock is moot."
(besides, there is a very fine distinction there, like 30 years of
computer science history).
 
J

Joe Schaefer

(e-mail address removed) (Ozgun Erdogan) writes:

[...]
I just gave "two threads" as a concrete example to show
that reading the hash structures was absolutely doing writes.

each %foo

is an iterator. Pretending it isn't won't resolve your
complaint. Somewhere somebody decided that it was good
to include an iterator API for perl hashes, PL theory
notwithstanding.
 
C

ctcgag

I think another example will make it clear why I think Perl's
hv_iterinit()/hv_iternext() functions are not implemented as expected.
Technically, these functions are doing reads. Say, you have two
threads:

T1
----
grab_read_lock(hash1);
hv_iterinit(hash1);
while (not_endof_hash(hash1));
{
hv_iternext(hash1);
print(value(hash1));
}
release_read_lock(hash1);

T2
----
grab_read_lock(hash1);
hv_iterinit(hash1);
while (not_endof_hash(hash1));
{
hv_iternext(hash1);
print(value(hash1));
}
release_read_lock(hash1);

Technically, independent of scheduling, you should see each of hash1's
elements printed twice. This is because hv_iterinit()/hv_iternext()
are supposed to do reads.

In my book, things that "are supposed to do reads" generally don't end
in "init".

Xho
 
O

Ozgun Erdogan

In my book, things that "are supposed to do reads" generally don't end
in "init".

Xho

Thanks for backing me up Xho. That's what I was trying to say the
whole thread. If I want to do a hash read from the beginning, why
would I need to call init (change) the hash structure, right? Take
init out of the picture in T1/T2, and you'd get completely
non-deterministic behavior.
 
C

ctcgag

Thanks for backing me up Xho. That's what I was trying to say the
whole thread. If I want to do a hash read from the beginning, why
would I need to call init (change) the hash structure, right? Take
init out of the picture in T1/T2, and you'd get completely
non-deterministic behavior.

I didn't know I was backing you up! It doesn't sound like you want to do
a simple hash read, it sounds like you want to do a hash iteration, which
is different. It seems to me that you are insisting a hash iteration is
(or should be) a simple read, but obviously it is no such thing. (As
evidenced by the fact that you need to call init in your pseudocode).

If you say "I want an iterator separate from the hash itself", well, I
can't disagree with that. You want what you want. Maybe you want to use
some other language. That's also fine. But you seem to be saying
"Everyone else should want that too." Well, we don't.

Xho
 
J

Joe Schaefer

If I want to do a hash read from the beginning, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
*Boggle*

Take init out of the picture in T1/T2, and you'd get completely
non-deterministic behavior.

perl isn't threaded, so the relevance of T1/T2 escapes me.
Have you considered rewriting whatever it is you're doing
around a breadth-first search instead of a depth-first one?
 
O

Ozgun Erdogan

perl isn't threaded, so the relevance of T1/T2 escapes me.
Have you considered rewriting whatever it is you're doing
around a breadth-first search instead of a depth-first one?

That may fix it too. I re-implemented hv_iterinit/hv_iternext, and
that seemed to solve the problem for me.

I brought up the T1/T2 point just to make the distinction between
reads/writes clearer. I can see that the iterator is within the hash
structure to make lazy deletion possible. I can also see that somebody
made the decision based on that.

However, this decision poses some strong limitations on concurrent
programming in Perl, and makes primitive data types in Perl "very out
of the ordinary" - from a PL design perspective (yup, I'm repeating
myself).

As for Xho's post, I've never said "everybody should want this", there
is a fine line between C/Java and Perl, and this is one of those
features that draws the line.
 
T

Tassilo v. Parseval

Also sprach (e-mail address removed):
(e-mail address removed) (Ozgun Erdogan) wrote:

I didn't know I was backing you up! It doesn't sound like you want to do
a simple hash read, it sounds like you want to do a hash iteration, which
is different. It seems to me that you are insisting a hash iteration is
(or should be) a simple read, but obviously it is no such thing. (As
evidenced by the fact that you need to call init in your pseudocode).

If you say "I want an iterator separate from the hash itself", well, I
can't disagree with that. You want what you want. Maybe you want to use
some other language. That's also fine. But you seem to be saying
"Everyone else should want that too." Well, we don't.

Also, there is keys() which is a simple way to iterate over a hash
without requiring any state.

Tassilo
 
J

Joe Schaefer

That may fix it too. I re-implemented hv_iterinit/hv_iternext, and
that seemed to solve the problem for me.

Hmm, I'm wondering whether there's a simpler way...
Looking at a 5.8.x implementation of hv_iterinit:

==================================================
I32
Perl_hv_iterinit(pTHX_ HV *hv)
{
register XPVHV* xhv;
HE *entry;

if (!hv)
Perl_croak(aTHX_ "Bad hash");
xhv = (XPVHV*)SvANY(hv);
entry = xhv->xhv_eiter; /* HvEITER(hv) */
if (entry && HvLAZYDEL(hv)) { /* was deleted earlier? */
HvLAZYDEL_off(hv);
hv_free_ent(hv, entry);
}
xhv->xhv_riter = -1; /* HvRITER(hv) = -1 */
xhv->xhv_eiter = Null(HE*); /* HvEITER(hv) = Null(HE*) */
/* used to be xhv->xhv_fill before 5.004_65 */
return XHvTOTALKEYS(xhv);
}
==================================================

I'm pretty sure that the iterator's entire state is managed by
xhv_riter and xhv_eiter. By copying those to automatic C variables
before doing any recursive calls, and then restoring those values
when control is returned, you should be able to patch
Data::Structure::Util to get it working. You may need to muck
with HvLAZYDEL before restoring, but that seems relatively
staightforward.

I suspect you tried this already, so I'm wondering what
sort of problems you encountered.

I brought up the T1/T2 point just to make the distinction between
reads/writes clearer. I can see that the iterator is within the hash
structure to make lazy deletion possible.

Personally I don't think lazy deletion is much of a big deal.
IMO the iterator belongs in the hash because you can't
implement Perl's C<each %foo> without associating an iterator
to the hash itself. If you don't think it belongs in the
struct, where else would you put it?
 
O

Ozgun Erdogan

when control is returned, you should be able to patch
Data::Structure::Util to get it working. You may need to muck
with HvLAZYDEL before restoring, but that seems relatively
staightforward.

I suspect you tried this already, so I'm wondering what
sort of problems you encountered.

This works perfectly fine. I took a pretty similar approach and got
Data::Structure::Util to work. I think you need to pay special
attention when doing graph walks with loops in them (I don't know,
maybe file system implementations with sym. links, certain networking
problems?).
Personally I don't think lazy deletion is much of a big deal.
IMO the iterator belongs in the hash because you can't
implement Perl's C<each %foo> without associating an iterator
to the hash itself. If you don't think it belongs in the
struct, where else would you put it?

From my perspective, lazy deletion is the biggest argument against
taking the iterator out. First, taking it out means a big change to
Perl's internals. Second, when you're walking over a hash, if you
don't pay attention and delete the element you're walking on, then
you'd get a seg. fault. Not a typical Perl behavior. Even though C++
STL clearly warns against doing this, it happened to me once or twice
- just because I was careless.

One alternative would be to have an independent iterator outside the
hash. Have an iterator struct that contains two elements, RITER and
EITER, and implement three functions: associate_to_hash(),
hv_iterinit_special(), hv_iternext_special(). Again, this all depends
on your perspective and your user base.
 
J

Joe Schaefer

(e-mail address removed) (Ozgun Erdogan) writes:


[...]
One alternative would be to have an independent iterator outside the
hash. Have an iterator struct that contains two elements, RITER and
EITER, and implement three functions: associate_to_hash(),
hv_iterinit_special(), hv_iternext_special(). Again, this all depends
on your perspective and your user base.

Parrot is the right place to look for thread-safe implementations.
Dan Sugalski is the architect and he's well aware of the limitations
of perl5. Perhaps it'd be a good idea for you to get involved with the
perl6.internals list...

Best wishes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top