Unable to debug Perl script

A

Artemis Fowl

Hello all,

I face a peculiar problem when I try to debug my perl script. This
script is used to fetch values from an excel sheet and print it into
different files. I needed to debug this script. When I do try to
debug, I get this error.

"Bizarre copy of HASH in leave at excel_extract.pl line 114.
at excel_extract.pl line 114
Debugged program terminated. Use q to quit or R to restart,
use o inhibit_exit to avoid stopping after program termination,
h q, h R or h o to get additional info."

I tried searching for help on the net. Couldn't find much information
about it. Seems like a peculiar error.
Does anyone know why this happens? It would be a lot of help if you
could shine some light on this :)

Warm Regards,
Artemis.
 
P

Peter Scott

I face a peculiar problem when I try to debug my perl script. This
script is used to fetch values from an excel sheet and print it into
different files. I needed to debug this script. When I do try to
debug, I get this error.

"Bizarre copy of HASH in leave at excel_extract.pl line 114.
at excel_extract.pl line 114
I tried searching for help on the net. Couldn't find much information
about it. Seems like a peculiar error.

This is due to a bug in some C or XS code somewhere, probably in a CPAN
module. First upgrade to the latest version of everything, including
perl. If you still get the error, reduce it to the shortest program you
can, preferably under 20 lines, and post it as a bug on rt.perl.org or
just email the author of the most excel-specific module your program uses.
It may not be their fault but you can't be expected to do more unless you
know how.
 
P

Peter Scott

FWIW, google finds a short method of generating this error.

This is perl, v5.8.8 built for i386-linux-thread-multi

perl -Te '@{%h}{x}'
Bizarre copy of HASH in leave at -e line 1.

Elegant. Looks fixed in 5.10. Either change 27350 or 25808.
 
B

Ben Morrow

Quoth Peter Scott said:
Elegant. Looks fixed in 5.10. Either change 27350 or 25808.

I should perhaps point out that this doesn't mean what you might think,
and that in 5.10 its meaning has also been fixed.

~% perl5.8.8 -le'%h = qw/a b/; %{"1/8"} = qw/a c/; print @{%h}{a}'
b
~% perl5.10.0 -le'%h = qw/a b/; %{"1/8"} = qw/a c/; print @{%h}{a}'
c

Perl used to allow you to treat a hash or array as a reference to
itself; this was a bug, and has now been (partly) fixed. The way the
expression now evaluates is

Evaluate %h in scalar context -> '1/8'
Evaluate @{'1/8'}{a} as a symbolic ref

which is why the above gives 'c'. But since 5.8 and earlier incorrectly
sliced %h rather than %{'1/8'}, you can't rely on this. (It would be
stupid behaviour to rely on, in any case, since the exact value of a
hash in scalar context has never been guaranteed. The only formal
statement in the docs is that the value will be true iff the hash has
any elements.)

Under 'use strict' you get 'Can't use string ("1/8") as a HASH
reference' with perls at least as far back as 5.6.1, so this won't be a
problem in any normal code. If you want to slice %h, the correct syntax
is simply

@h{a}

Ben
 
S

szr

Ben said:
I should perhaps point out that this doesn't mean what you might
think, and that in 5.10 its meaning has also been fixed.

~% perl5.8.8 -le'%h = qw/a b/; %{"1/8"} = qw/a c/; print @{%h}{a}'
b
~% perl5.10.0 -le'%h = qw/a b/; %{"1/8"} = qw/a c/; print @{%h}{a}'
c

Perl used to allow you to treat a hash or array as a reference to
itself; this was a bug, and has now been (partly) fixed. The way the
expression now evaluates is

Evaluate %h in scalar context -> '1/8'
Evaluate @{'1/8'}{a} as a symbolic ref

which is why the above gives 'c'. But since 5.8 and earlier
incorrectly sliced %h rather than %{'1/8'}, you can't rely on this.
(It would be stupid behaviour to rely on, in any case, since the
exact value of a hash in scalar context has never been guaranteed.
The only formal statement in the docs is that the value will be true
iff the hash has any elements.)

Under 'use strict' you get 'Can't use string ("1/8") as a HASH
reference' with perls at least as far back as 5.6.1, so this won't be
a problem in any normal code. If you want to slice %h, the correct
syntax is simply

@h{a}

Having read all this inspired me to run a few tests, and I found
something odd regarding allocation:

$ perl5.8.8 -Mstrict -we 'my %h; @h{1..1} = (1..100); print "[",
scalar %h, "]\n";'
[1/8]

$ perl5.8.8 -Mstrict -we 'my %h; @h{1..2} = (1..100); print "[",
scalar %h, "]\n";'
[2/8]

$ perl5.8.8 -Mstrict -we 'my %h; @h{1..3} = (1..100); print "[",
scalar %h, "]\n";'
[3/8]

$ perl5.8.8 -Mstrict -we 'my %h; @h{1..4} = (1..100); print "[",
scalar %h, "]\n";'
[3/8]

$ perl5.8.8 -Mstrict -we 'my %h; @h{1..5} = (1..100); print "[",
scalar %h, "]\n";'
[4/8]


I get the same using 5.10.0, 5.8.2, and 5.8.0. 5.6.1, however, shows
the fourth line as [4/8], and the 5th as [5/8], which is what I would
have exacted. It seems Perl 5.8.0 and above sometimes incorrectly return
the number of used buckets, as in the fourth line, there are four
key-value pairs, but only 3 buckets.... how can this be?
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
szr
$ perl5.8.8 -Mstrict -we 'my %h; @h{1..4} = (1..100); print "[",
scalar %h, "]\n";'
[3/8]

$ perl5.8.8 -Mstrict -we 'my %h; @h{1..5} = (1..100); print "[",
scalar %h, "]\n";'
[4/8]


I get the same using 5.10.0, 5.8.2, and 5.8.0. 5.6.1, however, shows
the fourth line as [4/8], and the 5th as [5/8],

So hashing algorithms in 5.6.1 is slightly better (on this particular
codeset). [No surprise for me; I suspect I know who optimized it. ;-]

With randomized hashing, 5 people with 8 possible birth-weekdays would
have a quite large chance of a collision, 1 - 8*7*6*5*4 / 5^8 = 80%
(birthday paradox). So it is not surprising that what you got is a
collision.
which is what I would have exacted. It seems Perl 5.8.0 and above
sometimes incorrectly return the number of used buckets, as in the
fourth line, there are four key-value pairs, but only 3
buckets.... how can this be?

Each bucket may keep an inlimited number of keys. [If you are lucky,
most buckets have only one key, and key lookup is quite quick.]

Hope this helps,
Ilya
 
B

Ben Morrow

Quoth "szr said:
Having read all this inspired me to run a few tests, and I found
something odd regarding allocation:

$ perl5.8.8 -Mstrict -we 'my %h; @h{1..1} = (1..100); print "[",
scalar %h, "]\n";'
[1/8]

$ perl5.8.8 -Mstrict -we 'my %h; @h{1..2} = (1..100); print "[",
scalar %h, "]\n";'
[2/8]

$ perl5.8.8 -Mstrict -we 'my %h; @h{1..3} = (1..100); print "[",
scalar %h, "]\n";'
[3/8]

$ perl5.8.8 -Mstrict -we 'my %h; @h{1..4} = (1..100); print "[",
scalar %h, "]\n";'
[3/8]

$ perl5.8.8 -Mstrict -we 'my %h; @h{1..5} = (1..100); print "[",
scalar %h, "]\n";'
[4/8]

I get the same using 5.10.0, 5.8.2, and 5.8.0. 5.6.1, however, shows
the fourth line as [4/8], and the 5th as [5/8], which is what I would
have exacted. It seems Perl 5.8.0 and above sometimes incorrectly return
the number of used buckets, as in the fourth line, there are four
key-value pairs, but only 3 buckets.... how can this be?

Learn how hash tables work. A 'bucket' isn't a key, but a set of keys
that hash to the same value; after that perl will do a linear scan
through all the keys in the bucket looking for one that matches.
Obviously, for efficiency, you want this final linear scan to be as
short as possible; this is why it is important to use a hash function
that distributes the keys evenly between the buckets.

Presumably the hash function was tweaked in 5.8, and two of the strings
'1'..'5' now end up in the same bucket; I would expect that this was
done to make some real-world set of keys distribute better, but I don't
know.

Ben
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was NOT [per weedlist] sent to
Ilya Zakharevich
With randomized hashing, 5 people with 8 possible birth-weekdays would
have a quite large chance of a collision, 1 - 8*7*6*5*4 / 5^8 = 80% ^^^
8^5

(birthday paradox). So it is not surprising that what you got is a
collision.

[The answer is AFAIK correct, only the expresion was wrong...]

Sorry,
Ilya
 
C

comp.lang.c++

Quoth "szr" <[email protected]>:

...
Presumably the hash function was tweaked in 5.8, and two of the strings
'1'..'5' now end up in the same bucket; I would expect that this was
done to make some real-world set of keys distribute better, but I don't
know.

Or maybe hv.h provides the clue:

/* hash a key */
...
The "hash seed" feature was added in Perl 5.8.1
to perturb the results to avoid "algorithmic
complexity attacks".
 
B

Ben Morrow

Quoth "comp.lang.c++ said:
Or maybe hv.h provides the clue:

/* hash a key */
...
The "hash seed" feature was added in Perl 5.8.1
to perturb the results to avoid "algorithmic
complexity attacks".

Nope. Firstly, the hashing appears to have changed *before* 5.8.1;
secondly, as of 5.8.2 (IIRC) the random-hash-seed behaviour only kicks
in on hashes that are actually under attack.

Ben
 
C

comp.lang.c++

Quoth "comp.lang.c++" <[email protected]>:






Nope. Firstly, the hashing appears to have changed *before* 5.8.1;
secondly, as of 5.8.2 (IIRC) the random-hash-seed behaviour only kicks
in on hashes that are actually under attack.

Seems strange the new hashing behavior - at least in the example
mentioned - is less distributive than 5.6.1. That is, the same 2 keys
now hash the same and get bucketed together whereas with 5.6.1 they
didn't. It just seems very counter-intuitive
that 2 different, single character keys would hash
to the same value in any case if the algorithm
bore any resemblance to the classic one:

int i = key_length;
unsigned int hash = 0;
char *s = key;
while (i--) { hash = hash * 33 + *s++; }
 
B

Ben Morrow

Quoth "comp.lang.c++ said:
Seems strange the new hashing behavior - at least in the example
mentioned - is less distributive than 5.6.1. That is, the same 2 keys
now hash the same and get bucketed together whereas with 5.6.1 they
didn't. It just seems very counter-intuitive
that 2 different, single character keys would hash
to the same value in any case if the algorithm
bore any resemblance to the classic one:

int i = key_length;
unsigned int hash = 0;
char *s = key;
while (i--) { hash = hash * 33 + *s++; }

5.6 used exactly that; 5.8 changed it to

char *s_PeRlHaSh_tmp = str;
unsigned char *s_PeRlHaSh = (unsigned char *)s_PeRlHaSh_tmp;
I32 i_PeRlHaSh = len;
U32 hash_PeRlHaSh = 0;
while (i_PeRlHaSh--) {
hash_PeRlHaSh += *s_PeRlHaSh++;
hash_PeRlHaSh += (hash_PeRlHaSh << 10);
hash_PeRlHaSh ^= (hash_PeRlHaSh >> 6);
}
hash_PeRlHaSh += (hash_PeRlHaSh << 3);
hash_PeRlHaSh ^= (hash_PeRlHaSh >> 11);
(hash) = (hash_PeRlHaSh + (hash_PeRlHaSh << 15));

which apparently performs better on real-world data.

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,679
Members
48,796
Latest member
Greg L.

Latest Threads

Top