Bug in &= (bitwise or)

A

Anno Siegel

I am observing this strange behavior:

# prepare a string
my $str = 'aa';
$str &= 'a'; # shorten it
print "str: $str\n"; # a single "a" as expected

# $str = "$str"; # this heals the defect (if any)

# something is wrong, though
die "Ha!\n" unless $str =~ /a+$/; # this dies!

The pattern should, of course, match. Similar patterns, like /a$/ and
/a+/ do match, but /a+$/ isn't recognized. Copying the string into itself
normalizes the behavior. "use bytes" makes no difference.

Whether the bug (or am I missing something?) is in &= or the regex
engine (gasp) is anyone's guess. My money is on string-truncation
by &=. It would be rarely-exercised code, other bitwise operations
don't shorten.

Anno
 
T

Tassilo v. Parseval

Also sprach Anno Siegel:
I am observing this strange behavior:

# prepare a string
my $str = 'aa';
$str &= 'a'; # shorten it
print "str: $str\n"; # a single "a" as expected

# $str = "$str"; # this heals the defect (if any)

# something is wrong, though
die "Ha!\n" unless $str =~ /a+$/; # this dies!

The pattern should, of course, match. Similar patterns, like /a$/ and
/a+/ do match, but /a+$/ isn't recognized. Copying the string into itself
normalizes the behavior. "use bytes" makes no difference.

Whether the bug (or am I missing something?) is in &= or the regex
engine (gasp) is anyone's guess. My money is on string-truncation
by &=. It would be rarely-exercised code, other bitwise operations
don't shorten.

After the bitwise-and, the string appears not to be NULL-terminated any
longer, at least not at the offset where perl usually finds the NULL
termination. That might be confusing the regex engine.

For testing what the raw string looks like after the bitwise-and, you
can use:


use Inline C => Config => BUILD_NOISY => 1;
use Inline C => <<'EOC';

void test (SV *sv) {
int i = 0;
char *c = SvPVX(sv);

while (i++ < SvLEN(sv))
printf("%i,", *c++);

sv_dump(sv);
}

EOC

my $a = 'aa';
$a &= 'a';
test($a);

Then I am not sure myself what the result of

$s = 'aa' & 'a'

should be.

Tassilo
 
A

attn.steven.kuo

Anno said:
I am observing this strange behavior:

# prepare a string
my $str = 'aa';
$str &= 'a'; # shorten it
print "str: $str\n"; # a single "a" as expected

# $str = "$str"; # this heals the defect (if any)

# something is wrong, though
die "Ha!\n" unless $str =~ /a+$/; # this dies!

The pattern should, of course, match. Similar patterns, like /a$/ and
/a+/ do match, but /a+$/ isn't recognized. Copying the string into itself
normalizes the behavior. "use bytes" makes no difference.

Whether the bug (or am I missing something?) is in &= or the regex
engine (gasp) is anyone's guess. My money is on string-truncation
by &=. It would be rarely-exercised code, other bitwise operations
don't shorten.


According to Devel::peek, string truncation results
in a non-NUL terminated Perl string? Not sure
if this narrows down the problem...


use Devel::peek;

my $string = 'aa';
Dump($string);

$string &= 'a';
Dump($string);

$string = "$string";
Dump($string);


__END__

SV = PV(0x11ac80) at 0x129f90
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x11a4f8 "aa"\0
CUR = 2
LEN = 3
SV = PV(0x11ac80) at 0x129f90
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x11a4f8 "a"
CUR = 1
LEN = 3
SV = PV(0x11ac80) at 0x129f90
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x11a4f8 "a"\0
CUR = 1
LEN = 3
 
B

Big and Blue

Anno said:
I am observing this strange behavior:

I'm observing one in the subject - given that &= is a bitwise AND.
# prepare a string
my $str = 'aa';
$str &= 'a'; # shorten it

Hmmmm...surely you've changed the last character to a NUL byte?
The pattern should, of course, match. Similar patterns, like /a$/ and
/a+/ do match, but /a+$/ isn't recognized.

Which woudl fit with the string actually being "a\000".
Whether the bug (or am I missing something?) is in &= or the regex
engine (gasp) is anyone's guess. My money is on string-truncation
by &=. It would be rarely-exercised code, other bitwise operations
don't shorten.

If you "use re qw( debug );" and change the &= line to:

$str &= "\000a";

you'll find that this leaves you with "\000a", so I'n guessing that the
string you have created does end with a NUL, but Perl is confused as to
whether it is there?
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Tassilo v. Parseval
After the bitwise-and, the string appears not to be NULL-terminated any
longer, at least not at the offset where perl usually finds the NULL
termination. That might be confusing the regex engine.

For testing what the raw string looks like after the bitwise-and, you
can use:

perl -MDevel::peek -wle "my $a = q(aa); $a &= q(a); print Dump $a"

SV = PV(0x40c64) at 0x40a24
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x42020 "a"
CUR = 1
LEN = 3

As you can see, PV is not null-terminated. Here is how
null-terminated stuff is output:


perl -MDevel::peek -wle "my $a = q(a); print Dump $a"
SV = PV(0x40c64) at 0x40a24
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x42020 "a"\0
CUR = 1
LEN = 2

Hope this helps,
Ilya
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Anno Siegel
my $str = 'aa';
$str &= 'a'; # shorten it
die "Ha!\n" unless $str =~ /a+$/; # this dies!
Whether the bug (or am I missing something?) is in &= or the regex
engine (gasp) is anyone's guess.

Both.

&= should (as any Perl operation) produce \0-terminated string.

REx engine (as any Perl operation) should work on non-\0-terminated
strings too.

The only reason to have \0-termination is to allow the string to be
passed to system calls (like open()) AS IS.

Hope this helps,
Ilya
 
T

Tassilo v. Parseval

Also sprach Ilya Zakharevich:
[A complimentary Cc of this posting was sent to
Tassilo v. Parseval
After the bitwise-and, the string appears not to be NULL-terminated any
longer, at least not at the offset where perl usually finds the NULL
termination. That might be confusing the regex engine.

For testing what the raw string looks like after the bitwise-and, you
can use:

perl -MDevel::peek -wle "my $a = q(aa); $a &= q(a); print Dump $a"

SV = PV(0x40c64) at 0x40a24
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x42020 "a"
CUR = 1
LEN = 3

As you can see, PV is not null-terminated. Here is how
null-terminated stuff is output:

Yes, well, I am aware of Dump() and how a NULL-termination is rendered.
It was after I saw the above output that I became curious and wanted to
see what characters actually were in the PV slot.

To me it seems that the perl core isn't quite sure whether it should
adhere to SvCUR or instead rather believe what is in PV. In any case,
perl obviously gets confused when

SvPVX(sv)[SvCUR(sv)] != '\0'

Did someone already file a bugreport?

Tassilo
 
A

Anno Siegel

Tassilo v. Parseval said:
Did someone already file a bugreport?

I will. Want to check against bleadperl first. I'll also at least go
through the motions of seeing if it has been reported before.

Anno
 
A

Anno Siegel

Anno Siegel said:
I will. Want to check against bleadperl first. I'll also at least go
through the motions of seeing if it has been reported before.

[Anno again]

The bug is still in perl-5.9.2, I've sent a report. Fun with perlbug, as
usual.

BTW, the combination of bitwise operations and regex matching that tickles
the bug isn't as exotic as it may seem. When you work with vec(), trailing
zero bytes in a string are essentially invisible -- strings behave as if
padded with infinitely many zeroes. Therefore trailing zeroes can make
strings look different (to eq) that are really the same as far as vec()
is concerned. To get rid of trailing zeroes, s/\0+$// offers itself,
particularly after &= which may have created them even if the operands
didn't have any.

Anno
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Tassilo v. Parseval
For testing what the raw string looks like after the bitwise-and, you
can use:

Is not it much easier to parse the output of Devel::peek, and read the
PV by unpack()?
my $a = 'aa';
$a &= 'a';
test($a);

For those who are too lazy to run this, the result it

97,97,0
Then I am not sure myself what the result of

$s = 'aa' & 'a'

should be.

I think the current result is both correct and intuitive enough
(modulo two bugs which comprise this problem). It is compatible with
both

a) junk-in-junk-out ("what is after end of 'a' is junk")
b) strings behave as if followed by infinitely many \0s.

By (b), the output string should also be considered as having
infinitely many \0s; the question is where to stop this flow. And (a)
looks as a reasonable argument to choose this cut-off point.

[My opinion may be a little bit skewed, since I do not remember
whether it was me who decided on this behaviour. ;-]

Hope this helps,
Ilya
 
T

Tassilo v. Parseval

Also sprach Ilya Zakharevich:
Is not it much easier to parse the output of Devel::peek, and read the
PV by unpack()?

No, it wasn't for me. :)

Can you give an example how to do it with unpack? I feel the 'P'
template is needed but I never know how to use that one.
For those who are too lazy to run this, the result it

97,97,0


I think the current result is both correct and intuitive enough
(modulo two bugs which comprise this problem). It is compatible with
both

a) junk-in-junk-out ("what is after end of 'a' is junk")
b) strings behave as if followed by infinitely many \0s.

By (b), the output string should also be considered as having
infinitely many \0s; the question is where to stop this flow. And (a)
looks as a reasonable argument to choose this cut-off point.

What are those two bugs you mentioned? For me the real bug is that an
'impossible' string value can be constructed thus. I would expect:

('aa' & 'a') eq "a\0"

Taking (b) into account, the smaller string should be padded with '\0'
which, on bit-wise ANDing, should yield '\0'.

There's another oddity:

$ perl -MDevel::peek -e 'my $a = 'aa'; $a &= 'a'; Dump($a)'
SV = PV(0x814ce90) at 0x814cc6c
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x815d628 "a"
CUR = 1
LEN = 3

$ perl -MDevel::peek -e 'my $a = 'aa' & 'a'; Dump($a)'
SV = PV(0x814cf20) at 0x814cc6c
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x815c0e8 "a"\0
CUR = 1
LEN = 2

Why are those two not equivalent?
[My opinion may be a little bit skewed, since I do not remember
whether it was me who decided on this behaviour. ;-]

I am sure it is. ;-)

Tassilo
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Tassilo v. Parseval
Also sprach Ilya Zakharevich:


No, it wasn't for me. :)

Can you give an example how to do it with unpack? I feel the 'P'
template is needed but I never know how to use that one.

You are right: I thought that one can easily get the result of Dump
into a variable. Probably not easy... So to do it without fork()
would not be easy:

#!/usr/bin/perl -wl

use strict;
use Devel::peek;

# Prepare what to inspect
my $a = 'aa';
$a &= 'a';

defined (my $pid = open my $p, '-|') or die "Can't fork() to self-pipe: $!";
if ($pid) { # parent
my $out;
{
local $/;
$out = <$p>;
close $p or die;
}
# Parse output of Dump using the expected format below:
my ($addr, $len) = ($out =~ m/
^ \s+ PV \s* = \s* (0x[[:xdigit:]]+) \b
.*?
^ \s+ LEN \s* = \s* (\d+) \b
/xsm);
die "unexpected format of output of Dump" unless $addr and $len;

my $buff = unpack "P$len", pack 'J', hex $addr;
print ord for split //, $buff;
} else { # kid
open STDERR, '>&', \*STDOUT or die;
Dump $a;
###SV = PV(0x40c64) at 0x40a24
### REFCNT = 1
### FLAGS = (PADBUSY,PADMY,POK,pPOK)
### PV = 0x42020 "a"
### CUR = 1
### LEN = 3
}
__END__
What are those two bugs you mentioned? For me the real bug is that an
'impossible' string value can be constructed thus.

Well, the REx engine operates in terms of start-of-string and
end-of-string. It should not read behind.

Moreover, IMO, it is important to support variables which are not
\0-terminated as wide as possible. E.g., this way one could do
substr() with copy-on-modify semantic.
I would expect:

('aa' & 'a') eq "a\0"

Taking (b) into account, the smaller string should be padded with '\0'
which, on bit-wise ANDing, should yield '\0'.

.... And, since this \0 comes from "extrapolated" values, it should be
"deextrapotated"; in other words, stripped.
There's another oddity:
$ perl -MDevel::peek -e 'my $a = 'aa'; $a &= 'a'; Dump($a)'
SV = PV(0x814ce90) at 0x814cc6c
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x815d628 "a"
CUR = 1
LEN = 3

We know this already...
$ perl -MDevel::peek -e 'my $a = 'aa' & 'a'; Dump($a)'
SV = PV(0x814cf20) at 0x814cc6c
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x815c0e8 "a"\0
CUR = 1
LEN = 2

Here 'aa' & 'a' is a temporary; most probably not \0-terminated. Now
the assignment operator fills $a from the values in the temporary; as
any well-behaved Perl operator, it does not care whether there is a
trailing \0. So it does not know that the temporary is "buggy".

Hope this helps,
Ilya
 
T

Tassilo v. Parseval

Also sprach Ilya Zakharevich:
[A complimentary Cc of this posting was sent to
Tassilo v. Parseval
Also sprach Ilya Zakharevich:


No, it wasn't for me. :)

Can you give an example how to do it with unpack? I feel the 'P'
template is needed but I never know how to use that one.

You are right: I thought that one can easily get the result of Dump
into a variable. Probably not easy... So to do it without fork()
would not be easy:

[...]

Ah, thank you. I have to make a mental note that the p/P templates work
on memory addresses (I don't like the term 'pointer' which is used in
`perldoc -f pack`).
Well, the REx engine operates in terms of start-of-string and
end-of-string. It should not read behind.
Agreed.

Moreover, IMO, it is important to support variables which are not
\0-terminated as wide as possible. E.g., this way one could do
substr() with copy-on-modify semantic.

Is that the current state of the affairs or rather an item on the
wishlist.
... And, since this \0 comes from "extrapolated" values, it should be
"deextrapotated"; in other words, stripped.

I have to admit that I never really read what perlop has to say on the
bit-wise AND for strings of differing length. Now that Abigail spelled
it out for me in that parallel posting I see it a little more clearly.
We know this already...


Here 'aa' & 'a' is a temporary; most probably not \0-terminated. Now
the assignment operator fills $a from the values in the temporary; as
any well-behaved Perl operator, it does not care whether there is a
trailing \0. So it does not know that the temporary is "buggy".

That can't be the explanation, because:

$ perl -MDevel::peek -e 'my ($b, $c) = qw/aa a/; my $a = $b & $c; Dump($a)'
SV = PV(0x814ce78) at 0x8160d28
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x8166520 "a"\0
CUR = 1
LEN = 2

and:

$ perl -MDevel::peek -e 'my $b = q/aa/; my $a = $b & 'a'; Dump($a)'
SV = PV(0x814cf38) at 0x8160cd8
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x8163d48 "a"\0
CUR = 1
LEN = 2

Tassilo
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Tassilo v. Parseval
Is that the current state of the affairs or rather an item on the
wishlist.

It is one of those things perl *must* have to be considered a serious
string-manipulation language. Without efficient and flexible "string
type" many operations which would be easy to do in many other
languages would take centuries in Perl (linear algorithms become
quadratic in Perl).

I do not expect that 5.9 has it (although this particular part would
be easy to implement). Please surprise me. ;-)
That can't be the explanation

However, it is. ;-)

I do not see why you think your examples contradict my argument. All
of them inspect results of assignment operator. In all of them the
result is fine (as my explanation implies).

Hope this helps,
Ilya
 
A

Anno Siegel

Anno Siegel said:
Anno Siegel said:
I will. Want to check against bleadperl first. I'll also at least go
through the motions of seeing if it has been reported before.

[Anno again]

The bug is still in perl-5.9.2, I've sent a report. Fun with perlbug, as
usual.

....and fixed, at least the bug in &= is. The one in m// (relying on a
trailing zero) seems to be still there, but now it will be harder to
produce such strings in Perl.

The bug tracking ticket is #37616, if anyone cares.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top