Why can't I xor strings?

A

Andrew Dalke

Paul said:
I think != is appropriate in this situation.
if bool(x) != bool(y): ...

while I prefer my already mentioned

if bool(x) + bool(y) == 1:
print "One and only one given"

That's because the pure logic ones, both != and
xor, confuse me while arithmetic doesn't.

It's also extensible to, say, "all three
parameters are either empty strings or non-empty
strings".

if 0 < bool(x) + bool(y) + bool(z) < 3:
print "that's not allowed"

BTW, I've used an idiom like this with some
success

def blah(filename = None, infile = None, url = None):
if (bool(filename is None) + bool(infile is None) +
bool(url is None)) != 2:
raise TypeError("Must specify one and only one of "
"'filename', 'infile', or 'url'")
if filename is not None:
infile = open(filename)
elif url is not None:
infile = urllib.urlopen(url)
...


Try that with an xor.

Andrew
(e-mail address removed)
 
G

Grant Edwards

Grant> I don't know what you mean by that. Nobody seems to have a
Grant> problem with "and" "or" and "not" operators using the truth
Grant> values of strings. What is there about "xor" that
Grant> precludes it from behaving similarly?

It's just that logical xor is an extremely rare beast.

Probably so, but that doesn't support the arguement that
there's something wrong with a logical xor argument coercing
it's operands to boolean values unless one also argues that
the logical "and", "or" and "not" operators should also not
coerce their operands to booleans.
I would probably prefer to see the operation expanded to the
more typical and-or-nots in real code.

If logical operators shouldn't coerce operands, then what you
should see is:

if (bool(string1) and (not bool(string2)) or ((not bool(string1)) and bool(string2))
 
G

Grant Edwards

Why the distinction? In your code you call bool on
an object at least once and perhaps twice. The
truth of an object should only be checked once. You
also have asymmetric return values. Consider

s1 s2 s1 xor s2
"A" "B" False
"A" "" True
"" "B" "B"
"" "" False

Hey, _that_ particular bug isn't my fault. Somebody else
decided that "and" and "or" don't return boolean values like
they should. ;)
Esthetics suggest that either "A"/"" return "A" or that
""/"B" return True. Mine does the latter. Yours does
neither. Probably the Pythonic way, assuming 'xor'
can be considered Pythonic, is to return the object
which gave the True value, like this

def xor_f(x, y):
bx = bool(x)
by = bool(y)
if bx:
if not by:
return bx
return False
else:
if by:
return by
return False

That's ugly. But then again, "A" or "B" returning "A" is ugly.

While I agree with your points, they're immaterial to the
argument I was making. The poster to which I responded was
arguing that "xor" didn't make sense because having it coerce
it's arguments to booleans was "wrong".
 
J

Jeremy Bowers

While I agree with your points, they're immaterial to the
argument I was making. The poster to which I responded was
arguing that "xor" didn't make sense because having it coerce
it's arguments to booleans was "wrong".

I didn't say "wrong", I said "non-obvious".

And, with respect, the fact that you have to argue your point is evidence
in my favor. Of course "proof" would require a comprehensive survey, and I
think we can all agree this point isn't worth such effort :)

But I do think that a bitwise operator should silently transform to a
logical operator for strings is not obvious.

What is 388488839405842L ^ "hello"?

Python says that's an "unsupported operand type(s) for ^: 'long' and
'str'". Why is it wrong?

This would also work if Python were more like C++ and we could define

xor(string, string)
xor(int, int)

and be done with it, but in Python, there should be an obvious meaning for
int ^ string, and there isn't.

It is also true that I recommended the OP consider subclassing string to
make ^ do what he wants. But it seems to be reasonably well expected that
while user classes can do what they like with operators (as long as they
are willing to pay the sometimes-subtle prices, especially the ones
involved with poorly defining __cmp__), that the core language should not
do such things.
 
G

Grant Edwards

But I do think that a bitwise operator should silently
transform to a logical operator for strings is not obvious.

What is 388488839405842L ^ "hello"?

No, I wouldn't want that either. What I was talking about was
the possibility of an "xor" logical operator that would behave
in a manner similar to the other logical operators in that it
coerces it's parameters to boolean values.
 
B

Bengt Richter

I didn't say "wrong", I said "non-obvious".

And, with respect, the fact that you have to argue your point is evidence
in my favor. Of course "proof" would require a comprehensive survey, and I
think we can all agree this point isn't worth such effort :)

But I do think that a bitwise operator should silently transform to a
logical operator for strings is not obvious.

What is 388488839405842L ^ "hello"?

Python says that's an "unsupported operand type(s) for ^: 'long' and
'str'". Why is it wrong?
I agree it is not wrong. The user is reminded that s/he should be explicit
and create a type that behaves as desired. However (;-) ...

What's right about accepting 3^7 ? Why should xor be defined for integers?
IMO we have implicit subtyping of integers as vectors or column matrices
of bools and operations element by element and implicit re-presentation
of the result as integer.

This dual interpretation of course reflects CPU hardware functionality, and
I would argue that familiarity (not to mention the expediency of conciseness)
has bred acceptance of operator notation without explicit cruft such as
int(boolvec(3)^boolvec(7)) instead of 3^7. The trouble is that a boolvec
should have a length, and boolvec(3) really hides implicit determination of length
(by dropping sign bits of unsigned value or extending signed integer sign bits infinitely),
and the ^ operation between boolvecs of different length hides implicit normalization to
the length of the longer (or infinity with compressed representation)). Again,
CPU hardware legacy comes into play, with the hidden imposition of length=32, sometimes
64 now, and we are forced (happily IMO) into defining what we mean in terms of
physical-representation-independent abstractions.

So the question becomes

What is boolvec(388488839405842L) ^ boolvec("hello")?

and boolvec("hello") is the more ambiguous one. If we wrote

boolvec("hello", as_bytes=True)

I think most would have an idea of what it meant -- until they
started to think about endianness ;-) I.e., what is the value of the following?

(boolvec("hello", asbytes=True) & boolvec(0xffff)).as_string() #note '&' for simpler example

should it be

"he\x00\x00\x00"

or do you see it as big-endian

"\x00\x00\x00lo"

and should "sign" bits be dropped, so the result would be "he" or "lo" ?
Or -- should boolvec("hello") default to boolvec(bool("hello"), length=1) ?
This would also work if Python were more like C++ and we could define

xor(string, string)
xor(int, int)

and be done with it, but in Python, there should be an obvious meaning for
int ^ string, and there isn't.

Notice that no one complains about intgr^intgr
not being defined as int(bool(intgr)^bool(intgr)) ?;-)
It is also true that I recommended the OP consider subclassing string to
make ^ do what he wants. But it seems to be reasonably well expected that
while user classes can do what they like with operators (as long as they
are willing to pay the sometimes-subtle prices, especially the ones
involved with poorly defining __cmp__), that the core language should not
do such things.
Except where there is an accepted legacy of such things being done already ;-)

BTW, what is the rationale behind this:
>>> ['',(),[],0, 0.0, 0L].count(0) 3
>>> ['',(),[],0, 0.0, 0L].count(()) 1
>>> ['',(),[],0, 0.0, 0L].count([]) 1
>>> ['',(),[],0, 0.0, 0L].count('') 1
>>> ['',(),[],0, 0.0, 0L].count(0.0) 3
>>> ['',(),[],0, 0.0, 0L].count(0L)
3

BTW2, just had the thought that ',' could be generalized as an operator. E.g.,

obj, x

would mean

type(obj).__dict__['__comma__'](obj, x)

unless __comma__ was undefined. And you could have __rcomma__ for x, obj.
Returning the same type object would support multiple commas as in obj, x, y

Of course, you would get interesting effects in such contexts as print obj,x,y
and foo(obj,x,y) or even (obj,x,y) vs (obj,x,y,) vs ((obj,x,y),) ;-)

Probably sequence-building should have priority and be overridden by parenthesized
expression. So print obj,x would be effectively be print str(obj),x and
print (obj,x,y) would be print str(type(obj).__dict__['__comma__'](obj,x)).
Similarly for arg list formation.

Regards,
Bengt Richter
 
D

David Bolen

Grant Edwards said:
Only because Python lacks a logical xor operator, so you're
used to thinking of ^ as a bitwise operator. What if you saw

string1 xor string2?

Wouldn't you expect it to be equivalent to

(string1 and (not string2)) or ((not string1) and string2)

Yes, no problem. I was definitely working with the bitwise operator
which is what I thought the OP was originally desiring.
I don't know what you mean by that. Nobody seems to have a
problem with "and" "or" and "not" operators using the truth
values of strings. What is there about "xor" that precludes it
from behaving similarly?

I think we're just talking about different items. I was referring to
the bitwise (^) xor operator (numeric), but not to a logical xor
operator. I'd have no problem with a separate logical operator that
behaved like the other logical operators.

The fact that saying "xor" can imply either isn't helping :)

-- David
 
D

David Bolen

Grant Edwards said:
While I agree with your points, they're immaterial to the
argument I was making. The poster to which I responded was
arguing that "xor" didn't make sense because having it coerce
it's arguments to booleans was "wrong".

Well, actually I said:

"Logical operators work that way but not numerics ..."

so I'm not arguing against a logical "xor" working that way, but not
the existing xor (^) operation in Python, which is bitwise.

-- David
 
J

Jeremy Bowers

What's right about accepting 3^7 ? Why should xor be defined for integers?
IMO we have implicit subtyping of integers as vectors or column matrices
of bools and operations element by element and implicit re-presentation
of the result as integer.

I'm intrigued but torn by your arguments.

For positive numbers, a number really is its vector of bits. In the
mathematical sense of "equal" (which I usually express in English as "two
equal things are fully substitutable with each other in all relevant
contexts"), where all base numbers are written in base 10:

10 base 10 = 11 base 9 = 22 base 4 = 1010 base 2

The fact that it happens to really be a bit vector in the computer in this
case actually doesn't count for anything, because just as that bit vector
*is*, in every conceivable mathematical way, a base 10 number, it is also
a base 3 number. I can and have defined ternary math before, and
experimental computers have been built on ternary bases and ternary logic,
and 44 in a binary computer is 44 in a ternary computer.

Only our choice of negative numbers, practically speaking, makes a
difference between the number-as-bitstring and number-qua-number, and
something as high level as Python can conceptually use an extra "negative"
bit so even that distinction goes away. In fact, playing with xor'ing
longs makes me wonder if Python doesn't *already* do that.
What is boolvec(388488839405842L) ^ boolvec("hello")?

I'd rather see this as the fairly meaningless "388488839405842L base 2",
meaningless because the base of a number does not affect its value, and
bitwise-xor, as the name implies, operates on the number base two.

Since I reject the need to cast ^ in terms of boolvec, I don't feel
compelled to try to define "boolvec" for strings. Cast it into a number
explicitly, refusing the temptation to guess.
Except where there is an accepted legacy of such things being done already ;-)

BTW, what is the rationale behind this:
['',(),[],0, 0.0, 0L].count(0) 3
['',(),[],0, 0.0, 0L].count(()) 1
['',(),[],0, 0.0, 0L].count([]) 1
['',(),[],0, 0.0, 0L].count('') 1
['',(),[],0, 0.0, 0L].count(0.0) 3
['',(),[],0, 0.0, 0L].count(0L)
3

Again, considered abstractly as numbers, 0 is zero no matter how you slice
it. Abstractly, even floats have a binary representation, just as they
have a decimal representation, and we could even xor them. Realistically,
it is much less useful and there is the infinite-length problem of floats
to deal with.

Theoretically, no float should ever equal an int or a long, since an Int
or Long conceptually identifies exactly one point and a float should be
seen as representing a range of numbers depending on the precision that
can be made arbitrarily small but never truly identifies one point. This
is another one of those "practicality beats purity" things.
 
B

Bengt Richter

I'm intrigued but torn by your arguments.

For positive numbers, a number really is its vector of bits. In the
^^^^^[1] [2]^^ ^^^[3] ^^^^[4]
[1] Abstract, non-negative integers?
[2] "is" as in "is identical with"? I doubt it.
[3] In what sense does an abstract number posess (its) a vector of bits?
I take it 'its' is in the sense of a 1:1 mapping to the corresponding
(unique) vector of "bits".
[4] A two's complement representation of an (abstract) integer, has "bits"
as _numerical_ values in the sum series with corresponding numerical
coefficients that are powers of two. But "bits" in the context of
bitwise ^ or & or | do not represent abstract _numerical_ values or 0 or 1,
they represent logical values False or True (and 0<->False, 1<->True is
just a convention that we're used to.

mathematical sense of "equal" (which I usually express in English as "two
equal things are fully substitutable with each other in all relevant
contexts"), where all base numbers are written in base 10:

10 base 10 = 11 base 9 = 22 base 4 = 1010 base 2

The fact that it happens to really be a bit vector in the computer in this
[1]^^ [2]^^^^^^^^^ ^^^[3]
[1] The abstract integer value?
[2] have 1:1 mapping to?
[3] bit is short for binary _digit_, not binary logic-value -- i.e., it's numeric,
and ties in with your base 2 representation.
case actually doesn't count for anything, because just as that bit vector
*is*, in every conceivable mathematical way, a base 10 number, it is also
1:1 correspondence is not identity. IOW, there's no such thing as "a base 10 number"
There is a set of abstract integers, and each can be put into 1:1 correpondence with
a set of symbols and rules. A "base 10 number" is composed with a set of digits [0-9]
and a rule for ordered composition and a rule for an interpretation to map back to
the abstract value. We can discuss each part of this process, but the bottom line is
that when we say that represented values are equal, we are saying that representations
map back to the same unique abstract value, not that representations per se are identical.
a base 3 number. I can and have defined ternary math before, and
experimental computers have been built on ternary bases and ternary logic,
and 44 in a binary computer is 44 in a ternary computer.
Sure, but again we are talking unique abstract integers vs various representations
and the rules for composing them and interpreting them. BTW, did you ever hear
of bi-quinary? That was a 4-bit representation of a decimal digit where the msb
had a "coefficient" of 5 and the 3 lsb's counted in binary modulo 5 ;-)
Only our choice of negative numbers, practically speaking, makes a
difference between the number-as-bitstring and number-qua-number, and
something as high level as Python can conceptually use an extra "negative"
bit so even that distinction goes away. In fact, playing with xor'ing
longs makes me wonder if Python doesn't *already* do that.
Yes, it plays the as-if-extended-infinitely-to-the-left game with sign bits,
but the hex representation of negative longs is not much good for seeing what
the bits are in a negative number >;-(
I'd rather see this as the fairly meaningless "388488839405842L base 2",
meaningless because the base of a number does not affect its value, and
bitwise-xor, as the name implies, operates on the number base two.
Well, to be nit-picky, I still think it does not really operate on a number
at all. It converts a number to an ordered set of bools, by way of an intermediate
two's complement _representation_ whose _numeric_ bit values are mapped to bool values.
Then, after the operating with the ordered sets of bools, the resulting set is mapped
back to a two's complement integer representation again, which can be interpreted as
an abstract integer value again ;-)
Since I reject the need to cast ^ in terms of boolvec, I don't feel
compelled to try to define "boolvec" for strings. Cast it into a number
explicitly, refusing the temptation to guess.
Except where there is an accepted legacy of such things being done already ;-)

BTW, what is the rationale behind this:
['',(),[],0, 0.0, 0L].count(0) 3
['',(),[],0, 0.0, 0L].count(()) 1
['',(),[],0, 0.0, 0L].count([]) 1
['',(),[],0, 0.0, 0L].count('') 1
['',(),[],0, 0.0, 0L].count(0.0) 3
['',(),[],0, 0.0, 0L].count(0L)
3
I guess this shows what's happening
... def __cmp__(self, other): return 0
...
>>> ['',(),[],0, 0.0, 0L, joker()].count('') 2
>>> ['',(),[],0, 0.0, 0L, joker()].count(()) 2
>>> ['',(),[],0, 0.0, 0L, joker()].count(0) 4
>>> ['',(),[],0, 0.0, 0L, joker()].count(0.0) 4
>>> ['',(),[],0, 0.0, 0L, joker()].count(0L) 4
>>> ['',(),[],0, 0.0, 0L, joker()].count(joker())
7
Again, considered abstractly as numbers, 0 is zero no matter how you slice
it. Abstractly, even floats have a binary representation, just as they
I think of "float" as indicating representation technique, as opposed to "real"
which I think of as a point in the continuous abstract -+infinity interval
of real numbers.
have a decimal representation, and we could even xor them. Realistically,
it is much less useful and there is the infinite-length problem of floats
to deal with.
Infinite-length problem...must...not...go...there... ;-)
Theoretically, no float should ever equal an int or a long, since an Int
or Long conceptually identifies exactly one point and a float should be
seen as representing a range of numbers depending on the precision that
IMO a float should be seen as what it is (a _representation_), which may be
interpreted variously, just like integers ;-)

IOW, a typical float is a 64-bit IEEE-754 double, so there are 2**64
possible states of the representation machinery. Not all of
those states are legal floating point number representations, but each one
that _is_ legal corresponds to an _exact_ abstract real value. It _can_ be
the exact value you wanted to represent, or not, depending on your programming
problem. Inexactness in not a property of floating point representations, it is
a property of their _relation_ to the exact abstract values the programmer is trying
to represent. 0.0 represents the exact same abstract value as integer 0
(perhaps by way of mapping integers conceived as separate to a subset of reals).
can be made arbitrarily small but never truly identifies one point. This
I disagree. 0.0 truly identifies one point. So does

3.141592653589793115997963468544185161590576171875

which is the _exact_ value of math.pi in decimal representation.
But this is _not_ the exact value of the abstract mathematical pi.

But that is not a problem with floating point numbers' not representing
exact values. It's that the set of available exact values is finite, and
you have to choose one to approximate (including right on sometimes) an abstract value.
(Or you might choose two to represent _exact_ bounds on a value, for interval math).
is another one of those "practicality beats purity" things.
I think the fact that repr(math.pi) is not printed as above is
a case of "practicality beats purity" ;-)
True

IMO floats have their own kind of purity, besides being practical ;-)

Regards,
Bengt Richter
 
J

Jeremy Bowers

For positive numbers, a number really is its vector of bits. In the
^^^^^[1] [2]^^ ^^^[3] ^^^^[4]
[1] Abstract, non-negative integers?

Integers is more convenient here, but any one dimensional number will do.
It so happens that computers do not encode "reals" in the naive way, to
allow for more variation in magnitude.
[2] "is" as in "is identical with"? I doubt it.

With respect, this isn't something to doubt or not doubt. There is one,
and only one, way to represent any positive number in base two, since
encoding sign is not an issue. Assuming an extra bit to show sign, there
is one and only one way to represent any negative number, too. (Zero gets
to be the exception since then you can have positive and negative zero,
which are theoretically identical, but on some machines that implement
them have practical differences.)

Endianness is just some wackiness in the ordering, but does not really
change that. 60000 + 50000 = 110000 no matter what endianess you have.
[3] In what sense does an abstract number posess (its) a vector of bits?

It does not. It *is* a series of ones and zeros, in base two. It *is* a
series of ones, zeros, and twos, in base three. It *is* a series of [0-9]
in base ten. It is all of these things at once, and an infinite number of
other things besides.

1 + 1 fully *is* 2. There is no difference mathematically. If there is,
your math is broken.
[4] A two's complement representation of an (abstract) integer, has
"bits"
as _numerical_ values in the sum series with corresponding numerical
coefficients that are powers of two. But "bits" in the context of
bitwise ^ or & or | do not represent abstract _numerical_ values or
0 or 1, they represent logical values False or True (and 0<->False,
1<->True is just a convention that we're used to.

No. They represent both of those things at once. 1 *is* True, and 0 *is*
false, in the domain of bitwise operators.

There is no "either/or" here, only "both/and".
IOW, there's no such thing as "a base 10 number"... A "base 10 number" is composed with a set of
digits [0-9] and a rule for ordered composition and a rule for an
interpretation to map back to the abstract value.

Well, I thank you for supplying the missing definition, I suppose.
We can discuss each
part of this process, but the bottom line is that when we say that
represented values are equal, we are saying that representations map
back to the same unique abstract value, not that representations per se
are identical.

Yep, that is part of my point, see below.
Well, to be nit-picky, I still think it does not really operate on a
number at all. It converts a number to an ordered set of bools, by way
of an intermediate two's complement _representation_ whose _numeric_ bit
values are mapped to bool values. Then, after the operating with the
ordered sets of bools, the resulting set is mapped back to a two's
complement integer representation again, which can be interpreted as an
abstract integer value again ;-)

In the domain of binary logic, I do not concede a difference between "1"
and "True", on the grounds that there is no way in the domain of binary
logic to distinguish the two.
IOW, a typical float is a 64-bit IEEE-754 double, so there are 2**64
possible states of the representation machinery. Not all of those states
are legal floating point number representations, but each one that _is_
legal corresponds to an _exact_ abstract real value. It _can_ be the
exact value you wanted to represent, or not, depending on your
programming problem. Inexactness in not a property of floating point
representations, it is a property of their _relation_ to the exact
abstract values the programmer is trying to represent. 0.0 represents
the exact same abstract value as integer 0 (perhaps by way of mapping
integers conceived as separate to a subset of reals).

This is a matter of definition. My problem with your definition mostly
lies in the fact that it is not as practically useful as mine. Someone
operating on the "floats are almost, but not quite, exact values" is
*much* less well equipped to handle the way floats bite you than someone
who operates with "floats represent a value and have an error range around
that value", because then they should understand they need to compute not
just with the values, but track the buildup of that error range as well.
It's a point-of-view thing.
I disagree. 0.0 truly identifies one point. So does

3.141592653589793115997963468544185161590576171875

which is the _exact_ value of math.pi in decimal representation. But
this is _not_ the exact value of the abstract mathematical pi.

Python 2.3.4 (#1, Sep 28 2004, 20:15:25)
[GCC 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.True

Look at the last digits. (Python is just a convenience here, this is
fundamental to all float representation schemes, as opposed to things like
Decimal.)

This is why I say they represent a range. In set theory, we'd call this an
equivalence class for equality. There is *no* way to distinguish between
3.1415926535897931159979634685441851615905761718758327502398752309 and
3.14159265358979311599796346854418516159057617187500000000000 with floats
(of the size Python uses, add more digits for some other bit count), not
even in theory. They are the same in every way. That's because the closest
float represents a range that covers them both. You can shrink that range
arbitrarily small but you can not get to a unique point, such that only
one true real will compare equal to that number and all others do not, not
even in theory.
 
G

Grant Edwards

With respect, this isn't something to doubt or not doubt.
There is one, and only one, way to represent any positive
number in base two, since encoding sign is not an issue.
Assuming an extra bit to show sign, there is one and only one
way to represent any negative number, too.

That's news to me. I've used three different base-2
representations for negative numbers in the past week, and I
can think of at least one other one I've used in the past.
(Zero gets to be the exception since then you can have
positive and negative zero,

That depends on which base-2 representation you've chosen. In
two's compiliment and excess-N representations, there is only
one zero value. In signed-magnitude there may be two.
 
A

Alex Martelli

Andrew Dalke said:
Grant Edwards:

BTW, another way to get that is

if bool(str1) + bool(str2) == 1:
print "one and only one of them was empty"

If this is getting into a many-ways-to-skin-the-cat context, I want to
make sure I get my entry in:

[str1, str2].count('') == 1

Other possibilities (I think -- untested, and it's late here):

len([x for x in (str1, str2) if x]) == 1

'' == min(str1, str2) != max(str1, str2)

len(str1+str2) == max(len(str1),len(str2)) != 0

sum(map(bool, (str1, str2))) == 1

not str1 ^ not str2

(str1 or str2) and not min(str1, str2)

not str1 != not str2

str1 != str2 and (str1+str2) == (str2+str1)

I guess I'd better stop here because I can't actually _prove_ the last
one of these actually implies one of the strings is empty but I can't
find counterexamples either...!-)


Alex
 
A

Alex Martelli

Jeremy Bowers said:
The basic problem is that there is no obvious "xor on string" operation
*in general*, even if you stipulate "bitwise". In particular, what does it
mean to bitwise xor two different length strings?

What I, personally, would expect here, would be to leave the trailing
part of the longer string "untouched". I.e., just like:

def bitwisexor(s1, s2):
result = [ chr(ord(a) ^ ord(b)) for a, b in zip(s1, s2) ]
# at most one of the following 2 stmts actually does anything:
result.extend(s1[len(result):])
result.extend(s2[len(result):])

return ''.join(result)

I realize this is the same as imagining the shorter string to be
conceptually padded with as many '\x0' characters as needed -- that just
seems the only "intuitively natural" way to extend "sequence xor'ing" to
sequences of different length...


Alex
 
A

Alex Martelli

Grant Edwards said:
used to thinking of ^ as a bitwise operator. What if you saw

string1 xor string2?

Wouldn't you expect it to be equivalent to

(string1 and (not string2)) or ((not string1) and string2)

Personally, I would not expect 'xor' to be non-commutative, and yet the
expression YOU apparently would expect it to be equivalent to obviousy
is...:

'foo' xor ''

would be True, while

'' xor 'foo'

would be 'foo'. At least, that's how the "equivalent to" expression you
give evaluates, I believe. The problem is that "not x" is a bool for
any x, while "x and y" isn't.

If I had to design the least-astonishing definition of xor in Python, I
would probably have to settle for a mix of the type-producing behavior
of and/or (always return one of the two operands, a wonderful thing) vs
not (always return a bool, and it's hard to imagine it doing otherwise).
"a xor b" would return one of the two operands if exactly one of them
evaluates to true in a boolean context, but False if both or neither do
(sigh).
E.g.,

def xor(a, b):
if (a and b) or (not a and not b): return False
else: return a or b

Unfortunately, it seems to me that his unholy admixture of return types
(hard to avoid...), as well as the inevitable lack of short-circuiting,
makes such a hypothetical 'xor' _WAY_ less useful than the existing
'and'/'or' operators are:-(


Alex
 
B

Bengt Richter

For positive numbers, a number really is its vector of bits. In the
^^^^^[1] [2]^^ ^^^[3] ^^^^[4]
[1] Abstract, non-negative integers?

Integers is more convenient here, but any one dimensional number will do.
It so happens that computers do not encode "reals" in the naive way, to
allow for more variation in magnitude.
True. But my point (just about everywhere in the post) was to draw attention
to the distinction between abstract entities (e.g. numbers) and concrete representations.
[2] "is" as in "is identical with"? I doubt it.

With respect, this isn't something to doubt or not doubt. There is one,
and only one, way to represent any positive number in base two, since
I think we are focusing on different things. When you say "a number really is
its vector of bits" I think wait: a number is a number and a vector of bits
is a vector of bits, but you can't say one _is_ the other. You can say they
correspond in some way. When you say of a number "its vector of bits" I presume
that "its" refers to "its" partner in the number<->bit vector relationship,
not in the sense that an aggregate has its components. To me, a (one-dimensional)
number is a primitive entity, and "has" no parts, though it may correspond to
a representation that does have parts and order etc.

I am hard put to think of anything that is _identical_ with any of the
possible representations that it may be put in correspondence with.
encoding sign is not an issue. Assuming an extra bit to show sign, there
is one and only one way to represent any negative number, too. (Zero gets
to be the exception since then you can have positive and negative zero,
which are theoretically identical, but on some machines that implement
them have practical differences.)

Endianness is just some wackiness in the ordering, but does not really
change that. 60000 + 50000 = 110000 no matter what endianess you have.
Wackiness must be accounted for ;-)
[3] In what sense does an abstract number posess (its) a vector of bits?

It does not. It *is* a series of ones and zeros, in base two. It *is* a
series of ones, zeros, and twos, in base three. It *is* a series of [0-9]
in base ten. It is all of these things at once, and an infinite number of
other things besides.
Depends on what the meaning of "is" is ;-) Your "is" seems to mean
"can be represented by" in my terms. That's not "is" in my book ;-)
1 + 1 fully *is* 2. There is no difference mathematically. If there is,
your math is broken.
You are apparently ignoring representation and just thinking in terms of
what is indicated. So yes, '1 + 1' represents something and '2' represents
something, and for some getabstractvalue, getabstractvalue('1 + 1') _is_
getabstractvalue('2'). But whatever your representation, there is that
function that maps to the abstract mathematical value.
[4] A two's complement representation of an (abstract) integer, has
"bits"
as _numerical_ values in the sum series with corresponding numerical
coefficients that are powers of two. But "bits" in the context of
bitwise ^ or & or | do not represent abstract _numerical_ values or
0 or 1, they represent logical values False or True (and 0<->False,
1<->True is just a convention that we're used to.

No. They represent both of those things at once. 1 *is* True, and 0 *is*
false, in the domain of bitwise operators.
ISTM we have different concepts of "is" ;-)
There is no "either/or" here, only "both/and".
IOW, there's no such thing as "a base 10 number"... A "base 10 number" is composed with a set of
digits [0-9] and a rule for ordered composition and a rule for an
interpretation to map back to the abstract value.

Well, I thank you for supplying the missing definition, I suppose.
Well, I made up an ad hoc definition to try to express my thinking on it ;-)
Yep, that is part of my point, see below.


In the domain of binary logic, I do not concede a difference between "1"
and "True", on the grounds that there is no way in the domain of binary
logic to distinguish the two.
It's a matter of interpretation. I view 1 and True as different, sort of like
[QUOTE= said:
IOW, a typical float is a 64-bit IEEE-754 double, so there are 2**64
possible states of the representation machinery. Not all of those states
are legal floating point number representations, but each one that _is_
legal corresponds to an _exact_ abstract real value. It _can_ be the
exact value you wanted to represent, or not, depending on your
programming problem. Inexactness in not a property of floating point
representations, it is a property of their _relation_ to the exact
abstract values the programmer is trying to represent. 0.0 represents
the exact same abstract value as integer 0 (perhaps by way of mapping
integers conceived as separate to a subset of reals).

This is a matter of definition. My problem with your definition mostly
lies in the fact that it is not as practically useful as mine. Someone
operating on the "floats are almost, but not quite, exact values" is[/QUOTE]
I didn't say that. I said they do represent _exact_ values, and that you
can use the available exact values as you like, including intepreting them
as tokens representing an interval on the real axis containing all the values
that will round to the exact value in question.
*much* less well equipped to handle the way floats bite you than someone
who operates with "floats represent a value and have an error range around
that value", because then they should understand they need to compute not
just with the values, but track the buildup of that error range as well.
It's a point-of-view thing.
I disagree. 0.0 truly identifies one point. So does

3.141592653589793115997963468544185161590576171875

which is the _exact_ value of math.pi in decimal representation. But
this is _not_ the exact value of the abstract mathematical pi.

Python 2.3.4 (#1, Sep 28 2004, 20:15:25)
[GCC 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.True
Two different numbers were rounded to the same exact number ;-)
Look at the last digits. (Python is just a convenience here, this is
fundamental to all float representation schemes, as opposed to things like
Decimal.)

This is why I say they represent a range. In set theory, we'd call this an
I agree they can be used to _represent_ a range. The range (actually a half-open
continuous interval I think) will be defined in terms of the _exact_ value
and the neighboring _exact_ values available in the set of values with concrete
representations in the particular hardware.
equivalence class for equality. There is *no* way to distinguish between
3.1415926535897931159979634685441851615905761718758327502398752309 and
3.14159265358979311599796346854418516159057617187500000000000 with floats
(of the size Python uses, add more digits for some other bit count), not
even in theory. They are the same in every way. That's because the closest
float represents a range that covers them both. You can shrink that range
Now tell me _exactly_ what that "range" is. I know it's a messy algorithm,
but the intevals are _exactly_ defined, so where will the _exact_ mathematical
values of the limits come from? ;-) (we'll ignore fpu rounding modes ;-)
arbitrarily small but you can not get to a unique point, such that only
one true real will compare equal to that number and all others do not, not
even in theory.
Well, yes, we can only create finite sets of representations ;-)
But you can "get to a unique point" by switching interpretation. I.e.,
if you take the floating point number's exact value as such, instead of
using that value and neighbors as a means to define an interval, then
you do have a unique point. E.g., all the exact values of a 53-bit integer
can be represented _exactly_ by a typical 64-bit double. The floating point
representation is no less exact than the corresponding int or long.

Regards,
Bengt Richter
 
A

Alex Martelli

Bernhard Herzog said:
These give syntax errors, actually. I wouldn't have expected that
either, though :)

Yep, you need parentheses, as in (not str1) &c.
Counter example:

Yep, excellent counterexample, thanks. So let's try instead:

(str1+str2==str2!=str1) or (str2+str1==str1!=str2)

I _do_ really want something based on concatenation equaling one of the
strings iff the other's empty...


Alex
 
A

Alex Martelli

Grant Edwards said:
Probably so, but that doesn't support the arguement that
there's something wrong with a logical xor argument coercing
it's operands to boolean values unless one also argues that
the logical "and", "or" and "not" operators should also not
coerce their operands to booleans.

One of the greatest sources of usefulness for 'and' and 'or' is exactly
that these operators _don't_ coerce their operands -- they always return
one of the objects that are given as their operands, never any
'coercion' of it. This lets one code, e.g.,

k, v = (onedict or another).popitem()

instead of

if onedict:
k, v = onedict.popitem()
else:
k, v = another.popitem()

Operator 'xor' couldn't possibly ensure this useful behavior, alas.


Alex
 
G

Grant Edwards

One of the greatest sources of usefulness for 'and' and 'or' is exactly
that these operators _don't_ coerce their operands

Surely they must coerce the operands for the "test" part if not
for the "return" part.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,141
Latest member
BlissKeto
Top