"Byte" type?

J

John Nagle

With "bytearray", the element type is considered to be "unsigned byte",
or so says PEP 3137: "The element data type is always 'B' (i.e. unsigned byte)."

Let's try:

Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)] on
win32
>>> xx = b'x'
>>> repr(xx) "'x'"
>>> repr(xx[0]) "'x'"
>>> repr(xx[0][0]) "'x'"
>>>

But that's not what "repr" indicates. The bytearray element is apparently
being promoted to "bytes" as soon as it comes out of the array.

John Nagle
 
S

Steve Holden

Erik said:
John said:
With "bytearray", the element type is considered to be "unsigned
byte",
or so says PEP 3137: "The element data type is always 'B' (i.e.
unsigned byte)."

Let's try:

Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on
win32
xx = b'x'
repr(xx) "'x'"
repr(xx[0]) "'x'"
repr(xx[0][0])
"'x'"

But that's not what "repr" indicates. The bytearray element is
apparently
being promoted to "bytes" as soon as it comes out of the array.

There's no distinction byte type. A single character of a bytes type is
also a bytes.
Beware, also, that in 2.6 the "bytes" type is essentially an ugly hack
to enable easier forward compatibility with the 3.X series ...

regards
Steve
 
B

Benjamin Peterson

Steve Holden said:
Beware, also, that in 2.6 the "bytes" type is essentially an ugly hack
to enable easier forward compatibility with the 3.X series ...

It's not an ugly hack. It just isn't all that you might hope it'd live up to be.
 
S

Steve Holden

Benjamin said:
It's not an ugly hack. It just isn't all that you might hope it'd live up to be.
I take it back
It's not an ugly hack
It's just an aliased type

regards
Steve
 
J

John Nagle

Benjamin said:
It's not an ugly hack. It just isn't all that you might hope it'd live up to be.

The semantics aren't what the naive user would expect. One would
expect an element of a bytearray to be a small integer. But instead,
it has string-like behavior. "+" means concatenate, not add.
The bit operators don't work at all.

Python 2.6.1 ...
>>> a = b'A'
>>> b = b'B'
>>> a+b 'AB'
>>> a[0]+b[0] 'AB'
>>>>>> a = b'A'
>>> b = b'B'
>>> a+b 'AB'
>>> a[0]+b[0] 'AB'
>>>
>>> a & b
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for &: 'str' and 'str'

Given that the intent of bytearray is that it's a data type for
handling raw binary data of unknown format, one might expect it to behave like
"array.array('B')", which is an array of unsigned bytes that are
treated as integers. But that's not how "bytearray" works. "bytearray"
is more like the old meaning of "str", before Unicode support, circa Python 2.1.

I sort of understand the mindset, but the documentation needs to be improved.
Right now, we have a few PEPs and the 2.6 "New features" article, but
no comprehensive documentation. The relationship between "str", "unicode",
"bytearray", "array.array('B')", and integers, and how this changes from
version to version of Python, needs to be made clearer, or conversion
to 2.6/3.0 will not happen rapidly.

John Nagle
 
M

Mark Tolonen

John Nagle said:
Benjamin said:
It's not an ugly hack. It just isn't all that you might hope it'd live up
to be.

The semantics aren't what the naive user would expect. One would
expect an element of a bytearray to be a small integer. But instead,
it has string-like behavior. "+" means concatenate, not add.
The bit operators don't work at all.

Python 2.6.1 ...
a = b'A'
b = b'B'
a+b 'AB'
a[0]+b[0] 'AB'
a = b'A'
b = b'B'
a+b 'AB'
a[0]+b[0] 'AB'

a & b
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for &: 'str' and 'str'

Given that the intent of bytearray is that it's a data type for
handling raw binary data of unknown format, one might expect it to behave
like
"array.array('B')", which is an array of unsigned bytes that are
treated as integers. But that's not how "bytearray" works. "bytearray"
is more like the old meaning of "str", before Unicode support, circa
Python 2.1.

It *is* the old meaning of str. It isn't a bytearray object in 2.6.X (and
it isn't a bytearray object in 3.X either, but a bytes object):

Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)]
on win32
Type "help", "copyright", "credits" or "license" for more information.
'x'

As Steve said, it is just an aliased type. In 3.X it is really a bytes
object:

Python 3.0.1 (r301:69561, Feb 13 2009, 20:04:18) [MSC v.1500 32 bit (Intel)]
on win32
Type "help", "copyright", "credits" or "license" for more information.
120

-Mark
 
J

John Nagle

Because b'x' is NOT a bytearray. It is a bytes object. When you actually use
a bytearray, it behaves like you expect.
type(b'x')
ba = bytearray(b'abc')
ba[0] + ba[1]
195

That's indeed how Python 2.6 works. But that's not how
PEP 3137 says it's supposed to work.

Guido:

"I propose the following type names at the Python level:

* bytes is an immutable array of bytes (PyString)
* bytearray is a mutable array of bytes (PyBytes)"

....

"Indexing bytes and bytearray returns small ints (like the bytes type in
3.0a1, and like lists or array.array('B'))."
(Not true in Python 2.6 - indexing a "bytes" object returns a "bytes"
object with length 1.)

"b1 + b2: concatenation. With mixed bytes/bytearray operands, the return type is
that of the first argument (this seems arbitrary until you consider how += works)."
(Not true in Python 2.6 - concatenation returns a bytearray in both cases.)

Is this a bug, a feature, a documentation error, or bad design?

John Nagle
 
J

John Nagle

Because b'x' is NOT a bytearray. It is a bytes object. When you actually use
a bytearray, it behaves like you expect.
type(b'x')
ba = bytearray(b'abc')
ba[0] + ba[1]
195

That's indeed how Python 2.6 works. But that's not how
PEP 3137 says it's supposed to work.

Guido:

"I propose the following type names at the Python level:

* bytes is an immutable array of bytes (PyString)
* bytearray is a mutable array of bytes (PyBytes)"

....

"Indexing bytes and bytearray returns small ints (like the bytes type in
3.0a1, and like lists or array.array('B'))."
(Not true in Python 2.6 - indexing a "bytes" object returns a "bytes"
object with length 1.)

"b1 + b2: concatenation. With mixed bytes/bytearray operands, the return type is
that of the first argument (this seems arbitrary until you consider how += works)."
(Not true in Python 2.6 - concatenation returns a bytearray in both cases.)

Is this a bug, a feature, a documentation error, or bad design?

John Nagle
 
S

Steve Holden

John said:
Because b'x' is NOT a bytearray. It is a bytes object. When you
actually use
a bytearray, it behaves like you expect.
type(b'x')
type(bytearray(b'x'))
ba = bytearray(b'abc')
ba[0] + ba[1]
195

That's indeed how Python 2.6 works. But that's not how
PEP 3137 says it's supposed to work.

Guido:

"I propose the following type names at the Python level:

* bytes is an immutable array of bytes (PyString)
* bytearray is a mutable array of bytes (PyBytes)"

...

"Indexing bytes and bytearray returns small ints (like the bytes type in
3.0a1, and like lists or array.array('B'))."
(Not true in Python 2.6 - indexing a "bytes" object returns a "bytes"
object with length 1.)

"b1 + b2: concatenation. With mixed bytes/bytearray operands, the return
type is
that of the first argument (this seems arbitrary until you consider how
+= works)."
(Not true in Python 2.6 - concatenation returns a bytearray in both cases.)

Is this a bug, a feature, a documentation error, or bad design?
It's a feature. In fact all that was done to accommodate easier
migration to 3.x is easily shown in one statement:

So that's why bytes works the way it does in 2.6 ... hence my contested
description of it as an "ugly hack". I am happy to withdraw "ugly", but
I think "hack" could still be held to apply.

regards
Steve
 
J

John Nagle

Steve said:
John said:
Benjamin said:
It's a feature. In fact all that was done to accommodate easier
migration to 3.x is easily shown in one statement:

True

So that's why bytes works the way it does in 2.6 ... hence my contested
description of it as an "ugly hack". I am happy to withdraw "ugly", but
I think "hack" could still be held to apply.

Agreed. But is this a 2.6 thing, making 2.6 incompatible with 3.0, or
what? How will 3.x do it? The PEP 3137 way, or the Python 2.6 way?

The way it works in 2.6 makes it necessary to do "ord" conversions
where they shouldn't be required.

John Nagle
 
S

Steve Holden

John said:
Steve said:
John said:
Benjamin Kaplan wrote:


Agreed. But is this a 2.6 thing, making 2.6 incompatible with 3.0, or
what? How will 3.x do it? The PEP 3137 way, or the Python 2.6 way?

The way it works in 2.6 makes it necessary to do "ord" conversions
where they shouldn't be required.
Yes, the hack was to achieve a modicum of compatibility with 3.0 without
having to turn the world upside down.

I haven't used 3.0 enough the say whether bytearray has been correctly
implemented. But I believe the intention is that 3.0 should fully
implement PEP 3137.

regards
Steve
 
J

John Nagle

Steve said:
John said:
Yes, the hack was to achieve a modicum of compatibility with 3.0 without
having to turn the world upside down.

I haven't used 3.0 enough the say whether bytearray has been correctly
implemented. But I believe the intention is that 3.0 should fully
implement PEP 3137.

If "bytes", a new keyword, works differently in 2.6 and 3.0, that was really
dumb. There's no old code using "bytes". So converting code to 2.6 means
it has to be converted AGAIN for 3.0. That's a good reason to ignore 2.6 as
defective.

John Nagle
 
H

Hendrik van Rooyen

Christian Heimes said:
John Nagle wrote

Please don't call something dumb that you don't fully understand. It's
offenses the people who have spent lots of time developing Python --
personal, unpaid and voluntary time!

Crying out; "Please do not criticise me, I am doing it for free!" does
not justify delivering sub standard work - that is the nature of the
open source process - if you lift your head and say or do something,
there are bound to be some objections - some thoughtful and valid,
and others merely carping. Being sensitive about it serves no purpose.
I can assure, the bytes alias and b'' alias have their right to exist.

This is not a helpful response - on the surface JN has a point - If
you have to go through two conversions, then 2.6 does not achieve
what it appears to set out to do. So the issue is simple:

- do you have to convert twice?
- If yes - why? - as he says - there exists no prior code,
so there seems to be no reason not to make it identical
to 3.0

The response answers neither of these valid concerns.

- Hendrik
 
M

Matthew Woodcraft

Hendrik van Rooyen said:
"Christian Heimes" <lis....s.de> wrote:
on the surface JN has a point - If you have to go through two
conversions, then 2.6 does not achieve what it appears to set out to
do. So the issue is simple:
- do you have to convert twice?
- If yes - why? - as he says - there exists no prior code,
so there seems to be no reason not to make it identical
to 3.0

You don't have to convert twice. You don't have to use 'bytes' in 2.6 at
all. It's there in 2.6 to make some strategies for transition to 3.x
easier.

Note that 'bytes' is not (as JN asserted) a keyword, so its inclusion
won't break existing programs which were using it as an identifier.

-M-
 
M

Martin v. Löwis

Please don't call something dumb that you don't fully understand. It's
Crying out; "Please do not criticise me, I am doing it for free!" does
not justify delivering sub standard work - that is the nature of the
open source process - if you lift your head and say or do something,
there are bound to be some objections - some thoughtful and valid,
and others merely carping. Being sensitive about it serves no purpose.

Still, John *clearly* doesn't understand what he observes, so asking him
not to draw conclusions until he does understand is not defending
against criticism.
This is not a helpful response - on the surface JN has a point - If
you have to go through two conversions, then 2.6 does not achieve
what it appears to set out to do. So the issue is simple:

- do you have to convert twice?

Depends on how you write your code. If you use the bytearray type
(which John didn't, despite his apparent believe that he did),
then no conversion additional conversion is needed.

Likewise, if you only use byte (not bytearray) literals, without
accessing individual bytes (e.g. if you only ever read and write
them, or pass them to the struct module), 2to3 will do the right
thing.
- If yes - why? - as he says - there exists no prior code,
so there seems to be no reason not to make it identical
to 3.0

Sure there is. Making the bytes type and the str type identical
in 2.x gives the easiest way of porting. Adding bytes as a separate
type would have complicated a lot of things.

Regards,
Martin
 
J

John Nagle

Some of the people involved are on Google's payroll.
Still, John *clearly* doesn't understand what he observes, so asking him
not to draw conclusions until he does understand is not defending
against criticism.


Depends on how you write your code. If you use the bytearray type
(which John didn't, despite his apparent believe that he did),
then no conversion additional conversion is needed.

According to PEP 3137, there should be no distinction between
the two for read purposes. In 2.6, there is. That's a bug.
Likewise, if you only use byte (not bytearray) literals, without
accessing individual bytes (e.g. if you only ever read and write
them, or pass them to the struct module), 2to3 will do the right
thing.


Sure there is. Making the bytes type and the str type identical
in 2.x gives the easiest way of porting. Adding bytes as a separate
type would have complicated a lot of things.

Regards,
Martin

No, it's broken. PEP 3137 says one thing, and the 2.6 implementation
does something else. So code written for 2.6 won't be ready for 3.0.
This defeats the supposed point of 2.6.

John Nagle
 
M

Martin v. Löwis

Depends on how you write your code. If you use the bytearray type
According to PEP 3137, there should be no distinction between
the two for read purposes. In 2.6, there is. That's a bug.

No. Python 2.6 doesn't implement PEP 3137, and the PEP doesn't claim
that it would, nor do the 2.6 release notes. So that it deviates from
PEP 3137 is not a bug.
No, it's broken. PEP 3137 says one thing, and the 2.6 implementation
does something else. So code written for 2.6 won't be ready for 3.0.
This defeats the supposed point of 2.6.

That's not true: if I write

if isinstance(x, bytes):
one_thing()
elif isinstance(x, unicode):
another_thing()

then 2to3 will convert it perfectly. 2to3 couldn't have done the
conversion correctly had I written

if isinstance(x, str):
one_thing()
elif isinstance(x, unicode):
another_thing()

So the introduction of the bytes builtin *does* help the supposed
point of 2.6, even though it doesn't help implementing PEP 3137.

Regards,
Martin
 
P

Paddy O'Loughlin

2009/2/24 John Nagle said:
  Some of the people involved are on Google's payroll.

Uh, what does that have to do with anything?
It would only be relevant if you are saying that Google is paying them
to do the work (so not just "on their payroll").

More importantly, it's also only relevant if ALL the people
contributing are being paid by Google to do the work, which I'm pretty
sure is not the case.

There are people are spending lots of personal, unpaid and voluntary
time developing Python.

Paddy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top