"Byte" type?

Lie Ryan · Feb 15, 2009

Isn't this creating a regular byte?

Shouldn't creation of bytearray be:

Chris Rebert · Feb 15, 2009

Isn't this creating a regular byte?

Shouldn't creation of bytearray be:

Indeed, and slicing that does give back a single byte (which Python
represents as an integer):

b = bytearray(b'abc')
b[0]

Click to expand...

Click to expand...

97

Cheers,
Chris

John Nagle · Feb 15, 2009

With "bytearray", the element type is considered to be "unsigned byte",
or so says PEP 3137: "The element data type is always 'B' (i.e. unsigned byte)."

Let's try:

Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)] on
win32

>>> xx = b'x'
>>> repr(xx) "'x'"
>>> repr(xx[0]) "'x'"
>>> repr(xx[0][0]) "'x'"
>>>

Click to expand...

Click to expand...

But that's not what "repr" indicates. The bytearray element is apparently
being promoted to "bytes" as soon as it comes out of the array.

John Nagle

Steve Holden · Feb 15, 2009

Erik said:
John said:

With "bytearray", the element type is considered to be "unsigned
byte",
or so says PEP 3137: "The element data type is always 'B' (i.e.
unsigned byte)."

Let's try:

Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on
win32

xx = b'x'
repr(xx) "'x'"

repr(xx[0]) "'x'"
repr(xx[0][0])

Click to expand...

"'x'"

Click to expand...

But that's not what "repr" indicates. The bytearray element is
apparently
being promoted to "bytes" as soon as it comes out of the array.

Click to expand...

There's no distinction byte type. A single character of a bytes type is
also a bytes.

Beware, also, that in 2.6 the "bytes" type is essentially an ugly hack
to enable easier forward compatibility with the 3.X series ...

regards
Steve

Benjamin Peterson · Feb 15, 2009

Steve Holden said:
Beware, also, that in 2.6 the "bytes" type is essentially an ugly hack
to enable easier forward compatibility with the 3.X series ...

It's not an ugly hack. It just isn't all that you might hope it'd live up to be.

Steve Holden · Feb 15, 2009

Benjamin said:
It's not an ugly hack. It just isn't all that you might hope it'd live up to be.

I take it back
It's not an ugly hack
It's just an aliased type

regards
Steve

John Nagle · Feb 15, 2009

Benjamin said:
It's not an ugly hack. It just isn't all that you might hope it'd live up to be.

The semantics aren't what the naive user would expect. One would
expect an element of a bytearray to be a small integer. But instead,
it has string-like behavior. "+" means concatenate, not add.
The bit operators don't work at all.

Python 2.6.1 ...

>>> a = b'A'
>>> b = b'B'
>>> a+b 'AB'
>>> a[0]+b[0] 'AB'
>>>>>> a = b'A'
>>> b = b'B'
>>> a+b 'AB'
>>> a[0]+b[0] 'AB'
>>>
>>> a & b

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for &: 'str' and 'str'

Given that the intent of bytearray is that it's a data type for
handling raw binary data of unknown format, one might expect it to behave like
"array.array('B')", which is an array of unsigned bytes that are
treated as integers. But that's not how "bytearray" works. "bytearray"
is more like the old meaning of "str", before Unicode support, circa Python 2.1.

I sort of understand the mindset, but the documentation needs to be improved.
Right now, we have a few PEPs and the 2.6 "New features" article, but
no comprehensive documentation. The relationship between "str", "unicode",
"bytearray", "array.array('B')", and integers, and how this changes from
version to version of Python, needs to be made clearer, or conversion
to 2.6/3.0 will not happen rapidly.

John Nagle

Mark Tolonen · Feb 15, 2009

John Nagle said:
Benjamin said:

It's not an ugly hack. It just isn't all that you might hope it'd live up
to be.

Click to expand...

The semantics aren't what the naive user would expect. One would
expect an element of a bytearray to be a small integer. But instead,
it has string-like behavior. "+" means concatenate, not add.
The bit operators don't work at all.

Python 2.6.1 ...

a = b'A'
b = b'B'
a+b 'AB'
a[0]+b[0] 'AB'
a = b'A'
b = b'B'
a+b 'AB'
a[0]+b[0] 'AB'

a & b

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for &: 'str' and 'str'

Given that the intent of bytearray is that it's a data type for
handling raw binary data of unknown format, one might expect it to behave
like
"array.array('B')", which is an array of unsigned bytes that are
treated as integers. But that's not how "bytearray" works. "bytearray"
is more like the old meaning of "str", before Unicode support, circa
Python 2.1.

It *is* the old meaning of str. It isn't a bytearray object in 2.6.X (and
it isn't a bytearray object in 3.X either, but a bytes object):

Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)]
on win32
Type "help", "copyright", "credits" or "license" for more information.

type(b'x')

Click to expand...

b'x'[0]

Click to expand...

Click to expand...

'x'

As Steve said, it is just an aliased type. In 3.X it is really a bytes
object:

Python 3.0.1 (r301:69561, Feb 13 2009, 20:04:18) [MSC v.1500 32 bit (Intel)]
on win32
Type "help", "copyright", "credits" or "license" for more information.

type(b'x')

Click to expand...

b'x'[0]

Click to expand...

Click to expand...

120

-Mark

John Nagle · Feb 15, 2009

Because b'x' is NOT a bytearray. It is a bytes object. When you actually use
a bytearray, it behaves like you expect.

type(b'x')

Click to expand...

ba = bytearray(b'abc')
ba[0] + ba[1]

Click to expand...

Click to expand...

195

That's indeed how Python 2.6 works. But that's not how
PEP 3137 says it's supposed to work.

Guido:

"I propose the following type names at the Python level:

* bytes is an immutable array of bytes (PyString)
* bytearray is a mutable array of bytes (PyBytes)"

....

"Indexing bytes and bytearray returns small ints (like the bytes type in
3.0a1, and like lists or array.array('B'))."
(Not true in Python 2.6 - indexing a "bytes" object returns a "bytes"
object with length 1.)

"b1 + b2: concatenation. With mixed bytes/bytearray operands, the return type is
that of the first argument (this seems arbitrary until you consider how += works)."
(Not true in Python 2.6 - concatenation returns a bytearray in both cases.)

Is this a bug, a feature, a documentation error, or bad design?

John Nagle

John Nagle · Feb 15, 2009

Because b'x' is NOT a bytearray. It is a bytes object. When you actually use
a bytearray, it behaves like you expect.

type(b'x')

Click to expand...

ba = bytearray(b'abc')
ba[0] + ba[1]

Click to expand...

Click to expand...

195

That's indeed how Python 2.6 works. But that's not how
PEP 3137 says it's supposed to work.

Guido:

"I propose the following type names at the Python level:

* bytes is an immutable array of bytes (PyString)
* bytearray is a mutable array of bytes (PyBytes)"

....

"Indexing bytes and bytearray returns small ints (like the bytes type in
3.0a1, and like lists or array.array('B'))."
(Not true in Python 2.6 - indexing a "bytes" object returns a "bytes"
object with length 1.)

"b1 + b2: concatenation. With mixed bytes/bytearray operands, the return type is
that of the first argument (this seems arbitrary until you consider how += works)."
(Not true in Python 2.6 - concatenation returns a bytearray in both cases.)

Is this a bug, a feature, a documentation error, or bad design?

John Nagle

Steve Holden · Feb 15, 2009

John said:
Because b'x' is NOT a bytearray. It is a bytes object. When you
actually use
a bytearray, it behaves like you expect.

type(b'x')

Click to expand...

type(bytearray(b'x'))

Click to expand...

ba = bytearray(b'abc')
ba[0] + ba[1]

Click to expand...

195

Click to expand...

That's indeed how Python 2.6 works. But that's not how
PEP 3137 says it's supposed to work.

Guido:

"I propose the following type names at the Python level:

* bytes is an immutable array of bytes (PyString)
* bytearray is a mutable array of bytes (PyBytes)"

...

"Indexing bytes and bytearray returns small ints (like the bytes type in
3.0a1, and like lists or array.array('B'))."
(Not true in Python 2.6 - indexing a "bytes" object returns a "bytes"
object with length 1.)

"b1 + b2: concatenation. With mixed bytes/bytearray operands, the return
type is
that of the first argument (this seems arbitrary until you consider how
+= works)."
(Not true in Python 2.6 - concatenation returns a bytearray in both cases.)

Is this a bug, a feature, a documentation error, or bad design?

It's a feature. In fact all that was done to accommodate easier
migration to 3.x is easily shown in one statement:

So that's why bytes works the way it does in 2.6 ... hence my contested
description of it as an "ugly hack". I am happy to withdraw "ugly", but
I think "hack" could still be held to apply.

regards
Steve

John Nagle · Feb 21, 2009

Steve said:
John said:

Benjamin said:

It's a feature. In fact all that was done to accommodate easier
migration to 3.x is easily shown in one statement:

True

So that's why bytes works the way it does in 2.6 ... hence my contested
description of it as an "ugly hack". I am happy to withdraw "ugly", but
I think "hack" could still be held to apply.

Click to expand...

Agreed. But is this a 2.6 thing, making 2.6 incompatible with 3.0, or
what? How will 3.x do it? The PEP 3137 way, or the Python 2.6 way?

The way it works in 2.6 makes it necessary to do "ord" conversions
where they shouldn't be required.

John Nagle

Click to expand...

Click to expand...

Steve Holden · Feb 21, 2009

John said:
Steve said:

John said:

Benjamin Kaplan wrote:

Agreed. But is this a 2.6 thing, making 2.6 incompatible with 3.0, or
what? How will 3.x do it? The PEP 3137 way, or the Python 2.6 way?

The way it works in 2.6 makes it necessary to do "ord" conversions
where they shouldn't be required.

Click to expand...

Yes, the hack was to achieve a modicum of compatibility with 3.0 without
having to turn the world upside down.

I haven't used 3.0 enough the say whether bytearray has been correctly
implemented. But I believe the intention is that 3.0 should fully
implement PEP 3137.

regards
Steve

Click to expand...

John Nagle · Feb 21, 2009

Steve said:
John said:

Yes, the hack was to achieve a modicum of compatibility with 3.0 without
having to turn the world upside down.

I haven't used 3.0 enough the say whether bytearray has been correctly
implemented. But I believe the intention is that 3.0 should fully
implement PEP 3137.

Click to expand...

If "bytes", a new keyword, works differently in 2.6 and 3.0, that was really
dumb. There's no old code using "bytes". So converting code to 2.6 means
it has to be converted AGAIN for 3.0. That's a good reason to ignore 2.6 as
defective.

John Nagle

Hendrik van Rooyen · Feb 22, 2009

Christian Heimes said:
John Nagle wrote

Please don't call something dumb that you don't fully understand. It's
offenses the people who have spent lots of time developing Python --
personal, unpaid and voluntary time!

Crying out; "Please do not criticise me, I am doing it for free!" does
not justify delivering sub standard work - that is the nature of the
open source process - if you lift your head and say or do something,
there are bound to be some objections - some thoughtful and valid,
and others merely carping. Being sensitive about it serves no purpose.

I can assure, the bytes alias and b'' alias have their right to exist.

This is not a helpful response - on the surface JN has a point - If
you have to go through two conversions, then 2.6 does not achieve
what it appears to set out to do. So the issue is simple:

- do you have to convert twice?
- If yes - why? - as he says - there exists no prior code,
so there seems to be no reason not to make it identical
to 3.0

The response answers neither of these valid concerns.

- Hendrik

Matthew Woodcraft · Feb 22, 2009

Hendrik van Rooyen said:
"Christian Heimes" <lis....s.de> wrote:
on the surface JN has a point - If you have to go through two
conversions, then 2.6 does not achieve what it appears to set out to
do. So the issue is simple:

- do you have to convert twice?
- If yes - why? - as he says - there exists no prior code,
so there seems to be no reason not to make it identical
to 3.0

You don't have to convert twice. You don't have to use 'bytes' in 2.6 at
all. It's there in 2.6 to make some strategies for transition to 3.x
easier.

Note that 'bytes' is not (as JN asserted) a keyword, so its inclusion
won't break existing programs which were using it as an identifier.

-M-

Martin v. Löwis · Feb 22, 2009

Please don't call something dumb that you don't fully understand. It's

Crying out; "Please do not criticise me, I am doing it for free!" does
not justify delivering sub standard work - that is the nature of the
open source process - if you lift your head and say or do something,
there are bound to be some objections - some thoughtful and valid,
and others merely carping. Being sensitive about it serves no purpose.

Still, John *clearly* doesn't understand what he observes, so asking him
not to draw conclusions until he does understand is not defending
against criticism.

This is not a helpful response - on the surface JN has a point - If
you have to go through two conversions, then 2.6 does not achieve
what it appears to set out to do. So the issue is simple:

- do you have to convert twice?

Depends on how you write your code. If you use the bytearray type
(which John didn't, despite his apparent believe that he did),
then no conversion additional conversion is needed.

Likewise, if you only use byte (not bytearray) literals, without
accessing individual bytes (e.g. if you only ever read and write
them, or pass them to the struct module), 2to3 will do the right
thing.

- If yes - why? - as he says - there exists no prior code,
so there seems to be no reason not to make it identical
to 3.0

Sure there is. Making the bytes type and the str type identical
in 2.x gives the easiest way of porting. Adding bytes as a separate
type would have complicated a lot of things.

Regards,
Martin

John Nagle · Feb 23, 2009

Some of the people involved are on Google's payroll.

Still, John *clearly* doesn't understand what he observes, so asking him
not to draw conclusions until he does understand is not defending
against criticism.

Depends on how you write your code. If you use the bytearray type
(which John didn't, despite his apparent believe that he did),
then no conversion additional conversion is needed.

According to PEP 3137, there should be no distinction between
the two for read purposes. In 2.6, there is. That's a bug.

Likewise, if you only use byte (not bytearray) literals, without
accessing individual bytes (e.g. if you only ever read and write
them, or pass them to the struct module), 2to3 will do the right
thing.

Sure there is. Making the bytes type and the str type identical
in 2.x gives the easiest way of porting. Adding bytes as a separate
type would have complicated a lot of things.

Regards,
Martin

No, it's broken. PEP 3137 says one thing, and the 2.6 implementation
does something else. So code written for 2.6 won't be ready for 3.0.
This defeats the supposed point of 2.6.

John Nagle

Martin v. Löwis · Feb 24, 2009

Depends on how you write your code. If you use the bytearray type

According to PEP 3137, there should be no distinction between
the two for read purposes. In 2.6, there is. That's a bug.

No. Python 2.6 doesn't implement PEP 3137, and the PEP doesn't claim
that it would, nor do the 2.6 release notes. So that it deviates from
PEP 3137 is not a bug.

No, it's broken. PEP 3137 says one thing, and the 2.6 implementation
does something else. So code written for 2.6 won't be ready for 3.0.
This defeats the supposed point of 2.6.

That's not true: if I write

if isinstance(x, bytes):
one_thing()
elif isinstance(x, unicode):
another_thing()

then 2to3 will convert it perfectly. 2to3 couldn't have done the
conversion correctly had I written

if isinstance(x, str):
one_thing()
elif isinstance(x, unicode):
another_thing()

So the introduction of the bytes builtin *does* help the supposed
point of 2.6, even though it doesn't help implementing PEP 3137.

Regards,
Martin

Paddy O'Loughlin · Feb 24, 2009

2009/2/24 John Nagle said:
Â Some of the people involved are on Google's payroll.

Uh, what does that have to do with anything?
It would only be relevant if you are saying that Google is paying them
to do the work (so not just "on their payroll").

More importantly, it's also only relevant if ALL the people
contributing are being paid by Google to do the work, which I'm pretty
sure is not the case.

There are people are spending lots of personal, unpaid and voluntary
time developing Python.

Paddy

Packing byte fields and an array object into struct	4	Nov 13, 2013
why is bytearray treated so inefficiently by pickle?	7	Nov 27, 2011
how to get bytes from bytearray without copying	0	Mar 2, 2014
Solve this riddle	1	Oct 10, 2023
bytearray inconsistencies?	0	Dec 20, 2013
Help with pointers	1	Mar 13, 2022
Byte ordering and array access	33	Feb 8, 2006
recv_into(bytearray) complains about a "pinned buffer"	8	Jan 31, 2010

"Byte" type?

Lie Ryan

Chris Rebert

John Nagle

Steve Holden

Benjamin Peterson

Steve Holden

John Nagle

Mark Tolonen

John Nagle

John Nagle

Steve Holden

John Nagle

Steve Holden

John Nagle

Hendrik van Rooyen

Matthew Woodcraft

Martin v. Löwis

John Nagle

Martin v. Löwis

Paddy O'Loughlin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads