Problems with hex-conversion functions

A

Arnon Yaari

Hello everyone.
Perhaps I'm missing something, but I see several problems with the two
hex-conversion function pairs that Python offers:
1. binascii.hexlify and binascii.unhexlify
2. bytes.fromhex and bytes.hex

Problem #1:
bytes.hex is not implemented, although it was specified in PEP 358.
This means there is no symmetrical function to accompany
bytes.fromhex.

Problem #2:
Both pairs perform the same function, although The Zen Of Python
suggests that
"There should be one-- and preferably only one --obvious way to do
it."
I do not understand why PEP 358 specified the bytes function pair
although it mentioned the binascii pair...

Problem #3:
bytes.fromhex may receive spaces in the input string, although
binascii.unhexlify may not.
I see no good reason for these two functions to have different
features.

Problem #4:
binascii.unhexlify may receive both input types: strings or bytes,
whereas bytes.fromhex raises an exception when given a bytes
parameter.
Again there is no reason for these functions to be different.

Problem #5:
binascii.hexlify returns a bytes type - although ideally, converting
to hex should always return string types and converting from hex
should always return bytes.
IMO there is no meaning of bytes as an output of hexlify, since the
output is a representation of other bytes.
This is also the suggested behavior of bytes.hex in PEP 358


Problems #4 and #5 call for a decision about the input and output of
the functions being discussed:

Option A : Strict input and output
unhexlify (and bytes.fromhex) may only receives string and may only
return bytes
hexlify (and bytes.hex) may only receives bytes and may only return
strings

Option B : Robust input and strict output
unhexlify (and bytes.fromhex) may receive bytes and strings and may
only return bytes
hexlify (and bytes.hex) may receive bytes or strings and may only
return strings

Of course we may also consider a third option, which will allow the
return type of all functions to be robust (perhaps specified in a
keyword argument), but as I wrote in the description of problem #5, I
see no sense in that.

Note that PEP 3137 describes: "... the more strict definitions of
encoding and decoding in Python 3000: encoding always takes a Unicode
string and returns a bytes sequence, and decoding always takes a bytes
sequence and returns a Unicode string." - suggesting option A.

To repeat problems #4 and #5, the current behavior does not match any
option:
* The return type of binascii.hexlify should be string, and this is
not the current behavior.
As for the input:
* Option A is not the current behavior because binascii.unhexlify may
receive both input types.
* Option B is not the current behavior because bytes.fromhex does not
allow bytes as input.


To fix these issues, three changes should be applied:
1. Deprecate bytes.fromhex. This fixes the following problems:
#4 (go with option B and remove the function that does not allow
bytes input)
#2 (the binascii functions will be the only way to "do it")
#1 (bytes.hex should not be implemented)
2. In order to keep the functionality that bytes.fromhex has over
unhexlify,
the latter function should be able to handle spaces in its input
(fix #3)
3. binascii.hexlify should return string as its return type (fix #5)


Any thoughts are appreciated.
 
S

Steven D'Aprano

Hello everyone.
Perhaps I'm missing something, but I see several problems with the two
hex-conversion function pairs that Python offers: 1. binascii.hexlify
and binascii.unhexlify 2. bytes.fromhex and bytes.hex

Problem #1:
bytes.hex is not implemented, although it was specified in PEP 358. This
means there is no symmetrical function to accompany bytes.fromhex.

That would probably be an oversight. Patches are welcome.


Problem #2:
Both pairs perform the same function, although The Zen Of Python
suggests that
"There should be one-- and preferably only one --obvious way to do it."

That is not a prohibition against multiple ways of doing something. It is
a recommendation that there should be one obvious way (as opposed to no
way at all, or thirty five non-obvious ways) to do things. Preferably
only one obvious way, but it's not a prohibition against there being an
obvious way and a non-obvious way.


I do not understand why PEP 358 specified the bytes function pair
although it mentioned the binascii pair...

Because there are three obvious ways of constructing a sequence of bytes:

(1) from a sequence of characters, with an optional encoding;

(2) from a sequence of pairs of hex digits, such as from a hex dump of a
file;

(3) from a sequence of integers.

(1) and (2) are difficult to distinguish -- should "ab45" be interpreted
as four characters, "a" "b" "4" and "5", or as two pairs of hex digits
"ab" and "45"? The obvious solution is to have two different bytes
constructors.

Problem #3:
bytes.fromhex may receive spaces in the input string, although
binascii.unhexlify may not.
I see no good reason for these two functions to have different features.

There's clearly differences of opinion about how strict to be when
accepting input strings. Personally, I can see arguments for both. Given
that these are two different functions, there's no requirement that they
do exactly the same thing, so I wouldn't even call it a wart. It's just a
difference.

Problem #4:
binascii.unhexlify may receive both input types: strings or bytes,
whereas bytes.fromhex raises an exception when given a bytes parameter.
Again there is no reason for these functions to be different.

There's no reason for them to be the same. unhexlify() is designed to
take either strings or bytes, mostly for historical reasons: in Python
1.x and 2.x, it is normal to use byte-strings (called 'strings') as the
standard string type, and character-strings ('unicode') is relatively
rare, so unhexlify needs to accept bytes. In Python 3.x, the use of bytes
as character strings is discouraged, hence passing hex digits as bytes to
bytes.fromhex() is illegal.

Problem #5:
binascii.hexlify returns a bytes type - although ideally, converting to
hex should always return string types and converting from hex should
always return bytes.

This is due to historical reasons -- binascii comes from Python 1.x when
bytes were the normal string type. Presumably modifying binascii to
return strings in Python 3.x (but not 2.6 or 2.7) would probably be a
good idea. Patches are welcome.


[...]
To fix these issues, three changes should be applied: 1. Deprecate
bytes.fromhex.

-1 on that. I disagree strongly: bytes are built-ins, and constructing
bytes from a sequence of hex digits is such a natural and important
function that needing to import a module to do it is silly and wasteful.


[...]
2. In order to keep the functionality that bytes.fromhex has over
unhexlify,
the latter function should be able to handle spaces in its input
(fix #3)

0 on that. I don't care either way.

3. binascii.hexlify should return string as its return type (fix #5)

+1 for the Python 3.x series, -1 for Python 2.6 and 2.7.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,071
Latest member
MetabolicSolutionsKeto

Latest Threads

Top