Python is annoying

2019-Feb-07, Thursday 09:38
mindstalk: (Default)
[personal profile] mindstalk
Our code works with binary data (hashes/digest) and hexstring representations of such data, a lot. It was written in Python 2, when everything was a string, but some strings were "beef" and some were "'\xbe\xef'"

Then we converted to Python 3, which introduced the 'bytes' type for binary data, and Unicode strings everywhere, which led to some type problems I had figured out, but a recent debugging session revealed I had to think about it some more. Basically we can now have a hexstring "beef", the bytes object b'\xbe\xef' described by that hexstring... and the bytes b"beef" which is the UTF-8 encoding of the string.

In particular, the function binascii.hexlify (aka binascii.b2a_hex) which we used a lot, changed what it returned.

Python 2:
>>> binascii.a2b_hex("beef")
'\xbe\xef'
>>> binascii.hexlify(_)
'beef'

Python 3:
>>> binascii.a2b_hex("beef")
b'\xbe\xef'
>>> binascii.hexlify(_)
b'beef'

vs.
>>> binascii.a2b_hex("beef")
b'\xbe\xef'
>>> _.hex()
'beef'


I found it easy to assume that if one of our functions was returning b"beef" and the other "beef" that they were on the same page, when really, not.

Bunch of examples in the cut.



>>> binascii.a2b_hex("beef")
b'\xbe\xef'
>>> binascii.hexlify(_)
b'beef'
>>> _.decode()
'beef'
>>> binascii.a2b_hex("beef")
b'\xbe\xef'
>>> _.decode()
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbe in position 0:
invalid start byte

>>> b"beef".hex()
'62656566'

>>> sha=hashlib.sha256(b'fee')
>>> sha.digest()
b'\xb8\x0c\xda\xae\x9b2+\xba*8\xfd9b\x99x*L\xc6\xb4\xa0\xcc\xf6\x7f\xcc\xbb\xcca|\x94\xa4`&'
>>> sha.digest().hex()
'b80cdaae9b322bba2a38fd396299782a4cc6b4a0ccf67fccbbcc617c94a46026'
>>> sha.hexdigest()
'b80cdaae9b322bba2a38fd396299782a4cc6b4a0ccf67fccbbcc617c94a46026'

>>> bytes.fromhex("cow")
Traceback (most recent call last):
  File "", line 1, in 
ValueError: non-hexadecimal number found in fromhex() arg at position 1
>>> "cow".encode()
b'cow'
>>> "beef".encode()
b'beef'

>>> binascii.b2a_hex(b"beef")
b'62656566'
>>> binascii.b2a_hex(bytes.fromhex("beef"))
b'beef'
>>> bytes.fromhex("beef").hex()
'beef'


April 2019

S M T W T F S
 123456
78910111213
141516 17181920
21222324252627
282930    

Expand Cut Tags

No cut tags

Style Credit

Most Popular Tags