Frustrating circular bytes issue

J

J

This is driving me batty... more enjoyment with the Python3
"Everything must be bytes" thing... sigh...
I have a file that contains a class used by other scripts. The class
is fed either a file, or a stream of output from another command, then
interprets that output and returns a set that the main program can
use... confusing, perhaps, but not necessarily important.

The class is created and then called with the load_filename method:


def load_filename(self, filename):
logging.info("Loading elements from filename: %s", filename)

file = open(filename, "rb", encoding="utf-8")
return self.load_file(file, filename)

As you can see, this calls the load_file method, by passing the
filehandle and filename (in common use, filename is actually an
IOStream object).

load_file starts out like this:


def load_file(self, file, filename="<stream>"):
elements = []
for string in self._reader(file):
if not string:
break

element = {}


Note that it now calls the private _reader() passing along the
filehandle further in. THIS is where I'm failing:

This is the private _reader function:


def _reader(self, file, size=4096, delimiter=r"\n{2,}"):
buffer_old = ""
while True:
buffer_new = file.read()
print(type(buffer_new))
if not buffer_new:
break
lines = re.split(delimiter, buffer_old + buffer_new)
buffer_old = lines.pop(-1)

for line in lines:
yield line

yield buffer_old


(the print statement is something I put in to verify the problem.

So stepping through this, when _reader executes, it executes read() on
the opened filehandle. Originally, it read in 4096 byte chunks, I
removed that to test a theory. It creates buffer_new with the output
of the read.

Running type() on buffer_new tells me that it's a bytes object.

However no matter what I do:

file.read().decode()
buffer_new.decode() in the lines = re.split() statement
buffer_str = buffer_new.decode()

I always get a traceback telling me that the str object has no decoe() method.

If I remove the decode attempts, I get a traceback telling me that it
can't implicitly convert a bytes_object to a str object.

So I'm stuck in a vicious circle and can't see a way out.

here's sample error messages:
When using the decode() method to attempt to convert the bytes object:
Traceback (most recent call last):
File "./filter_templates", line 134, in <module>
sys.exit(main(sys.argv[1:]))
File "./filter_templates", line 126, in main
options.whitelist, options.blacklist)
File "./filter_templates", line 77, in parse_file
matches = match_elements(template.load_file(file), *args, **kwargs)
File "/usr/lib/python3/dist-packages/checkbox/lib/template.py", line
73, in load_file
for string in self._reader(file):
File "/usr/lib/python3/dist-packages/checkbox/lib/template.py", line
35, in _reader
lines = re.split(delimiter, buffer_old + buffer_new.decode())
AttributeError: 'str' object has no attribute 'decode'

It's telling me that buffer_new is a str object.

so if I remove the decode():

Traceback (most recent call last):
File "./run_templates", line 142, in <module>
sys.exit(main(sys.argv[1:]))
File "./run_templates", line 137, in main
runner.process(args, options.shell)
File "./run_templates", line 39, in process
records = self.process_output(process.stdout)
File "./run_templates", line 88, in process_output
return template.load_file(output)
File "/usr/lib/python3/dist-packages/checkbox/lib/template.py", line
73, in load_file
for string in self._reader(file):
File "/usr/lib/python3/dist-packages/checkbox/lib/template.py", line
35, in _reader
lines = re.split(delimiter, buffer_old + buffer_new)
TypeError: Can't convert 'bytes' object to str implicitly

now it's complaining that buffer_new is a bytes object and can't be
implicitly converted to str.

This is a bug introduced in our conversion from Python 2 to Python 3.
I am really, really starting to dislike some of the things Python3
does... or just am really, really frustrated.
 
H

Hans Mulder

This is driving me batty... more enjoyment with the Python3
"Everything must be bytes" thing... sigh...
I have a file that contains a class used by other scripts. The class
is fed either a file, or a stream of output from another command, then
interprets that output and returns a set that the main program can
use... confusing, perhaps, but not necessarily important.

It would help if you could post an extract that we can actually run,
to see for ourselves what happens.
The class is created and then called with the load_filename method:


def load_filename(self, filename):
logging.info("Loading elements from filename: %s", filename)

file = open(filename, "rb", encoding="utf-8")

When I try this in Python3, I get an error message:

ValueError: binary mode doesn't take an encoding argument


You'll have to decide for yourself whether you want to read strings or
bytes. If you want strings, you'll have to open the file in text mode:

file = open(filename, "rt", encoding="utf-8")

Alternatively, if you want bytes, you must leave off the encoding:


file = open(filename, "rb")
return self.load_file(file, filename)

As you can see, this calls the load_file method, by passing the
filehandle and filename (in common use, filename is actually an
IOStream object).

load_file starts out like this:


def load_file(self, file, filename="<stream>"):
elements = []
for string in self._reader(file):
if not string:
break

element = {}


Note that it now calls the private _reader() passing along the
filehandle further in. THIS is where I'm failing:

This is the private _reader function:


def _reader(self, file, size=4096, delimiter=r"\n{2,}"):
buffer_old = ""
while True:
buffer_new = file.read()
print(type(buffer_new))
if not buffer_new:
break
lines = re.split(delimiter, buffer_old + buffer_new)
buffer_old = lines.pop(-1)

for line in lines:
yield line

yield buffer_old


(the print statement is something I put in to verify the problem.

So stepping through this, when _reader executes, it executes read() on
the opened filehandle. Originally, it read in 4096 byte chunks, I
removed that to test a theory. It creates buffer_new with the output
of the read.

Running type() on buffer_new tells me that it's a bytes object.

However no matter what I do:

file.read().decode()
buffer_new.decode() in the lines = re.split() statement
buffer_str = buffer_new.decode()

I always get a traceback telling me that the str object has no decoe() method.

If I remove the decode attempts, I get a traceback telling me that it
can't implicitly convert a bytes_object to a str object.

So I'm stuck in a vicious circle and can't see a way out.

here's sample error messages:
When using the decode() method to attempt to convert the bytes object:
Traceback (most recent call last):
File "./filter_templates", line 134, in <module>
sys.exit(main(sys.argv[1:]))
File "./filter_templates", line 126, in main
options.whitelist, options.blacklist)
File "./filter_templates", line 77, in parse_file
matches = match_elements(template.load_file(file), *args, **kwargs)
File "/usr/lib/python3/dist-packages/checkbox/lib/template.py", line
73, in load_file
for string in self._reader(file):
File "/usr/lib/python3/dist-packages/checkbox/lib/template.py", line
35, in _reader
lines = re.split(delimiter, buffer_old + buffer_new.decode())
AttributeError: 'str' object has no attribute 'decode'

Look carefulle at the traceback: this line looks deceptively like
a line in your code, except the file name is different. Your file
is called "./filter_template" and this line is from a file named
"/usr/lib/python3/dist-packages/checkbox/lib/template.py".

At line 77 in your code, your calling the "load_file" method on an
instance of a class defined in that file. Your description sounds
as if you meant to call the "load_file" method on an instance of
your own class. In other words, it sounds like you're instantiating
the wrong class.

I can't say for certain, because you've left out that bit of your code.

It's telling me that buffer_new is a str object.

so if I remove the decode():

Traceback (most recent call last):
File "./run_templates", line 142, in <module>
sys.exit(main(sys.argv[1:]))
File "./run_templates", line 137, in main
runner.process(args, options.shell)
File "./run_templates", line 39, in process
records = self.process_output(process.stdout)
File "./run_templates", line 88, in process_output
return template.load_file(output)
File "/usr/lib/python3/dist-packages/checkbox/lib/template.py", line
73, in load_file
for string in self._reader(file):
File "/usr/lib/python3/dist-packages/checkbox/lib/template.py", line
35, in _reader
lines = re.split(delimiter, buffer_old + buffer_new)
TypeError: Can't convert 'bytes' object to str implicitly

now it's complaining that buffer_new is a bytes object and can't be
implicitly converted to str.

This is a bug introduced in our conversion from Python 2 to Python 3.
I am really, really starting to dislike some of the things Python3
does... or just am really, really frustrated.

Try creating a script that we can all run that shows the problem.
Then shorten it by cutting out bits that shouldn't matter for
your problem. After each cut, run the script, to make sure you
still have to problem. If the problem goes away, you've cut out
the line with the bug. Put that line back in, and try to figure
out what's wrong with it. You might be able to solve your own
problem without even posting the script to this forum.

Hope this helps,

-- HansM
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top