[Python3] Reading a binary file and wrtiting the bytes verbatim in an utf-8 file

G

Guest

Hello.

I have to read the contents of a binary file (a PNG file exactly), and
dump it into an RTF file.

The RTF-file has been opened with codecs.open in utf-8 mode.

As I expected, the utf-8 decoder chokes on some combinations of bits;
how can I tell python to dump the bytes as they are, without
interpreting them?

Thanks.
 
C

Chris Rebert

I have to read the contents of a binary file (a PNG file exactly), and
dump it into an RTF file.

The RTF-file has been opened with codecs.open in utf-8 mode.

As I expected, the utf-8 decoder

You mean encoder.
chokes on some combinations of bits;

Well yeah, it's supposed to be getting *characters*, not bytes.
how can I tell python to dump the bytes as they are, without
interpreting them?

Go around the encoder and write bytes directly to the file:

# Disclaimer: Completely untested

import codecs

raw_rtf = open("path/to/rtf.rtf", 'w')
png = open("path/to/png.png", 'r')
writer_factory = codecs.getwriter('utf-8')

encoded_rtf = writer_factory(raw_rtf)
encoded_rtf.write(u"whatever text we want") # use unicode
# ...write more text...

# flush buffers
encoded_rtf.reset()
raw_rtf.flush()

raw_rtf.write(png.read()) # write from bytes to bytes

raw_rtf.close()
#END code

I have no idea how you'd go about reading the contents of such a file
in a sensible way.

Cheers,
Chris
 
C

Chris Rebert

Go around the encoder and write bytes directly to the file:

# Disclaimer: Completely untested
encoded_rtf.write(u"whatever text we want") # use unicode

Erm, sorry, since you're apparently using Python 3.x, that line should
have been just:

encoded_rtf.write("whatever text we want") # use unicode

Cheers,
Chris
 
G

Guest

Thanks, I'll try this.
I have no idea how you'd go about reading the contents of such a file
in a sensible way.

The purpose is to embed PNG pictures in an RTF file that will be read
by OpenOffice. It seems that OpenOffice reads RTF in 8-bit, so it
should be ok.

The RTF is produced from a TeX source file encoded in UTF-8, that's
why I mix unicode and 8-bit.
 
A

Antoine Pitrou

Hello,
I have to read the contents of a binary file (a PNG file exactly), and
dump it into an RTF file.

The RTF-file has been opened with codecs.open in utf-8 mode.

You should use the built-in open() function. codecs.open() is outdated in
Python 3.
As I expected, the utf-8 decoder chokes on some combinations of bits;
how can I tell python to dump the bytes as they are, without
interpreting them?

Well, the one thing you have to be careful about is to flush text buffers
before writing binary data. But, for example:

gives you:

$ hexdump -C TEST
00000000 68 c3 a9 68 c3 a9 ff 00 |h..h....|

(utf-8 encoded text and then two raw bytes which are invalid utf-8)

Another possibility is to open the file in binary mode and do the
encoding yourself when writing text. This might actually be a better
solution, since I'm not sure RTF uses utf-8 by default.

Regards

Antoine.
 
S

Stefan Behnel

Antoine Pitrou, 25.04.2010 02:16:
Another possibility is to open the file in binary mode and do the
encoding yourself when writing text. This might actually be a better
solution, since I'm not sure RTF uses utf-8 by default.

That's a lot cleaner as it doesn't use two interfaces to write to the same
file, and doesn't rely on any specific coordination between those two
interfaces.

Stefan
 
G

Guest

Another possibility is to open the file in binary mode and do the
encoding yourself when writing text. This might actually be a better
solution, since I'm not sure RTF uses utf-8 by default.

Yes, thanks for this suggestion, it seems the best to me. Actually RTF
is not UTF-8 encoded, it's 8-bit and maybe even ASCII only. Every
unicode char has to be encoded as an escape sequence (\u2022 for
example).

Thanks again.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top