[Python3] Reading a binary file and wrtiting the bytes verbatim in an utf-8 file

Discussion in 'Python' started by Guest, Apr 23, 2010.

  1. Guest

    Guest Guest


    I have to read the contents of a binary file (a PNG file exactly), and
    dump it into an RTF file.

    The RTF-file has been opened with codecs.open in utf-8 mode.

    As I expected, the utf-8 decoder chokes on some combinations of bits;
    how can I tell python to dump the bytes as they are, without
    interpreting them?

    Guest, Apr 23, 2010
    1. Advertisements

  2. Guest

    Chris Rebert Guest

    You mean encoder.
    Well yeah, it's supposed to be getting *characters*, not bytes.
    Go around the encoder and write bytes directly to the file:

    # Disclaimer: Completely untested

    import codecs

    raw_rtf = open("path/to/rtf.rtf", 'w')
    png = open("path/to/png.png", 'r')
    writer_factory = codecs.getwriter('utf-8')

    encoded_rtf = writer_factory(raw_rtf)
    encoded_rtf.write(u"whatever text we want") # use unicode
    # ...write more text...

    # flush buffers

    raw_rtf.write(png.read()) # write from bytes to bytes

    #END code

    I have no idea how you'd go about reading the contents of such a file
    in a sensible way.

    Chris Rebert, Apr 23, 2010
    1. Advertisements

  3. Guest

    Chris Rebert Guest

    Erm, sorry, since you're apparently using Python 3.x, that line should
    have been just:

    encoded_rtf.write("whatever text we want") # use unicode

    Chris Rebert, Apr 23, 2010
  4. Guest

    Guest Guest

    Thanks, I'll try this.
    The purpose is to embed PNG pictures in an RTF file that will be read
    by OpenOffice. It seems that OpenOffice reads RTF in 8-bit, so it
    should be ok.

    The RTF is produced from a TeX source file encoded in UTF-8, that's
    why I mix unicode and 8-bit.
    Guest, Apr 23, 2010
  5. Hello,
    You should use the built-in open() function. codecs.open() is outdated in
    Python 3.
    Well, the one thing you have to be careful about is to flush text buffers
    before writing binary data. But, for example:

    gives you:

    $ hexdump -C TEST
    00000000 68 c3 a9 68 c3 a9 ff 00 |h..h....|

    (utf-8 encoded text and then two raw bytes which are invalid utf-8)

    Another possibility is to open the file in binary mode and do the
    encoding yourself when writing text. This might actually be a better
    solution, since I'm not sure RTF uses utf-8 by default.


    Antoine Pitrou, Apr 25, 2010
  6. Antoine Pitrou, 25.04.2010 02:16:
    That's a lot cleaner as it doesn't use two interfaces to write to the same
    file, and doesn't rely on any specific coordination between those two

    Stefan Behnel, Apr 25, 2010
  7. Guest

    Guest Guest

    Another possibility is to open the file in binary mode and do the
    Yes, thanks for this suggestion, it seems the best to me. Actually RTF
    is not UTF-8 encoded, it's 8-bit and maybe even ASCII only. Every
    unicode char has to be encoded as an escape sequence (\u2022 for

    Thanks again.
    Guest, Apr 25, 2010
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.