[Python3] Reading a binary file and wrtiting the bytes verbatim in an utf-8 file

Discussion in 'Python' started by Guest, Apr 23, 2010.

  1. Guest

    Guest Guest

    Hello.

    I have to read the contents of a binary file (a PNG file exactly), and
    dump it into an RTF file.

    The RTF-file has been opened with codecs.open in utf-8 mode.

    As I expected, the utf-8 decoder chokes on some combinations of bits;
    how can I tell python to dump the bytes as they are, without
    interpreting them?

    Thanks.

    --
    Fabrice DELENTE
    Guest, Apr 23, 2010
    #1
    1. Advertising

  2. Guest

    Chris Rebert Guest

    Re: [Python3] Reading a binary file and wrtiting the bytes verbatimin an utf-8 file

    On Fri, Apr 23, 2010 at 9:22 AM, <-one.org> wrote:
    > I have to read the contents of a binary file (a PNG file exactly), and
    > dump it into an RTF file.
    >
    > The RTF-file has been opened with codecs.open in utf-8 mode.
    >
    > As I expected, the utf-8 decoder


    You mean encoder.

    > chokes on some combinations of bits;


    Well yeah, it's supposed to be getting *characters*, not bytes.

    > how can I tell python to dump the bytes as they are, without
    > interpreting them?


    Go around the encoder and write bytes directly to the file:

    # Disclaimer: Completely untested

    import codecs

    raw_rtf = open("path/to/rtf.rtf", 'w')
    png = open("path/to/png.png", 'r')
    writer_factory = codecs.getwriter('utf-8')

    encoded_rtf = writer_factory(raw_rtf)
    encoded_rtf.write(u"whatever text we want") # use unicode
    # ...write more text...

    # flush buffers
    encoded_rtf.reset()
    raw_rtf.flush()

    raw_rtf.write(png.read()) # write from bytes to bytes

    raw_rtf.close()
    #END code

    I have no idea how you'd go about reading the contents of such a file
    in a sensible way.

    Cheers,
    Chris
    --
    http://blog.rebertia.com
    Chris Rebert, Apr 23, 2010
    #2
    1. Advertising

  3. Guest

    Chris Rebert Guest

    Re: [Python3] Reading a binary file and wrtiting the bytes verbatimin an utf-8 file

    On Fri, Apr 23, 2010 at 9:48 AM, Chris Rebert <> wrote:
    > On Fri, Apr 23, 2010 at 9:22 AM,  <-one.org> wrote:
    >> I have to read the contents of a binary file (a PNG file exactly), and
    >> dump it into an RTF file.

    <snip>
    >> how can I tell python to dump the bytes as they are, without
    >> interpreting them?

    >
    > Go around the encoder and write bytes directly to the file:
    >
    > # Disclaimer: Completely untested

    <snip>
    > encoded_rtf.write(u"whatever text we want") # use unicode


    Erm, sorry, since you're apparently using Python 3.x, that line should
    have been just:

    encoded_rtf.write("whatever text we want") # use unicode

    Cheers,
    Chris
    --
    http://blog.rebertia.com
    Chris Rebert, Apr 23, 2010
    #3
  4. Guest

    Guest Guest

    Re: [Python3] Reading a binary file and wrtiting the bytes verbatim ?in an utf-8 file

    Thanks, I'll try this.

    > I have no idea how you'd go about reading the contents of such a file
    > in a sensible way.


    The purpose is to embed PNG pictures in an RTF file that will be read
    by OpenOffice. It seems that OpenOffice reads RTF in 8-bit, so it
    should be ok.

    The RTF is produced from a TeX source file encoded in UTF-8, that's
    why I mix unicode and 8-bit.

    --
    Fabrice DELENTE
    Guest, Apr 23, 2010
    #4
  5. Re: [Python3] Reading a binary file and wrtiting the bytes verbatimin an utf-8 file

    Hello,

    > I have to read the contents of a binary file (a PNG file exactly), and
    > dump it into an RTF file.
    >
    > The RTF-file has been opened with codecs.open in utf-8 mode.


    You should use the built-in open() function. codecs.open() is outdated in
    Python 3.

    > As I expected, the utf-8 decoder chokes on some combinations of bits;
    > how can I tell python to dump the bytes as they are, without
    > interpreting them?


    Well, the one thing you have to be careful about is to flush text buffers
    before writing binary data. But, for example:

    >>> f = open("TEST", "w", encoding='utf8')
    >>> f.write("héhé")

    4
    >>> f.flush()
    >>> f.buffer.write(b"\xff\x00")

    2
    >>> f.close()


    gives you:

    $ hexdump -C TEST
    00000000 68 c3 a9 68 c3 a9 ff 00 |h..h....|

    (utf-8 encoded text and then two raw bytes which are invalid utf-8)

    Another possibility is to open the file in binary mode and do the
    encoding yourself when writing text. This might actually be a better
    solution, since I'm not sure RTF uses utf-8 by default.

    Regards

    Antoine.
    Antoine Pitrou, Apr 25, 2010
    #5
  6. Re: [Python3] Reading a binary file and wrtiting the bytes verbatimin an utf-8 file

    Antoine Pitrou, 25.04.2010 02:16:
    > Another possibility is to open the file in binary mode and do the
    > encoding yourself when writing text. This might actually be a better
    > solution, since I'm not sure RTF uses utf-8 by default.


    That's a lot cleaner as it doesn't use two interfaces to write to the same
    file, and doesn't rely on any specific coordination between those two
    interfaces.

    Stefan
    Stefan Behnel, Apr 25, 2010
    #6
  7. Guest

    Guest Guest

    Re: [Python3] Reading a binary file and wrtiting the bytes verbatim?in an utf-8 file

    > Another possibility is to open the file in binary mode and do the
    > encoding yourself when writing text. This might actually be a better
    > solution, since I'm not sure RTF uses utf-8 by default.


    Yes, thanks for this suggestion, it seems the best to me. Actually RTF
    is not UTF-8 encoded, it's 8-bit and maybe even ASCII only. Every
    unicode char has to be encoded as an escape sequence (\u2022 for
    example).

    Thanks again.

    --
    Fabrice DELENTE
    Guest, Apr 25, 2010
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JamesT

    Wrtiting AVI files with Python

    JamesT, Jul 29, 2003, in forum: Python
    Replies:
    1
    Views:
    440
    Anand Pillai
    Jul 30, 2003
  2. Replies:
    1
    Views:
    382
    Jim Langston
    May 29, 2006
  3. Dieter Britz

    Verbatim or the like

    Dieter Britz, Sep 21, 2009, in forum: HTML
    Replies:
    14
    Views:
    767
    dorayme
    Sep 23, 2009
  4. Cortes
    Replies:
    1
    Views:
    76
    Lasse Reichstein Nielsen
    Jun 2, 2004
  5. dakin999

    Wrtiting to a file in LDIF format

    dakin999, Jun 19, 2008, in forum: Perl Misc
    Replies:
    5
    Views:
    100
    Martijn Lievaart
    Jun 19, 2008
Loading...

Share This Page