Extract an image from a RTF file

Discussion in 'Python' started by Bryan.Fodness@gmail.com, Feb 14, 2009.

  1. Guest

    I have a large amount of RTF files where the only thing in them is an
    image. I would like to extract them an save them as a png.
    Eventually, I would like to also grab some text that is on the image.
    I think PIL has something for this.

    Does anyone have any suggestion on how to start this?
     
    , Feb 14, 2009
    #1
    1. Advertising

  2. Terry Reedy Guest

    wrote:
    > I have a large amount of RTF files where the only thing in them is an
    > image. I would like to extract them an save them as a png.
    > Eventually, I would like to also grab some text that is on the image.
    > I think PIL has something for this.
    >
    > Does anyone have any suggestion on how to start this?


    Wikepedia Rich Text Format has several links, which lead to
    http://pyrtf.sourceforge.net/
    http://code.google.com/p/pyrtf-ng/
    The former says rtf generation, including images.
    The latter says rtf generation and parsing, but only claims to be a
    rewrite of the former.
     
    Terry Reedy, Feb 14, 2009
    #2
    1. Advertising

  3. Curt Hash Guest

    On Sat, Feb 14, 2009 at 11:01 AM, Terry Reedy <> wrote:
    >
    > wrote:
    >>
    >> I have a large amount of RTF files where the only thing in them is an
    >> image. I would like to extract them an save them as a png.
    >> Eventually, I would like to also grab some text that is on the image.
    >> I think PIL has something for this.
    >>
    >> Does anyone have any suggestion on how to start this?

    >
    > Wikepedia Rich Text Format has several links, which lead to
    > http://pyrtf.sourceforge.net/
    > http://code.google.com/p/pyrtf-ng/
    > The former says rtf generation, including images.
    > The latter says rtf generation and parsing, but only claims to be a rewrite of the former.
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list


    I've written an RTF parser in Python before, but for the purpose of
    filtering and discarding content rather than extracting it.

    Take a look at the specification here:
    http://www.microsoft.com/downloads/...8d-ff06-4207-b476-6b5396a18a2b&displaylang=en

    You will find that images are specified by one or more RTF control
    words followed by a long string of hex data. For this special purpose,
    you will not need to write a parser for the entire specification. Just
    search the file for the correct sequence of control words, extract the
    hex data that follows, and save it to a file.

    It helps if you open the RTF document in a text editor and locate the
    specific control group that contains the image, as the format and
    order of control words varies depending on the application that
    created it. If all of your documents are created with the same
    application, it will be much easier.
     
    Curt Hash, Feb 14, 2009
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. NuBBeR

    RTF Image

    NuBBeR, Dec 9, 2004, in forum: C++
    Replies:
    1
    Views:
    651
    Victor Bazarov
    Dec 9, 2004
  2. NuBBeR

    RTF Image

    NuBBeR, Dec 9, 2004, in forum: C Programming
    Replies:
    19
    Views:
    1,232
    Albert van der Horst
    Dec 18, 2004
  3. Sam
    Replies:
    2
    Views:
    618
    Bjorn Sagbakken
    Jul 12, 2007
  4. Replies:
    5
    Views:
    3,502
    Kaz Kylheku
    Apr 11, 2008
  5. Tony
    Replies:
    2
    Views:
    326
Loading...

Share This Page