Extracting images from a PDF file

Discussion in 'Python' started by Doug Farrell, Dec 27, 2007.

  1. Doug Farrell

    Doug Farrell Guest

    Hi all,

    Does anyone know how to extract images from a PDF file? What I'm looking
    to do is use pdflib_py to open large PDF files on our Linux servers,
    then use PIL to verify image data. I want to do this in order
    to find corrupt images in the PDF files. If anyone could help
    me out, or point me in the right direction, it would be most
    appreciated!

    Also, does anyone know of a way to validate a PDF file?

    Thanks in advance,
    Doug
     
    Doug Farrell, Dec 27, 2007
    #1
    1. Advertising

  2. Doug Farrell

    Carl K Guest

    Doug Farrell wrote:
    > Hi all,
    >
    > Does anyone know how to extract images from a PDF file? What I'm looking
    > to do is use pdflib_py to open large PDF files on our Linux servers,
    > then use PIL to verify image data. I want to do this in order
    > to find corrupt images in the PDF files. If anyone could help
    > me out, or point me in the right direction, it would be most
    > appreciated!
    >


    If you are ok shelling out to a binary:

    pdfimages - Portable Document Format (PDF) image extractor (version
    3.00)
    http://packages.ubuntu.com/gutsy/text/xpdf-utils

    I am trying to convert the pdf to a png, but without having to run external
    commands. so I will understand if you arn't happy with pdfimages.

    Carl K
     
    Carl K, Dec 27, 2007
    #2
    1. Advertising

  3. Doug Farrell

    writeson Guest

    On Dec 27, 1:12 am, Carl K <> wrote:
    > Doug Farrell wrote:
    > > Hi all,

    >
    > > Does anyone know how to extract images from a PDF file? What I'm looking
    > > to do is use pdflib_py to open large PDF files on our Linux servers,
    > > then use PIL to verify image data. I want to do this in order
    > > to find corrupt images in the PDF files. If anyone could help
    > > me out, or point me in the right direction, it would be most
    > > appreciated!

    >
    > If you are ok shelling out to a binary:
    >
    > pdfimages - Portable Document Format (PDF) image extractor (version
    > 3.00)http://packages.ubuntu.com/gutsy/text/xpdf-utils
    >
    > I am trying to convert the pdf to a png, but without having to run external
    > commands. so I will understand if you arn't happy with pdfimages.
    >
    > Carl K


    Carl,

    Thanks for the feedback, and I don't mind shelling out to an external
    command if it gets the job done. Thanks for the link to xpdf-utils,
    I'm going to look into it this morning.

    Doug
     
    writeson, Dec 27, 2007
    #3
  4. Doug Farrell

    Max Erickson Guest

    Doug Farrell <> wrote:

    > Hi all,
    >
    > Does anyone know how to extract images from a PDF file? What I'm
    > looking to do is use pdflib_py to open large PDF files on our
    > Linux servers, then use PIL to verify image data. I want to do
    > this in order to find corrupt images in the PDF files. If anyone
    > could help me out, or point me in the right direction, it would
    > be most appreciated!
    >
    > Also, does anyone know of a way to validate a PDF file?
    >
    > Thanks in advance,
    > Doug


    There is some discussion here:

    http://nedbatchelder.com/blog/200712.html#e20071210T064608



    max
     
    Max Erickson, Dec 27, 2007
    #4
  5. Doug Farrell

    writeson Guest

    On Dec 27, 10:13 am, writeson <> wrote:
    > On Dec 27, 1:12 am, Carl K <> wrote:
    >
    >
    >
    > > Doug Farrell wrote:
    > > > Hi all,

    >
    > > > Does anyone know how to extract images from aPDFfile? What I'm looking
    > > > to do is use pdflib_py to open largePDFfiles on our Linux servers,
    > > > then use PIL to verify image data. I want to do this in order
    > > > to find corrupt images in thePDFfiles. If anyone could help
    > > > me out, or point me in the right direction, it would be most
    > > > appreciated!

    >
    > > If you are ok shelling out to a binary:

    >
    > > pdfimages - Portable Document Format (PDF) image extractor (version
    > > 3.00)http://packages.ubuntu.com/gutsy/text/xpdf-utils

    >
    > > I am trying to convert thepdfto a png, but without having to run external
    > > commands. so I will understand if you arn't happy with pdfimages.

    >
    > > Carl K

    >
    > Carl,
    >
    > Thanks for the feedback, and I don't mind shelling out to an external
    > command if it gets the job done. Thanks for the link to xpdf-utils,
    > I'm going to look into it this morning.
    >
    > Doug


    Hi,

    Our linux servers run CentOS (4.X) I believe, and the repositories for
    this version doesn't have xpdf-utils available. I'm going to look into
    editing the sources.list file in order to get yum to install the
    necessary dependencies for me as xpdf-utils looks very useful!

    Doug
     
    writeson, Dec 28, 2007
    #5
  6. Doug Farrell

    writeson Guest

    On Dec 27, 2:17 pm, Max Erickson <> wrote:
    > Doug Farrell <> wrote:
    > > Hi all,

    >
    > > Does anyone know how to extract images from aPDFfile? What I'm
    > > looking to do is use pdflib_py to open largePDFfiles on our
    > > Linux servers, then use PIL to verify image data. I want to do
    > > this in order to find corrupt images in thePDFfiles. If anyone
    > > could help me out, or point me in the right direction, it would
    > > be most appreciated!

    >
    > > Also, does anyone know of a way to validate aPDFfile?

    >
    > > Thanks in advance,
    > > Doug

    >
    > There is some discussion here:
    >
    > http://nedbatchelder.com/blog/200712.html#e20071210T064608
    >
    > max


    Max,

    That's a very interesting snippet of code, thanks for posting the
    link! Much appreciated!

    Doug
     
    writeson, Dec 28, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ricardo Pog
    Replies:
    1
    Views:
    453
    Austin Ziegler
    Mar 26, 2008
  2. Sean Nakasone
    Replies:
    1
    Views:
    400
    Farrel Lifson
    Apr 14, 2008
  3. karthikprs

    extracting number from a pdf

    karthikprs, Feb 25, 2012, in forum: Java
    Replies:
    0
    Views:
    255
    karthikprs
    Feb 25, 2012
Loading...

Share This Page