How to get text from PDF?

Discussion in 'Perl Misc' started by Shahid, Dec 22, 2008.

  1. Shahid

    Shahid Guest

    Hi all,

    I have my web server bases on linux. I am working on a project for
    which I need to get text out of PDF file. I need to know which text
    belongs to which PDF page number?

    Is there any utility/tool that should be installed on linux and I can
    use it from command line in PHP through exec() or system() etc for
    this purpose?

    Please reply me urgently.

    Thanks in advance.
     
    Shahid, Dec 22, 2008
    #1
    1. Advertising

  2. Shahid

    smallpond Guest

    On Dec 22, 10:06 am, Shahid <> wrote:
    > Hi all,
    >
    > I have my web server bases on linux. I am working on a project for
    > which I need to get text out of PDF file. I need to know which text
    > belongs to which PDF page number?
    >
    > Is there any utility/tool that should be installed on linux and I can
    > use it from command line in PHP through exec() or system() etc for
    > this purpose?
    >
    > Please reply me urgently.
    >
    > Thanks in advance.



    There is a module on CPAN called PDF::OCR::Thorough which attempts
    to extract text from pdf docs. I've never used it and it looks like
    a fair amount of work to set up. If the pdf file has a known simple
    structure, there may be easier ways.
     
    smallpond, Dec 22, 2008
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. vasudevram
    Replies:
    0
    Views:
    581
    vasudevram
    Jul 22, 2006
  2. vasudevram
    Replies:
    0
    Views:
    374
    vasudevram
    Oct 27, 2006
  3. Ricardo Pog
    Replies:
    1
    Views:
    492
    Austin Ziegler
    Mar 26, 2008
  4. Sean Nakasone
    Replies:
    1
    Views:
    431
    Farrel Lifson
    Apr 14, 2008
Loading...

Share This Page