How to extract Arabic Text from PDF file

S

Shahid

Dear All,
Hi,
I am doing following task in PHP....
I am using pdftotext command line utility of xpdf package for Windows
and Linux. It successfully extracts English text from PDF files. Now I
need to extract Unicoded Arabic text from PDF files. For this, I
tried:

"pdftotext -enc UTF-8 arabicFile.pdf arabicFile.txt"

If I remove -enc switch/parameter, there is empty space in place of
Arabic text, but English text is extracted from PDF. With -enc UTF-8,
some Arabic characters/alphabet s are extracted from PDF, but the
complete Arabic text is not extracted. I also have downloaded and
installed the xpdf-Arabic package from internet. I couldn't get the
required result i.e. Arabic Text from PDF.

Can anyone help on urgent basis? How to configure xpdf-Arabic or some
other way???


SHAHID MAHMOOD
 
R

RedGrittyBrick

Shahid said:
Dear All,
Hi,
I am doing following task in PHP....

This is a Perl newsgroup not a PHP newsgroup.

I am using pdftotext command line utility of xpdf package for Windows
and Linux. It successfully extracts English text from PDF files. Now I
need to extract Unicoded Arabic text from PDF files. For this, I
tried:

"pdftotext -enc UTF-8 arabicFile.pdf arabicFile.txt"

If I remove -enc switch/parameter, there is empty space in place of
Arabic text, but English text is extracted from PDF. With -enc UTF-8,
some Arabic characters/alphabet s are extracted from PDF, but the
complete Arabic text is not extracted. I also have downloaded and
installed the xpdf-Arabic package from internet. I couldn't get the
required result i.e. Arabic Text from PDF.

Can anyone help on urgent basis?
http://www.catb.org/~esr/faqs/smart-questions.html#urgent


How to configure xpdf-Arabic
http://www.catb.org/~esr/faqs/smart-questions.html#forum


or some other way???

"The Java edition of Aspose.Pdf.Kit supports extracting Arabic text from
PDF file."

I've no experience of this product. I expect Google will find many others.
 
T

Tim Greer

Shahid said:
Dear All,
Hi,
I am doing following task in PHP....

The PHP newsgroup is where you want to ask, or did you have some part of
your script that uses Perl? If so, does that part work and you
actually need to ask about pdftotext? Either way, I think you've
mistakenly posted to the wrong group.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,571
Members
45,045
Latest member
DRCM

Latest Threads

Top