I am trying to use the rpdf2text library, does anyone have any example
on how to use properly?
http://raa.ruby-lang.org/project/rpdf2txt/
this is taken from bin/rpdf2txt (comments added for your benefit
<snip>
# create a parser-instance, with the pdf-content as its first argument.
# The second argument is the encoding you want the resulting String to
# have. Note: if you need utf8, I recommend the character-encoding
# library by Nikolai Weibull
parser = Rpdf2txt:
arser.new(File.read(ARGV[0]), 'utf8')
outstream = STDOUT
if(ARGV.size == 2)
outstream = File.open(ARGV[1], 'w')
end
# create a callback handler (If you roll your own, be sure to include
# Rpdf2txt:
efaultHandler). outstream needs to respond to :<<
handler = Rpdf2txt::ColumnHandler.new(outstream, padding)
parser.extract_text(handler)
</snip>
There have recently been a couple of major improvements in how rpdf2txt
positions characters. However, there's no official release for that yet.
Since you're just starting out, I would recommend using a daily build
from
http://download.ywesee.com/rpdf2txt/rpdf2txt-daily.tar.bz2,
or download rpdf2txt via git/cogito:
cg-clone
http://scm.ywesee.com/rpdf2txt
Changelog:
http://scm.ywesee.com/?p=rpdf2txt;a=summary
hth,
let me know if it works for you
Hannes