How to automate download pdf from web in ruby

P

Priyank Shah

Hi,

I want to download pdf file from website. But i cannot use rails
"send_file" as i want it in ruby only.

e.g : http://www.example.com/abc.pdf

Now from above URL I just want to automate download "abc.pdf" rather
than

click on save.

I am looking for some good solutions and suggestions.

Thanks,
Priyank Shah
 
A

Anurag Priyam

e.g : http://www.example.com/abc.pdf
Now from above URL I just want to automate download "abc.pdf" rather
than

click on save.

I am looking for some good solutions and suggestions.

On a Linux box you can happily use wget. Here is a simple use, and
throw script that I used to download video lectures (of Design of
Machine Elements :)) from NPTEL[1]. I hope this gives you an idea.

require 'rubygems'
require 'nokogiri'
require 'open-uri'

proxy = ENV["http_proxy"]
page = "http://nptel.iitm.ac.in/video.php?courseId=1063"

# get the title of all the lectures to use as a filename later
doc = Nokogiri open(page, :proxy => proxy)
titles = doc.search("td.videolink a").map(&:text)

# download url follows a common pattern
# construct the url, and download it
(1..9).each do |i|
url = "http://npteldownloads.iitm.ac.in/flv/1063/lec0#{i}.flv"
lecture = titles[i - 1]
puts lecture
%x|wget -c '#{url}' -O '#{lecture}'|
end

(10..40).each do |i|
url = "http://npteldownloads.iitm.ac.in/flv/1063/lec#{i}.flv"
lecture = titles[i - 1]
puts lecture
%x|wget -c '#{url}' -O '#{lecture}'|
end

[1] http://nptel.iitm.ac.in/
 
C

Colin Bartlett

e.g : http://www.example.com/abc.pdf
Now from above URL I just want to automate download "abc.pdf" rather
than click on save.
I am looking for some good solutions and suggestions.

On a Linux box you can happily use wget. Here is a simple use, and
throw script that I used to download video lectures (of Design of
Machine Elements :)) from NPTEL[1]. I hope this gives you an idea.

wget also seems to work well on Microsoft Windows systems, both stand
alone and run as a process from Ruby. (I use the system command rather
than %x, but that's because I feel more comfortable using system.)

Specifically on running wget from Ruby, I did this quite a lot about
three years ago on a dial-up connection using Microsoft Windows XP,
and much more recently I've been doing it on a broadband connection
using Microsoft Windows Vista, wrapping wget with some Ruby methods
(including using Dir["path/*"] before and after running wget and
differencing to find the downloaded files), and that's fairly easy to
do. (If you'd like to see the wrapping code, I'd be happy to post it
on Github.)

There is a thread here:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/333425
"I would need a ruby "wget version" which works on linux and windows.
I would like to feed it an URL to a .tar.bz2 or .zip or .tar.gz file
and have it download. That's what it basically should do. Right now I
use
system 'wget '+the_url
which does not work on windows easily (but will work on pretty much
all the linuxes out there)"

1. In the thread there were some suggestions that you could probably
just use Net::HTTP or openuri. I used (and use) wget because (rightly
or wrongly) I suspect it may be more robust with possibly flaky
connections and/or large files (and I knew how to use wget, so there
wasn't a learning curve), but I will be experimenting with Net::HTTP,
out of curiousity and to get downloads straight into Ruby (I hope)
instead of via a file downloaded by wget.

2. As I said, I found using wget from Ruby worked easily for me on
Microsoft Windows, but I'd be interested in any experiences to the
contrary.

3. I seem to recall there was a wget.rb (something like that) which
wrapped wget and which was installed with the Ruby Windows Installer,
but I've just looked in my MS Windows Ruby and JRuby and couldn't find
it. But it should be easy to write your own wrapper for what you want
to do.
 
A

Anurag Priyam

wasn't a learning curve), but I will be experimenting with Net::HTTP,
out of curiousity and to get downloads straight into Ruby (I hope)
instead of via a file downloaded by wget.

See if the download, and unzip functions here[1] helps. My fork[2] has
proxy support. I have send a pull request too, but the developer seems
to be on a leave.

It uses 'progressbar' gem to show progress, but you can safely omit it
and make a dependency free version :).

[1] https://github.com/maccman/bowline/blob/master/lib/bowline/tasks/libs.rake
[2] https://github.com/yeban/bowline/blob/proxy_support/lib/bowline/tasks/libs.rake
 
N

Nicholas A.

You can do it with CGI:

#!/usr/bin/ruby

puts "Content-Type: application/x-unknown\n\n"
puts "Content-Length: application/x-unknown\n\n"
puts "Content-Disposition: attachment; filename=abc.pdf\n\n"

# output the binary file data here...
 
M

Michael Peterson

I want to download pdf file from website. But i cannot use rails
"send_file" as i want it in ruby only.

e.g : http://www.example.com/abc.pdf

Now from above URL I just want to automate download "abc.pdf" rather
than

The simplest pure-ruby way to do it that I know of is to use the rio
library:

I just ran this simple program and it downloaded the pdf perfectly:

require 'rubygems'
require 'rio'
rio("http://www.sqlite.org/copyright-release.pdf") >
rio('sqlite-copyright-release.pdf')

-Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top