Downloading web pages to a file (Newbie question)

W

woodyee

Hi! I'm a newbie and my request is probably over my head. Here's what I
want to do:
I'll go to a website (usually a blog). I'll then do "File/Save As" in
my browser, change the "Save as Type" to a text file, and then save it
to my desk top. How can I do this in Ruby? I've found several source
codes but they'll either save the HTML code instead of the text and/or
display it in my DOS screen instead of sending to a file. Thanks!
 
G

Gene Tani

woodyee said:
Hi! I'm a newbie and my request is probably over my head. Here's what I
want to do:
I'll go to a website (usually a blog). I'll then do "File/Save As" in
my browser, change the "Save as Type" to a text file, and then save it
to my desk top. How can I do this in Ruby? I've found several source
codes but they'll either save the HTML code instead of the text and/or
display it in my DOS screen instead of sending to a file. Thanks!

you could do:

`wget -dump ((URL))`
`curl -dump URL`
(I'm not absolute sure about the -dump switch, but it's easy to locate
info)
urllib2
 
R

rasser

woodyee said:
Hi! I'm a newbie and my request is probably over my head. Here's what I
want to do:
I'll go to a website (usually a blog). I'll then do "File/Save As" in
my browser, change the "Save as Type" to a text file, and then save it
to my desk top. How can I do this in Ruby? I've found several source
codes but they'll either save the HTML code instead of the text and/or
display it in my DOS screen instead of sending to a file. Thanks!

Maybe something like including a time stamp:
-------------------------------------------------------------------------

require 'net/http'

# if you are not behind a proxy just delete the last two params
h = Net::HTTP.new('blog.company.com', 80, 'proxy.mycompany.com', 8080)

# what file to get - use fx "index.html" if you dont know
resp, data = h.get("/PATH-TO-BLOG/FILE"+".html", nil )

t = Time.new
ts = t.strftime("%Y%m%d%H")
f = File.open("FILENAME-TO-SAVE-TO-"+ts+".html", "w")
f.syswrite data
f.close
 
C

ChrisH

woodyee said:
Hi! I'm a newbie and my request is probably over my head. Here's what I
want to do:
I'll go to a website (usually a blog). I'll then do "File/Save As" in
my browser, change the "Save as Type" to a text file, and then save it
to my desk top. How can I do this in Ruby? I've found several source
codes but they'll either save the HTML code instead of the text and/or
display it in my DOS screen instead of sending to a file. Thanks!

This code will strip the HTML tags (mostly). Will also remove any
embeded links:
require 'cgi'
require 'open-uri'
require 'uri'

def removeHTML(htmlstr)

CGI.unescapeHTML(htmlstr.gsub(/<[^>]*>/,'')).gsub(/\-\->/,'').rstrip.chomp
end

def html2txt(uri, out)
open(uri){|htmldoc|
File.open(out,'w'){|of|
of.print removeHTML(htmldoc.read)
}
}
end
site = 'http://ruby-doc.org/'
outFile = URI.parse(site).host + ".txt"
html2txt(site, outFile)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,139
Latest member
JamaalCald
Top