Saving HTML as MHT

A

Albert Schlef

Hello.

I want to download some HTML page, but I also want to save with it the
images it contains. I was thinking about saving it as a MHT file, this
will make my life easier because I won't have to handle the files. I've
checked both my browsers (Firefox and Opera) but I see that there's no
command-line switch that allows me to save URLs as MHT files. I also
searched the net for a Ruby library but found one that seems to only
work on Windows (it's provided with a DLL) which is not good for me
because I'm using Ubuntu.

So, my question is:

Given a URL, how can I save this page as MHT?

(My program is in Ruby, but I don't mind delegating this part to a
command-line utility.)
 
C

Colin Bartlett

[Note: parts of this message were removed to make it a legal post.]

Hello.

I want to download some HTML page, but I also want to save with it the
images it contains. I was thinking about saving it as a MHT file, this
will make my life easier because I won't have to handle the files. I've
checked both my browsers (Firefox and Opera) but I see that there's no
command-line switch that allows me to save URLs as MHT files. I also
searched the net for a Ruby library but found one that seems to only
work on Windows (it's provided with a DLL) which is not good for me
because I'm using Ubuntu.

So, my question is:

Given a URL, how can I save this page as MHT?

(My program is in Ruby, but I don't mind delegating this part to a
command-line utility.)

Although another post cites wikipedia as implying that using the mht file
format seems like a lot of effort for not much gain, I have found it useful
to save web pages (including images) to MHT (using all of Opera, Firefox and
Internet Explorer), and then extract what I want (including images) from the
MHT file.

That said, once a web page is saved (if necessary using plugins) as MHT, as
a file with images etc in a subdir, or as zip archives, it should be fairly
easy to take out what you want from whatever the save format is.

So: is the problem saving as MHT from the command line, or one of saving
anything - MHT or HTML+Images - from the command line?

Can you use Watir or http://watij.com + JRuby? From a quick look at their
websites these may work, but I haven't tried them yet because the initial
learning curve looks a bit steep, and because at the moment (on Microsoft
Windows) I can use AutoIt with Ruby to (programatically) switch from a Ruby
DosBox to the browser, and send keystrokes to save the page as MHT or plain
HTML or whatever. It's not exactly elegant, but it does (mostly!) work. If
all else fails, can you do something similar in Linux?

If you find a reasonably elegant solution, then I'd be very interested.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top