text to copy out of html sides

M

M. Lesaar

Hello,

there is a web page with links to other sides which include texts which I
want to copy into word. There are quite a lot of links. Is there a
possibility to get get the original texts without clicking on each of those
links a copy the text manually?

Thanks for your help.

Marcel
 
R

Roy Schestowitz

M. Lesaar said:
Hello,

there is a web page with links to other sides which include texts which I
want to copy into word. There are quite a lot of links. Is there a
possibility to get get the original texts without clicking on each of
those links a copy the text manually?

Thanks for your help.

Marcel

If you use Word, I assume that you work under Windows and you will lack
flexibility. You can install Cygwin (www.cygwin.com) to get Linux
functionality, which will enable you to do the following.

If the page is located at ADDRESS, run the following command:

wget -r -l2 -t1 -N -np -erobots=off ADDRESS

This assumes internal links, but can be modified as necessary (see 'man
wget')

You should then have a directory (or several directories) with all the text
(hopefully not hypertext, which complicates things). You can then append
the files using 'cat' (see 'man cat').

I am afraid that I see no simpler alternatives. If you don't perform this
task often, then it is not worth the investment.

Roy
 
C

Csaba Gabor

M. Lesaar said:
there is a web page with links to other sides which include texts which I
want to copy into word. There are quite a lot of links. Is there a
possibility to get get the original texts without clicking on each of those
links a copy the text manually?

You can use vbscript to do this fairly straightforwardly on your Windows system,
by fleshing out the below to an HTML2Word.vbs file:

set ie = newIEtoForeground("HTML to Word")
SourcePage = "your web page address"
ie.Navigate(SourcePage)
Do Until ie.ReadyState=4 : Wscript.Sleep 10 : Loop
Now get the links for this page and stuff them into an array or some such
For each such link:
ie.Navigate(that link)
Do Until ie.ReadyState=4 : Wscript.Sleep 10 : Loop
myText = ie.Document.Body.innerText
Save the text to Word here
End of For

You can find the code for newIEtoForeground at
http://groups-beta.google.com/group....vbscript/browse_frm/thread/b5a4788bb2dacc09/

Of course you still need to save out the links in the first page
and then you have to save myText to Word
(microsoft.public.scripting.vbscript can help you with questions)
but this should give you a framework.

Csaba Gabor from Vienna
 
M

M. Lesaar

Hello,

I actually have problems to understand what to do. But now I have a list in
ms word with only the links listed. How can I get the html-files downloaded
or into word?

Thanks for your help.

Bye. Marcel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top