text to copy out of html sides

Discussion in 'HTML' started by M. Lesaar, Apr 23, 2005.

  1. M. Lesaar

    M. Lesaar Guest

    Hello,

    there is a web page with links to other sides which include texts which I
    want to copy into word. There are quite a lot of links. Is there a
    possibility to get get the original texts without clicking on each of those
    links a copy the text manually?

    Thanks for your help.

    Marcel
     
    M. Lesaar, Apr 23, 2005
    #1
    1. Advertising

  2. M. Lesaar wrote:

    > Hello,
    >
    > there is a web page with links to other sides which include texts which I
    > want to copy into word. There are quite a lot of links. Is there a
    > possibility to get get the original texts without clicking on each of
    > those links a copy the text manually?
    >
    > Thanks for your help.
    >
    > Marcel


    If you use Word, I assume that you work under Windows and you will lack
    flexibility. You can install Cygwin (www.cygwin.com) to get Linux
    functionality, which will enable you to do the following.

    If the page is located at ADDRESS, run the following command:

    wget -r -l2 -t1 -N -np -erobots=off ADDRESS

    This assumes internal links, but can be modified as necessary (see 'man
    wget')

    You should then have a directory (or several directories) with all the text
    (hopefully not hypertext, which complicates things). You can then append
    the files using 'cat' (see 'man cat').

    I am afraid that I see no simpler alternatives. If you don't perform this
    task often, then it is not worth the investment.

    Roy

    --
    Roy S. Schestowitz
    http://Schestowitz.com
     
    Roy Schestowitz, Apr 24, 2005
    #2
    1. Advertising

  3. M. Lesaar

    Csaba Gabor Guest

    M. Lesaar wrote:
    > there is a web page with links to other sides which include texts which I
    > want to copy into word. There are quite a lot of links. Is there a
    > possibility to get get the original texts without clicking on each of those
    > links a copy the text manually?


    You can use vbscript to do this fairly straightforwardly on your Windows system,
    by fleshing out the below to an HTML2Word.vbs file:

    set ie = newIEtoForeground("HTML to Word")
    SourcePage = "your web page address"
    ie.Navigate(SourcePage)
    Do Until ie.ReadyState=4 : Wscript.Sleep 10 : Loop
    Now get the links for this page and stuff them into an array or some such
    For each such link:
    ie.Navigate(that link)
    Do Until ie.ReadyState=4 : Wscript.Sleep 10 : Loop
    myText = ie.Document.Body.innerText
    Save the text to Word here
    End of For

    You can find the code for newIEtoForeground at
    http://groups-beta.google.com/group....vbscript/browse_frm/thread/b5a4788bb2dacc09/

    Of course you still need to save out the links in the first page
    and then you have to save myText to Word
    (microsoft.public.scripting.vbscript can help you with questions)
    but this should give you a framework.

    Csaba Gabor from Vienna
     
    Csaba Gabor, Apr 24, 2005
    #3
  4. M. Lesaar

    M. Lesaar Guest

    Hello,

    I actually have problems to understand what to do. But now I have a list in
    ms word with only the links listed. How can I get the html-files downloaded
    or into word?

    Thanks for your help.

    Bye. Marcel

    "Csaba Gabor" <> schrieb im Newsbeitrag
    news:B4Jae.17494$...
    > M. Lesaar wrote:
    >> there is a web page with links to other sides which include texts which I
    >> want to copy into word. There are quite a lot of links. Is there a
    >> possibility to get get the original texts without clicking on each of
    >> those links a copy the text manually?

    >
    > You can use vbscript to do this fairly straightforwardly on your Windows
    > system,
    > by fleshing out the below to an HTML2Word.vbs file:
    >
    > set ie = newIEtoForeground("HTML to Word")
    > SourcePage = "your web page address"
    > ie.Navigate(SourcePage)
    > Do Until ie.ReadyState=4 : Wscript.Sleep 10 : Loop
    > Now get the links for this page and stuff them into an array or some such
    > For each such link:
    > ie.Navigate(that link)
    > Do Until ie.ReadyState=4 : Wscript.Sleep 10 : Loop
    > myText = ie.Document.Body.innerText
    > Save the text to Word here
    > End of For
    >
    > You can find the code for newIEtoForeground at
    > http://groups-beta.google.com/group....vbscript/browse_frm/thread/b5a4788bb2dacc09/
    >
    > Of course you still need to save out the links in the first page
    > and then you have to save myText to Word
    > (microsoft.public.scripting.vbscript can help you with questions)
    > but this should give you a framework.
    >
    > Csaba Gabor from Vienna
     
    M. Lesaar, Apr 25, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. freesoft_2000

    Sides And Collate

    freesoft_2000, Feb 11, 2005, in forum: Java
    Replies:
    0
    Views:
    425
    freesoft_2000
    Feb 11, 2005
  2. Remy
    Replies:
    3
    Views:
    649
  3. HP
    Replies:
    3
    Views:
    421
    Ben C
    Mar 9, 2007
  4. Replies:
    2
    Views:
    427
  5. Replies:
    9
    Views:
    413
    BootNic
    Aug 25, 2007
Loading...

Share This Page