saving a webpage's links to the hard disk

Jetus · May 4, 2008

Is there a good place to look to see where I can find some code that
will help me to save webpage's links to the local drive, after I have
used urllib2 to retrieve the page?
Many times I have to view these pages when I do not have access to the
internet.

Gabriel Genellina · May 4, 2008

En Sun said:
Is there a good place to look to see where I can find some code that
will help me to save webpage's links to the local drive, after I have
used urllib2 to retrieve the page?
Many times I have to view these pages when I do not have access to the
internet.

Don't reinvent the wheel and use wget
http://en.wikipedia.org/wiki/Wget

castironpi · May 4, 2008

Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget

A lot of the functionality is already present.

import urllib
urllib.urlretrieve( 'http://python.org/', 'main.htm' )
from htmllib import HTMLParser
from formatter import NullFormatter
parser= HTMLParser( NullFormatter( ) )
parser.feed( open( 'main.htm' ).read( ) )
import urlparse
for a in parser.anchorlist:
print urlparse.urljoin( 'http://python.org/', a )

Output snipped:

...
http://python.org/psf/
http://python.org/dev/
http://python.org/links/
http://python.org/download/releases/2.5.2
http://docs.python.org/
http://python.org/ftp/python/2.5.2/python-2.5.2.msi
...

Jetus · May 7, 2008

A lot of the functionality is already present.

import urllib
urllib.urlretrieve( 'http://python.org/', 'main.htm' )
from htmllib import HTMLParser
from formatter import NullFormatter
parser= HTMLParser( NullFormatter( ) )
parser.feed( open( 'main.htm' ).read( ) )
import urlparse
for a in parser.anchorlist:
print urlparse.urljoin( 'http://python.org/', a )

Output snipped:

...http://python.org/psf/http://python.../python.org/ftp/python/2.5.2/python-2.5.2.msi
...

How can I modify or add to the above code, so that the file references
are saved to specified local directories, AND the saved webpage makes
reference to the new saved files in the respective directories?
Thanks for your help in advance.

castironpi · May 7, 2008

How can I modify or add to the above code, so that the file references
are saved to specified local directories, AND the saved webpage makes
reference to the new saved files in the respective directories?
Thanks for your help in advance.- Hide quoted text -

- Show quoted text -

You'd have to convert filenames in the loop to a file system path; try
writing as is with makedirs( ). You'd have to replace contents in a
file for links, so your best might be prefixing them with localhost
and spawning a small bounce-router.

Diez B. Roggisch · May 7, 2008

Jetus said:
How can I modify or add to the above code, so that the file references
are saved to specified local directories, AND the saved webpage makes
reference to the new saved files in the respective directories?
Thanks for your help in advance.

how about you *try* to do so - and if you have actual problems, you come
back and ask for help? Alternatively, there's always guru.com

Diez

castironpi · May 8, 2008

how about you *try* to do so - and if you have actual problems, you come
back and ask for help? Alternatively, there's always guru.com

Diez- Hide quoted text -

- Show quoted text -

I've tried, no avail. How does the open-source plug to Python look/
work? Firefox was able to spawn Python in a toolbar in a distant
land. Does it still? I believe under DOM, return a file named X that
contains a list of changes to make to the page, or put it at the top
of one, to be removed by Firefox. At that point, X would pretty much
be the last lexicly-sorted file in a pre-established directory. Files
are really easy to create and add syntax too, if you create a bunch of
them. Sector size was bouncing though, which brings that all the way
up to file system.

for( int docID= 0; docID++ ) {
if ( doc.links[ docID ]== pythonfileA.links[ pyID ] ) {
doc.links[ docID ].anchor= pythonfileB.links[ pyID ];
pyID++;
}
}

Data saving in condition of changing reality	0	Apr 29, 2022
Saving and rewatch a game played before on cmd with C	0	Jun 26, 2022
Why coding is not hard.	6	Aug 3, 2017
Possible PHP/WP problem with code, trouble accessing custom archive links	1	Jan 5, 2023
COMPUTER HARDWARE like hard disk, keyboard, mouse, pen-drive	0	Jul 28, 2009
Downloading/Saving to a Directory	0	Nov 28, 2013
how to get partition information of a hard disk with python	8	Sep 21, 2010
How to get/read Hard disk label / drive label	1	Nov 5, 2006

saving a webpage's links to the hard disk

Jetus

Gabriel Genellina

castironpi

Jetus

castironpi

Diez B. Roggisch

castironpi

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads