saving a webpage's links to the hard disk

Discussion in 'Python' started by Jetus, May 4, 2008.

  1. Jetus

    Jetus Guest

    Is there a good place to look to see where I can find some code that
    will help me to save webpage's links to the local drive, after I have
    used urllib2 to retrieve the page?
    Many times I have to view these pages when I do not have access to the
    Jetus, May 4, 2008
    1. Advertisements

  2. Don't reinvent the wheel and use wget
    Gabriel Genellina, May 4, 2008
    1. Advertisements

  3. Jetus

    castironpi Guest

    A lot of the functionality is already present.

    import urllib
    urllib.urlretrieve( '', 'main.htm' )
    from htmllib import HTMLParser
    from formatter import NullFormatter
    parser= HTMLParser( NullFormatter( ) )
    parser.feed( open( 'main.htm' ).read( ) )
    import urlparse
    for a in parser.anchorlist:
    print urlparse.urljoin( '', a )

    Output snipped:

    castironpi, May 4, 2008
  4. Jetus

    Jetus Guest

    How can I modify or add to the above code, so that the file references
    are saved to specified local directories, AND the saved webpage makes
    reference to the new saved files in the respective directories?
    Thanks for your help in advance.
    Jetus, May 7, 2008
  5. Jetus

    castironpi Guest

    You'd have to convert filenames in the loop to a file system path; try
    writing as is with makedirs( ). You'd have to replace contents in a
    file for links, so your best might be prefixing them with localhost
    and spawning a small bounce-router.
    castironpi, May 7, 2008
  6. how about you *try* to do so - and if you have actual problems, you come
    back and ask for help? Alternatively, there's always

    Diez B. Roggisch, May 7, 2008
  7. Jetus

    castironpi Guest

    I've tried, no avail. How does the open-source plug to Python look/
    work? Firefox was able to spawn Python in a toolbar in a distant
    land. Does it still? I believe under DOM, return a file named X that
    contains a list of changes to make to the page, or put it at the top
    of one, to be removed by Firefox. At that point, X would pretty much
    be the last lexicly-sorted file in a pre-established directory. Files
    are really easy to create and add syntax too, if you create a bunch of
    them. Sector size was bouncing though, which brings that all the way
    up to file system.

    for( int docID= 0; docID++ ) {
    if ( doc.links[ docID ]== pythonfileA.links[ pyID ] ) {
    doc.links[ docID ].anchor= pythonfileB.links[ pyID ];
    castironpi, May 8, 2008
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.