Web Page Downloader

Chase Preuninger · May 8, 2008

I want to write a program that downloads web pages and replaces all
the relative URLs with absolute ones

EX. files/banner.jpg gets Replaced by http://www.mysite.com/files/banner.jpg

Where are the locations in which I would have to look to find a url
that needs to be replaced?

dorayme · May 8, 2008

Chase Preuninger said:
I want to write a program that downloads web pages and replaces all
the relative URLs with absolute ones

EX. files/banner.jpg gets Replaced by http://www.mysite.com/files/banner.jpg

Where are the locations in which I would have to look to find a url
that needs to be replaced?

Good question, I don't know if there is a general answer. I know that I
can do it often by S & R by targeting any href=" that does not have
after the " a http://

dorayme · May 9, 2008

Ed Jay said:
dorayme scribed:

Or do a universal search and replace on files/*.jpg.

No, the question was more general Ed.

dorayme · May 9, 2008

Ed Jay said:
dorayme scribed:

Then do a general search and replace?

No, this is not right either because there is no such thing as a general
this. You have to be specific. Hence the problem. I am not meaning to be
awkward here Ed, it just comes naturally. <g>

dorayme · May 9, 2008

Ed Jay said:
dorayme scribed:

You awkwardly missed my smiley. ;-) <----- winky

But you smoothly and elegantly *noted* my <g> <------- grin ?

viza · May 9, 2008

Hi

I want to write a program that downloads web pages and replaces all
the relative URLs with absolute ones

EX. files/banner.jpg gets Replaced byhttp://www.mysite.com/files/banner.jpg

Where are the locations in which I would have to look to find a url
that needs to be replaced?

You are reinventing the wheel:

http://www.gnu.org/software/wget/

Use the -k option without the -p option.

HTH

viza

Chase Preuninger · May 9, 2008

I was talking about something that downloads a web page so that it
will still work fine in a browser so that means replacing any
references to an external resource.

dorayme · May 9, 2008

Chase Preuninger said:
I was talking about something that downloads a web page so that it
will still work fine in a browser so that means replacing any
references to an external resource.

To take the first part of what you want, do you mean, will work fine
offline starting with a cleared cache and continue to work fine offline
to get all the other pages on the website?

Different browsers have different abilities to save webpages and sites.
With old Mac IE you could specify the level of links you wanted
preserved and it would prepare a file that worked entirely off line to
the depth wanted. Proprietary MS method. Basically it saves a page
*with* all the images and other stuff (all the info goes into the
offline file) and goes to the links on it and does the same at those
(online) pages and so on to the depth specified. You get the lot and can
view offline later. It worked *quite* well.

In Safari, you can save a page but not deeper and it resolves the urls
on that page so that if you are viewing the offline file, it will get to
the online links ok provided you are online at the time or have the page
cached. It also is prettu proprietary looking.

Firefox is more straightforward, transparent in that you get a html file
and a folder is created with the images and other resources downloaded
to you machine for that one page.

I better stop in case no one is reading me...

about relative path in asp.net	0	Oct 1, 2011
Simple web framework - improvements to makefile	0	Feb 1, 2023
Creating web pages	7	Feb 8, 2022
Bash scripts for web apps	1	Jan 16, 2023
Improving the web page download code.	5	Aug 27, 2013
New coder with a focus on animating web banners	0	Dec 15, 2022
How do I set the default content page) on a Classic ASP file?	0	Aug 24, 2021
I'm tempted to quit out of frustration	1	Aug 13, 2023

Web Page Downloader

Chase Preuninger

dorayme

dorayme

dorayme

dorayme

viza

Chase Preuninger

dorayme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads