Web Page Downloader

Discussion in 'HTML' started by Chase Preuninger, May 8, 2008.

  1. I want to write a program that downloads web pages and replaces all
    the relative URLs with absolute ones

    EX. files/banner.jpg gets Replaced by http://www.mysite.com/files/banner.jpg

    Where are the locations in which I would have to look to find a url
    that needs to be replaced?
     
    Chase Preuninger, May 8, 2008
    #1
    1. Advertising

  2. Chase Preuninger

    dorayme Guest

    In article
    <>,
    Chase Preuninger <> wrote:

    > I want to write a program that downloads web pages and replaces all
    > the relative URLs with absolute ones
    >
    > EX. files/banner.jpg gets Replaced by http://www.mysite.com/files/banner.jpg
    >
    > Where are the locations in which I would have to look to find a url
    > that needs to be replaced?


    Good question, I don't know if there is a general answer. I know that I
    can do it often by S & R by targeting any href=" that does not have
    after the " a http://

    --
    dorayme
     
    dorayme, May 8, 2008
    #2
    1. Advertising

  3. Chase Preuninger

    dorayme Guest

    In article <>,
    Ed Jay <> wrote:

    > dorayme scribed:
    >
    > >In article
    > ><>,
    > > Chase Preuninger <> wrote:
    > >
    > >> I want to write a program that downloads web pages and replaces all
    > >> the relative URLs with absolute ones
    > >>
    > >> EX. files/banner.jpg gets Replaced by
    > >> http://www.mysite.com/files/banner.jpg
    > >>
    > >> Where are the locations in which I would have to look to find a url
    > >> that needs to be replaced?

    > >
    > >Good question, I don't know if there is a general answer. I know that I
    > >can do it often by S & R by targeting any href=" that does not have
    > >after the " a http://

    >
    > Or do a universal search and replace on files/*.jpg.


    No, the question was more general Ed.

    --
    dorayme
     
    dorayme, May 9, 2008
    #3
  4. Chase Preuninger

    dorayme Guest

    In article <>,
    Ed Jay <> wrote:

    > dorayme scribed:
    >
    > >In article <>,
    > > Ed Jay <> wrote:
    > >
    > >> dorayme scribed:
    > >>
    > >> >In article
    > >> ><>,
    > >> > Chase Preuninger <> wrote:
    > >> >
    > >> >> I want to write a program that downloads web pages and replaces all
    > >> >> the relative URLs with absolute ones
    > >> >>
    > >> >> EX. files/banner.jpg gets Replaced by
    > >> >> http://www.mysite.com/files/banner.jpg
    > >> >>
    > >> >> Where are the locations in which I would have to look to find a url
    > >> >> that needs to be replaced?
    > >> >
    > >> >Good question, I don't know if there is a general answer. I know that I
    > >> >can do it often by S & R by targeting any href=" that does not have
    > >> >after the " a http://
    > >>
    > >> Or do a universal search and replace on files/*.jpg.

    > >
    > >No, the question was more general Ed.

    >
    > Then do a general search and replace? :)


    No, this is not right either because there is no such thing as a general
    this. You have to be specific. Hence the problem. I am not meaning to be
    awkward here Ed, it just comes naturally. <g>

    --
    dorayme
     
    dorayme, May 9, 2008
    #4
  5. Chase Preuninger

    dorayme Guest

    In article <>,
    Ed Jay <> wrote:

    > dorayme scribed:
    >
    > >In article <>,
    > > Ed Jay <> wrote:
    > >
    > >> >> Or do a universal search and replace on files/*.jpg.
    > >> >
    > >> >No, the question was more general Ed.
    > >>
    > >> Then do a general search and replace? :)

    > >
    > >No, this is not right either because there is no such thing as a general
    > >this. You have to be specific. Hence the problem. I am not meaning to be
    > >awkward here Ed, it just comes naturally. <g>

    >
    > You awkwardly missed my smiley. ;-) <----- winky


    But you smoothly and elegantly *noted* my <g> <------- grin ?

    --
    dorayme
     
    dorayme, May 9, 2008
    #5
  6. Chase Preuninger

    viza Guest

    Hi

    On May 8, 11:13 pm, Chase Preuninger <>
    wrote:
    > I want to write a program that downloads web pages and replaces all
    > the relative URLs with absolute ones
    >
    > EX. files/banner.jpg gets Replaced byhttp://www.mysite.com/files/banner.jpg
    >
    > Where are the locations in which I would have to look to find a url
    > that needs to be replaced?


    You are reinventing the wheel:

    http://www.gnu.org/software/wget/

    Use the -k option without the -p option.

    HTH

    viza
     
    viza, May 9, 2008
    #6
  7. I was talking about something that downloads a web page so that it
    will still work fine in a browser so that means replacing any
    references to an external resource.
     
    Chase Preuninger, May 9, 2008
    #7
  8. Chase Preuninger

    dorayme Guest

    In article
    <>,
    Chase Preuninger <> wrote:

    > I was talking about something that downloads a web page so that it
    > will still work fine in a browser so that means replacing any
    > references to an external resource.


    To take the first part of what you want, do you mean, will work fine
    offline starting with a cleared cache and continue to work fine offline
    to get all the other pages on the website?

    Different browsers have different abilities to save webpages and sites.
    With old Mac IE you could specify the level of links you wanted
    preserved and it would prepare a file that worked entirely off line to
    the depth wanted. Proprietary MS method. Basically it saves a page
    *with* all the images and other stuff (all the info goes into the
    offline file) and goes to the links on it and does the same at those
    (online) pages and so on to the depth specified. You get the lot and can
    view offline later. It worked *quite* well.

    In Safari, you can save a page but not deeper and it resolves the urls
    on that page so that if you are viewing the offline file, it will get to
    the online links ok provided you are online at the time or have the page
    cached. It also is prettu proprietary looking.

    Firefox is more straightforward, transparent in that you get a html file
    and a folder is created with the images and other resources downloaded
    to you machine for that one page.

    I better stop in case no one is reading me...

    --
    dorayme
     
    dorayme, May 9, 2008
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. BW

    File Upload/Downloader

    BW, Mar 4, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    632
  2. fishboy
    Replies:
    1
    Views:
    586
    Josiah Carlson
    Jun 4, 2004
  3. fishboy
    Replies:
    0
    Views:
    380
    fishboy
    May 31, 2004
  4. Kim
    Replies:
    4
    Views:
    406
    Patrick Ellis
    Sep 8, 2006
  5. Kim
    Replies:
    0
    Views:
    308
Loading...

Share This Page