How to limit the number of web pages downloaded from a site?

Discussion in 'HTML' started by Nad, Aug 8, 2008.

  1. Nad

    Nad Guest

    I have a very large site with valuable information.
    Is there any way to prevent downloading a large number
    of articles. Some people want to download the entire site.

    Any hints or pointers would be appreciated.
     
    Nad, Aug 8, 2008
    #1
    1. Advertisements

  2. Nad

    dorayme Guest

    Password protect folders or pages, make users register to get the
    passwords, that would slow them down a bit. But really, if you make
    stuff available publicly...
     
    dorayme, Aug 9, 2008
    #2
    1. Advertisements

  3. Gazing into my crystal ball I observed (Nad) writing in
    You could store their IP address in a session, and check to see the length
    of time between requests.
     
    Adrienne Boswell, Aug 9, 2008
    #3
  4. Nad

    Neredbojias Guest

    Change the articles' text to Olde Englishe.
     
    Neredbojias, Aug 9, 2008
    #4
  5. Nad

    Nad Guest

    Well, something along those lines.
    The problem is the server side support.
    Some servers do not allow cgi, php, javascript, or even ssi
    executable commands, and I'd like it to work on ANY server.
     
    Nad, Aug 9, 2008
    #5
  6. Nad

    Nad Guest

    :--}

    I like that!!!
     
    Nad, Aug 9, 2008
    #6
  7. Nad

    Lars Eighner Guest

    It depends upon what you mean by 'articles.' If put you html documents on a
    web server. you are pretty much inviting the public to view/download as much
    of it as they want. If it is 'valuable', why are you giving it away? And
    if you are giving it away valuable stuff, what did you expect? What is your
    real concern here?

    If you only worried about server load, why not zip or tar and gzip it up
    and put it on an FTP server? This is most practical for related documents,
    such as parts of a tutorial or parts of a spec. If you are a philanthropist
    who is giving away valuable stuff, you can give it away in big chunks so
    the nickel and dime requests don't bug you.

    Well-behaved download-the-whole-site spiders will obey robots.txt, but that
    is pretty much a courtesy thing, and it won't stop anyone who is manually
    downloading a page at a time, and it won't stop rogue or altered spiders.
    Likewise, you can block nice spiders which send a true user-agent ID, but
    not so nice spiders can spoof their ID. That's kind of pointless, because
    most of the nice spiders will obey robots.txt anyway.

    You can make pages available through php or cgi which keeps track of the
    number of documents with hidden controls. This is easily defeated by
    anyone determined to do so, and like a cheap lock, will only keep the honest
    people out. Beyond that, you can go to various user account schemes up to
    putting your documents on a secure server.

    But I think what you are asking is 'Can I keep my documents public and still
    limit public access?' And the answer to that is, of course not because
    there is a fundamental contradiction in what you want.
     
    Lars Eighner, Aug 9, 2008
    #7
  8. Nad

    richard Guest


    If I go to your website and view every one of your pages, guess what?
    I've already downloaded them onto my machine.

    To help stop downloading, you could put on your prescious pages a
    simple coded text block so that they have to type in a code to get the
    page.
     
    richard, Aug 9, 2008
    #8
  9. Nad

    Neredbojias Guest

    <grin>

    Seriously, I don't think there's much you can do that is practical. With
    server-side support, you could impliment some kind of time limit and/or p/w
    but you indicated you didn't want to rely on that. An off-the-wall "non-
    solution" would be to use reasonably long meta page redirects, but the user
    could always come back with a new time limit.
     
    Neredbojias, Aug 9, 2008
    #9
  10. Nad

    Nad Guest

    It doesn't work. For example, Teleport Pro (a program to download
    the entire sites) allows you to specify login/passwd.
    So, once they register, they can enter this info and boom...
    Well, the site is 150 megs, over 20k articles.
    And there are plenty of people who would LOVE to have
    the entire site on their own box.
    Then you have a problem. Providers usually charge for
    the amount of traffic. In one month, you'd have to shell
    out some bux, just to give the information to the
    "gimme free Coce" zombies.
    That does not make sense.
     
    Nad, Aug 9, 2008
    #10
  11. Nad

    Nad Guest

    Something along those lines.
    I was thinking of detecting automated downloads.
    When people simply look at the information manually,
    it is all fine and dandy.
    But when they start running a program to download hundreds
    if not thousands of articles, that is another issue.

    But there are issues with this.
    Some web hosting vendors do not allow the executables
    to run as it is a security risk. They may not allow
    cgi, php, even executable commands of ssi and javascripts.

    Now, in order to detect a page access by the client,
    you either add an ssi include statement to either cgi
    or javascript, or make your pages dynamically assemble
    with php. But what if you can not even run those because
    the provider does not allow it?

    Sure, you can shell out some serious bux to get yourself
    a premiere hosting facility where you can have your own
    virtual domain. But if you spent years developing tools
    to automatically build a 20k+ article site, and are
    willing to give the information for free, then financing
    the top notch provider on the top of it, is not something
    that excites my imagination.

    There is an excellent provider - by.ru. They are free.
    They are huge, several hundred thousand sites and free
    email users. But they do not allow ANY executables to
    run. They even disable the ssi executable statements.

    So, what do you do in that case?
     
    Nad, Aug 9, 2008
    #11
  12. Nad

    Nad Guest

    Downloading the entire 150+ meg site, which translates into
    all sorts of things.
    That doesn't work. Some random user may come and download the
    entire site. By the time you put him into robots.txt, it is too late.
    That is not a problem. They can manually download as much as they want.
    But no automated downloads.
    Well, no account schemes, no user verification, no limits beyond
    trying to automatically download the entire site pretty much.
    Not really. AUTOMATED download.
    I do not see it at the moment.
     
    Nad, Aug 9, 2008
    #12
  13. Nad

    Nad Guest

    No problem. You did it manually, and you just viewed the information,
    which is exactly what this site is for.
    Nope. That's a hassle for the user.
    They should be able to move freely around the site
    without any passwords, log-ins or any other crap.
    But they shold not be able to download the whole thing.
    That is all.
     
    Nad, Aug 9, 2008
    #13
  14. Nad

    dorayme Guest

    I understand your concerns and it is natural to worry a bit. But
    consider again.

    That there is Teleport Pro does not actually show that my suggestion
    would not work. Perhaps you are looking at every stage at worst case
    possibilities. It would limit it to people who knew about this program
    or be prepared to get it. That is one thing. The other thing is that
    granting passwords might be conditional on them agreeing not to do what
    you fear. Is your site a serious site liable to attract serious people?
    You might be surprised how decent most people are if you make things
    clear.

    How sure are you of the likelihood of a whole bunch of people wanting to
    download the whole lot? Most people are wary of over exposing themselves
    to information and will get what they are interested in. So I guess, you
    need to do some guessing and some analysis. Perhaps you are worrying
    excessively?

    Presumably you would be hoping your site is used and is useful. If a
    bunch of folk download a small bunch of articles each, this might well
    be the biggest factor rather than a few who download the lot. You would
    have to make some projections concerning this, you would be in the best
    position to crunch some numbers as it is your field. If you are more
    successful than you imagine via people doing reasoanable things rather
    than unreasonable things, you perhaps ought to be preparing yourself for
    the possibility of serious server charges. I understand your concern to
    limit things, but a huge site carves out a certain territory and you may
    need to consider charging for access?

    The other suggestion I might make is that you provide for the odd
    possibility of some people wanting the lot by employing compressed
    archives and utilising other than your own server, there might be some
    free servers or cheap servers for this express purpose.
     
    dorayme, Aug 9, 2008
    #14
  15. Nad

    Nad Guest

    It doesn't work. For example, Teleport Pro (a program to download
    the entire sites) allows you to specify login/passwd.
    So, once they register, they can enter this info and boom...
    Well, the site is 150 megs, over 20k articles.
    And there are plenty of people who would LOVE to have
    the entire site on their own box.
    Then you have a problem. Providers usually charge for
    the amount of traffic. In one month, you'd have to shell
    out some bux, just to give the information to the
    "gimme free Coce" zombies.
    That does not make sense.


    --
    The most powerful Usenet tool you have ever heard of.

    NewsMaestro v. 4.0.8 has been released.

    * Several nice improvements and bug fixes.

    Note: In some previous releases some class files were missing.
    As a result, the program would not run.
    Sorry for the inconvenience.

    Web page:
    http://newsmaestro.sourceforge.net/

    Download page:
    http://newsmaestro.sourceforge.net/Download_Information.htm

    Send any feedback, ideas, suggestions, test results to
    newsmaestroinfo \at/ mail.ru.

    Your personal info will not be released and your privacy
    will be honored.
     
    Nad, Aug 9, 2008
    #15
  16. Nad

    Nad Guest

    No problem. You did it manually, and you just viewed the information,
    which is exactly what this site is for.
    Nope. That's a hassle for the user.
    They should be able to move freely around the site
    without any passwords, log-ins or any other crap.
    But they shold not be able to download the whole thing.

    That is all.
     
    Nad, Aug 9, 2008
    #16
  17. Nad

    Nad Guest

    Something along those lines.
    I was thinking of detecting automated downloads.
    When people simply look at the information manually,
    it is all fine and dandy.
    But when they start running a program to download hundreds
    if not thousands of articles, that is another issue.

    But there are issues with this.
    Some web hosting vendors do not allow the executables
    to run as it is a security risk. They may not allow
    cgi, php, even executable commands of ssi and javascripts.

    Now, in order to detect a page access by the client,
    you either add an ssi include statement to either cgi
    or javascript, or make your pages dynamically assemble
    with php. But what if you can not even run those because
    the provider does not allow it?

    Sure, you can shell out some serious bux to get yourself
    a premiere hosting facility where you can have your own
    virtual domain. But if you spent years developing tools
    to automatically build a 20k+ article site, and are
    willing to give the information for free, then financing
    the top notch provider on the top of it, is not something
    that excites my imagination.

    There is an excellent provider - by.ru. They are free.

    They are huge, several hundred thousand sites and free
    email users. But they do not allow ANY executables to
    run. They even disable the ssi executable statements.

    So, what do you do in that case?
     
    Nad, Aug 9, 2008
    #17
  18. Nad

    Nad Guest

    Downloading the entire 150+ meg site, which translates into
    all sorts of things.
    That doesn't work. Some random user may come and download the
    entire site. By the time you put him into robots.txt, it is too late.
    That is not a problem. They can manually download as much as they want.
    But no automated downloads.
    Well, no account schemes, no user verification, no limits beyond
    trying to automatically download the entire site pretty much.
    Not really. AUTOMATED download.
    I do not see it at the moment. Can you expand on that?
     
    Nad, Aug 9, 2008
    #18
  19. Nad

    Nad Guest

    Well, Google does it. Sure, it is slightly a different setup,
    but they limit the number of queries to 100.
    Time limit on high bandwidth does not work.
    Could you expand on that idea?
     
    Nad, Aug 9, 2008
    #19
  20. Nad

    Nad Guest

    Not really. There are quite a few programs out there.
    Easiest thing in the world to find.
    Do a search on Teleport Pro. A VERY nice program.
    Well, if you use that site, your salary could go up quite a bit
    just in a few months. Is that "serious"?
    :--}

    I wish I had your optimism. But I have seen plenty of evidence
    otherwise.
    It's been done already.
    Who knows. But there is an issue here. I have no doubts about that much.
    That is not a problem.
    But those few count hundred times more that fair users.
    That is what we are doing here.
    Yep. For about $100/mo. I can have my own virtual domain
    on a tier 1 network. But that bites. There is no income from
    this enterprise. Nobody is going to give you a dime for getting
    something. That much I have seen.
    Nope. That is not reasonable. It should be totally free.
    No registration, no charges of any kind.
    Just come and look at anything you want. But not be a bastard.
    Thanks for feedback.
     
    Nad, Aug 9, 2008
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.