How to limit the number of web pages downloaded from a site?

Discussion in 'Java' started by Nad, Aug 8, 2008.

  1. Nad

    Nad Guest

    I have a very large site with valuable information.
    Is there any way to limit the number of articles that can
    be downloaded from the site?

    What are the options?

    Any hints or pointers would be appreciated.

    --
    The most powerful Usenet tool you have ever heard of.

    NewsMaestro v. 4.0.8 has been released.

    * Several nice improvements and bug fixes.

    Note: In some previous releases some class files were missing.
    As a result, the program would not run.
    Sorry for the inconvenience.

    Web page:
    http://newsmaestro.sourceforge.net/

    Download page:
    http://newsmaestro.sourceforge.net/Download_Information.htm

    Send any feedback, ideas, suggestions, test results to
    newsmaestroinfo \at/ mail.ru.

    Your personal info will not be released and your privacy
    will be honored.
     
    Nad, Aug 8, 2008
    #1
    1. Advertising

  2. Nad wrote:
    > I have a very large site with valuable information.
    > Is there any way to limit the number of articles that can
    > be downloaded from the site?


    Don't put them up.

    > What are the options?


    Disable your web server? Don't put up so many 'valuable' articles?

    > Any hints or pointers would be appreciated.


    What are you really trying to do?

    --

    Knute Johnson
    email s/nospam/knute2008/

    --
    Posted via NewsDemon.com - Premium Uncensored Newsgroup Service
    ------->>>>>>http://www.NewsDemon.com<<<<<<------
    Unlimited Access, Anonymous Accounts, Uncensored Broadband Access
     
    Knute Johnson, Aug 9, 2008
    #2
    1. Advertising

  3. Nad

    Nad Guest

    In article <489cd1d8$0$4028$>, Knute Johnson <> wrote:
    >Nad wrote:
    >> I have a very large site with valuable information.
    >> Is there any way to limit the number of articles that can
    >> be downloaded from the site?

    >
    >Don't put them up.


    Well, that's not quite an option.

    >> What are the options?


    >Disable your web server?


    Think harder.

    >Don't put up so many 'valuable' articles?


    Why?

    >> Any hints or pointers would be appreciated.


    >What are you really trying to do?


    Trying to prevent downloading the entire 150 meg sized site.
    Simple as that.
     
    Nad, Aug 9, 2008
    #3
  4. Nad

    Nad Guest

    In article <g7ikhn$bnb$>, (Nad) wrote:
    >In article <489cd1d8$0$4028$>, Knute Johnson
    > <> wrote:
    >>Nad wrote:
    >>> I have a very large site with valuable information.
    >>> Is there any way to limit the number of articles that can
    >>> be downloaded from the site?

    >>
    >>Don't put them up.

    >
    >Well, that's not quite an option.
    >
    >>> What are the options?

    >
    >>Disable your web server?

    >
    >Think harder.
    >
    >>Don't put up so many 'valuable' articles?

    >
    >Why?


    Don't you think that what YOU have to say
    is valuable information?

    :--}

    >>> Any hints or pointers would be appreciated.

    >
    >>What are you really trying to do?

    >
    >Trying to prevent downloading the entire 150 meg sized site.
    >Simple as that.
    >
     
    Nad, Aug 9, 2008
    #4
  5. Nad

    Nad Guest

    In article <g7ilpv$gd0$>, (Nad) wrote:
    >In article <g7ikhn$bnb$>, (Nad) wrote:
    >>In article <489cd1d8$0$4028$>, Knute Johnson
    >> <> wrote:
    >>>Nad wrote:
    >>>> I have a very large site with valuable information.
    >>>> Is there any way to limit the number of articles that can
    >>>> be downloaded from the site?
    >>>
    >>>Don't put them up.

    >>
    >>Well, that's not quite an option.
    >>
    >>>> What are the options?

    >>
    >>>Disable your web server?

    >>
    >>Think harder.
    >>
    >>>Don't put up so many 'valuable' articles?

    >>
    >>Why?

    >
    >Don't you think that what YOU have to say
    >is valuable information?


    >:--}


    Here is the list of Java "experts" here.
    Don't you think what they have to say is valuable information?
    Btw, you can add or comment out any of them.
    We'll look at that and possibly adjust the list.
    Have fun.

    ;
    ; Comments are OK
    ;
    // This is a comment, it will be ignored
    ; and so will the blank lines

    ; Anyone from Sun talking on Java is good enough of an expert
    @sun.com

    ; Well, if we are talking about the database issues, then ...
    ; Oracle.com
    @oracle.com

    ; Da main priest on comp.lang.java.programmer
    ; He's been posting for slightly more than a year to cljp,
    ; but he knows plenty, that's fer sure
    ; Lew <>
    Lew

    ; He is da priest number number 2
    ; "Andrew Thompson" <>
    Andrew Thompson

    ; Eric Sosman is java expert from Sun
    ; Eric Sosman <>
    ; Is that true? :--}
    ; Otherwise, we'll have to take him out from global library :--}
    Eric Sosman

    ; She's quite good, no kwestions abouts its
    ; Patricia Shanahan <>
    Patricia Shanahan

    ; Da priest number 3, well somewhere in that range
    ; "Daniel Pitts" <>
    Daniel Pitts

    ; One of my favorites. Smart dude and not obnoxious
    ; as other priests. Actually, he isn't a priest.
    ; Piotr Kobzda <>
    Piotr Kobzda

    ; Thomas Hawtin <>
    Thomas Hawtin

    ; John Ersatznom <>
    John Ersatznom

    ; Brandon McCombs <>
    Brandon McCombs

    ; =?ISO-8859-1?Q?Arne_Vajh=F8j?= <>
    Arne_Vajh

    ; "Oliver Wong" <>
    Oliver Wong

    ; Shawn is clueless newbie
    ;Shawn <>

    ; Chris Uppal

    ;"Jeff Higgins" <>
    Jeff Higgins

    ; "Karl Uppiano" <>
    Karl Uppiano

    ; Joshua Cranmer <>
    Joshua Cranmer

    ; Knute Johnson <>
    Knute Johnson

    ; Robert Klemme <>
    Robert Klemme

    ; Tom Forsmo <>
    Tom Forsmo

    ; Nigel Wade <>
    Nigel Wade

    ; Twisted <>
    Twisted

    ; Manivannan Palanichamy <>
    Manivannan Palanichamy

    ; "Mike Schilling" <>
    "Mike Schilling"

    ; Owen Jacobson <>
    Owen Jacobson

    ; Wojtek <>
    ; Wojtek

    ; Expert on regular expressions at least
    ; Lars Enderin <>
    Lars Enderin

    ; Expert on regular expressions at least
    ;Jussi Piitulainen <>
    Jussi Piitulainen

    ; "shweta" <>
    shweta

    ; Tom McGlynn
    ;

    ; Tom Anderson <>
    Tom Anderson

    ; Alexey <>
    inline_four

    ; "Ted Hopp" <>
    Ted Hopp

    ; "Kenneth P. Turvey" <>
    Kenneth P. Turvey

    >>>> Any hints or pointers would be appreciated.

    >>
    >>>What are you really trying to do?

    >>
    >>Trying to prevent downloading the entire 150 meg sized site.
    >>Simple as that.
    >>
     
    Nad, Aug 9, 2008
    #5
  6. Nad

    Jeff Higgins Guest

    Nad wrote:
    >
    > Any hints or pointers would be appreciated.
    >


    <http://sourceforge.net/project/stats/?group_id=218169&ugn=newsmaestro>
     
    Jeff Higgins, Aug 9, 2008
    #6
  7. Nad

    Jeff Higgins Guest

    Nad wrote:
    >I have a very large site with valuable information.
    > Is there any way to limit the number of articles that can
    > be downloaded from the site?
    >


    <http://sourceforge.net/project/stats/?group_id=218169&ugn=newsmaestro>

    Whatever you are doing now seems to be working well.
     
    Jeff Higgins, Aug 9, 2008
    #7
  8. Nad

    Cork Soaker Guest

    Nad wrote:
    > I have a very large site with valuable information.
    > Is there any way to limit the number of articles that can
    > be downloaded from the site?
    >
    > What are the options?


    You're an idiot.
    Everyone agrees.
     
    Cork Soaker, Aug 9, 2008
    #8
  9. Nad

    Roedy Green Guest

    On Fri, 08 Aug 2008 22:45:42 GMT, (Nad) wrote, quoted
    or indirectly quoted someone who said :

    >I have a very large site with valuable information.
    >Is there any way to limit the number of articles that can
    >be downloaded from the site?
    >
    >What are the options?
    >
    >Any hints or pointers would be appreciated.


    1. cookie. This will defeat only casual browsers. Even a newbie could
    delete his cookies.

    2. IP. The problem with this is everyone coming from some large
    corporation is going to talk to you with the same IP. You will treat
    them all as identical.

    3. Make people register and get an account/password/certificate.
    Someone trying to defeat you could register many times under different
    throw-away email addresses.

    4. Make people pay a nominal sum to register, or to fill the tank with
    gas.

    I think the Internet should have a price on each page that is
    automatically collected by the system itself. It could have ad-free
    for pay and ad-subsidised versions. The page could be cached by the
    system all over the net for months at a time.
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
     
    Roedy Green, Aug 9, 2008
    #9
  10. Nad

    Roedy Green Guest

    On Sat, 09 Aug 2008 01:23:28 +0100, Cork Soaker
    <> wrote, quoted or indirectly quoted someone
    who said :

    >You're an idiot.
    >Everyone agrees.


    I don't agree. I don't know what the data is. For example, if he has
    a real estate website, he does not want a competitor screenscraping
    his entire database.
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
     
    Roedy Green, Aug 9, 2008
    #10
  11. Nad

    Arne Vajhøj Guest

    Roedy Green wrote:
    > On Sat, 09 Aug 2008 01:23:28 +0100, Cork Soaker
    > <> wrote, quoted or indirectly quoted someone
    > who said :
    >> You're an idiot.
    >> Everyone agrees.

    >
    > I don't agree. I don't know what the data is. For example, if he has
    > a real estate website, he does not want a competitor screenscraping
    > his entire database.


    Did you notice his signature ?

    Arne
     
    Arne Vajhøj, Aug 9, 2008
    #11
  12. Nad

    Arne Vajhøj Guest

    Arne Vajhøj wrote:
    > Roedy Green wrote:
    >> On Sat, 09 Aug 2008 01:23:28 +0100, Cork Soaker
    >> <> wrote, quoted or indirectly quoted someone
    >> who said :
    >>> You're an idiot.
    >>> Everyone agrees.

    >>
    >> I don't agree. I don't know what the data is. For example, if he has
    >> a real estate website, he does not want a competitor screenscraping
    >> his entire database.

    >
    > Did you notice his signature ?


    Not Cork's but Nad's/NewsMaestro's !

    Arne
     
    Arne Vajhøj, Aug 9, 2008
    #12
  13. Nad

    Nad Guest

    In article <>, Roedy Green
    <> wrote:
    >On Fri, 08 Aug 2008 22:45:42 GMT, (Nad) wrote, quoted
    >or indirectly quoted someone who said :
    >
    >>I have a very large site with valuable information.
    >>Is there any way to limit the number of articles that can
    >>be downloaded from the site?
    >>
    >>What are the options?
    >>
    >>Any hints or pointers would be appreciated.

    >
    >1. cookie. This will defeat only casual browsers. Even a newbie could
    >delete his cookies.


    Well, when someone wants to download your site,
    it is not clear how cookie would help. Because it is all done
    in one session.

    >2. IP. The problem with this is everyone coming from some large
    >corporation is going to talk to you with the same IP. You will treat
    >them all as identical.


    Well, I was thinking more along the lines of counting
    a number of page references. If that count exceeds some number
    in ONE session, then put up some page saying sorry, you want
    too much. It is one thing when people want to see some information.
    But it is a different thing when they want to suck it out dry.

    Plus the bandwidth issue. Not many providers like the idea
    of sucking their bandwidth. That's why some of them charge for
    the amount of traffic.

    >3. Make people register and get an account/password/certificate.
    >Someone trying to defeat you could register many times under different
    >throw-away email addresses.


    >4. Make people pay a nominal sum to register, or to fill the tank with
    >gas.


    >I think the Internet should have a price on each page that is
    >automatically collected by the system itself. It could have ad-free
    >for pay and ad-subsidised versions. The page could be cached by the
    >system all over the net for months at a time.


    Well, at this moment, this is not doable. Unless things are
    automatic and people register once, and from then on access
    anything on the Internet they want, it is impractical.
    Whenever I see some site that wants me to register, I just go back
    usually. First of all, the registration procedure is a pain
    on the neck. You have to fill all sorts of fields, disclose
    your personal information and you name it. Your email address
    could be used to send you tons of spam for one thing. How do you
    know what's in their mind? Not many sites give you an opt-out
    option in each spam email they send you, and you may have to
    spend some time to either trying to contact them with request,
    which is a wate of time, or create another rule in your
    firewall to block their address or their entire domain...
     
    Nad, Aug 9, 2008
    #13
  14. Nad wrote:
    > In article <489cd1d8$0$4028$>, Knute Johnson <> wrote:
    >> Nad wrote:
    >>> I have a very large site with valuable information.
    >>> Is there any way to limit the number of articles that can
    >>> be downloaded from the site?

    >> Don't put them up.

    >
    > Well, that's not quite an option.
    >
    >>> What are the options?

    >
    >> Disable your web server?

    >
    > Think harder.
    >
    >> Don't put up so many 'valuable' articles?

    >
    > Why?
    >
    >>> Any hints or pointers would be appreciated.

    >
    >> What are you really trying to do?

    >
    > Trying to prevent downloading the entire 150 meg sized site.
    > Simple as that.
    >


    Then write a throttle for your site, each IP gets only so many bytes and
    then it is cut off.

    I'm still not clear what you are doing that you don't want people to see
    your data yet you want people to see your data?

    --

    Knute Johnson
    email s/nospam/knute2008/

    --
    Posted via NewsDemon.com - Premium Uncensored Newsgroup Service
    ------->>>>>>http://www.NewsDemon.com<<<<<<------
    Unlimited Access, Anonymous Accounts, Uncensored Broadband Access
     
    Knute Johnson, Aug 9, 2008
    #14
  15. Nad

    Nad Guest

    In article <>, Cork Soaker
    <> wrote:
    >Nad wrote:
    >> I have a very large site with valuable information.
    >> Is there any way to limit the number of articles that can
    >> be downloaded from the site?
    >>
    >> What are the options?

    >
    >You're an idiot.
    >Everyone agrees.


    Get lost, funky ass.
    I could give a dead flying chicken about what "everyone",
    "agrees" or does not "agree".
    You, lil funkay chickan, keep following tails of other
    rats, running at forever maddening speed,
    faster, and faster and faster.
    Lil did you know, you are running to da abbys.
    Sure, it feels "nice" to be in the middle of the heard.
    Because when wolves and lions come,
    you think you have a better chance to "survive".
    But you don't even know to "survive" for WHAT?
    Ever thought about this?

    Now, can you dig it, suxy?
     
    Nad, Aug 9, 2008
    #15
  16. Nad

    Nad Guest

    In article <489d06ec$0$4048$>, Knute Johnson
    <> wrote:
    >Nad wrote:
    >> In article <489cd1d8$0$4028$>, Knute Johnson

    > <> wrote:
    >>> Nad wrote:
    >>>> I have a very large site with valuable information.
    >>>> Is there any way to limit the number of articles that can
    >>>> be downloaded from the site?
    >>> Don't put them up.

    >>
    >> Well, that's not quite an option.
    >>
    >>>> What are the options?

    >>
    >>> Disable your web server?

    >>
    >> Think harder.
    >>
    >>> Don't put up so many 'valuable' articles?

    >>
    >> Why?
    >>
    >>>> Any hints or pointers would be appreciated.

    >>
    >>> What are you really trying to do?

    >>
    >> Trying to prevent downloading the entire 150 meg sized site.
    >> Simple as that.
    >>

    >
    >Then write a throttle for your site, each IP gets only so many bytes and
    >then it is cut off.


    And how exactly do you do that?
    With what tools, languages or scripts?
     
    Nad, Aug 9, 2008
    #16
  17. Nad

    Nad Guest

    In article <489cde91$0$4016$>, "Jeff Higgins"
    <> wrote:
    >
    >Nad wrote:
    >>I have a very large site with valuable information.
    >> Is there any way to limit the number of articles that can
    >> be downloaded from the site?
    >>

    >
    ><http://sourceforge.net/project/stats/?group_id=218169&ugn=newsmaestro>
    >
    >Whatever you are doing now seems to be working well.


    Sure it does.
    Da Diamond Sword...

    Are you having fun yet?
    Otherwise it is kinda boring with all these priests,
    flying up there in the clouds, thinking they are here
    to fix everyone and put them "right".
    Sickos.

    Btw, your articles do not appear on some servers.
     
    Nad, Aug 9, 2008
    #17
  18. Nad

    Roedy Green Guest

    On Sat, 09 Aug 2008 01:02:22 GMT, (Nad) wrote, quoted
    or indirectly quoted someone who said :

    >Well, when someone wants to download your site,
    >it is not clear how cookie would help. Because it is all done
    >in one session.


    The cookie just helps identify him. You keep a tally in a database of
    how many pages he has downloaded. You add an ever-growing delay to
    the response as he gets greedy. He will eventually give up, thinking
    your site is overloaded.

    A clever hacker might try deleting cookies so he will appear to be a
    new customer.

    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
     
    Roedy Green, Aug 9, 2008
    #18
  19. Nad

    Roedy Green Guest

    On Sat, 09 Aug 2008 01:02:22 GMT, (Nad) wrote, quoted
    or indirectly quoted someone who said :

    >Well, I was thinking more along the lines of counting
    >a number of page references. If that count exceeds some number
    >in ONE session, then put up some page saying sorry, you want
    >too much. It is one thing when people want to see some information.
    >But it is a different thing when they want to suck it out dry.


    What does session mean if they don't logon? Perhaps the last hour?
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
     
    Roedy Green, Aug 9, 2008
    #19
  20. Nad

    Roedy Green Guest

    On Sat, 09 Aug 2008 01:02:22 GMT, (Nad) wrote, quoted
    or indirectly quoted someone who said :

    >
    >Well, at this moment, this is not doable.


    That's right. The Internet was not designed to efficiently deliver
    large files. They should be encrypted and stored all over the place.
    When you ask for one, you get the nearest copy. The rights to open
    and look inside is separate from the right to get or cache a copy.

    You find out what file you need, then get a copy, then open it.
    Conceptually every change to a file distributed is a new file with a
    new ID. The original author's job is only to tell people the id of
    the file they want, not to serve it. The master distribution site
    will also respond to queries about a given ID to know if it has been
    replaced, and by what.

    People are free to keep old copies around if they like.
    So for example if you wanted to download the JDK from Sun, you would
    go to the sun website. It would give your browser the ID of the JDK
    1.6.0_7 bundle. Your browser would hand that number to your IAP which
    would look for the closest copy, and arrange a download, possibly a
    simultaneous download of parts of it from different sites for speed
    and to share the load.

    Instead of coming all the way from California, it would likely come
    off one of the IAP's computers, or a server within 10 km. This cuts
    down hugely on Internet bandwidth chewed up.

    Multiply the effect when you consider all the videos people download.
    If they came from nearby server, response could be much faster.
    Automatic use of multiple servers could pretty well guarantee you
    would not have to deal with dropouts.
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
     
    Roedy Green, Aug 9, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    725
  2. Stephen
    Replies:
    2
    Views:
    305
    Stephen
    Oct 28, 2006
  3. Replies:
    1
    Views:
    1,087
    Victor Bazarov
    Jun 28, 2005
  4. Nad
    Replies:
    31
    Views:
    1,118
    David Segall
    Aug 10, 2008
  5. Nad
    Replies:
    23
    Views:
    236
    David Segall
    Aug 10, 2008
Loading...

Share This Page