How to limit the number of web pages downloaded from a site?

Discussion in 'Java' started by Nad, Aug 8, 2008.

  1. Nad

    Nad Guest

    I have a very large site with valuable information.
    Is there any way to limit the number of articles that can
    be downloaded from the site?

    What are the options?

    Any hints or pointers would be appreciated.

    --
    The most powerful Usenet tool you have ever heard of.

    NewsMaestro v. 4.0.8 has been released.

    * Several nice improvements and bug fixes.

    Note: In some previous releases some class files were missing.
    As a result, the program would not run.
    Sorry for the inconvenience.

    Web page:
    http://newsmaestro.sourceforge.net/

    Download page:
    http://newsmaestro.sourceforge.net/Download_Information.htm

    Send any feedback, ideas, suggestions, test results to
    newsmaestroinfo \at/ mail.ru.

    Your personal info will not be released and your privacy
    will be honored.
     
    Nad, Aug 8, 2008
    #1
    1. Advertisements

  2. Don't put them up.
    Disable your web server? Don't put up so many 'valuable' articles?
    What are you really trying to do?
     
    Knute Johnson, Aug 9, 2008
    #2
    1. Advertisements

  3. Nad

    Nad Guest

    Well, that's not quite an option.
    Think harder.
    Trying to prevent downloading the entire 150 meg sized site.
    Simple as that.
     
    Nad, Aug 9, 2008
    #3
  4. Nad

    Nad Guest

    Don't you think that what YOU have to say
    is valuable information?

    :--}
     
    Nad, Aug 9, 2008
    #4
  5. Nad

    Nad Guest

    Here is the list of Java "experts" here.
    Don't you think what they have to say is valuable information?
    Btw, you can add or comment out any of them.
    We'll look at that and possibly adjust the list.
    Have fun.

    ;
    ; Comments are OK
    ;
    // This is a comment, it will be ignored
    ; and so will the blank lines

    ; Anyone from Sun talking on Java is good enough of an expert
    @sun.com

    ; Well, if we are talking about the database issues, then ...
    ; Oracle.com
    @oracle.com

    ; Da main priest on comp.lang.java.programmer
    ; He's been posting for slightly more than a year to cljp,
    ; but he knows plenty, that's fer sure
    ; Lew <>
    Lew

    ; He is da priest number number 2
    ; "Andrew Thompson" <>
    Andrew Thompson

    ; Eric Sosman is java expert from Sun
    ; Eric Sosman <>
    ; Is that true? :--}
    ; Otherwise, we'll have to take him out from global library :--}
    Eric Sosman

    ; She's quite good, no kwestions abouts its
    ; Patricia Shanahan <>
    Patricia Shanahan

    ; Da priest number 3, well somewhere in that range
    ; "Daniel Pitts" <>
    Daniel Pitts

    ; One of my favorites. Smart dude and not obnoxious
    ; as other priests. Actually, he isn't a priest.
    ; Piotr Kobzda <>
    Piotr Kobzda

    ; Thomas Hawtin <>
    Thomas Hawtin

    ; John Ersatznom <>
    John Ersatznom

    ; Brandon McCombs <>
    Brandon McCombs

    ; =?ISO-8859-1?Q?Arne_Vajh=F8j?= <>
    Arne_Vajh

    ; "Oliver Wong" <>
    Oliver Wong

    ; Shawn is clueless newbie
    ;Shawn <>

    ; Chris Uppal

    ;"Jeff Higgins" <>
    Jeff Higgins

    ; "Karl Uppiano" <>
    Karl Uppiano

    ; Joshua Cranmer <>
    Joshua Cranmer

    ; Knute Johnson <>
    Knute Johnson

    ; Robert Klemme <>
    Robert Klemme

    ; Tom Forsmo <>
    Tom Forsmo

    ; Nigel Wade <>
    Nigel Wade

    ; Twisted <>
    Twisted

    ; Manivannan Palanichamy <>
    Manivannan Palanichamy

    ; "Mike Schilling" <>
    "Mike Schilling"

    ; Owen Jacobson <>
    Owen Jacobson

    ; Wojtek <>
    ; Wojtek

    ; Expert on regular expressions at least
    ; Lars Enderin <>
    Lars Enderin

    ; Expert on regular expressions at least
    ;Jussi Piitulainen <>
    Jussi Piitulainen

    ; "shweta" <>
    shweta

    ; Tom McGlynn
    ;

    ; Tom Anderson <>
    Tom Anderson

    ; Alexey <>
    inline_four

    ; "Ted Hopp" <>
    Ted Hopp

     
    Nad, Aug 9, 2008
    #5
  6. Nad

    Jeff Higgins Guest

    Jeff Higgins, Aug 9, 2008
    #6
  7. Nad

    Jeff Higgins Guest

    Jeff Higgins, Aug 9, 2008
    #7
  8. Nad

    Cork Soaker Guest

    You're an idiot.
    Everyone agrees.
     
    Cork Soaker, Aug 9, 2008
    #8
  9. Nad

    Roedy Green Guest

    1. cookie. This will defeat only casual browsers. Even a newbie could
    delete his cookies.

    2. IP. The problem with this is everyone coming from some large
    corporation is going to talk to you with the same IP. You will treat
    them all as identical.

    3. Make people register and get an account/password/certificate.
    Someone trying to defeat you could register many times under different
    throw-away email addresses.

    4. Make people pay a nominal sum to register, or to fill the tank with
    gas.

    I think the Internet should have a price on each page that is
    automatically collected by the system itself. It could have ad-free
    for pay and ad-subsidised versions. The page could be cached by the
    system all over the net for months at a time.
     
    Roedy Green, Aug 9, 2008
    #9
  10. Nad

    Roedy Green Guest

    I don't agree. I don't know what the data is. For example, if he has
    a real estate website, he does not want a competitor screenscraping
    his entire database.
     
    Roedy Green, Aug 9, 2008
    #10
  11. Nad

    Arne Vajhøj Guest

    Did you notice his signature ?

    Arne
     
    Arne Vajhøj, Aug 9, 2008
    #11
  12. Nad

    Arne Vajhøj Guest

    Not Cork's but Nad's/NewsMaestro's !

    Arne
     
    Arne Vajhøj, Aug 9, 2008
    #12
  13. Nad

    Nad Guest

    Well, when someone wants to download your site,
    it is not clear how cookie would help. Because it is all done
    in one session.
    Well, I was thinking more along the lines of counting
    a number of page references. If that count exceeds some number
    in ONE session, then put up some page saying sorry, you want
    too much. It is one thing when people want to see some information.
    But it is a different thing when they want to suck it out dry.

    Plus the bandwidth issue. Not many providers like the idea
    of sucking their bandwidth. That's why some of them charge for
    the amount of traffic.
    Well, at this moment, this is not doable. Unless things are
    automatic and people register once, and from then on access
    anything on the Internet they want, it is impractical.
    Whenever I see some site that wants me to register, I just go back
    usually. First of all, the registration procedure is a pain
    on the neck. You have to fill all sorts of fields, disclose
    your personal information and you name it. Your email address
    could be used to send you tons of spam for one thing. How do you
    know what's in their mind? Not many sites give you an opt-out
    option in each spam email they send you, and you may have to
    spend some time to either trying to contact them with request,
    which is a wate of time, or create another rule in your
    firewall to block their address or their entire domain...
     
    Nad, Aug 9, 2008
    #13
  14. Then write a throttle for your site, each IP gets only so many bytes and
    then it is cut off.

    I'm still not clear what you are doing that you don't want people to see
    your data yet you want people to see your data?
     
    Knute Johnson, Aug 9, 2008
    #14
  15. Nad

    Nad Guest

    Get lost, funky ass.
    I could give a dead flying chicken about what "everyone",
    "agrees" or does not "agree".
    You, lil funkay chickan, keep following tails of other
    rats, running at forever maddening speed,
    faster, and faster and faster.
    Lil did you know, you are running to da abbys.
    Sure, it feels "nice" to be in the middle of the heard.
    Because when wolves and lions come,
    you think you have a better chance to "survive".
    But you don't even know to "survive" for WHAT?
    Ever thought about this?

    Now, can you dig it, suxy?
     
    Nad, Aug 9, 2008
    #15
  16. Nad

    Nad Guest

    And how exactly do you do that?
    With what tools, languages or scripts?
     
    Nad, Aug 9, 2008
    #16
  17. Nad

    Nad Guest

    Sure it does.
    Da Diamond Sword...

    Are you having fun yet?
    Otherwise it is kinda boring with all these priests,
    flying up there in the clouds, thinking they are here
    to fix everyone and put them "right".
    Sickos.

    Btw, your articles do not appear on some servers.
     
    Nad, Aug 9, 2008
    #17
  18. Nad

    Roedy Green Guest

    The cookie just helps identify him. You keep a tally in a database of
    how many pages he has downloaded. You add an ever-growing delay to
    the response as he gets greedy. He will eventually give up, thinking
    your site is overloaded.

    A clever hacker might try deleting cookies so he will appear to be a
    new customer.
     
    Roedy Green, Aug 9, 2008
    #18
  19. Nad

    Roedy Green Guest

    What does session mean if they don't logon? Perhaps the last hour?
     
    Roedy Green, Aug 9, 2008
    #19
  20. Nad

    Roedy Green Guest

    That's right. The Internet was not designed to efficiently deliver
    large files. They should be encrypted and stored all over the place.
    When you ask for one, you get the nearest copy. The rights to open
    and look inside is separate from the right to get or cache a copy.

    You find out what file you need, then get a copy, then open it.
    Conceptually every change to a file distributed is a new file with a
    new ID. The original author's job is only to tell people the id of
    the file they want, not to serve it. The master distribution site
    will also respond to queries about a given ID to know if it has been
    replaced, and by what.

    People are free to keep old copies around if they like.
    So for example if you wanted to download the JDK from Sun, you would
    go to the sun website. It would give your browser the ID of the JDK
    1.6.0_7 bundle. Your browser would hand that number to your IAP which
    would look for the closest copy, and arrange a download, possibly a
    simultaneous download of parts of it from different sites for speed
    and to share the load.

    Instead of coming all the way from California, it would likely come
    off one of the IAP's computers, or a server within 10 km. This cuts
    down hugely on Internet bandwidth chewed up.

    Multiply the effect when you consider all the videos people download.
    If they came from nearby server, response could be much faster.
    Automatic use of multiple servers could pretty well guarantee you
    would not have to deal with dropouts.
     
    Roedy Green, Aug 9, 2008
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.