How to limit the number of web pages downloaded from a site?

N

Nad

I have a very large site with valuable information.
Is there any way to limit the number of articles that can
be downloaded from the site?

What are the options?

Any hints or pointers would be appreciated.

--
The most powerful Usenet tool you have ever heard of.

NewsMaestro v. 4.0.8 has been released.

* Several nice improvements and bug fixes.

Note: In some previous releases some class files were missing.
As a result, the program would not run.
Sorry for the inconvenience.

Web page:
http://newsmaestro.sourceforge.net/

Download page:
http://newsmaestro.sourceforge.net/Download_Information.htm

Send any feedback, ideas, suggestions, test results to
newsmaestroinfo \at/ mail.ru.

Your personal info will not be released and your privacy
will be honored.
 
K

Knute Johnson

Nad said:
I have a very large site with valuable information.
Is there any way to limit the number of articles that can
be downloaded from the site?

Don't put them up.
What are the options?

Disable your web server? Don't put up so many 'valuable' articles?
Any hints or pointers would be appreciated.

What are you really trying to do?
 
N

Nad

Don't you think that what YOU have to say
is valuable information?

Here is the list of Java "experts" here.
Don't you think what they have to say is valuable information?
Btw, you can add or comment out any of them.
We'll look at that and possibly adjust the list.
Have fun.

;
; Comments are OK
;
// This is a comment, it will be ignored
; and so will the blank lines

; Anyone from Sun talking on Java is good enough of an expert
@sun.com

; Well, if we are talking about the database issues, then ...
; Oracle.com
@oracle.com

; Da main priest on comp.lang.java.programmer
; He's been posting for slightly more than a year to cljp,
; but he knows plenty, that's fer sure
; Lew <[email protected]>
Lew

; He is da priest number number 2
; "Andrew Thompson" <[email protected]>
Andrew Thompson

; Eric Sosman is java expert from Sun
; Eric Sosman <[email protected]>
; Is that true? :--}
; Otherwise, we'll have to take him out from global library :--}
Eric Sosman

; She's quite good, no kwestions abouts its
; Patricia Shanahan <[email protected]>
Patricia Shanahan

; Da priest number 3, well somewhere in that range
; "Daniel Pitts" <[email protected]>
Daniel Pitts

; One of my favorites. Smart dude and not obnoxious
; as other priests. Actually, he isn't a priest.
; Piotr Kobzda <[email protected]>
Piotr Kobzda

; Thomas Hawtin <[email protected]>
Thomas Hawtin

; John Ersatznom <[email protected]>
John Ersatznom

; Brandon McCombs <[email protected]>
Brandon McCombs

; =?ISO-8859-1?Q?Arne_Vajh=F8j?= <[email protected]>
Arne_Vajh

; "Oliver Wong" <[email protected]>
Oliver Wong

; Shawn is clueless newbie
;Shawn <[email protected]>

; Chris Uppal

;"Jeff Higgins" <[email protected]>
Jeff Higgins

; "Karl Uppiano" <[email protected]>
Karl Uppiano

; Joshua Cranmer <[email protected]>
Joshua Cranmer

; Knute Johnson <[email protected]>
Knute Johnson

; Robert Klemme <[email protected]>
Robert Klemme

; Tom Forsmo <[email protected]>
Tom Forsmo

; Nigel Wade <[email protected]>
Nigel Wade

; Twisted <[email protected]>
Twisted

; Manivannan Palanichamy <[email protected]>
Manivannan Palanichamy

; "Mike Schilling" <[email protected]>
"Mike Schilling"

; Owen Jacobson <[email protected]>
Owen Jacobson

; Wojtek <[email protected]>
; Wojtek

; Expert on regular expressions at least
; Lars Enderin <[email protected]>
Lars Enderin

; Expert on regular expressions at least
;Jussi Piitulainen <[email protected]>
Jussi Piitulainen

; "shweta" <[email protected]>
shweta

; Tom McGlynn
; (e-mail address removed)

; Tom Anderson <[email protected]>
Tom Anderson

; Alexey <[email protected]>
inline_four

; "Ted Hopp" <[email protected]>
Ted Hopp

 
C

Cork Soaker

Nad said:
I have a very large site with valuable information.
Is there any way to limit the number of articles that can
be downloaded from the site?

What are the options?

You're an idiot.
Everyone agrees.
 
R

Roedy Green

I have a very large site with valuable information.
Is there any way to limit the number of articles that can
be downloaded from the site?

What are the options?

Any hints or pointers would be appreciated.

1. cookie. This will defeat only casual browsers. Even a newbie could
delete his cookies.

2. IP. The problem with this is everyone coming from some large
corporation is going to talk to you with the same IP. You will treat
them all as identical.

3. Make people register and get an account/password/certificate.
Someone trying to defeat you could register many times under different
throw-away email addresses.

4. Make people pay a nominal sum to register, or to fill the tank with
gas.

I think the Internet should have a price on each page that is
automatically collected by the system itself. It could have ad-free
for pay and ad-subsidised versions. The page could be cached by the
system all over the net for months at a time.
 
R

Roedy Green

You're an idiot.
Everyone agrees.

I don't agree. I don't know what the data is. For example, if he has
a real estate website, he does not want a competitor screenscraping
his entire database.
 
A

Arne Vajhøj

Roedy said:
I don't agree. I don't know what the data is. For example, if he has
a real estate website, he does not want a competitor screenscraping
his entire database.

Did you notice his signature ?

Arne
 
N

Nad

Roedy Green said:
1. cookie. This will defeat only casual browsers. Even a newbie could
delete his cookies.

Well, when someone wants to download your site,
it is not clear how cookie would help. Because it is all done
in one session.
2. IP. The problem with this is everyone coming from some large
corporation is going to talk to you with the same IP. You will treat
them all as identical.

Well, I was thinking more along the lines of counting
a number of page references. If that count exceeds some number
in ONE session, then put up some page saying sorry, you want
too much. It is one thing when people want to see some information.
But it is a different thing when they want to suck it out dry.

Plus the bandwidth issue. Not many providers like the idea
of sucking their bandwidth. That's why some of them charge for
the amount of traffic.
3. Make people register and get an account/password/certificate.
Someone trying to defeat you could register many times under different
throw-away email addresses.
4. Make people pay a nominal sum to register, or to fill the tank with
gas.
I think the Internet should have a price on each page that is
automatically collected by the system itself. It could have ad-free
for pay and ad-subsidised versions. The page could be cached by the
system all over the net for months at a time.

Well, at this moment, this is not doable. Unless things are
automatic and people register once, and from then on access
anything on the Internet they want, it is impractical.
Whenever I see some site that wants me to register, I just go back
usually. First of all, the registration procedure is a pain
on the neck. You have to fill all sorts of fields, disclose
your personal information and you name it. Your email address
could be used to send you tons of spam for one thing. How do you
know what's in their mind? Not many sites give you an opt-out
option in each spam email they send you, and you may have to
spend some time to either trying to contact them with request,
which is a wate of time, or create another rule in your
firewall to block their address or their entire domain...
 
K

Knute Johnson

Nad said:
Well, that's not quite an option.



Think harder.



Trying to prevent downloading the entire 150 meg sized site.
Simple as that.

Then write a throttle for your site, each IP gets only so many bytes and
then it is cut off.

I'm still not clear what you are doing that you don't want people to see
your data yet you want people to see your data?
 
N

Nad

Cork Soaker said:
You're an idiot.
Everyone agrees.

Get lost, funky ass.
I could give a dead flying chicken about what "everyone",
"agrees" or does not "agree".
You, lil funkay chickan, keep following tails of other
rats, running at forever maddening speed,
faster, and faster and faster.
Lil did you know, you are running to da abbys.
Sure, it feels "nice" to be in the middle of the heard.
Because when wolves and lions come,
you think you have a better chance to "survive".
But you don't even know to "survive" for WHAT?
Ever thought about this?

Now, can you dig it, suxy?
 
N

Nad

Knute Johnson said:
Then write a throttle for your site, each IP gets only so many bytes and
then it is cut off.

And how exactly do you do that?
With what tools, languages or scripts?
 
R

Roedy Green

Well, when someone wants to download your site,
it is not clear how cookie would help. Because it is all done
in one session.

The cookie just helps identify him. You keep a tally in a database of
how many pages he has downloaded. You add an ever-growing delay to
the response as he gets greedy. He will eventually give up, thinking
your site is overloaded.

A clever hacker might try deleting cookies so he will appear to be a
new customer.
 
R

Roedy Green

Well, I was thinking more along the lines of counting
a number of page references. If that count exceeds some number
in ONE session, then put up some page saying sorry, you want
too much. It is one thing when people want to see some information.
But it is a different thing when they want to suck it out dry.

What does session mean if they don't logon? Perhaps the last hour?
 
R

Roedy Green

Well, at this moment, this is not doable.

That's right. The Internet was not designed to efficiently deliver
large files. They should be encrypted and stored all over the place.
When you ask for one, you get the nearest copy. The rights to open
and look inside is separate from the right to get or cache a copy.

You find out what file you need, then get a copy, then open it.
Conceptually every change to a file distributed is a new file with a
new ID. The original author's job is only to tell people the id of
the file they want, not to serve it. The master distribution site
will also respond to queries about a given ID to know if it has been
replaced, and by what.

People are free to keep old copies around if they like.
So for example if you wanted to download the JDK from Sun, you would
go to the sun website. It would give your browser the ID of the JDK
1.6.0_7 bundle. Your browser would hand that number to your IAP which
would look for the closest copy, and arrange a download, possibly a
simultaneous download of parts of it from different sites for speed
and to share the load.

Instead of coming all the way from California, it would likely come
off one of the IAP's computers, or a server within 10 km. This cuts
down hugely on Internet bandwidth chewed up.

Multiply the effect when you consider all the videos people download.
If they came from nearby server, response could be much faster.
Automatic use of multiple servers could pretty well guarantee you
would not have to deal with dropouts.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top