How to limit the number of web pages downloaded from a site?

A

Adrienne Boswell

Gazing into my crystal ball I observed (e-mail address removed) (Nad) writing in
Now, in order to detect a page access by the client,
you either add an ssi include statement to either cgi
or javascript, or make your pages dynamically assemble
with php. But what if you can not even run those because
the provider does not allow it?

That is transparent to the client, and you find a provider that allows
scripting on their servers. There are plenty of free hosts that do so.
Look through this group for some recently mentioned hosts.
 
N

Neredbojias

Well, Google does it. Sure, it is slightly a different setup,
but they limit the number of queries to 100.

Sure, but tell me they do it without server-side techniques which you so
explicitly eschewed...
Could you expand on that idea?

Don't think it would work but just 10-15 minute meta refresh in page head.
 
N

Nad

Ed Mullen said:
I must not be following you. It's ok if I view every document on your
site one at a time. But, it's not ok to have program on my machine
spider your site and do it automatically for me so I can view them at my
leisure later.

Realize this: the information is definetely of value, if you are
interested in these issues to beging with. That information was
collected and processed by the tools that took many years to develop.
And I know for fact that 99.99% of people only like a free Coke.
They do not value someone else's work if they can take advantage of
it.

I do not mind you downloading SOME of it. But, at the same time,
I am not interested in people that take all this work and start
"doing business" with it at my expense, without putting a penny
into it and without moving a finger to make that happen.

Now, is that problem valid?
Or is it some kind of invention?
Have you ever created anything, spend years on it,
and got fucked at the end?
How do you like THAT fer breakfast?

You see, for you it is a mindfuck pretty much.
It is a philosophy, be it of perverted kind.
It is not something YOU live.
It is something you mindfuck about while dealing with lives
of OTHER people. It is simply a mental masturbation excersize
for you. You see anything?

So, I have a LOT of patience.
But when my patience runs out,
I'll just kick you on the arse so hard,
that you are going to fly like a dead chicken.

Clear enough?
In the first case, once the document is displayed in my browser I can
save it to my local disk. And do so manually for every other document
on your site.

That's find. Not a problem with it.
Look at the topic list. Select a chapter.
Look at all the articles. Read ANY of them without limit.
But don't **** with me.
What is not clear?
So, assuming I really did want to read all the documents, what you are
doing by preventing automated downloading is annoy me, your targeted user.

First of all, did I FORCE you to even bother with it?
Did I charge you a penny?
Did you get TOP notch info?
Did you benefit from it? and you did.
Now, you want me to spend years on something
and you wouldn't even bother to give me a dime
even if I had nothing to eat.

How do YOU like the fer breakfast, jack?
What's the point?

The point is that you need to learn to appreciate life
and pay where it is due, and if you are given some benefit
that is of value to you, you have to be greatful and make
sure you give something back, and not just be a parasite.
What is not clear?

What's the point of you MINDFUCKING with all this?
What is YOUR interest in this all?
Do you mind if I live my life the way I see it?
Or do you want to clean up my brain,
so I see the reality the way YOU want it?
 
N

Nad

Adrienne Boswell said:
Gazing into my crystal ball I observed (e-mail address removed) (Nad) writing in


That is transparent to the client, and you find a provider that allows
scripting on their servers. There are plenty of free hosts that do so.
Look through this group for some recently mentioned hosts.

Well, I'd like to know about those servers. I'll review the
posts. Do you know of any off hand?

But I suspect there is a limit on the site size, isn't there?
The sites I'll be creating are in the range of 150-250 megs.
The one I am using now has no limit on size.

Btw, I was thinking if it is worth creating a site on HTML,
using this group's archive, going back a couple of years.

Unfortunately, I am not an HTML expert and the site has to
be organized on two categories of information

1) Code examples
2) Expert opinions

Now, to generate the site automatically, you have to process
the archives with fancy, multi-stage filters to make sure
you get only articles that ARE exactly on topic for some
issue. In other words, find a needle in a haystack.

The information is categorized by different issues
and I have no idea what those are for HTML. You have to spend
some time on it and create a category list of interesting or
important issues.

Secondly, you need to know who are the "experts" around here.
I have never participated in this group. So, I'd have to review
tons of posts and decide who are those people that really know
what they are talking about.

Things like that. But who knows, you may see a very nice site
on HTML in the near future what would contain thousands of
code examples. If you have any requests, ideas
or suggestions, that would help. If you can recommend a list
of "experts", that'd help. They don't have to be the HTML gods.
They just have to know what they are talking about and be
helpful in the followups.

Also, if you can give me a list of HTML issues that are worth
creating a chapter from, that would help.

Just type one entry per line.
 
R

richard

No problem. You did it manually, and you just viewed the information,
which is exactly what this site is for.


Nope. That's a hassle for the user.
They should be able to move freely around the site
without any passwords, log-ins or any other crap.
But they shold not be able to download the whole thing.

That is all.


Then sell access rights.
Kind of like paying for a ticket to go see a movie at the theatre.
No ticket, no enter.

Many sites don't allow visitors to get to see all their pages unless
they pay for membership. That would stop automated programs from
downloading your stuff.
 
P

Peter J Ross

In alt.html on Sat, 09 Aug 2008 14:44:52 -0700, richard
Then sell access rights.
Kind of like paying for a ticket to go see a movie at the theatre.
^^^^^
No ticket, no enter.

Many sites don't allow visitors to get to see all their pages unless
they pay for membership. That would stop automated programs from
downloading your stuff.

You misspelled "movy", Bullis. HTH.

[xposted, f-up set. Nothing to see here!]
 
R

Raymond SCHMIT

No problem. You did it manually, and you just viewed the information,
which is exactly what this site is for.


Nope. That's a hassle for the user.
They should be able to move freely around the site
without any passwords, log-ins or any other crap.
But they shold not be able to download the whole thing.

That is all.

You should post on your site some complete documents, some documents
with only the first paragrah and a list of all the document's titles.
And a mail adress(obfuscated) where the interested people can ask you
to furnish them max-10 documents per month after paying first 3
dollars per pack sent.
 
N

Nad

You should post on your site some complete documents, some documents
with only the first paragrah and a list of all the document's titles.
And a mail adress(obfuscated) where the interested people can ask you
to furnish them max-10 documents per month after paying first 3
dollars per pack sent.

Well, I don't particularly like this idea.
This is technical information and people are extremely busy
with all sorts of other things. Plus, there is all sorts of
information on the Internet that they can find even though
it is not going to be as easy as on this site. But they don't
even know it. So, they'll just go back to search engine and
look elsewhere. People do not like the hassles. I, personally,
whenever I see some site with all these conditions attached,
simply go back and look elsewhere, even though I have to spend
hours to find something. Sure, on this site, I could find what
I need in minutes. Not only that, but I could learn quite a
lot while traversing it.

What I am doing here right now is looking at different options.
Most people say that with the limitations you placed, it can
not be done, and I knew that already. I was just interested
in investigating all the ins and outs of it before I make a
decision on what is the best way to go about it.

So, lets put it this way:
We have some options on doing it using cgi, php, ssi executable
includes (on each page) that can run some script, and, finally,
Javascript.

Can I ask you this:
If it is done with cgi, what is the exact code that would do
what is needed?

Again, the specification is to count the amount of page hits
from the same IP during certain period of time. What is needed
on the server site to count the accesses to ANY page?
The alternative is to have the ssi includes on each page
that run some script.

If it is done with php, what exactly the code that would do it?

If it is done with Javascript, what is the code to do it?

If it is done some other way, what is the exact code to do it?

Any feedback would be appreciated.
 
D

Dr J R Stockton

Fri said:
I have a very large site with valuable information.
Is there any way to prevent downloading a large number
of articles. Some people want to download the entire site.

Any hints or pointers would be appreciated.

If you have a well-crafted index.htm page, and a robots.txt file that
allows robot access only to that page, then it seems likely that the
proportion of access from those whose searches have found something that
might have been of interest but was not will be significantly reduced.
Certainly using such a robots.txt works for me, to reduce total
download.

Keep page sizes down, so that a page access which turned out to be
uninteresting or only partly interesting does not cost you so many
bytes.

Omit inessential figures from the text pages, link to them instead, so
that a click is needed and will open a new tab or window. Maybe do
similar with tables.

Check how the access is counted. If a page in plain HTML requires 50 kB
but can be compressed to 25 kB, is it delivered compressed and is it
counted as 25 or 50 kB?

Consider zipping material, as a possible means of deterring mere
passers-by. Consider compressing material in a manner less easy of
access - zip with password or a rarer compressing tool. Consider
encoding material by writing not in English but, say, in German. You
can always if necessary rephrase your German so that translate tools
make reasonable sense of it.

Don't expect any of these to prevent all downloading of the whole site;
they are merely ways likely to reduce downloading by those who don't
need the material.
 
B

Bergamot

Nad said:
They should be able to move freely around the site
without any passwords, log-ins or any other crap.
But they shold not be able to download the whole thing.

Have you looked at your server logs to find out how the whole site is
being downloaded? Add those UAs to your robots.txt file. That should
stop things like wget.
 
C

Chaddy2222

Nad said:
Well, I'd like to know about those servers. I'll review the
posts. Do you know of any off hand?

But I suspect there is a limit on the site size, isn't there?
The sites I'll be creating are in the range of 150-250 megs.
The one I am using now has no limit on size.
Their is NO SUCH THING as Unlimited, especially when it come to web
servers. The best you can get is a VPS or dedicated server.
Check out http://www.servergrade.com.au
They are not free but they do have very good deals on web hosting and
domain names. Especially if you buy your Domain and web hosting as the
one package, then the domain only costs $1 per year!

Btw, I was thinking if it is worth creating a site on HTML,
using this group's archive, going back a couple of years.
No it is not at all. Also if you create sites useing content from
other sites, you can be sued for being in breach of copyright laws in
many / most countries of the world, also Google will ban you from it's
listings for having duplicated content.
 
D

David Segall

If it is done with php, what exactly the code that would do it?

If it is done some other way, what is the exact code to do it?
As you have explained in a separate thread you want to obtain the
information on your site free by downloading the content of this and
other newsgroups. You even want to obtain free advice on how to edit
this content. You expect someone to provide, without charge, the
"exact code" to prevent others from downloading the valuable
information that you have obtained without paying for it. Of course,
all this should run on a server you don't pay for. I'm not surprised
that you are paranoid about someone stealing your site.

You have answered your own question. If you make the content public it
is not possible to prevent downloading it. It is relatively easy, and
quite common, to write a program that emulates a human downloading the
contents of a site. As you point out, if you charge for the
information, then your audience will look for the information
elsewhere and find it from the same sources that you used for your
site.

Live with it. Someone will "build on" your published work just as you
have "built on" other people's work. You can use copyright laws to
protect the exact content of your site and you can use your skills in
maintaining and updating your content to make sure that your site is
more attractive than theirs.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top