obtaining permission


L

Larry Gates

I made a recent original post and received the following response from a
clp.misc regular:

Larry said:
use strict;
use warnings;
use LWP::Simple;

# load the complete content of the url in question
# via LWP::Simple::get(...)

my $t = get 'http://www.fourmilab.ch/cgi-bin/
Yoursky?z=1&lat=35.0836&ns=North&lon=106.651&ew=West';

print "t is $t";

# perl scraper2.pl

C:\MinGW\source>perl scraper2.pl
Use of uninitialized value in concatenation (.) or string at scraper2.pl
line 14
.
t is
C:\MinGW\source>

I would have expected $t to have the whole page. What gives?

I don't know. It works for me.

However, the site disallows via robots.txt automatic access to their
cgi-bin, so whatever you are attempting to do, you'd better stop doing it.

#end excerpt

I thought I would be clever and avoid such issues by chooosing the most
small c catholic thing I stumbled upon and ambled instead on Tad's
response:

However, what you appear to be doing violates Google's ToS:

http://groups.google.com/intl/en/googlegroups/terms_of_service3.html

you agree that when using the Service, you will not:
...
use any robot, spider, site search/retrieval application, or other
device to retrieve or index any portion of the Service or collect
information about users for any unauthorized purpose;

# end second excerpt

First of all, I don't think of extensions of my keyboard as a robot.
Robots don't consist of 2 sprained hands and no spares.

Secondly, I send these sites many fewer keystrokes with perl than I do with
my browser.

Thirdly, I've got all the time in the world to obtain explicit legal
permission to do what I want with either of these entities.

How do I do this?
--
larry gates

Any false value is gonna be fairly boring in Perl, mathematicians
notwithstanding.
-- Larry Wall in <[email protected]>
 
Ad

Advertisements

T

Tim Greer

Larry said:
I made a recent original post and received the following response from
a clp.misc regular:



I don't know. It works for me.

However, the site disallows via robots.txt automatic access to their
cgi-bin, so whatever you are attempting to do, you'd better stop doing
it.

#end excerpt

I thought I would be clever and avoid such issues by chooosing the
most small c catholic thing I stumbled upon and ambled instead on
Tad's response:

However, what you appear to be doing violates Google's ToS:

http://groups.google.com/intl/en/googlegroups/terms_of_service3.html

you agree that when using the Service, you will not:
...
use any robot, spider, site search/retrieval application, or other
device to retrieve or index any portion of the Service or collect
information about users for any unauthorized purpose;

# end second excerpt

First of all, I don't think of extensions of my keyboard as a robot.
Robots don't consist of 2 sprained hands and no spares.

Secondly, I send these sites many fewer keystrokes with perl than I do
with my browser.

Thirdly, I've got all the time in the world to obtain explicit legal
permission to do what I want with either of these entities.

How do I do this?

If you automate something to obtain data from a remote site
infrequently, you are likely fine, if it's just a replacement for using
your browser and saving a file. If you plan to do it frequently or
have a site that enacts the connection/download each time someone
visits a script/area of your site, then it could be (or will be) more
frequent, even creating a load on the remote site. So, ask that site
admin or webmaster contact. I don't understand the question about how
you get permission, other than to suggest you contact the appropriate
contact person at said site/company and ask. Another issue is if they
block any non common user agents in an effort to prevent automated
scripts. There could be a number of reasons beyond that about why they
forbid it on their site.
 
Ad

Advertisements

L

Larry Gates

If you automate something to obtain data from a remote site
infrequently, you are likely fine, if it's just a replacement for using
your browser and saving a file. If you plan to do it frequently or
have a site that enacts the connection/download each time someone
visits a script/area of your site, then it could be (or will be) more
frequent, even creating a load on the remote site. So, ask that site
admin or webmaster contact. I don't understand the question about how
you get permission, other than to suggest you contact the appropriate
contact person at said site/company and ask. Another issue is if they
block any non common user agents in an effort to prevent automated
scripts. There could be a number of reasons beyond that about why they
forbid it on their site.

I think the prohibition has much more to do with sending the page a lot of
information and gumming up the works.

I found a link to the webmaster of the fourmilab site and said as much.
We'll see what he says.
--
larry gates

There are still some other things to do, so don't think if I didn't fix
your favorite bug that your bug report is in the bit bucket. (It may be,
but don't think it. :) Larry Wall in <[email protected]>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top