Python - why don't this script work?

O

Ohmster

I am trying to use this cool script that some MIT guy wrote and it just
does not work, I get a stream of errors when I try to run it. It is
supposed to visit a URL and snag all of the pictures on the site. Here is
the script:
http://web.mit.edu/pgbovine/www/image-harvester/image-harvester.py

Here is my output when I try to run it on my Fedora 6 machine:

[ohmster@ohmster bench]$ image-harvester.py
http://public.fotki.com/DaGennelman/
/home/ohmster/scripts/image-harvester.py: line 59: from: command not found
[ohmster@ohmster bench]$

The script is to be folowed up with another one to weed out the small
thumbnails and banner images, here is the base URL:
http://web.mit.edu/pgbovine/www/image-harvester/

Line 59 in image-harvester.py reads as follows:

59: from sgmllib import SGMLParser
60: import urllib
70: from urlparse import urlparse, urljoin
71: import re
72: import os


Can anyone tell me what is wrong with this script and why it will not run?
It does not like the command "from", is there such a command in python?
Does this mean that python has the "import" command but not the "from"
command or do we not know this yet as it hangs right away when it hits the
very first word of the script, "from"? Maybe this is not a Linux script or
something? I wonder why it needs the x-server anyway, I tried running it
from an ssh term window and it had a fit about no x-server so now I am
doing this in a gnome term window. This looked so cool too. :(

Please be patient with me, I do not know python at all, I just want for
this script to work and if I see enough working examples of python, I may
just take up study on it, but for right now, I do not know the language.
Total newbie.

Thanks.
 
J

John McMonagle

Ohmster said:
I am trying to use this cool script that some MIT guy wrote and it just
does not work, I get a stream of errors when I try to run it. It is
supposed to visit a URL and snag all of the pictures on the site. Here is
the script:
http://web.mit.edu/pgbovine/www/image-harvester/image-harvester.py

Here is my output when I try to run it on my Fedora 6 machine:

[ohmster@ohmster bench]$ image-harvester.py
http://public.fotki.com/DaGennelman/
/home/ohmster/scripts/image-harvester.py: line 59: from: command not found
[ohmster@ohmster bench]$

The script is to be folowed up with another one to weed out the small
thumbnails and banner images, here is the base URL:
http://web.mit.edu/pgbovine/www/image-harvester/

Line 59 in image-harvester.py reads as follows:

59: from sgmllib import SGMLParser
60: import urllib
70: from urlparse import urlparse, urljoin
71: import re
72: import os# Usage: python image-harvester.py <url-to-harvest>


Can anyone tell me what is wrong with this script and why it will not run?
It does not like the command "from", is there such a command in python?
Does this mean that python has the "import" command but not the "from"
command or do we not know this yet as it hangs right away when it hits the
very first word of the script, "from"? Maybe this is not a Linux script or
something? I wonder why it needs the x-server anyway, I tried running it
from an ssh term window and it had a fit about no x-server so now I am
doing this in a gnome term window. This looked so cool too. :(

Please be patient with me, I do not know python at all, I just want for
this script to work and if I see enough working examples of python, I may
just take up study on it, but for right now, I do not know the language.
Total newbie.

Thanks.

Your linux shell thinks it is running a shell script (from is not a
valid command in bash).

To execute this script with the python interpreter type (from a shell
prompt):

python image-harvester.py http://some.url.whatever/images_page

Read the comments at the beginning of the script and you will discover
all sorts of important usage information.

Regards,

John
 
A

Adam Atlas

I am trying to use this cool script that some MIT guy wrote and it just
does not work, I get a stream of errors when I try to run it. It is
supposed to visit a URL and snag all of the pictures on the site. Here is
the script:http://web.mit.edu/pgbovine/www/image-harvester/image-harvester.py

Here is my output when I try to run it on my Fedora 6 machine:

[ohmster@ohmster bench]$ image-harvester.pyhttp://public.fotki.com/DaGennelman/
/home/ohmster/scripts/image-harvester.py: line 59: from: command not found
[ohmster@ohmster bench]$

The script is to be folowed up with another one to weed out the small
thumbnails and banner images, here is the base URL:http://web.mit.edu/pgbovine/www/image-harvester/

Line 59 in image-harvester.py reads as follows:

59: from sgmllib import SGMLParser
60: import urllib
70: from urlparse import urlparse, urljoin
71: import re
72: import os

Can anyone tell me what is wrong with this script and why it will not run?
It does not like the command "from", is there such a command in python?
Does this mean that python has the "import" command but not the "from"
command or do we not know this yet as it hangs right away when it hits the
very first word of the script, "from"? Maybe this is not a Linux script or
something? I wonder why it needs the x-server anyway, I tried running it
from an ssh term window and it had a fit about no x-server so now I am
doing this in a gnome term window. This looked so cool too. :(

Please be patient with me, I do not know python at all, I just want for
this script to work and if I see enough working examples of python, I may
just take up study on it, but for right now, I do not know the language.
Total newbie.

Thanks.

I think you're executing it as a shell script. Run "python image-
harvester.py", or add "#!/usr/bin/env python" to the top of the file.
 
O

Ohmster

Here is my output when I try to run it on my Fedora 6 machine:

[ohmster@ohmster bench]$ image-harvester.py
http://public.fotki.com/DaGennelman/
/home/ohmster/scripts/image-harvester.py: line 59: from: command not
found [ohmster@ohmster bench]$

The original page for this script is here:
http://web.mit.edu/pgbovine/www/image-harvester.htm

I figured it out, I have to run python I think first then the script and
the URL like this:
$ python image-harvester.py http://public.fotki.com/DaGennelman/

Now that actually seems to be doing something and it sure is busy now. It
is making a lot of little subdirectories in my test directory. I had to
copy image-harvester.py to the test directory first, then run python and
image-harvester.py w/URL and it is going to town. Tons of subfolders, so
far not images yet but it is not done. At least it is doing something now
and not bitching and hanging. I guess I had to call up python and pass it
to the script as the script does not seem to pull up python on it's own. So
far I have 60 directories and about 45 robots.txt but no jpg files yet. I
will let you know what happens.

Feel free to jump right in with your input on how this should or won't work
and what can be done to make it better. I have all of my scripts in a
$HOME/scripts/ directory and it is in my path but running this from another
directory does not work if image-harvester.py is not in the harvest
directory where I run the script from. I can right click on the image and
save it but the amazing script trips all over itself with these wacky file
name. I am all ears if someone figures it
 
J

J.O. Aho

Ohmster said:
Here is my output when I try to run it on my Fedora 6 machine:
[ohmster@ohmster bench]$ image-harvester.py
http://public.fotki.com/DaGennelman/
/home/ohmster/scripts/image-harvester.py: line 59: from: command not found

Check line 59 in the python script and you see which command you are missing.
I bet you didn't read what it said on the page
http://web.mit.edu/pgbovine/www/image-harvester.htm

Install the programs that mentioned on the page before you use the python script.
 
O

Ohmster

[snip]
Did you bother reading the comments? If you had, you'd
know that's not how you run it.
When run as directed (and common sense dictates),
it works fine.
[snip]

I figured it out, I have to run python I think first then the script and
the URL like this:

$ python image-harvester.py http://public.fotki.com/DaGennelman/

Now that actually seems to be doing something and it sure is busy now. It
is making a lot of little subdirectories in my test directory. I had to
copy image-harvester.py to the test directory first, then run python and
image-harvester.py w/URL and it is going to town. Tons of subfolders, so
far not images yet but it is not done. At least it is doing something now
and not bitching and hanging. I guess I had to call up python and pass it
to the script as the script does not seem to pull up python on it's own. So
far I have 60 directories and about 45 robots.txt but no jpg files yet. I
will let you know what happens. I think that these images are protected by
script, you never get a valid URL to the imgage file, just referrers, and
numbers and what not. When the image is finally displayed in your browser,
then you can save it but not until then. Pretty good way to stop a
harvester. Is this assumption pretty much correct or is there a way to make
this work? Now that I use python as the first command, I can run it in an
ssh window now and do not require an x-server.

Feel free to jump right in with your input on how this should or won't work
and what can be done to make it better. I have all of my scripts in a
$HOME/scripts/ directory and it is in my path but running this from another
directory does not work if image-harvester.py is not in the harvest
directory where I run the script from. I can right click on the image and
save it but the amazing script trips all over itself with these wacky file
name. I am all ears if someone figures it.
 
O

Ohmster

Your linux shell thinks it is running a shell script (from is not a
valid command in bash).

To execute this script with the python interpreter type (from a shell
prompt):

python image-harvester.py http://some.url.whatever/images_page

Read the comments at the beginning of the script and you will discover
all sorts of important usage information.

Regards,

John

Thanks John.
 
O

Ohmster

Check line 59 in the python script and you see which command you are
missing. I bet you didn't read what it said on the page
http://web.mit.edu/pgbovine/www/image-harvester.htm

Install the programs that mentioned on the page before you use the
python script.

I figured it out, see my other reply. I have to run this command begining
with "python". I still don't get the results I want but I think it is
because the images are protected with script. My other post in this thread
gives the details. If you have more ideas, I am all ears.

Thanks AHO.
 
O

Ohmster

I think you're executing it as a shell script. Run "python image-
harvester.py", or add "#!/usr/bin/env python" to the top of the file.

Hey that is a cool idea, I think I will try it. I found out what is wrong
and did not get the results I want, I think the images are protected with
script. See my other post in this thread for details.

Shoot, the followup might have gone to alt.os.linux. I will repost for you
here.

I figured it out, I have to run python I think first then the script and
the URL like this:

$ python image-harvester.py http://public.fotki.com/DaGennelman/

Now that actually seems to be doing something and it sure is busy now. It
is making a lot of little subdirectories in my test directory. I had to
copy image-harvester.py to the test directory first, then run python and
image-harvester.py w/URL and it is going to town. Tons of subfolders, so
far not images yet but it is not done. At least it is doing something now
and not bitching and hanging. I guess I had to call up python and pass it
to the script as the script does not seem to pull up python on it's own. So
far I have 60 directories and about 45 robots.txt but no jpg files yet. I
will let you know what happens. I think that these images are protected by
script, you never get a valid URL to the imgage file, just referrers, and
numbers and what not. When the image is finally displayed in your browser,
then you can save it but not until then. Pretty good way to stop a
harvester. Is this assumption pretty much correct or is there a way to make
this work? Now that I use python as the first command, I can run it in an
ssh window now and do not require an x-server.

Feel free to jump right in with your input on how this should or won't work
and what can be done to make it better. I have all of my scripts in a
$HOME/scripts/ directory and it is in my path but running this from another
directory does not work if image-harvester.py is not in the harvest
directory where I run the script from. I can right click on the image and
save it but the amazing script trips all over itself with these wacky file
name. I am all ears if someone figures it.
 
C

cokofreedom

Hey that is a cool idea, I think I will try it. I found out what is wrong
and did not get the results I want, I think the images are protected with
script. See my other post in this thread for details.

Shoot, the followup might have gone to alt.os.linux. I will repost for you
here.

I figured it out, I have to run python I think first then the script and
the URL like this:

$ python image-harvester.pyhttp://public.fotki.com/DaGennelman/

Now that actually seems to be doing something and it sure is busy now. It
is making a lot of little subdirectories in my test directory. I had to
copy image-harvester.py to the test directory first, then run python and
image-harvester.py w/URL and it is going to town. Tons of subfolders, so
far not images yet but it is not done. At least it is doing something now
and not bitching and hanging. I guess I had to call up python and pass it
to the script as the script does not seem to pull up python on it's own. So
far I have 60 directories and about 45 robots.txt but no jpg files yet. I
will let you know what happens. I think that these images are protected by
script, you never get a valid URL to the imgage file, just referrers, and
numbers and what not. When the image is finally displayed in your browser,
then you can save it but not until then. Pretty good way to stop a
harvester. Is this assumption pretty much correct or is there a way to make
this work? Now that I use python as the first command, I can run it in an
ssh window now and do not require an x-server.

Feel free to jump right in with your input on how this should or won't work
and what can be done to make it better. I have all of my scripts in a
$HOME/scripts/ directory and it is in my path but running this from another
directory does not work if image-harvester.py is not in the harvest
directory where I run the script from. I can right click on the image and
save it but the amazing script trips all over itself with these wacky file
name. I am all ears if someone figures it.

--
~Ohmster | ohmster /a/t/ ohmster dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.

Do note that the reason you may not see images is that the website
has, '''correctly''', identified your program as an automated bot and
blocked it access to things...
 
O

Ohmster

(e-mail address removed) wrote in @q5g2000prf.googlegroups.com:
Do note that the reason you may not see images is that the website
has, '''correctly''', identified your program as an automated bot and
blocked it access to things...

Probably so, I did not get anything, even with all of that flurry of
activity, the results were 0 images. :(
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top