fetching webpage

yookyung · Dec 30, 2005

I am trying to crawl webpages in citeseer domain (a collection of research
papers mostly in computer science).

I have used the following code snippet.

#####
import urllib

sock = urllib.urlopen("http://citeseer.ist.psu.edu")
webcontent = sock.read().split('\n')
sock.close()
print webcontent
########

Then I get the following error message.

['', '', ' ', '', ' The server encountered an
internal error and was ', ' unable to complete your request.', '', '
', '', ' Error message:', '
<br />', '', '
', '', ' The server encountered an internal error and was ',
' unable to complete your request. Either the server is', ' overloaded
or there was an error in a CGI script.', '', ' ', '',
'', '']

However, the url is valid and it works fine if I open the url in my web
browser.
Or, if I use a different url (http://www.google.com instead of
http://citeseer.ist.psu.edu),
then it works.

What is wrong?
Could it be that the citeseer webserver checks the http request, and it sees
something
that it doesn't like and reject the request?
What should I do?

Thank you.

Best regards,
Yookyung

charlespina · Dec 30, 2005

I went to the URL you posted, and it looks like that error is the
content you should be recieving. Try refreshing your browser cache, you
could be loading a cached page.

Charles

Internal Server Error on checkdnsrr	0	Apr 9, 2021
When using checkdnsrr I get an Internal Server error	1	Apr 9, 2021
[CGI] Basic newbie error or server configuration error?	5	Aug 20, 2012
How to investigate web script not running?	5	Sep 28, 2012
Help-log in to a web page	5	Oct 6, 2005
[2.4.3/Newbie] Web script doesn't run	7	Feb 11, 2013
cgi script	2	Aug 2, 2009
AttributeError: 'module' object has no attribute 'urlopen'	3	Feb 23, 2004

fetching webpage

yookyung

charlespina

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads