Python: 404 Error when trying to login a webpage by using 'urllib'and 'HTTPCookieProcessor'

KMeans Algorithm · Jan 12, 2014

I'm trying to log in a webpage by using 'urllib' and this piece of code

---------
import urllib2,urllib,os

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
login = urllib.urlencode({'username':'john', 'password':'foo'})
url = "https://www.mysite.com/loginpage"
req = urllib2.Request(url, login)
try:
resp = urllib2.urlopen(req)
print resp.read()
except urllib2.HTTPError, e:
print "

Error = " + str(e.code)

Chris Angelico · Jan 12, 2014

What am I doing wrong? Thank you very much.

I can't say what's actually wrong, but I have a few ideas for getting
more information out of the system...

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())

You don't do anything with this opener - could you have a cookie problem?

req = urllib2.Request(url, login)

But I get a "404" error (Not Found). The page "https://www.mysite.com/loginpage" does exist (note please the httpS, since I'm not sure if this the key of my problem).

If I try with

Note that adding a data parameter changes the request from a GET to a
POST. I'd normally expect the server to respond 404 to both or
neither, but it's theoretically possible.

It's also possible that you're getting redirected, and that (maybe
because cookies aren't being retained??) the destination is 404. I'm
not familiar with urllib2, but if you get a response object back, you
can call .geturl() on it - no idea how that goes with HTTP errors,
though.

You may want to look at the exception's .reason attribute - might be
more informative than .code.

As a last resort, try firing up Wireshark or something and watch
exactly what gets sent and received. I went looking through the docs
for a "verbose" mode or a "debug" setting but can't find one - that'd
be ideal if it exists, though.

Hope that's of at least some help!

ChrisA

Chris Angelico · Jan 12, 2014

The page "https://www.mysite.com/loginpage" does exist

PS. If it's not an intranet site and the URL isn't secret, it'd help
if we could actually try things out. One of the tricks I like to use
is to access the same page with a different program/library - maybe
wget, or bare telnet, or something like that. Sometimes one succeeds
and another doesn't, and then you dig into the difference (once I
found that a web server failed unless the request headers were in a
particular order - that was a pain to (a) find, and (b) work around!).

ChrisA

xDog Walker · Jan 12, 2014

As a last resort, try firing up Wireshark or something and watch
exactly what gets sent and received. I went looking through the docs
for a "verbose" mode or a "debug" setting but can't find one - that'd
be ideal if it exists, though.

I think you can set debug on httplib before using urllib to get the header
traffic printed. I don't recall exactly how to do it though.

Terry Reedy · Jan 12, 2014

But I get a "404" error (Not Found). The page "https://www.mysite.com/loginpage" does exist

Firefox tells me the same thing. If that is a phony address, you should
have said so.

python: HTTP connections through a proxy server requiring authentication	3	Jan 26, 2013
script to Login a website	8	Jul 31, 2013
Urllib and login	0	Sep 10, 2009
login to https (newbie)	0	Jan 15, 2008
IOError 35 when trying to read the result of call to urllib2.urlopen	2	Sep 9, 2011
python proxy checker ,change to threaded version	8	Dec 7, 2009
How can I upload a tar.bz2 file to OpenStack swift object storage container using the Python swift client?	2	Mar 22, 2024
basic web auth and verification	2	Oct 23, 2007

Python: 404 Error when trying to login a webpage by using 'urllib'and 'HTTPCookieProcessor'

KMeans Algorithm

Chris Angelico

Chris Angelico

xDog Walker

Terry Reedy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads