HTTPConncetion - HEAD request

G

gervaz

Hi all, can someone tell me why the read() function in the following
py3 code returns b''?
b''

Thanks,

Mattia
 
I

Ian Kelly

Hi all, can someone tell me why the read() function in the following
py3 code returns b''?

b''

You mean why does it return an empty byte sequence? Because the HEAD
method only requests the response headers, not the body, so the body
is empty. If you want to see the response body, use GET.

Cheers,
Ian
 
G

gervaz

You mean why does it return an empty byte sequence?  Because the HEAD
method only requests the response headers, not the body, so the body
is empty.  If you want to see the response body, use GET.

Cheers,
Ian

The fact is that I have a list of urls and I wanted to retrieve the
minimum necessary information in order to understand if the link is a
valid html page or e.g. a picture or something else. As far as I
understood here http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
the HEAD command is the one that let you do this. But it seems it
doesn't work.

Any help?

Mattia
 
C

Chris Angelico

The fact is that I have a list of urls and I wanted to retrieve the
minimum necessary information in order to understand if the link is a
valid html page or e.g. a picture or something else. As far as I
understood here http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
the HEAD command is the one that let you do this. But it seems it
doesn't work.

It's not working because of a few issues.

Twitter doesn't accept requests that come without a Host: header, so
you'll need to provide that. Also, your "HTTP 1.0" is going as the
body of the request, which is quite unnecessary. What you were getting
was a 301 redirect, as you can confirm thus:
[('Date', 'Fri, 17 Jun 2011 08:31:31 GMT'), ('Server', 'Apache'),
('Location', 'http://twitter.com/'), ('Cache-Control', 'max-age=300'),
('Expires', 'Fri, 17 Jun 2011 08:36:31 GMT'), ('Vary',
'Accept-Encoding'), ('Connection', 'close'), ('Content-Type',
'text/html; charset=iso-8859-1')]

(Note the Location header - the server's asking you to go to
twitter.com by name.)

h.request("HEAD","/",None,{"Host":"twitter.com"})

Now we have a request that the server's prepared to answer:
200

The headers are numerous, so I won't quote them here, but you get a
Content-Length which tells you the size of the page that you would
get, plus a few others that may be of interest. But note that there's
still no body on a HEAD request:
b''

If you want to check validity, the most important part is the code:
404

Twitter might be a bad example for this, though, as the above call
will succeed if there is a user of that name (for instance, replacing
"/aasdfadefa" with "/rosuav" changes the response to a 200). You also
have to contend with the possibility that the server won't allow HEAD
requests at all, in which case just fall back on GET.

But all this isn't certain, even so. There are some misconfigured
servers that actually send a 200 response when a page doesn't exist.
But you can probably ignore those sorts of hassles, and just code to
the standard.

Hope that helps!

Chris Angelico
 
A

Adam Tauno Williams

Hi all, can someone tell me why the read() function in the following
py3 code returns b''
b''

Because there is no body in a HEAD request. What is useful are the
Content-Type, Content-Length, and etag headers.

Is r.getcode() == 200? That indicates a successful response; you
*always* much check the response code before interpreting the response.

Also I'm pretty sure that "HTTP 1.0" is wrong.
 
G

gervaz

Because there is no body in a HEAD request.  What is useful are the
Content-Type, Content-Length, and etag headers.

Is r.getcode() == 200?  That indicates a successful response; you
*always* much check the response code before interpreting the response.

Also I'm pretty sure that "HTTP 1.0" is wrong.

Ok, thanks for the replies, just another question in order to have a
similar behaviour using a different approach...
I decided to implement this solution:

class HeadRequest(urllib.request.Request):
def get_method(self):
return "HEAD"

Now I download the url using:

r = HeadRequest(url, None, self.headers)
c = urllib.request.urlopen(r)

but I don't know how to retrieve the request status (e.g. 200) as in
the previous examples with a different implementation...

Any suggestion?

Thanks,

Mattia
 
E

Elias Fotinis

I decided to implement this solution:

class HeadRequest(urllib.request.Request):
def get_method(self):
return "HEAD"

Now I download the url using:

r = HeadRequest(url, None, self.headers)
c = urllib.request.urlopen(r)

but I don't know how to retrieve the request status (e.g. 200) as in
the previous examples with a different implementation...

Use c.getcode() to get the response code. When you're testing interactively, you might find printing the headers with "print c.headers" quite handy.

Don't forget to close the response (c.close()) when your script exits its experimental state.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top