HTTPConncetion - HEAD request

gervaz · Jun 16, 2011

Hi all, can someone tell me why the read() function in the following
py3 code returns b''?
b''

Thanks,

Mattia

Ian Kelly · Jun 17, 2011

Hi all, can someone tell me why the read() function in the following
py3 code returns b''?

b''

You mean why does it return an empty byte sequence? Because the HEAD
method only requests the response headers, not the body, so the body
is empty. If you want to see the response body, use GET.

Cheers,
Ian

gervaz · Jun 17, 2011

You mean why does it return an empty byte sequence? Because the HEAD
method only requests the response headers, not the body, so the body
is empty. If you want to see the response body, use GET.

Cheers,
Ian

The fact is that I have a list of urls and I wanted to retrieve the
minimum necessary information in order to understand if the link is a
valid html page or e.g. a picture or something else. As far as I
understood here http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
the HEAD command is the one that let you do this. But it seems it
doesn't work.

Any help?

Mattia

Chris Angelico · Jun 17, 2011

The fact is that I have a list of urls and I wanted to retrieve the
minimum necessary information in order to understand if the link is a
valid html page or e.g. a picture or something else. As far as I
understood here http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
the HEAD command is the one that let you do this. But it seems it
doesn't work.

It's not working because of a few issues.

Twitter doesn't accept requests that come without a Host: header, so
you'll need to provide that. Also, your "HTTP 1.0" is going as the
body of the request, which is quite unnecessary. What you were getting
was a 301 redirect, as you can confirm thus:
[('Date', 'Fri, 17 Jun 2011 08:31:31 GMT'), ('Server', 'Apache'),
('Location', 'http://twitter.com/'), ('Cache-Control', 'max-age=300'),
('Expires', 'Fri, 17 Jun 2011 08:36:31 GMT'), ('Vary',
'Accept-Encoding'), ('Connection', 'close'), ('Content-Type',
'text/html; charset=iso-8859-1')]

(Note the Location header - the server's asking you to go to
twitter.com by name.)

h.request("HEAD","/",None,{"Host":"twitter.com"})

Now we have a request that the server's prepared to answer:
200

The headers are numerous, so I won't quote them here, but you get a
Content-Length which tells you the size of the page that you would
get, plus a few others that may be of interest. But note that there's
still no body on a HEAD request:
b''

If you want to check validity, the most important part is the code:
404

Twitter might be a bad example for this, though, as the above call
will succeed if there is a user of that name (for instance, replacing
"/aasdfadefa" with "/rosuav" changes the response to a 200). You also
have to contend with the possibility that the server won't allow HEAD
requests at all, in which case just fall back on GET.

But all this isn't certain, even so. There are some misconfigured
servers that actually send a 200 response when a page doesn't exist.
But you can probably ignore those sorts of hassles, and just code to
the standard.

Hope that helps!

Chris Angelico

Adam Tauno Williams · Jun 17, 2011

Hi all, can someone tell me why the read() function in the following
py3 code returns b''
b''

Because there is no body in a HEAD request. What is useful are the
Content-Type, Content-Length, and etag headers.

Is r.getcode() == 200? That indicates a successful response; you
*always* much check the response code before interpreting the response.

Also I'm pretty sure that "HTTP 1.0" is wrong.

gervaz · Jun 17, 2011

Because there is no body in a HEAD request. What is useful are the
Content-Type, Content-Length, and etag headers.

Is r.getcode() == 200? That indicates a successful response; you
*always* much check the response code before interpreting the response.

Also I'm pretty sure that "HTTP 1.0" is wrong.

Ok, thanks for the replies, just another question in order to have a
similar behaviour using a different approach...
I decided to implement this solution:

class HeadRequest(urllib.request.Request):
def get_method(self):
return "HEAD"

Now I download the url using:

r = HeadRequest(url, None, self.headers)
c = urllib.request.urlopen(r)

but I don't know how to retrieve the request status (e.g. 200) as in
the previous examples with a different implementation...

Any suggestion?

Thanks,

Mattia

Elias Fotinis · Jun 19, 2011

I decided to implement this solution:

class HeadRequest(urllib.request.Request):
def get_method(self):
return "HEAD"

Now I download the url using:

r = HeadRequest(url, None, self.headers)
c = urllib.request.urlopen(r)

but I don't know how to retrieve the request status (e.g. 200) as in
the previous examples with a different implementation...

Use c.getcode() to get the response code. When you're testing interactively, you might find printing the headers with "print c.headers" quite handy.

Don't forget to close the response (c.close()) when your script exits its experimental state.

How does a HEAD pointer end up pointing to the first node in a linked list?	3	Jan 24, 2023
Python scoping	5	Jun 20, 2011
Google sheets song request	3	Apr 19, 2022
putchar(8)	4	Oct 16, 2009
urllib2 (py2.6) vs urllib.request (py3)	1	Mar 17, 2009
Request data is empty	0	Nov 29, 2023
Forking PyPI package	9	May 29, 2014
PHP cURL for large content and single HTTP request	1	Feb 23, 2023

HTTPConncetion - HEAD request

gervaz

Ian Kelly

gervaz

Chris Angelico

Adam Tauno Williams

gervaz

Elias Fotinis

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads