Is it possible to get image size before/without downloading?

A

aldonnelley

Hi there: a bit of a left-field question, I think.
I'm writing a program that analyses image files downloaded with a basic
crawler, and it's slow, mainly because I only want to analyse files
within a certain size range, and I'm having to download all the files
on the page, open them, get their size, and then only analyse the ones
that are in that size range.
Is there a way (in python, of course!) to get the size of images before
or without downloading them? I've checked around, and I can't seem to
find anything promising...

Anybody got any clues?

Cheers, Al.
 
J

Josiah Manson

In the head of an HTTP response, most servers will specify a
Content-Length that is the number of bytes in the body of the response.
Normally, when using the GET method, the header is returned with the
body following. It is possible to make a HEAD request to the server
that will only return header information that will hopefully tell you
the file size.

If you want to know the actual dimensions of the image, I don't know of
anything in HTTP that will tell you. You will probably just have to
download the image to find that out. Relevant HTTP specs below if you
care.

http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

The above is true regardless of language. In python it appears there an
httplib module. I would call request using the method head.

http://docs.python.org/lib/httpconnection-objects.html
 
A

aldonnelley

Thanks Josiah

I thought as much... Still, it'll help me immensely to cut the
downloads from a page to only those that are within a file-size range,
even if this gets me some images that are out-of-spec dimensionally.

Cheers, Al.

(Oh, and if anyone still has a bright idea about how to get image
dimensions without downloading, it'd be great to hear!)
 
P

Peter Otten

Hi there: a bit of a left-field question, I think.
I'm writing a program that analyses image files downloaded with a basic
crawler, and it's slow, mainly because I only want to analyse files
within a certain size range, and I'm having to download all the files
on the page, open them, get their size, and then only analyse the ones
that are in that size range.
Is there a way (in python, of course!) to get the size of images before
or without downloading them? I've checked around, and I can't seem to
find anything promising...

Anybody got any clues?

The PIL can determine the size of an image from some "large enough" chunk at
the beginning of the image, e. g:

import Image
import urllib
from StringIO import StringIO

f = urllib.urlopen("http://www.python.org/images/success/nasa.jpg")
s = StringIO(f.read(512))
print Image.open(s).size

Peter
 
M

Marc 'BlackJack' Rintsch

aldonnelley said:
(Oh, and if anyone still has a bright idea about how to get image
dimensions without downloading, it'd be great to hear!)

Most image formats have some sort of header with the dimensions
information so it's enough to download this header. Depends on the image
format how much of the file has to be read and how the information is
encoded.

Ciao,
Marc 'BlackJack' Rintsch
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top