D
Dimitrios Apostolou
Hello list,
I want to limit the download speed when using urllib2. In particular,
having several parallel downloads, I want to make sure that their total
speed doesn't exceed a maximum value.
I can't find a simple way to achieve this. After researching a can try
some things but I'm stuck on the details:
1) Can I overload some method in _socket.py to achieve this, and perhaps
make this generic enough to work even with other libraries than urllib2?
2) There is the urllib.urlretrieve() function which accepts a reporthook
parameter. Perhaps I can have reporthook to increment a global counter and
sleep as necessary when a threshold is reached.
However there is not something similar in urllib2. Isn't urllib2 supposed
to be a superset of urllib in functionality? Why there is no reporthook
parameter in any of urllib2's functions?
Moreover, even the existing way reporthook can be used doesn't seem so
right: reporthook(blocknum, bs, size) is always called with bs=8K even
for the last block, and sometimes (blocknum*bs > size) is possible, if the
server sends wrong Content-Lentgth HTTP headers.
3) Perhaps I can use filehandle.read(1024) and manually read as many
chunks of data as I need. However I think this would generally be
inefficient and I'm not sure how it would work because
of internal buffering of urllib2.
So how do you think I can achieve rate limiting in urllib2?
Thanks in advance,
Dimitris
P.S. And something simpler: How can I disallow urllib2 to follow
redirections to foreign hosts?
I want to limit the download speed when using urllib2. In particular,
having several parallel downloads, I want to make sure that their total
speed doesn't exceed a maximum value.
I can't find a simple way to achieve this. After researching a can try
some things but I'm stuck on the details:
1) Can I overload some method in _socket.py to achieve this, and perhaps
make this generic enough to work even with other libraries than urllib2?
2) There is the urllib.urlretrieve() function which accepts a reporthook
parameter. Perhaps I can have reporthook to increment a global counter and
sleep as necessary when a threshold is reached.
However there is not something similar in urllib2. Isn't urllib2 supposed
to be a superset of urllib in functionality? Why there is no reporthook
parameter in any of urllib2's functions?
Moreover, even the existing way reporthook can be used doesn't seem so
right: reporthook(blocknum, bs, size) is always called with bs=8K even
for the last block, and sometimes (blocknum*bs > size) is possible, if the
server sends wrong Content-Lentgth HTTP headers.
3) Perhaps I can use filehandle.read(1024) and manually read as many
chunks of data as I need. However I think this would generally be
inefficient and I'm not sure how it would work because
of internal buffering of urllib2.
So how do you think I can achieve rate limiting in urllib2?
Thanks in advance,
Dimitris
P.S. And something simpler: How can I disallow urllib2 to follow
redirections to foreign hosts?