R
Roedy Green
I used a thread pool to speed up the screenscraping I use to find out
which bookstores carry which books. Then I discovered some bookstores
sometimes were returning 403 forbidden codes. I think they do this if
you have more than one request outstanding from a given IP. I later
discovered that Xenu link checker was getting 403 codes that
BrokenLinks (which does one probe at a time) was finding were 200
(ok).
So I think screenscraping/link checking etc code needs some mechanism
to optionally avoid hitting a site with more than one request at a
time or perhaps even with a pause of X seconds between requests.
It might do that with an explicit Semaphore, ordering the requests to
increased distance between probes to the same site, reducing the pool
size... ??
--
Roedy Green Canadian Mind Products
http://mindprod.com
For me, the appeal of computer programming is that
even though I am quite a klutz,
I can still produce something, in a sense
perfect, because the computer gives me as many
chances as I please to get it right.
which bookstores carry which books. Then I discovered some bookstores
sometimes were returning 403 forbidden codes. I think they do this if
you have more than one request outstanding from a given IP. I later
discovered that Xenu link checker was getting 403 codes that
BrokenLinks (which does one probe at a time) was finding were 200
(ok).
So I think screenscraping/link checking etc code needs some mechanism
to optionally avoid hitting a site with more than one request at a
time or perhaps even with a pause of X seconds between requests.
It might do that with an explicit Semaphore, ordering the requests to
increased distance between probes to the same site, reducing the pool
size... ??
--
Roedy Green Canadian Mind Products
http://mindprod.com
For me, the appeal of computer programming is that
even though I am quite a klutz,
I can still produce something, in a sense
perfect, because the computer gives me as many
chances as I please to get it right.