pagecrawling websites with Python

W

writeson

Hi all,

We've got an application we wrote in Python called pagecrawler that
generates a list of URL's based on sql queries. It then runs through
this list of URL's 'browsing' one of our staging servers for all those
URL's. We do this to build the site dynamically, but each page
generated by the URL is saved as a static HTML file. Anyway, the
pagecrawler program uses Python threads to try and build the pages as
fast as it can. The list of URL's is stored in a queue and the thread
objects get URL's from the queue and run them till the queue is empty.
This works okay but it still seems to take a long time to build the
site this way, even though the actual pages only take milliseconds to
run (the pages are generated with PHP on separate server). Does anyone
have any insight if this is a reasonable approach to build web pages,
or if we should look at another design?

Thanks in advance,
Doug
 
W

writeson

Swaroop,

Thanks for the reply, I'll take a look at HarvestMan and see if we can
use it directly, or get some ideas from the source code. :)

Doug
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top