pagecrawling websites with Python

writeson · Apr 1, 2005

Hi all,

We've got an application we wrote in Python called pagecrawler that
generates a list of URL's based on sql queries. It then runs through
this list of URL's 'browsing' one of our staging servers for all those
URL's. We do this to build the site dynamically, but each page
generated by the URL is saved as a static HTML file. Anyway, the
pagecrawler program uses Python threads to try and build the pages as
fast as it can. The list of URL's is stored in a queue and the thread
objects get URL's from the queue and run them till the queue is empty.
This works okay but it still seems to take a long time to build the
site this way, even though the actual pages only take milliseconds to
run (the pages are generated with PHP on separate server). Does anyone
have any insight if this is a reasonable approach to build web pages,
or if we should look at another design?

Thanks in advance,
Doug

Swaroop C H · Apr 2, 2005

We've got an application we wrote in Python called pagecrawler that

Does anyone have any insight if this is a reasonable approach to build web pages,
or if we should look at another design?

I don't have an answer to your particular question, but maybe you can
have a look at how the HarvestMan works:

http://freshmeat.net/projects/harvestman

Regards,

writeson · Apr 5, 2005

Swaroop,

Thanks for the reply, I'll take a look at HarvestMan and see if we can
use it directly, or get some ideas from the source code.

Doug

Want to host websites that I will probably be the only user from home. Sacrilege, I know, but it has always been a dream of mine. Where do I start?	2	Aug 13, 2024
First steps in setting up VSCode to work with Python.	2	Mar 13, 2023
Python discord bot problem	1	Jan 11, 2023
Executing a second python file with one of several options at a time	0	Nov 6, 2025
Web scraping i guess (Yet to start, maybe this should be done in python?)	1	Nov 10, 2021
Search nested folders with specific names in python	0	Sep 23, 2022
Search Results with Pagination	1	Oct 25, 2024
Kids Creating Websites	0	Oct 24, 2010

pagecrawling websites with Python

writeson

Swaroop C H

writeson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads