indexing web pages - in python?

D

Dan Stromberg

Are there any open source search engines written in python for indexing a
given collection of (internal only) html pages? Right now I'm talking
about dozens, but hopefully it'll be hundreds or thousands at some point.

I'm thinking some sort of CGI script, with perhaps a cron job that updates
the indexes.

I'm not particularly looking for something that has a full RDBMS behind
it - just a file that stores indexes. I'll go with an RDBMS-based
solution if I must, but I don't think that's really needed at this point.

TIA
 
K

Kevin T. Ryan

Are there any open source search engines written in python for indexing a
given collection of (internal only) html pages? Right now I'm talking
about dozens, but hopefully it'll be hundreds or thousands at some point.

I'm thinking some sort of CGI script, with perhaps a cron job that updates
the indexes.

I'm not particularly looking for something that has a full RDBMS behind
it - just a file that stores indexes. I'll go with an RDBMS-based
solution if I must, but I don't think that's really needed at this point.

TIA

You could try:

http://gnosis.cx/download/indexer.py

There is an extensive write-up by the author at:

http://gnosis.cx/publish/programming/charming_python_15.txt

Might be something you'd be interested in ...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,114
Latest member
GlucoPremiumReview
Top