Automated HTML code extraction and documenting

sirleech · Sep 13, 2005

I have an interesting problem on hand. My goal is to find all pages
within our institution's web domain that contain a certain length of
form posting code , then report a list of all pages that contain the
code.

The reason for searching for the code is that we are looking for all
pages that links to our campus' search engine. Any user who makes a
web page on campus can add the search engine easily with the
appropriate code; in turn it is uncertain the extent of the search
engine implementation. However, we will be changing the way the search
engine is accessed and in turn we must find and correct all instances
of the old code.

The files that lie under our domain are uncentralized; lying on many
different web servers throughout the university, so a query on one web
server will not work.

My thoughts are that we need a HTML spidering software to crawl
throughout the entire domain, while at the same time searching for the
code segment specified and documenting its location.

If anyone has a better solution, or can possibly point me in the
direction that I need to go in, it would be much appreciated.

Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
best practice for documenting a project? pydoc?	0	Aug 12, 2009
HTML Development and Delivery Issue	4	Mar 28, 2013
python fast HTML data extraction library	4	Jul 22, 2009
HCaptcha - How to stop page from refreshing on submit if captcha is not checked/validated	1	Aug 29, 2023
Web pages extraction & Libcurl	4	Nov 9, 2007
I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	1	Jun 4, 2023
Simple html include?	18	Feb 12, 2011

Automated HTML code extraction and documenting

sirleech

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads