Searching Content

S

sql

I am developing a Content Management System in ASP .NET with a SQL
Server database. The main content is stored as HTML (still not sure if
this is the way to go yet!) so that it is displayed on the page with
formatting, which will be specified by the user when adding the content
to a page. I am working on a search facility, which searches the main
content against the search word/s entered. The problem is that the
content contains HTML. It is possible that the HTML itself could
contain the word/s searched on by the user. I have 2 posiblilities
here, none of which i like, which are as follows:-

1. Store the html and a plain text version of the content, and search
the plain text version
2. Strip the html in the VB .NET code then search the string again to
see if the word/s are still present

Anyone got any ideas on how this should be done?
 
G

Gaurav Vaish \(www.EdujiniOnline.com\)

formatting, which will be specified by the user when adding the content
to a page. I am working on a search facility, which searches the main
content against the search word/s entered. The problem is that the
content contains HTML. It is possible that the HTML itself could

Another option:

1. Turn on Indexing Engine
2. Don't keep the files in SQL-DB. But in file-system. Raw html
3. Create a catalog for the folder
4. Create a linked server in SQL-Server mapped to the indexing engine to get
a view
5. Apply full-text search on the view!

Want the code? ... eh! :)


--
Happy Hacking,
Gaurav Vaish | www.mastergaurav.com
www.edujinionline.com
http://eduzine.edujinionline.com
-----------------------------------------
 
S

sql

Gaurav,

Thanks for the reply. I just want to clarify what I have done here. The
website has just 1 page, default.aspx, which is totally data driven.
The menus and the content are built on page load event by passing a
page id to the database. It is the main content which is stored as HTML
so that when it is displayed it has the required formatting for the
display (i.e Bold text, Italics, bullet points etc.)

So basically none of the pages in the website actually exist as HTML
pages on the file server on the IIS box. Hope this makes sense.

Thanks.
 
G

Gaurav Vaish \(www.EdujiniOnline.com\)

page id to the database. It is the main content which is stored as HTML
so that when it is displayed it has the required formatting for the
display (i.e Bold text, Italics, bullet points etc.)

Assuming that none of the dynamic-HTML content contains any form-elements
that may submit the main form (the [runat='server'] form) of the page, if
any, because that may break ASP.Net processing.

Also, I am assuming that what you want to do is grab the HTML from the
appropriate row for which the searched matched and display the contents.

Since you would not be looking at 'WHERE' or 'LIKE' match but free-flow-text
match, I would suggest one of these options:

1. Microsoft Indexing Service to index your files. Instead of putting the
HTML-content in db, put them in files and then let MIS do the job. The
results are fairly good. We have been using it for our internal purposes...
basically to test our KM product.

2. Buy Google MiniSearch. Let it index the documents. You query it using
APIs (Web Service enabled). You trust Google? I do... at least for search.
Do look at the cost figures... MiniSearch can index upto around 1million
documents of any type (you'd specifically be interested in HTML contents
only) but costs around $x,000 (don't recall if x = 2 or x = 5 :D).


If you are looking to scale up your operations and have control over hosting
environment, my personal recommendation would be Google MiniSearch since as
the size of repository grows, MIS tends to get very slower. (size >=
100-200k documents; well, we have a mix of text, html, Office [doc, ppt,
xls] etc documents).


Hope that helps!


--
Happy Hacking,
Gaurav Vaish | www.mastergaurav.com
www.edujinionline.com
http://eduzine.edujinionline.com
-----------------------------------------
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top