R
Rob Meade
Hi all,
Ok - this leads on from speaking to a couple here and in the SQL server
group...
I've an application which allows the user to type in their text into a form,
they add 'happy' tags around their words, the app then replaces these with
the html equivalent and saves it to the database...
Thus far this has been working very well.
I've been asked to add search functionality to the site now, and whilst I've
already made a good start on this, one slight fly in my coding oink-ment
@\) is the fact that when I search through it I have things like :
<b>hello world</b>
My initial search syntax might end something like this:
where PageContent Like '% hello %'
this would run off and try to find all the instances of 'hello' where its a
word in its own right, but as I've seen now - in the example above it
wouldn't find the word because of the first <b> tag.
Aaron (and others) mentioned a few ways to get around this but suggested the
problem was because I have the formatting and data in the same table....
There are currently 100+ pages, so therefore fixing/changing this could be a
bit of a sod, lucky I work closely with the company using this so I am happy
to spend the extra time and page by page if need be change each one to
correct it.
What I am unable to come up with yet is a 'good' way to seperate the
formatting from the text.
Thoughts so far :
2 tables - one with formatted text used only for display - and the second
then only used for searching, the content would be written to both tables at
the time of the page being created and then both again when updated etc.
This would be the 'easiest' way (apart from the 100+ already created), but I
dont 'personally' think its the best approach because of the data
replication.
Another thought was to have a lookup table, this would contain the page id,
and then many rows for each page with the character position of an opening
formatting tag, and the closing character position of a formatting tag, the
type of tag, and the 'detail' for the tag, ie
pageid 1
charpos 10
tagtype 1 (<a href>)
tagdetail <a href="http://www.mydomain.com" title="my domain">
This would enable me to strip the tags from the data table (only one now) -
but then there would be an overhead when putting it all together, and
obviously when saving the page initially or updating it later as it would
have to run through this procedure and try and find them all....
Because of the freedom the user has, ie, its not just 'header' and 'body'
and then I always make the header bold or something there are quite a few
tags that can be used, and some of them with variable data inside, ie the
hyperlinks for web pages or email addresses, I have image and document tags
for a repository for images and documents and so on...
Other than these 2 ideas I cannot at this time think of anything else, I
cant just use css because again I have the hyperlinks etc, and even then
there would need to be 'something' in the data that says "this has to be
bold"...
Anyone got any thoughts/ideas...?
As I said, I'm more than happy to change the 100+ pages and remove the tags
from the text, thus correcting the problem with the search, but I need a way
that will definately work before I even think about climbing that mountain!
Thanks for your time,
Regards
Rob
Ok - this leads on from speaking to a couple here and in the SQL server
group...
I've an application which allows the user to type in their text into a form,
they add 'happy' tags around their words, the app then replaces these with
the html equivalent and saves it to the database...
Thus far this has been working very well.
I've been asked to add search functionality to the site now, and whilst I've
already made a good start on this, one slight fly in my coding oink-ment
<b>hello world</b>
My initial search syntax might end something like this:
where PageContent Like '% hello %'
this would run off and try to find all the instances of 'hello' where its a
word in its own right, but as I've seen now - in the example above it
wouldn't find the word because of the first <b> tag.
Aaron (and others) mentioned a few ways to get around this but suggested the
problem was because I have the formatting and data in the same table....
There are currently 100+ pages, so therefore fixing/changing this could be a
bit of a sod, lucky I work closely with the company using this so I am happy
to spend the extra time and page by page if need be change each one to
correct it.
What I am unable to come up with yet is a 'good' way to seperate the
formatting from the text.
Thoughts so far :
2 tables - one with formatted text used only for display - and the second
then only used for searching, the content would be written to both tables at
the time of the page being created and then both again when updated etc.
This would be the 'easiest' way (apart from the 100+ already created), but I
dont 'personally' think its the best approach because of the data
replication.
Another thought was to have a lookup table, this would contain the page id,
and then many rows for each page with the character position of an opening
formatting tag, and the closing character position of a formatting tag, the
type of tag, and the 'detail' for the tag, ie
pageid 1
charpos 10
tagtype 1 (<a href>)
tagdetail <a href="http://www.mydomain.com" title="my domain">
This would enable me to strip the tags from the data table (only one now) -
but then there would be an overhead when putting it all together, and
obviously when saving the page initially or updating it later as it would
have to run through this procedure and try and find them all....
Because of the freedom the user has, ie, its not just 'header' and 'body'
and then I always make the header bold or something there are quite a few
tags that can be used, and some of them with variable data inside, ie the
hyperlinks for web pages or email addresses, I have image and document tags
for a repository for images and documents and so on...
Other than these 2 ideas I cannot at this time think of anything else, I
cant just use css because again I have the hyperlinks etc, and even then
there would need to be 'something' in the data that says "this has to be
bold"...
Anyone got any thoughts/ideas...?
As I said, I'm more than happy to change the 100+ pages and remove the tags
from the text, thus correcting the problem with the search, but I need a way
that will definately work before I even think about climbing that mountain!
Thanks for your time,
Regards
Rob