asp innerText?

G

Giles

in DHTML, body.innerText nicely strips out the raw textual contents of a
formatted page. Is there a straighforwards way to do this with a server-side
ASP function (e.g. on a string containing the HTML) ? It is to fill a
database field used for a simple search routine.
I don't have permission on this server to use 3rd party components, it's
plain IIS6.
Thanks.
Giles
 
B

Bob Barrows [MVP]

Giles said:
in DHTML, body.innerText nicely strips out the raw textual contents
of a formatted page. Is there a straighforwards way to do this with a
server-side ASP function (e.g. on a string containing the HTML) ? It
is to fill a database field used for a simple search routine.
I don't have permission on this server to use 3rd party components,
it's plain IIS6.

Use a Regular Expression.
Bob Barrows
 
G

Giles

from Bob Barrows [MVP]
Use a Regular Expression.
Bob Barrows

RegExp is a black art to me! Off the top of the head,
delete from "<head" to "/head>"
delete from "<style" to "/style>" (in case not in head)
delete from "<script" to "/script>" (in case not in head)
replace anything in chevrons with nothing.
replace line-breaks with spaces
replace multiple spaces with single spaces
replace HTML entities with literals
Does that sound about right?
thanks, Giles
 
B

Bob Barrows [MVP]

Giles said:
from Bob Barrows [MVP]

RegExp is a black art to me!
Somewhat to me as well ...
A couple people in this group (Chris Hohmann comes to mind) have it down
pretty well. There are some websites out there that provide libraries of
regular expression patterns.
Off the top of the head,
delete from "<head" to "/head>"
delete from "<style" to "/style>" (in case not in head)
delete from "<script" to "/script>" (in case not in head)
replace anything in chevrons with nothing.
replace line-breaks with spaces
replace multiple spaces with single spaces
replace HTML entities with literals
Does that sound about right?

I guess so, but why are you leaving the closing and opening brackets?
 
J

Justin Piper

Giles said:
in DHTML, body.innerText nicely strips out the raw textual contents of a
formatted page. Is there a straighforwards way to do this with a server-side
ASP function (e.g. on a string containing the HTML) ? It is to fill a
database field used for a simple search routine.

If you can, you might consider using the Indexing Services instead of
rolling your own search routine.

http://www.codeproject.com/asp/indexserver.asp

If that's not an option, you should be able to use Internet Explorer
from an ASP.

<% Option Explicit

Dim ie: Set ie = CreateObject("InternetExplorer.Application")
ie.Navigate "about:blank"

Dim doc: Set doc = ie.Document
doc.open
doc.writeln "<dl>"
doc.writeln "<dt>em</dt>"
doc.writeln "<dd>Indicates <em>emphasis</em></dd>"
doc.writeln "<dt>strong</dt>"
doc.writeln "<dd>Indicates <strong>stronger emphasis</strong></dd>"
doc.writeln "</dl>"
doc.close

Response.ContentType = "text/plain"
Response.Write doc.documentElement.InnerText
%>
 
T

Tom Kaminski [MVP]

Giles said:
in DHTML, body.innerText nicely strips out the raw textual contents of a
formatted page. Is there a straighforwards way to do this with a
server-side ASP function (e.g. on a string containing the HTML) ? It is to
fill a database field used for a simple search routine.
I don't have permission on this server to use 3rd party components, it's
plain IIS6.

With ASP you have complete control over the content of the page before it
gets written so it's not clear to me why you would need to do this ...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top