Java Design For A News Filter

B

Ben Jessel

I am doing the technical design for a news syndication system that:

1) Reads news feeds ( xml-rss ) from user defined sources.
2) Filters out the news feeds based on applying user defined search
expressions in the subject, and body xml portions.
3) Stores this in a database so that people can view the filtered
news.

I've had a look at the options:

1) Write the whole thing from scratch; devise an algorithm for text
searching. This would have to deal with logic ( i.e "must match Java
AND Programmer but not Coffee" OR "must match java AND UML" ) and
possible regular expressions ( can be dropped out of scope ).

Advantages
Totally meets requirements.

Disadvantage
Complex coding.
Time intensive

2) Use XPath - this would involve stylesheets to be created
on-the-fly, which has the appropriate logic. Some translation between
XPath's search and what the user enters may be required.

Advantages
Less Flexible

Disadvantages
May not be flexible enough ( could you do "must match Java AND
Programmer but not Coffee" OR "must match java AND UML" in XPath ).

3) Save the whole lot to the database and use database Full Text
Retrieval.

Advantages
Simple And Easy

Disadvantages
May be slow.
But of a hacky workaround.
Databases are not Search engines!


I'd really appreciate some comments as going down the wrong route
could be a world of pain!

Thanks,

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top