Finding K most common words from a collection of Documents.

Joined
Feb 19, 2010
Messages
1
Reaction score
0
Hi ...

I am new member here ....

I have a done an assignment on Text Analysis. In the 1st phase of assignment (on text analysis) I have extracted text from multiple web pages and applied some rules Latent Semantic analysis. I have removed prepositions, articles, brackets etc, stoping words. Then I applied steming algorithm (Porter Algorithm) to remove suffixes. Then I stored remaining words in a seperate Hash Table for each document. Also I have collected their frequencies.

Now In 2nd phase I have to find the list of most common words in all the documents. Most common doesn't refer to most occuring words, mind you. means to say if a document contains "sick" and in this document or any other contains "ill" then it's also updated in common words list. the words will not be given by user. The existing words will be used to search . Any Idea or Algorithm plz???????
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top