S
slash
Hi,
I am trying to write a script that will allow me to manipulate words
in a certain way and also keep track of the documents from which those
words came from. In other words, let's say my corpus consisted of
htese three documents with the following contents.
DocID 1.TXT
Compose your message
DocID 2.TXT
Use this form to post your message
DocID 3.TXT
Remember that it can be viewed by millions
Now, when I do my processing for all files, I want to be able to see
that "message" is a word that appears in both DocID 1.TXT and DocID
2.TXT
How can I do this in Perl? Is this what an inverted index is minus the
term frequencies, etc.? I am under pressure and wanted to know if
there was any way I could perhaps get this code from somewhere else or
perhaps the pseudocode.
I would certainly appreciate any help.
Thanks,
Slash
I am trying to write a script that will allow me to manipulate words
in a certain way and also keep track of the documents from which those
words came from. In other words, let's say my corpus consisted of
htese three documents with the following contents.
DocID 1.TXT
Compose your message
DocID 2.TXT
Use this form to post your message
DocID 3.TXT
Remember that it can be viewed by millions
Now, when I do my processing for all files, I want to be able to see
that "message" is a word that appears in both DocID 1.TXT and DocID
2.TXT
How can I do this in Perl? Is this what an inverted index is minus the
term frequencies, etc.? I am under pressure and wanted to know if
there was any way I could perhaps get this code from somewhere else or
perhaps the pseudocode.
I would certainly appreciate any help.
Thanks,
Slash