R
Roedy Green
Have you ever noticed how the quotation websites have the same
quotations with tiny variations? or the same quote attributed to
several different authors. Sometimes there is a short and long version
of the same quotation.
I was wondering how you might detect these.
I thought you might do it by converting all to lower case, stripping
punctuation and normalising white space to a single space.
Then you would remove common words.
Then you need to match, where order matters, put precise matching does
not. Just how would that work?
quotations with tiny variations? or the same quote attributed to
several different authors. Sometimes there is a short and long version
of the same quotation.
I was wondering how you might detect these.
I thought you might do it by converting all to lower case, stripping
punctuation and normalising white space to a single space.
Then you would remove common words.
Then you need to match, where order matters, put precise matching does
not. Just how would that work?