D
David Weldon
Problem: 1 million+ strings (Set A) need to be matched with 1 million+
substrings (Set B). For example:
Set A =
iliketotraveltohawaii
travelmagazine
Set B =
travel
hawaii
A(1,2) match "travel"
A(1) matches "hawaii"
What is the best approach to take with this problem? I have tried using
ferret:
http://www.ruby-forum.com/topic/132772
Indexing is fast, but search is very slow. I think ferret could be a
good choice if I had the substrings split out so iliketotraveltohawaii
-> "i like to travel to hawaii". I have a solution to this problem but
it can't be trusted with misspellings (that's an entirely different
forum topic). Is there something obvious that I'm missing here?
substrings (Set B). For example:
Set A =
iliketotraveltohawaii
travelmagazine
Set B =
travel
hawaii
A(1,2) match "travel"
A(1) matches "hawaii"
What is the best approach to take with this problem? I have tried using
ferret:
http://www.ruby-forum.com/topic/132772
Indexing is fast, but search is very slow. I think ferret could be a
good choice if I had the substrings split out so iliketotraveltohawaii
-> "i like to travel to hawaii". I have a solution to this problem but
it can't be trusted with misspellings (that's an entirely different
forum topic). Is there something obvious that I'm missing here?