Y
Yash
We have a list of regular expressions such as:
vodafone.*horoscope
vodafone.*pxtsend
vodafone
yahoo
The total number of such reg expressions is few hundreds.
Our program reads a large file with millions of URLs. For each URL, it
has to find the best match in the list of regular expressions.
For example www.vodafome.com will match with vodafone.
www.vodafone.abc.horoscope will match with vodafone.*horoscope, etc.
Basically we have to find the regular expression that matches best for
a given URL.
Do you have any suggestions/pointers for doing this in a very
efficient way? Suggestions regarding data structures to use,
algorithm, etc. would be helpful.
Thanks
vodafone.*horoscope
vodafone.*pxtsend
vodafone
yahoo
The total number of such reg expressions is few hundreds.
Our program reads a large file with millions of URLs. For each URL, it
has to find the best match in the list of regular expressions.
For example www.vodafome.com will match with vodafone.
www.vodafone.abc.horoscope will match with vodafone.*horoscope, etc.
Basically we have to find the regular expression that matches best for
a given URL.
Do you have any suggestions/pointers for doing this in a very
efficient way? Suggestions regarding data structures to use,
algorithm, etc. would be helpful.
Thanks