[Fredrik Lundh]
Francois Pinard wrote:
Given the above,
build_regexp(['this', 'that', 'the-other'])
yields the string 'th(?:is|at|e\\-other)', which one may choose to
`re.compile' before use.
the SRE compiler looks for common prefixes, so "th(?:is|at|e\\-other)" is
no different from "this|that|the-other" on the engine level.
Thanks for the note. So the `build_regexp' function is not useful after
all. It was indirectly written around a speed problem in the GNU regexp
engine, but seemingly, the Python regexp engine knows better already. As I
wrote earlier, I first saw Emacs Lisp `regexp-opt' used within `enscript'..
A speed comparison between both methods shows that they are fairly
equivalent. A small difference is that `build_regexp', given that one of
the word is a prefix of another, automatically recognises the longest one,
while a naive regexp of '|'.join(words) recognises whatever happens to be
listed first. Of course, this is easily solved by sorting, then reversing
the word list before producing the naive regexp.