Besides, when we're talking about performance on this scale, the order
in which the terms are evaluated by the regex engine might matter:
we'd want it to check against the more common case first to minimize
the time spent backtracking.
Only problem is that this cannot be done by looking at the expression -
the optimizer would need additional data about inputs. That would make
the reordering more complicated and the result would probably be no
longer "tidy".
I came to think that Roedy's main concern was human readability and not
performance of the expression. However, changing a human crafted
expression is probably not wise - for performance reasons (see below)
but also for readability: it may actually be crucial for readability
that the expression stays as written.
Now, regex engines don't _have_ to use the ordering of the term as a
suggestion for the order of evaluation, but as far as I'm aware,
that's how it's usually done.
That does make sense: since NFSs are known to execute in order (as
opposed to DFAs) it is wise to use the human given order because then we
have a chance to do that optimization.
So if "rabbit" is much more prevalent in your expected input than either
"cat", "cow" or "dog", alphabetizing the terms would yield suboptimal
performance.
Right, but see above.
(That said, if you find that such low-level details about the regex'
efficency is important enough that it is significant for your program,
you should probably rethink your whole approach.)
Yes, for word search there are probably better approaches. I personally
haven't come across a case where this order would have mattered. In
many cases cost of IO will be dominant anyway.
Cheers
robert