Why is regex so slow?

J

Johannes Bauer

All the O() tells you is the general shape of the line.

Nitpick: it only gives an *upper bound* for the complexity. Any function
that is within O(n) is also within O(n^2). Usually when people say O()
they actually mean capital Thetha (which is the correct term).
It's perfectly
feasible that for the range of values of n that you care about in a
particular application, there's an O(n^2) algorithm that's way faster
than another O(log(n)) algorithm. [Though that becomes a lot less
likely as n gets large.]

Since O() only gives upper bounds it's also possible for an algorithm
within O(n^2) to always be faster than another algorithm within O(logn).
The O(n^2) algorithm could be Thetha(1).

Regards,
Johannes

--
Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>
 
R

Roy Smith

I'd just like to point out that your simple loop is looking at every
character of the input string. The simple "'ENQ' not in line" test can look
at the third character of the string and if it's none of 'E', 'N' or 'Q'
skip to checking the 6th and then the 9th. It doesn't have to touch the
intervening characters at all.

It's been a while since I looked at boyer-moore in detail. Looking at Objects/stringlib/fastsearch.h from the 2.7.4 source, it occurs to me that:

/* create compressed boyer-moore delta 1 table */

/* process pattern[:-1] */
for (i = 0; i < mlast; i++) {
STRINGLIB_BLOOM_ADD(mask, p);
if (p == p[mlast])
skip = mlast - i - 1;
}
/* process pattern[-1] outside the loop */
STRINGLIB_BLOOM_ADD(mask, p[mlast]);

is essentially (well, sort-if) the same as the compile() step of a regex. For the (presumably) common use case of searching many strings for the samesubstring (which is what we're doing here), it seems like it would be a win to cache the mask and reuse it if the search string id is the same as thelast search string id. The overhead on cache misses would be a single pointer comparison. Has anybody looked at doing that?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top