(e-mail address removed) wrote:
: Hello. I've discovered that this regex is a bottleneck:
: /(?:<!\-.*?>.*?){5}/sig
: It tries to locate as many html comments in chunks of five which can
: make for quite some possibilities in longer files. Is there a way to
: optimize this or do you consider it to be simply poor practice?
First, there are html parses that may help do what ever you want to do,
but ignoring that for the moment...
First off, a comment does not end with >, it ends with --> (and starts
with <!-- so why not test for that correctly also)?
<!--.*?-->
If you know the comments can't have > in them, then a character class
would be quicker than .*?
<!--[^>]*>
Next, I wonder why would you need to find comments in blocks of 5?
Even if you really wish to look for blocks of 5 comments at a time, the /g
says to do this globally, so it looks thru the entire file for all
possible combinations of 5 blocks (I didn't say that correctly) and I
suspect that is the biggest bottle neck.
I suspect you don't really want /g at all.
Also, the .*? is a potential bug, because it does not _prevent_ the re
from matching two (or more) comments at the place you intend to match a
single comment, it simply says "match no more than is necessary to get a
match", so the regex engine could be trying combinations of multiple
comments in an attempt to get a {5} /g match to work.
I'm not sure if the above _is_ a bug, but I can't say it isn't. The
character class I mentioned is not prone to this issue as it simply can't
match past the > , but that assumes (as I mentioned) that the comments
never use > .
Finally, /i is to ignore case, but nothing you look for uses case, so why
specify it (though I doubt that makes a difference here).