R
Randy Kramer
This is going to seem a little strange (for a number of reasons I might
mention below), but I would like to iterate through a string from beginning
to end, examining each character, then discarding (or moving) them
(elsewhere).
This is so that I can, upon finding certain character(s), run one of several
REs on the next part of the string. I'm looking to find a very efficient way
to do this.
Here are the approaches I'm considering so far. Any other suggestions?
* ?? (Is there any method which chops the first character from the string?
How efficient is that (especially compared to the method described in the
next bullet).
* Reverse the string, then use something like s[length (-1)] to examine
then chop to discard the last character. The problem with this approach is
that I then have to reverse the string again to have the REs work. (I
(briefly) thought about just setting up the REs in reverse, but I foresee a
lot of difficulty there--scanning through the string in reverse (from last
character to first, negates the advantage I was hoping to gain, that of
starting the RE match only from positions where the RE could possibly match.)
Barring anything better, the approach I may take is to iterate through the
string and, when I find a potential RE match, use s[n,len] to return a
partial string to be checked against the RE.
Aside: I'm trying to make a very efficient parser in Ruby for a wiki-like
thing I want to build, and am trying to avoid the approach of simply making
multiple scans through the document for each of a fairly large number of REs.
I might be guilty of premature optimization, but I prefer to think of it as
doing some proof of concept testing before committing to a design.
I have done some tests that show a 1 to 10% savings in time by taking a
similar approach for REs that could only match at the beginning of a string.
(At some point I'll "publish" those results on WikiLearn (or the next
incarnation thereof).) The next REs are considerably more complex as they
can match anywhere in the string--if the savings from the same approach for
them is only 1 to 10%, the complexity will not be worth it. If by some
chance it exceeds say 50%, I will seriously consider that complexity.
Randy Kramer
mention below), but I would like to iterate through a string from beginning
to end, examining each character, then discarding (or moving) them
(elsewhere).
This is so that I can, upon finding certain character(s), run one of several
REs on the next part of the string. I'm looking to find a very efficient way
to do this.
Here are the approaches I'm considering so far. Any other suggestions?
* ?? (Is there any method which chops the first character from the string?
How efficient is that (especially compared to the method described in the
next bullet).
* Reverse the string, then use something like s[length (-1)] to examine
then chop to discard the last character. The problem with this approach is
that I then have to reverse the string again to have the REs work. (I
(briefly) thought about just setting up the REs in reverse, but I foresee a
lot of difficulty there--scanning through the string in reverse (from last
character to first, negates the advantage I was hoping to gain, that of
starting the RE match only from positions where the RE could possibly match.)
Barring anything better, the approach I may take is to iterate through the
string and, when I find a potential RE match, use s[n,len] to return a
partial string to be checked against the RE.
Aside: I'm trying to make a very efficient parser in Ruby for a wiki-like
thing I want to build, and am trying to avoid the approach of simply making
multiple scans through the document for each of a fairly large number of REs.
I might be guilty of premature optimization, but I prefer to think of it as
doing some proof of concept testing before committing to a design.
I have done some tests that show a 1 to 10% savings in time by taking a
similar approach for REs that could only match at the beginning of a string.
(At some point I'll "publish" those results on WikiLearn (or the next
incarnation thereof).) The next REs are considerably more complex as they
can match anywhere in the string--if the savings from the same approach for
them is only 1 to 10%, the complexity will not be worth it. If by some
chance it exceeds say 50%, I will seriously consider that complexity.
Randy Kramer