robic0 said:
Finding text from the phrase back to the keyword is indeed hard since
searches procede left to right in general.
That is actually getting to the heart of my question. If somebody could
tell me how do do this, then my problem would be solved.
I have read in Oreilly Regular Expressions Book but could not find a
topic on it, even though I just skimmed through the chapters.
I had a feeling so, that lookaheads could be helpfull, because they are
just used to mark a position. But this I guess still leaves me with
defining where this position is, so I figured they aren't the answer to
my question.
Html/Xml is indeed easier to parse because of its mark-up, and indeed
one of the hardest things to do corectly.
Actually the real example consists of HTML. So there are all kinds of
different tags and they of course can varry. I want to grap a group of
radio boxes, which are contained inside a table. But I only want to
grap the first and last row of the table and everything within.
The back of the expression is easy, but searching from the first radio
box to the left is difficult, because up to this point the document
contains all kinds of tags, words and so on, that you always catch
something in the front. That's why I would like to look from the right
to the left. Then I could ignore all this noise before.
Some other alternatives:
- Method 1 is is a negative character class with one character '<'.
<> are very powerfull delimeters.
This would fail of course, because it will capture many other tags in
front of my radio button group.
- Method 2 is an alternative to a negative assertion construct (?!...)
i believe was mentioned by another poster. I believe the method below to
be a close proximity to negative assertions.
I'm not at all comfortable with negative assertions, however, logically it is the only way.
I made all the tags start tags, and narrowed down the regex to the range
of interest, the start/end text;
This could work, if I let in rund through until the very end. However,
I'm not sure, I have to try.
use strict;
use warnings;
my $string =
'<tr> dfsdfre <tr>fsdsfd35gd <tr>khf758 <tr>afdga654jhuotj <input type="text"> 67kfbs356<tr>sh tu65 <tr> hbrubs<tr>';
# -- method 1 --
my ($capt) = $string =~ m!(<tr>[^<]*<input type="text">)!;
print "found: $capt\n";
# -- method 2 --
while ($string =~ /<tr>(.*?)(?

<tr>)|<input type="text">)/g)
# 1 1( 2 2| )
{
if (defined $2)
{
pos($string) = pos($string) - 4;
next;
}
print "found: $1\n";
}
__END__
found: <tr>afdga654jhuotj <input type="text">
found: afdga654jhuotj