Brian McCauley said:
hi community...
i have a hard work with a "really good" regex -
my regex should substitute/insert (in pdf) something like
/V () with my new value...
my regex is working for the first time when i really have /V ()
but second time i substitute only until first ) without checked for a
\\ before the )
s#([^\\]/V\s*\()((\))|(.*?[^\\]?\)))#$1$value)#ms
im totally confused and have no idea where my mistake is in this regex
Well since you do not define "something like" it is impossible to know
what your regex is supposed to match. Without knowing that there's
only so much one can do.
First we can remove some of the () that do nothing.
s#([^\\]/V\s*\()(\)|.*?[^\\]?\))#$1$value)#ms
Then we can observe that the subpattern /.*?[^\\]?/ will match exactly
the same as /.*?/
s#([^\\]/V\s*\()(\)|.*?\))#$1$value)#ms
Next we observe that ')', the only thing matched by the subpattern
/\)/, could also be the 'best' match for /.*?\)/
It follows that /\)|.*?\)/ will simplify to /.*?\)/
s#([^\\]/V\s*\().*?\)#$1$value)#ms
u done a really nice simplification
So I've simplified your regex so its easier to see that it does. But
without knowing what it was supposed to do I can't say how to change
it so that it does what you want.
that's what i really need to learn - thank u a lot !
Random shot-in-the-dark. You original mistake was a spurious ? after [^\\]
s#([^\\]/V\s*\()(\)|.*?[^\\]\))#$1$value)#ms
one of my mistakes...
Although I'm usually not a fan of negative look-behind I think this is
a case I would use it.
s#((?<!\\)/V\s*\().*?(?<!\\)\)#$1$value)#ms
works fine until i use different counts of \
I'd also consider using the /x qualifier.
only for easier reading or why else ?
Alternative shot-in-the-dark, see FAQ: Can I use Perl regular
expressions to match balanced text?
this means i cant do it with perl5 ?!
here is my solution...
(maybe not the shortest - but it works in every case i know) :
$bsl = '\\'; # only for easier reading
$new1 = "$bsl)$bsl$bsl$bsl)simple text$bsl$bsl$bsl$bsl$bsl)$bsl$bsl";
$new2 = "easy text$bsl$bsl$bsl)simple text$bsl$bsl$bsl$bsl$bsl)";
$new3 = "$bsl$bsl$bsl)simple text";
$obj = ".../T (element) /FT /Tx
/V (first text$bsl$bsl$bsl)second text$bsl)last text$bsl$bsl)
/DA (/Cour 10 Tf 0 g)..."
print "$obj\n";
# test a little bit within foreach
foreach $new ($new1,$new2,$new3,$new3,$new3,$new2,$new2,$new1,$new3) {
$obj =~ s#([^\\]/V\s*\()(.*\))#$1#ms; # all what's maybe right after /V
$_ = $2; # to replace and the rest to save
@_ = split (/(\\+\))/ms,$_); # split for BSL's with added )
if (!$#_) { # no BSL's in front of )
s#(.*?)\)#$new)#ms; # until first )
$obj .= $_;
} else {
while (@_) {
$_ = shift @_;
if (/\\+\)/) { # BSL with added )
last if length($_)%2; # pair of BSL => unquoted )
} elsif (/^\)$/) { # single char
last;
} elsif (/.*?\)(.*)/ms) { # ) in the middle or at the end
unshift @_,$1; # save possbile rest
last;
} # i miss the else ;(
}
}
$obj =~ s#([^\\]/V\s*\()#$1$new)#ms;
$obj.=join('',@_);
print "$obj\n";
}
best regards
Thomas