really hard regex ;(

T

Thomas

hi community...
i have a hard work with a "really good" regex -
my regex should substitute/insert (in pdf) something like
/V () with my new value...
my regex is working for the first time when i really have /V ()
but second time i substitute only until first ) without checked for a
\\ before the )

s#([^\\]/V\s*\()((\))|(.*?[^\\]?\)))#$1$value)#ms

im totally confused and have no idea where my mistake is in this regex

best regards
Thomas
 
A

Anno Siegel

Thomas said:
hi community...
i have a hard work with a "really good" regex -
my regex should substitute/insert (in pdf) something like
/V () with my new value...
my regex is working for the first time when i really have /V ()
but second time i substitute only until first ) without checked for a
\\ before the )

s#([^\\]/V\s*\()((\))|(.*?[^\\]?\)))#$1$value)#ms

im totally confused and have no idea where my mistake is in this regex

It appears you are trying to come up with a regex that deals with nested
constructs. Don't. Use a module that deals with such things, like
Text::Balanced.

It may be possible to do this with a regex, but that would involve a
recursive qr// (an obscenity) and is no fun at all.

Anno
 
B

Brian McCauley

hi community...
i have a hard work with a "really good" regex -
my regex should substitute/insert (in pdf) something like
/V () with my new value...
my regex is working for the first time when i really have /V ()
but second time i substitute only until first ) without checked for a
\\ before the )

s#([^\\]/V\s*\()((\))|(.*?[^\\]?\)))#$1$value)#ms

im totally confused and have no idea where my mistake is in this regex

Well since you do not define "something like" it is impossible to know
what your regex is supposed to match. Without knowing that there's
only so much one can do.

First we can remove some of the () that do nothing.

s#([^\\]/V\s*\()(\)|.*?[^\\]?\))#$1$value)#ms

Then we can observe that the subpattern /.*?[^\\]?/ will match exactly
the same as /.*?/

s#([^\\]/V\s*\()(\)|.*?\))#$1$value)#ms

Next we observe that ')', the only thing matched by the subpattern
/\)/, could also be the 'best' match for /.*?\)/

It follows that /\)|.*?\)/ will simplify to /.*?\)/

s#([^\\]/V\s*\().*?\)#$1$value)#ms

So I've simplified your regex so its easier to see that it does. But
without knowing what it was supposed to do I can't say how to change
it so that it does what you want.

Random shot-in-the-dark. You original mistake was a spurious ? after [^\\]

s#([^\\]/V\s*\()(\)|.*?[^\\]\))#$1$value)#ms

Although I'm usually not a fan of negative look-behind I think this is
a case I would use it.

s#((?<!\\)/V\s*\().*?(?<!\\)\)#$1$value)#ms

I'd also consider using the /x qualifier.

Alternative shot-in-the-dark, see FAQ: Can I use Perl regular
expressions to match balanced text?

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
 
T

Thomas

Bill Segraves said:
If you mean to replace instances of
/V (old_value)
with
/V (new_value)

in a PDF file, your efforts are likely misguided, as a PDF file is not
intended to be edited in this fashion. While you may be able to make the
substitutions you desire in some cases, your efforts will often be
frustrated by the generation of invalid PDF files.

u are right ;) i can tell u so many lessons about frustation by
generation of PDF
You'll enjoy greater success if you use the import/export (FDF, XFDF), and
submit methods that are officially supported for PDFs.

for sure, this is one possible solution...but not without some
problems to me...
1) import/export only working with adobe writer or equal (or do i
missed something)...
2) the forms i like to publish should be encoded so nobody can change
any content he shouldnt change
=> the finishing result of all work (my & the world) should be a PDF
which is totally read only - so any other people can read the document
but nobody can change it...
3) export of formdata is not the method i like to support my
people...they should be able to get a filled form and send it to my
script without any other tools...
IMO, your objective is more appropriate for newsgroup comp.text.pdf, where
you'll find a number of experts willing to guide you on legitimate issues
with the use of PDF files.

maybe - but i think my problem is much nearer to perl regex than a
special of pdf...thatswhy i posted it here...
Good luck.
Bill Segraves

thanks ;)
Thomas
 
T

Thomas

Brian McCauley said:
hi community...
i have a hard work with a "really good" regex -
my regex should substitute/insert (in pdf) something like
/V () with my new value...
my regex is working for the first time when i really have /V ()
but second time i substitute only until first ) without checked for a
\\ before the )

s#([^\\]/V\s*\()((\))|(.*?[^\\]?\)))#$1$value)#ms

im totally confused and have no idea where my mistake is in this regex

Well since you do not define "something like" it is impossible to know
what your regex is supposed to match. Without knowing that there's
only so much one can do.

First we can remove some of the () that do nothing.

s#([^\\]/V\s*\()(\)|.*?[^\\]?\))#$1$value)#ms

Then we can observe that the subpattern /.*?[^\\]?/ will match exactly
the same as /.*?/

s#([^\\]/V\s*\()(\)|.*?\))#$1$value)#ms

Next we observe that ')', the only thing matched by the subpattern
/\)/, could also be the 'best' match for /.*?\)/

It follows that /\)|.*?\)/ will simplify to /.*?\)/
s#([^\\]/V\s*\().*?\)#$1$value)#ms
u done a really nice simplification
So I've simplified your regex so its easier to see that it does. But
without knowing what it was supposed to do I can't say how to change
it so that it does what you want.
that's what i really need to learn - thank u a lot !
Random shot-in-the-dark. You original mistake was a spurious ? after [^\\]
s#([^\\]/V\s*\()(\)|.*?[^\\]\))#$1$value)#ms
one of my mistakes...
Although I'm usually not a fan of negative look-behind I think this is
a case I would use it.
s#((?<!\\)/V\s*\().*?(?<!\\)\)#$1$value)#ms
works fine until i use different counts of \
I'd also consider using the /x qualifier.
only for easier reading or why else ?
Alternative shot-in-the-dark, see FAQ: Can I use Perl regular
expressions to match balanced text?
this means i cant do it with perl5 ?! ;)

here is my solution...
(maybe not the shortest - but it works in every case i know) :

$bsl = '\\'; # only for easier reading
$new1 = "$bsl)$bsl$bsl$bsl)simple text$bsl$bsl$bsl$bsl$bsl)$bsl$bsl";
$new2 = "easy text$bsl$bsl$bsl)simple text$bsl$bsl$bsl$bsl$bsl)";
$new3 = "$bsl$bsl$bsl)simple text";
$obj = ".../T (element) /FT /Tx
/V (first text$bsl$bsl$bsl)second text$bsl)last text$bsl$bsl)
/DA (/Cour 10 Tf 0 g)..."
print "$obj\n";
# test a little bit within foreach
foreach $new ($new1,$new2,$new3,$new3,$new3,$new2,$new2,$new1,$new3) {
$obj =~ s#([^\\]/V\s*\()(.*\))#$1#ms; # all what's maybe right after /V
$_ = $2; # to replace and the rest to save
@_ = split (/(\\+\))/ms,$_); # split for BSL's with added )
if (!$#_) { # no BSL's in front of )
s#(.*?)\)#$new)#ms; # until first )
$obj .= $_;
} else {
while (@_) {
$_ = shift @_;
if (/\\+\)/) { # BSL with added )
last if length($_)%2; # pair of BSL => unquoted )
} elsif (/^\)$/) { # single char
last;
} elsif (/.*?\)(.*)/ms) { # ) in the middle or at the end
unshift @_,$1; # save possbile rest
last;
} # i miss the else ;(
}
}
$obj =~ s#([^\\]/V\s*\()#$1$new)#ms;
$obj.=join('',@_);
print "$obj\n";
}

best regards
Thomas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top