really hard regex ;(

Discussion in 'Perl Misc' started by Thomas, Oct 17, 2003.

  1. Thomas

    Thomas Guest

    hi community...
    i have a hard work with a "really good" regex -
    my regex should substitute/insert (in pdf) something like
    /V () with my new value...
    my regex is working for the first time when i really have /V ()
    but second time i substitute only until first ) without checked for a
    \\ before the )

    s#([^\\]/V\s*\()((\))|(.*?[^\\]?\)))#$1$value)#ms

    im totally confused and have no idea where my mistake is in this regex

    best regards
    Thomas
    Thomas, Oct 17, 2003
    #1
    1. Advertising

  2. Thomas

    Anno Siegel Guest

    Thomas <> wrote in comp.lang.perl.misc:
    > hi community...
    > i have a hard work with a "really good" regex -
    > my regex should substitute/insert (in pdf) something like
    > /V () with my new value...
    > my regex is working for the first time when i really have /V ()
    > but second time i substitute only until first ) without checked for a
    > \\ before the )
    >
    > s#([^\\]/V\s*\()((\))|(.*?[^\\]?\)))#$1$value)#ms
    >
    > im totally confused and have no idea where my mistake is in this regex


    It appears you are trying to come up with a regex that deals with nested
    constructs. Don't. Use a module that deals with such things, like
    Text::Balanced.

    It may be possible to do this with a regex, but that would involve a
    recursive qr// (an obscenity) and is no fun at all.

    Anno
    Anno Siegel, Oct 17, 2003
    #2
    1. Advertising

  3. (Thomas) writes:

    > hi community...
    > i have a hard work with a "really good" regex -
    > my regex should substitute/insert (in pdf) something like
    > /V () with my new value...
    > my regex is working for the first time when i really have /V ()
    > but second time i substitute only until first ) without checked for a
    > \\ before the )
    >
    > s#([^\\]/V\s*\()((\))|(.*?[^\\]?\)))#$1$value)#ms
    >
    > im totally confused and have no idea where my mistake is in this regex


    Well since you do not define "something like" it is impossible to know
    what your regex is supposed to match. Without knowing that there's
    only so much one can do.

    First we can remove some of the () that do nothing.

    s#([^\\]/V\s*\()(\)|.*?[^\\]?\))#$1$value)#ms

    Then we can observe that the subpattern /.*?[^\\]?/ will match exactly
    the same as /.*?/

    s#([^\\]/V\s*\()(\)|.*?\))#$1$value)#ms

    Next we observe that ')', the only thing matched by the subpattern
    /\)/, could also be the 'best' match for /.*?\)/

    It follows that /\)|.*?\)/ will simplify to /.*?\)/

    s#([^\\]/V\s*\().*?\)#$1$value)#ms

    So I've simplified your regex so its easier to see that it does. But
    without knowing what it was supposed to do I can't say how to change
    it so that it does what you want.

    Random shot-in-the-dark. You original mistake was a spurious ? after [^\\]

    s#([^\\]/V\s*\()(\)|.*?[^\\]\))#$1$value)#ms

    Although I'm usually not a fan of negative look-behind I think this is
    a case I would use it.

    s#((?<!\\)/V\s*\().*?(?<!\\)\)#$1$value)#ms

    I'd also consider using the /x qualifier.

    Alternative shot-in-the-dark, see FAQ: Can I use Perl regular
    expressions to match balanced text?

    --
    \\ ( )
    . _\\__[oo
    .__/ \\ /\@
    . l___\\
    # ll l\\
    ###LL LL\\
    Brian McCauley, Oct 17, 2003
    #3
  4. Thomas

    Thomas Guest

    "Bill Segraves" <> wrote in message news:<2u%jb.1484$>...
    > If you mean to replace instances of
    > /V (old_value)
    > with
    > /V (new_value)
    >
    > in a PDF file, your efforts are likely misguided, as a PDF file is not
    > intended to be edited in this fashion. While you may be able to make the
    > substitutions you desire in some cases, your efforts will often be
    > frustrated by the generation of invalid PDF files.


    u are right ;) i can tell u so many lessons about frustation by
    generation of PDF

    > You'll enjoy greater success if you use the import/export (FDF, XFDF), and
    > submit methods that are officially supported for PDFs.


    for sure, this is one possible solution...but not without some
    problems to me...
    1) import/export only working with adobe writer or equal (or do i
    missed something)...
    2) the forms i like to publish should be encoded so nobody can change
    any content he shouldnt change
    => the finishing result of all work (my & the world) should be a PDF
    which is totally read only - so any other people can read the document
    but nobody can change it...
    3) export of formdata is not the method i like to support my
    people...they should be able to get a filled form and send it to my
    script without any other tools...

    > IMO, your objective is more appropriate for newsgroup comp.text.pdf, where
    > you'll find a number of experts willing to guide you on legitimate issues
    > with the use of PDF files.


    maybe - but i think my problem is much nearer to perl regex than a
    special of pdf...thatswhy i posted it here...

    > Good luck.
    > Bill Segraves


    thanks ;)
    Thomas
    Thomas, Oct 20, 2003
    #4
  5. Thomas

    Thomas Guest

    Brian McCauley <> wrote in message news:<>...
    > (Thomas) writes:
    >
    > > hi community...
    > > i have a hard work with a "really good" regex -
    > > my regex should substitute/insert (in pdf) something like
    > > /V () with my new value...
    > > my regex is working for the first time when i really have /V ()
    > > but second time i substitute only until first ) without checked for a
    > > \\ before the )
    > >
    > > s#([^\\]/V\s*\()((\))|(.*?[^\\]?\)))#$1$value)#ms
    > >
    > > im totally confused and have no idea where my mistake is in this regex

    >
    > Well since you do not define "something like" it is impossible to know
    > what your regex is supposed to match. Without knowing that there's
    > only so much one can do.
    >
    > First we can remove some of the () that do nothing.
    >
    > s#([^\\]/V\s*\()(\)|.*?[^\\]?\))#$1$value)#ms
    >
    > Then we can observe that the subpattern /.*?[^\\]?/ will match exactly
    > the same as /.*?/
    >
    > s#([^\\]/V\s*\()(\)|.*?\))#$1$value)#ms
    >
    > Next we observe that ')', the only thing matched by the subpattern
    > /\)/, could also be the 'best' match for /.*?\)/
    >
    > It follows that /\)|.*?\)/ will simplify to /.*?\)/
    > s#([^\\]/V\s*\().*?\)#$1$value)#ms

    u done a really nice simplification

    > So I've simplified your regex so its easier to see that it does. But
    > without knowing what it was supposed to do I can't say how to change
    > it so that it does what you want.

    that's what i really need to learn - thank u a lot !

    > Random shot-in-the-dark. You original mistake was a spurious ? after [^\\]
    > s#([^\\]/V\s*\()(\)|.*?[^\\]\))#$1$value)#ms

    one of my mistakes...

    > Although I'm usually not a fan of negative look-behind I think this is
    > a case I would use it.
    > s#((?<!\\)/V\s*\().*?(?<!\\)\)#$1$value)#ms

    works fine until i use different counts of \

    > I'd also consider using the /x qualifier.

    only for easier reading or why else ?

    > Alternative shot-in-the-dark, see FAQ: Can I use Perl regular
    > expressions to match balanced text?

    this means i cant do it with perl5 ?! ;)

    here is my solution...
    (maybe not the shortest - but it works in every case i know) :

    $bsl = '\\'; # only for easier reading
    $new1 = "$bsl)$bsl$bsl$bsl)simple text$bsl$bsl$bsl$bsl$bsl)$bsl$bsl";
    $new2 = "easy text$bsl$bsl$bsl)simple text$bsl$bsl$bsl$bsl$bsl)";
    $new3 = "$bsl$bsl$bsl)simple text";
    $obj = ".../T (element) /FT /Tx
    /V (first text$bsl$bsl$bsl)second text$bsl)last text$bsl$bsl)
    /DA (/Cour 10 Tf 0 g)..."
    print "$obj\n";
    # test a little bit within foreach
    foreach $new ($new1,$new2,$new3,$new3,$new3,$new2,$new2,$new1,$new3) {
    $obj =~ s#([^\\]/V\s*\()(.*\))#$1#ms; # all what's maybe right after /V
    $_ = $2; # to replace and the rest to save
    @_ = split (/(\\+\))/ms,$_); # split for BSL's with added )
    if (!$#_) { # no BSL's in front of )
    s#(.*?)\)#$new)#ms; # until first )
    $obj .= $_;
    } else {
    while (@_) {
    $_ = shift @_;
    if (/\\+\)/) { # BSL with added )
    last if length($_)%2; # pair of BSL => unquoted )
    } elsif (/^\)$/) { # single char
    last;
    } elsif (/.*?\)(.*)/ms) { # ) in the middle or at the end
    unshift @_,$1; # save possbile rest
    last;
    } # i miss the else ;(
    }
    }
    $obj =~ s#([^\\]/V\s*\()#$1$new)#ms;
    $obj.=join('',@_);
    print "$obj\n";
    }

    best regards
    Thomas
    Thomas, Oct 20, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Amir
    Replies:
    3
    Views:
    582
  2. nc
    Replies:
    1
    Views:
    482
    nice.guy.nige
    Feb 3, 2005
  3. Replies:
    3
    Views:
    728
    Reedick, Andrew
    Jul 1, 2008
  4. Sonnich
    Replies:
    8
    Views:
    149
    Bob Barrows [MVP]
    Aug 3, 2006
  5. gavino
    Replies:
    2
    Views:
    86
    Tad McClellan
    Jul 26, 2006
Loading...

Share This Page