help with regexp

Discussion in 'Perl Misc' started by Marc Girod, Feb 7, 2013.

  1. Marc Girod

    Marc Girod Guest

    Hello,

    I intend to start fixing an issue I have with a regexp of mine, but I
    thought I might ask for comments even before I start myself.

    I wanted to catch the text between '%[' ... ']N?l' brackets in a
    'format specification' [*].
    My first attempt worked well at first, with format strings such as
    '%Vn %[^O13]Nl\n':

    $fmt =~ s/\%\[(.*?)\](N?)l/$ph/

    The text I catch is itself a regexp, but which I process in isolation
    [I extend the format specification, so that '^O13' will be used as a
    filter.]

    Unfortunately, later, I started to use bolder format strings, such as
    e.g.:
    '%Vn %[Foo]NSa %[^O13]Nl\n'

    My first regexp obviously bled over the two sets of brackets...
    A first naive fix was:

    $fmt =~ s/\%\[([^\]]*?)\](N?)l/$ph/

    However, I can forsee that this prevents other valid specs, such as
    e.g.:
    '%Vn %[^[OE]]Nl\n'

    I can also see that my strategy works only with *one* such field, but
    I am willing to accept that, if I can support complex regexps inside
    it.

    The question I have is: am I doomed to implement a parser?
    Or can I find a reasonable way out e.g. with look ahead?

    Of course, I'll post what I get to myself, if I do (I won't jump to it
    right away...)

    Thanks!
    Marc

    *: I give the link to the man page for this, but I don't expect you to
    need to read it:
    <http://publib.boulder.ibm.com/infocenter/cchelp/v7r0m1/topic/
    com.ibm.rational.clearcase.cc_ref.doc/topics/fmt_ccase.htm>
     
    Marc Girod, Feb 7, 2013
    #1
    1. Advertising

  2. Ben Morrow <> writes:
    > Quoth Marc Girod <>:
    >> I wanted to catch the text between '%[' ... ']N?l' brackets in a
    >> 'format specification' [*].
    >> My first attempt worked well at first, with format strings such as
    >> '%Vn %[^O13]Nl\n':
    >>
    >> $fmt =~ s/\%\[(.*?)\](N?)l/$ph/


    [...]

    >> The question I have is: am I doomed to implement a parser?
    >> Or can I find a reasonable way out e.g. with look ahead?

    >
    > You are doomed to implement a parser, but you can do so using the regex
    > engine :).


    Not really. A 'parser' would be something which does a grammatical analysis
    of a sequence of tokens. This here is a lexical analyzer.
     
    Rainer Weikusat, Feb 7, 2013
    #2
    1. Advertising

  3. Marc Girod

    Marc Girod Guest

    Thanks Ben (and Rainer),

    I didn't have any chance to touch it myself today...

    On Feb 7, 2:37 pm, Ben Morrow <> wrote:

    > I am assuming the spec here requires matching brackets inside a %[]Nl?
    > Can non-matching brackets be escaped?


    I cannot see how non-maching brackets could make any sense there.
    So, this would likely be an error, and I'd have to report it.
    Now, maybe not in this scope, although...

    I'd rather not force escaping inner brackets.
    But that's my choice.

    > If you don't allow escaping of unbalanced brackets, the simple answer is
    > to use Regexp::Common::balanced. If you do, you will need to use 5.10,
    > and write out the recursion yourself:

    ....
    > The trick is the (?-1) group, which says 'start again at the top of the
    > nearest enclosing () group'.


    I'll have to play with both of these suggestions!
    Thanks!
    Marc
     
    Marc Girod, Feb 7, 2013
    #3
  4. Marc Girod

    Marc Girod Guest

    On Feb 7, 7:15 pm, Marc Girod <> wrote:

    > I'll have to play with both of these suggestions!


    I am very impressed.
    Regexp::Common qw /balanced/ gives me a starting point (I have to use
    {-keep}, and work out how to discriminate the 'wrong' brackets (e.g. %
    [...]NSa) from the right ones, and to strip the backets;
    but yours works fully as such (er... I had to switch from m[...] to
    e.g. m{...}-- my Perl (5.14.2 on Cygwin) got confused and told:
    'Invalid [] range "?-1" in regex'.)

    I wasn't aware of this recursive option.
    Only ashamed that I didn't even try...
    Thanks!
    Marc
     
    Marc Girod, Feb 7, 2013
    #4
  5. Marc Girod

    Marc Girod Guest

    On Feb 7, 10:37 pm, Ben Morrow <> wrote:

    >     %[ac]Nl          # simple brackets


    Yes

    >     %[a[b[c]d]e]Nl      # nested brackets
    >     %[a\Nl           # an escaped bracket
    >     %[a[[c]d]Nl         # a Perl character class containing [c
    >     %[a[]c]d]Nl         # a Perl character class containing ]c
    >     %[a[^]c]d]Nl        # a Perl character class not containing ]c


    Honestly, I believe only the first is relevant...
    I.e. I'll take the contents and use it as a regexp to filter 'label
    types'.
    So, one level of character class may be useful, but brackets are not
    themselves legal characters for 'label types', so all the rest is
    moot, isn't it?

    Thanks again anyway!
    Marc
     
    Marc Girod, Feb 8, 2013
    #5
  6. Marc Girod

    Marc Girod Guest

    On Feb 8, 7:46 pm, Ben Morrow <> wrote:

    > Oh, well, that's much easier then:


    Right you are (with label types matching [\w-]+).
    Thanks.
    Marc
     
    Marc Girod, Feb 19, 2013
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Greg Hurrell
    Replies:
    4
    Views:
    166
    James Edward Gray II
    Feb 14, 2007
  2. Mikel Lindsaar
    Replies:
    0
    Views:
    506
    Mikel Lindsaar
    Mar 31, 2008
  3. Joao Silva
    Replies:
    16
    Views:
    377
    7stud --
    Aug 21, 2009
  4. Uldis  Bojars
    Replies:
    2
    Views:
    196
    Janwillem Borleffs
    Dec 17, 2006
  5. Matìj Cepl

    new RegExp().test() or just RegExp().test()

    Matìj Cepl, Nov 24, 2009, in forum: Javascript
    Replies:
    3
    Views:
    191
    Matěj Cepl
    Nov 24, 2009
Loading...

Share This Page