help with regexp

Marc Girod · Feb 7, 2013

Hello,

I intend to start fixing an issue I have with a regexp of mine, but I
thought I might ask for comments even before I start myself.

I wanted to catch the text between '%[' ... ']N?l' brackets in a
'format specification' [*].
My first attempt worked well at first, with format strings such as
'%Vn %[^O13]Nl\n':

$fmt =~ s/\%\[(.*?)\](N?)l/$ph/

The text I catch is itself a regexp, but which I process in isolation
[I extend the format specification, so that '^O13' will be used as a
filter.]

Unfortunately, later, I started to use bolder format strings, such as
e.g.:
'%Vn %[Foo]NSa %[^O13]Nl\n'

My first regexp obviously bled over the two sets of brackets...
A first naive fix was:

$fmt =~ s/\%\[([^\]]*?)\](N?)l/$ph/

However, I can forsee that this prevents other valid specs, such as
e.g.:
'%Vn %[^[OE]]Nl\n'

I can also see that my strategy works only with *one* such field, but
I am willing to accept that, if I can support complex regexps inside
it.

The question I have is: am I doomed to implement a parser?
Or can I find a reasonable way out e.g. with look ahead?

Of course, I'll post what I get to myself, if I do (I won't jump to it
right away...)

Thanks!
Marc

*: I give the link to the man page for this, but I don't expect you to
need to read it:
<http://publib.boulder.ibm.com/infocenter/cchelp/v7r0m1/topic/
com.ibm.rational.clearcase.cc_ref.doc/topics/fmt_ccase.htm>

Rainer Weikusat · Feb 7, 2013

Ben Morrow said:
Quoth Marc Girod said:

I wanted to catch the text between '%[' ... ']N?l' brackets in a
'format specification' [*].
My first attempt worked well at first, with format strings such as
'%Vn %[^O13]Nl\n':

$fmt =~ s/\%\[(.*?)\](N?)l/$ph/

Click to expand...

[...]

The question I have is: am I doomed to implement a parser?
Or can I find a reasonable way out e.g. with look ahead?

Click to expand...

You are doomed to implement a parser, but you can do so using the regex
engine .

Not really. A 'parser' would be something which does a grammatical analysis
of a sequence of tokens. This here is a lexical analyzer.

Marc Girod · Feb 7, 2013

Thanks Ben (and Rainer),

I didn't have any chance to touch it myself today...

I am assuming the spec here requires matching brackets inside a %[]Nl?
Can non-matching brackets be escaped?

I cannot see how non-maching brackets could make any sense there.
So, this would likely be an error, and I'd have to report it.
Now, maybe not in this scope, although...

I'd rather not force escaping inner brackets.
But that's my choice.

If you don't allow escaping of unbalanced brackets, the simple answer is
to use Regexp::Common::balanced. If you do, you will need to use 5.10,
and write out the recursion yourself: ....
The trick is the (?-1) group, which says 'start again at the top of the
nearest enclosing () group'.

I'll have to play with both of these suggestions!
Thanks!
Marc

Marc Girod · Feb 7, 2013

I'll have to play with both of these suggestions!

I am very impressed.
Regexp::Common qw /balanced/ gives me a starting point (I have to use
{-keep}, and work out how to discriminate the 'wrong' brackets (e.g. %
[...]NSa) from the right ones, and to strip the backets;
but yours works fully as such (er... I had to switch from m[...] to
e.g. m{...}-- my Perl (5.14.2 on Cygwin) got confused and told:
'Invalid [] range "?-1" in regex'.)

I wasn't aware of this recursive option.
Only ashamed that I didn't even try...
Thanks!
Marc

Marc Girod · Feb 8, 2013

%[ac]Nl # simple brackets
Yes

%[a[b[c]d]e]Nl # nested brackets
%[a\Nl # an escaped bracket
%[a[[c]d]Nl # a Perl character class containing [c
%[a[]c]d]Nl # a Perl character class containing ]c
%[a[^]c]d]Nl # a Perl character class not containing ]c

Honestly, I believe only the first is relevant...
I.e. I'll take the contents and use it as a regexp to filter 'label
types'.
So, one level of character class may be useful, but brackets are not
themselves legal characters for 'label types', so all the rest is
moot, isn't it?

Thanks again anyway!
Marc

Marc Girod · Feb 19, 2013

Oh, well, that's much easier then:

Right you are (with label types matching [\w-]+).
Thanks.
Marc

I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	1	Jun 4, 2023
Help with my responsive home page	2	Dec 14, 2022
Help with regexp	5	May 11, 2009
Help!! Can anyone provide this solution?	1	Jan 30, 2022
Can't solve problems! please Help	0	Sep 26, 2022
Issue with textbox script?	0	Sep 5, 2022
Regexp discovery - using ^ with /m is a time sink	5	Feb 14, 2009
Replace an occurrence of a regexp with a function call on a substringof the match, multiple times on	4	Sep 16, 2013

help with regexp

Marc Girod

Rainer Weikusat

Marc Girod

Marc Girod

Marc Girod

Marc Girod

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads