Multi-line regular expression match question

G

Guillermo Riojas

Hi,

I have been trying to create a solid regular expression to match a
possible multi-line expression without success. So after several hours i
almost got there but not the point i would like, hoping somebody can
point me in the right direction.
Here is an example i am dealing with:

01xxxxxxxxxxxxxx
|-:
<20>ABCD
<30>edfghi212
-|
|-:
<20>EFGH
<30>hjkli3232
-|
89xxxxxxxxxxxxx

I need to match anything that is enclosed in between "|-:" and "-|"
So far i've got "/^\{\|-:$.*^-\|$/m" , this one is greedy, returning the
complete set, instead of each match, i just haven't figure out how to
make it reluctant enough to return one by one.

The returned matches expected must be something like this:

1:
|-:
<20>ABCD
<30>edfghi212
-|

2:
|-:
<20>EFGH
<30>hjkli3232
-|

Currently is returning:
1:

|-:
<20>ABCD
<30>edfghi212
-|
|-:
<20>EFGH
<30>hjkli3232
-|


Any suggestion is greatly appreciated.
And finally, any good regular expressions book ?? =)

Cheers,
guillermo
 
J

Jesús Gabriel y Galán

Hi,

I have been trying to create a solid regular expression to match a
possible multi-line expression without success. So after several hours i
almost got there but not the point i would like, hoping somebody can
point me in the right direction.
Here is an example i am dealing with:

01xxxxxxxxxxxxxx
|-:
<20>ABCD
<30>edfghi212
-|
|-:
<20>EFGH
<30>hjkli3232
-|
89xxxxxxxxxxxxx

I need to match anything that is enclosed in between "|-:" and "-|"
So far i've got "/^\{\|-:$.*^-\|$/m" , this one is greedy, returning the
complete set, instead of each match, i just haven't figure out how to
make it reluctant enough to return one by one.

The returned matches expected must be something like this:

1:
|-:
<20>ABCD
<30>edfghi212
-|

2:
|-:
<20>EFGH
<30>hjkli3232
-|

Currently is returning:
1:

|-:
<20>ABCD
<30>edfghi212
-|
|-:
<20>EFGH
<30>hjkli3232
-|


Any suggestion is greatly appreciated.
And finally, any good regular expressions book ?? =)

irb(main):018:0> s =<<EOF
irb(main):019:0" 01xxxxxxxxxxxxxx
irb(main):020:0" |-:
irb(main):021:0" <20>ABCD
irb(main):022:0" <30>edfghi212
irb(main):023:0" -|
irb(main):024:0" |-:
irb(main):025:0" <20>EFGH
irb(main):026:0" <30>hjkli3232
irb(main):027:0" -|
irb(main):028:0" 89xxxxxxxxxxxxx
irb(main):029:0" EOF
=> "01xxxxxxxxxxxxxx\n|-:\n<20>ABCD\n<30>edfghi212\n-|\n|-:\n<20>EFGH\n<30>hjkli3232\n-|\n89xxxxxxxxxxxxx\n"
irb(main):036:0> s.scan(/(\|-:.*?-\|)/m)
=> [["|-:\n<20>ABCD\n<30>edfghi212\n-|"], ["|-:\n<20>EFGH\n<30>hjkli3232\n-|"]]

Jesus.
 
A

Ammar Ali

I need to match anything that is enclosed in between "|-:" and "-|"
So far i've got "/^\{\|-:$.*^-\|$/m" , this one is greedy, returning the
complete set, instead of each match, i just haven't figure out how to
make it reluctant enough to return one by one.

If you're using ruby 1.9 you can do us that, *? is the reluctant version of *:

/^\|-:$.*?^-\|$/m

Note that I removed the '\{' from your original pattern. It is not
needed in this case.

If you're on 1.8, one possibility is:

/^\|-:$(?m:.*?)(?=^-\|)^-\|$/

But there are other, probably more efficient, ways to do it as well.
And finally, any good regular expressions book ?? =)

Jeffrey Friedl's Mastering Regular Expressions is an excellent read
and covers regular expressions inside and out, literally. Here's the
amazon link:

http://oreilly.com/catalog/9780596528126

HTH,
Ammar
 
A

Ammar Ali

If you're using ruby 1.9 you can do us that, *? is the reluctant version = of *:

=C2=A0/^\|-:$.*?^-\|$/m
---8<---

If you're on 1.8, one possibility is:

=C2=A0/^\|-:$(?m:.*?)(?=3D^-\|)^-\|$/

I was under the impression that the reluctant versions of the four
quantifiers was only available under ruby 1.9, but they are apparently
available under 1.8 as well. I used it in the example I showed for 1.8
without noticing.

Regards,
Ammar
 
G

Guillermo Riojas

t =

#962571:
irb(main):018:0> s =3D<<EOF
irb(main):019:0" 01xxxxxxxxxxxxxx
irb(main):020:0" |-:
irb(main):021:0" <20>ABCD
irb(main):022:0" <30>edfghi212
irb(main):023:0" -|
irb(main):024:0" |-:
irb(main):025:0" <20>EFGH
irb(main):026:0" <30>hjkli3232
irb(main):027:0" -|
irb(main):028:0" 89xxxxxxxxxxxxx
irb(main):029:0" EOF
=3D>
=

01xxxxxxxxxxxxxx\n|-:\n said:
irb(main):036:0> s.scan(/(\|-:.*?-\|)/m)
=3D> [["|-:\n<20>ABCD\n<30>edfghi212\n-|"],
["|-:\n<20>EFGH\n<30>hjkli3232\n-|"]]

Jesus.


Muchas gracias =3D)
works like a charm
guillermo.

-- =

Posted via http://www.ruby-forum.com/.=
 
G

Guillermo Riojas

Ammar Ali wrote in post #962576:
I was under the impression that the reluctant versions of the four
quantifiers was only available under ruby 1.9, but they are apparently
available under 1.8 as well. I used it in the example I showed for 1.8
without noticing.

Regards,
Ammar

Thanks for clarification Ammar, it works perfectly, and also for the =

book recommendation very useful , thanks a lot

guillermo

-- =

Posted via http://www.ruby-forum.com/.=
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,521
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top