Regexp/scan question

P

Peter Szinek

Hello,

I need to match a chunk of code like this:

....
....
#begin here
...}
......end
...}
......}
.....end
...
...

I need to match from "the #begin here" up to the n-th closing token
(i.e. '}' or 'end'). n can be arbitrary (let's consider that it is
meaningful, i.e. there are no more '}' + 'end's than n.

Example
match_stuff(2):

#begin here
...}
......end

match_stuff(4):

#begin here
...}
......end
...}
......}

etc.

What's the most optimal way to accomplish this? I have been trying with
scan() but I did not really succeed yet

TIA,
Peter

__
http://www.rubyrailways.com
 
C

Carlos

Peter said:
Hello,

I need to match a chunk of code like this:

....
....
#begin here
...}
......end
...}
......}
.....end
...
...

I need to match from "the #begin here" up to the n-th closing token
(i.e. '}' or 'end'). n can be arbitrary (let's consider that it is
meaningful, i.e. there are no more '}' + 'end's than n.

n = 4
text =~ /#begin(.*(\}|end)){#{n}}/m

?

(not tested).
 
R

Robert Klemme

n = 4
text =~ /#begin(.*(\}|end)){#{n}}/m

?

(not tested).

IMHO this does not work because of the greedy ".*". You could try with
reluctant, i.e. ".*?". Also the grouping does not catch the whole sequence.

robert
 
P

Peter Szinek

Carlos said:
n = 4
text =~ /#begin(.*(\}|end)){#{n}}/m

Sorry, I need to 'scan' it. I have been playing around with similar
regexps, but they did not work out. E.g. also yours:

irb(main):007:0> text = '.... #begin aaaa end bbb } ccc end ddd'
=> ".... #begin aaaa end bbb } ccc end ddd"
irb(main):008:0> n = 2
=> 2
irb(main):009:0> text.scan(/#begin(.*(\}|end)){#{n}}/m)
=> [[" ccc end", "end"]]

does not work with scan...

Cheers,
Peter

__
http://www.rubyrailways.com
 
P

Peter Szinek

IMHO this does not work because of the greedy ".*". You could try with
reluctant, i.e. ".*?". Also the grouping does not catch the whole
sequence.

Yeah, I tried to correct these problems but I am still not quite there...

Carlos' regexp, vol 2 (with greedy ?)

irb(main):007:0> text = '.... #begin aaaa end bbb } ccc end ddd'
=> ".... #begin aaaa end bbb } ccc end ddd"
irb(main):008:0> n = 2
=> 2
irb(main):009:0> text.scan(/#begin(.*?(\}|end)){#{n}}/m)
=> [[" ccc end", "end"]]

And I would like to get

[["#begin aaaa end bbb }"]]

OK, I know that I did not specify the problem correctly for the first
time, maybe now it is more clear...

Cheers,
Peter

__
http://www.rubyrailways.com
 
C

Carlos

Peter said:
Carlos said:
n = 4
text =~ /#begin(.*(\}|end)){#{n}}/m


Sorry, I need to 'scan' it. I have been playing around with similar
regexps, but they did not work out. E.g. also yours:

irb(main):007:0> text = '.... #begin aaaa end bbb } ccc end ddd'
=> ".... #begin aaaa end bbb } ccc end ddd"
irb(main):008:0> n = 2
=> 2
irb(main):009:0> text.scan(/#begin(.*(\}|end)){#{n}}/m)
=> [[" ccc end", "end"]]

does not work with scan...

To make it work with scan just make the parens non-capturing:

irb(main):001:0> text = "#begin aaa end bbb } ccc } #begin ddd end eee
end fff"
=> "#begin aaa end bbb } ccc } #begin ddd end eee end fff"
irb(main):002:0> text.scan(/#begin(?:.*?(?:\}|end)){2}/m)
=> ["#begin aaa end bbb }", "#begin ddd end eee end"]

Good luck.
--
 
P

Peter Szinek

To make it work with scan just make the parens non-capturing:
irb(main):001:0> text = "#begin aaa end bbb } ccc } #begin ddd end eee
end fff"
=> "#begin aaa end bbb } ccc } #begin ddd end eee end fff"
irb(main):002:0> text.scan(/#begin(?:.*?(?:\}|end)){2}/m)
=> ["#begin aaa end bbb }", "#begin ddd end eee end"]

Ha! That was the trick I have been looking for! Muchas Gracias, Carlos.

Cheers,
Peter

__
http://www.rubyrailways.com
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top