too greedy of a regexp

Dave Rose · Nov 9, 2006

i have a regexp: /(^BillHead(.*))(^Bill_End(.*))/m that's too greedy for
processing
a billing extract file containing:
BillHead...<<<much information here>>\n
<<one or more detail lines here\n>>
Bill_End...<<<much information here>>\n
BillHead...<<<much information here>>\n
<<one or more detail lines here\n>>
Bill_End...<<<much information here>>\n
...etc.... to EOF....

..i get the whole file matched....i just want each invoice...
it will eventually be in a oneliner like
a=File.read("billfile").scan(regexp)

so what is the non-greedy way for the above regexp to properly match
each invoice...

Jan Svitok · Nov 9, 2006

i have a regexp: /(^BillHead(.*))(^Bill_End(.*))/m that's too greedy for
processing
a billing extract file containing:
BillHead...<<<much information here>>\n
<<one or more detail lines here\n>>
Bill_End...<<<much information here>>\n
BillHead...<<<much information here>>\n
<<one or more detail lines here\n>>
Bill_End...<<<much information here>>\n
...etc.... to EOF....

..i get the whole file matched....i just want each invoice...
it will eventually be in a oneliner like
a=File.read("billfile").scan(regexp)

so what is the non-greedy way for the above regexp to properly match
each invoice...

try:

/(^BillHead(.*?))(^Bill_End(.*?))\n/m

or

/(^BillHead(.*?))(^Bill_End([^\n].*))\n/m

notice the .*? instead of .*

*? has some pecularities, that were discussed here some time ago, so
perhaps you'd want to find them in the archives. (search for 'greedy'
or 'regex' - I don't remeber now)

Robert Klemme · Nov 9, 2006

Jan said:
i have a regexp: /(^BillHead(.*))(^Bill_End(.*))/m that's too greedy for
processing
a billing extract file containing:
BillHead...<<<much information here>>\n
<<one or more detail lines here\n>>
Bill_End...<<<much information here>>\n
BillHead...<<<much information here>>\n
<<one or more detail lines here\n>>
Bill_End...<<<much information here>>\n
...etc.... to EOF....

..i get the whole file matched....i just want each invoice...
it will eventually be in a oneliner like
a=File.read("billfile").scan(regexp)

so what is the non-greedy way for the above regexp to properly match
each invoice...

Click to expand...

try:

/(^BillHead(.*?))(^Bill_End(.*?))\n/m

or

/(^BillHead(.*?))(^Bill_End([^\n].*))\n/m

notice the .*? instead of .*

*? has some pecularities, that were discussed here some time ago, so
perhaps you'd want to find them in the archives. (search for 'greedy'
or 'regex' - I don't remeber now)

I would also remove the last .* because that likely eats up the rest of
the document. So that would be

/^BillHead(.*?)^BillEnd/m

Another approach is to do

s.split(/^(Bill(?:Head|End))/m)

and then go through the array.

irb(main):006:0> "BillHead\nfoo\nbar\nBillEnd".split(/^(Bill(?:Head|End))/m)
=> ["", "BillHead", "\nfoo\nbar\n", "BillEnd"]

Kind regards

robert

Dave Rose · Nov 9, 2006

Robert said:
Jan said:

...etc.... to EOF....

Click to expand...

/(^BillHead(.*?))(^Bill_End(.*?))\n/m

or

/(^BillHead(.*?))(^Bill_End([^\n].*))\n/m

notice the .*? instead of .*

Click to expand...

=> ["", "BillHead", "\nfoo\nbar\n", "BillEnd"]

Kind regards

robert

i played around in irb with a shorten extract file and found that:
b=File.read("drbilp.txt").scan(/(^BillHead(.*?))(^Bill_End(\d*)(\s*UBPBILP1\n)(.*?))/m)
works in that separates each invoice in an sub-array of size=6
in which b[x][0]+b[x][2] completes that task of reading,scanning
correctly
and puting all in a ruby 'container' that i can do an each on....thanx
dave

How to try a range of hex values in C# code ?	0	Nov 19, 2022
Regexp - start and end of line or string	1	Jan 16, 2011
How can I click on an entry of my Kendo/Telerik Grid and show more detailed information in a Detail view?	0	Sep 13, 2022
Quickie - Regexp for a string not at the beginning of the line	16	Oct 25, 2012
regexp exclusion search - find matches NOT ending with a string?	8	Jul 17, 2009
Newbie regexp question	5	Sep 15, 2006
questions of idiom	3	Jun 7, 2010
[RegExp] Making non-greedy; Escaping parentheses?	3	Sep 12, 2003

too greedy of a regexp

Dave Rose

Jan Svitok

Robert Klemme

Dave Rose

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads