too greedy of a regexp

Discussion in 'Ruby' started by Dave Rose, Nov 9, 2006.

  1. Dave Rose

    Dave Rose Guest

    i have a regexp: /(^BillHead(.*))(^Bill_End(.*))/m that's too greedy for
    processing
    a billing extract file containing:
    BillHead...<<<much information here>>\n
    <<one or more detail lines here\n>>
    Bill_End...<<<much information here>>\n
    BillHead...<<<much information here>>\n
    <<one or more detail lines here\n>>
    Bill_End...<<<much information here>>\n
    ...etc.... to EOF....

    ..i get the whole file matched....i just want each invoice...
    it will eventually be in a oneliner like
    a=File.read("billfile").scan(regexp)

    so what is the non-greedy way for the above regexp to properly match
    each invoice...

    --
    Posted via http://www.ruby-forum.com/.
     
    Dave Rose, Nov 9, 2006
    #1
    1. Advertising

  2. Dave Rose

    Jan Svitok Guest

    On 11/9/06, Dave Rose <> wrote:
    > i have a regexp: /(^BillHead(.*))(^Bill_End(.*))/m that's too greedy for
    > processing
    > a billing extract file containing:
    > BillHead...<<<much information here>>\n
    > <<one or more detail lines here\n>>
    > Bill_End...<<<much information here>>\n
    > BillHead...<<<much information here>>\n
    > <<one or more detail lines here\n>>
    > Bill_End...<<<much information here>>\n
    > ...etc.... to EOF....
    >
    > ..i get the whole file matched....i just want each invoice...
    > it will eventually be in a oneliner like
    > a=File.read("billfile").scan(regexp)
    >
    > so what is the non-greedy way for the above regexp to properly match
    > each invoice...


    try:

    /(^BillHead(.*?))(^Bill_End(.*?))\n/m

    or

    /(^BillHead(.*?))(^Bill_End([^\n].*))\n/m

    notice the .*? instead of .*

    *? has some pecularities, that were discussed here some time ago, so
    perhaps you'd want to find them in the archives. (search for 'greedy'
    or 'regex' - I don't remeber now)
     
    Jan Svitok, Nov 9, 2006
    #2
    1. Advertising

  3. Jan Svitok wrote:
    > On 11/9/06, Dave Rose <> wrote:
    >> i have a regexp: /(^BillHead(.*))(^Bill_End(.*))/m that's too greedy for
    >> processing
    >> a billing extract file containing:
    >> BillHead...<<<much information here>>\n
    >> <<one or more detail lines here\n>>
    >> Bill_End...<<<much information here>>\n
    >> BillHead...<<<much information here>>\n
    >> <<one or more detail lines here\n>>
    >> Bill_End...<<<much information here>>\n
    >> ...etc.... to EOF....
    >>
    >> ..i get the whole file matched....i just want each invoice...
    >> it will eventually be in a oneliner like
    >> a=File.read("billfile").scan(regexp)
    >>
    >> so what is the non-greedy way for the above regexp to properly match
    >> each invoice...

    >
    > try:
    >
    > /(^BillHead(.*?))(^Bill_End(.*?))\n/m
    >
    > or
    >
    > /(^BillHead(.*?))(^Bill_End([^\n].*))\n/m
    >
    > notice the .*? instead of .*
    >
    > *? has some pecularities, that were discussed here some time ago, so
    > perhaps you'd want to find them in the archives. (search for 'greedy'
    > or 'regex' - I don't remeber now)


    I would also remove the last .* because that likely eats up the rest of
    the document. So that would be

    /^BillHead(.*?)^BillEnd/m

    Another approach is to do

    s.split(/^(Bill(?:Head|End))/m)

    and then go through the array.

    irb(main):006:0> "BillHead\nfoo\nbar\nBillEnd".split(/^(Bill(?:Head|End))/m)
    => ["", "BillHead", "\nfoo\nbar\n", "BillEnd"]

    Kind regards

    robert
     
    Robert Klemme, Nov 9, 2006
    #3
  4. Dave Rose

    Dave Rose Guest

    Robert Klemme wrote:
    > Jan Svitok wrote:
    >>> ...etc.... to EOF....

    >> /(^BillHead(.*?))(^Bill_End(.*?))\n/m
    >>
    >> or
    >>
    >> /(^BillHead(.*?))(^Bill_End([^\n].*))\n/m
    >>
    >> notice the .*? instead of .*
    >>

    => ["", "BillHead", "\nfoo\nbar\n", "BillEnd"]
    >
    > Kind regards
    >
    > robert

    i played around in irb with a shorten extract file and found that:
    b=File.read("drbilp.txt").scan(/(^BillHead(.*?))(^Bill_End(\d*)(\s*UBPBILP1\n)(.*?))/m)
    works in that separates each invoice in an sub-array of size=6
    in which b[x][0]+b[x][2] completes that task of reading,scanning
    correctly
    and puting all in a ruby 'container' that i can do an each on....thanx
    dave


    --
    Posted via http://www.ruby-forum.com/.
     
    Dave Rose, Nov 9, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sam Pointon

    regexp non-greedy matching bug?

    Sam Pointon, Dec 4, 2005, in forum: Python
    Replies:
    8
    Views:
    367
    Fredrik Lundh
    Dec 5, 2005
  2. Tim Peters

    Re: regexp non-greedy matching bug?

    Tim Peters, Dec 4, 2005, in forum: Python
    Replies:
    0
    Views:
    391
    Tim Peters
    Dec 4, 2005
  3. Dan Kelly

    Greedy and non greedy quantifiers

    Dan Kelly, Jan 17, 2008, in forum: Ruby
    Replies:
    4
    Views:
    147
    Robert Klemme
    Jan 19, 2008
  4. Matt Garrish

    greedy v. non-greedy matching

    Matt Garrish, Feb 16, 2004, in forum: Perl Misc
    Replies:
    4
    Views:
    164
    Matt Garrish
    Feb 16, 2004
  5. bettyann

    regexp s// too greedy

    bettyann, Nov 11, 2004, in forum: Perl Misc
    Replies:
    10
    Views:
    183
    Gunnar Hjalmarsson
    Nov 14, 2004
Loading...

Share This Page