regex dynamic count modifier {min, max} ?

Discussion in 'Ruby' started by jOhn, Feb 8, 2008.

  1. jOhn

    jOhn Guest

    [Note: parts of this message were removed to make it a legal post.]

    Here is an idea and tell me if it could be accomplished by some other means.


    To parse a logic statement like this:

    fn:function3(fn:function2(fn:function1(xargs)))

    Using a regex sorta like so to grep the function-start-pattern(s) :

    /\A((fn\:[\w\-]+)[ ]*\([ ]*)+\z/i

    It would be nice to ensure the proper count of ')', without confusion if say
    the xargs had ')' literal or escaped string value(s) in there.

    One way is to provide a count ref for function-start-pattern, so I could
    then group a pattern for match on post-xargs ')' and force the {min,max}
    count by some backref to count-of-(function-start-pattern) =X and put that
    in there for the (function-end-pattern)+{X,X}.

    Then it might be something like this (ignore the lack of a match on possible
    xargs for now) :

    /\A((fn\:[\w\-]+)[ ]*\([ ]*)+[ ]*(\)){$#1,$#1}\z/i

    Where $#1 would be the count ref of the first group etc. Then there would be
    matching count-left-side-( and count-right-side-).

    Or I don't understand enuf about the internals of regex to know that this is
    impossible.

    -ntcm
    jOhn, Feb 8, 2008
    #1
    1. Advertising

  2. jOhn

    t3chn0n3rd Guest

    On Feb 7, 7:02 pm, jOhn <> wrote:
    > [Note:  parts of this message were removed to make it a legal post.]
    >
    > Here is an idea and tell me if it could be accomplished by some other means.
    >
    > To parse a logic statement like this:
    >
    > fn:function3(fn:function2(fn:function1(xargs)))
    >
    > Using a regex sorta like so to grep the function-start-pattern(s) :
    >
    > /\A((fn\:[\w\-]+)[ ]*\([ ]*)+\z/i
    >
    > It would be nice to ensure the proper count of ')', without confusion if say
    > the xargs had ')' literal or escaped string value(s) in there.
    >
    > One way is to provide a count ref for function-start-pattern, so I could
    > then group a pattern for match on post-xargs ')' and force the {min,max}
    > count by some backref to count-of-(function-start-pattern) =X and put that
    > in there for the (function-end-pattern)+{X,X}.
    >
    > Then it might be something like this (ignore the lack of a match on possible
    > xargs for now) :
    >
    > /\A((fn\:[\w\-]+)[ ]*\([ ]*)+[ ]*(\)){$#1,$#1}\z/i
    >
    > Where $#1 would be the count ref of the first group etc. Then there would be
    > matching count-left-side-( and count-right-side-).
    >
    > Or I don't understand enuf about the internals of regex to know that this is
    > impossible.
    >
    > -ntcm


    Could you use the awk statement to further parse here?
    t3chn0n3rd, Feb 8, 2008
    #2
    1. Advertising

  3. 2008/2/8, jOhn <>:
    > Here is an idea and tell me if it could be accomplished by some other means.
    >
    >
    > To parse a logic statement like this:
    >
    > fn:function3(fn:function2(fn:function1(xargs)))
    >
    > Using a regex sorta like so to grep the function-start-pattern(s) :
    >
    > /\A((fn\:[\w\-]+)[ ]*\([ ]*)+\z/i
    >
    > It would be nice to ensure the proper count of ')', without confusion if say
    > the xargs had ')' literal or escaped string value(s) in there.
    >
    > One way is to provide a count ref for function-start-pattern, so I could
    > then group a pattern for match on post-xargs ')' and force the {min,max}
    > count by some backref to count-of-(function-start-pattern) =X and put that
    > in there for the (function-end-pattern)+{X,X}.
    >
    > Then it might be something like this (ignore the lack of a match on possible
    > xargs for now) :
    >
    > /\A((fn\:[\w\-]+)[ ]*\([ ]*)+[ ]*(\)){$#1,$#1}\z/i
    >
    > Where $#1 would be the count ref of the first group etc. Then there would be
    > matching count-left-side-( and count-right-side-).
    >
    > Or I don't understand enuf about the internals of regex to know that this is
    > impossible.


    Parsing nested structures is not possible with standard regular
    expressions. IIRC they added something to Perl regexps to do that and
    it may be possible with Ruby 1.9; but I do not know the 1.9 regexp
    engine good enough to answer that off the top of my head.

    So the usual approach is to use a context free grammar and parser.
    You can find parser generators in the RAA.

    If you just want to ensure counts match you could do something like this:

    raise "brackets do not match!" if
    str.scan(/\(/).size != str.scan(/\)/).size

    However, this does not ensure proper nesting. I bit more sophisticated:

    c = 0
    str.scan /[()]/ do |m|
    case m
    when "("
    c += 1
    when ")"
    c -= 1
    raise "Mismatch at '#$`'" if c < 0
    else
    raise "Programming error"
    end
    end
    raise "Mismatch" unless c == 0

    But now you get pretty close to a decent parser. :)

    Kind regards

    robert

    --
    use.inject do |as, often| as.you_can - without end
    Robert Klemme, Feb 8, 2008
    #3
  4. jOhn

    tho_mica_l Guest

    > To parse a logic statement like this:
    >
    > fn:function3(fn:function2(fn:function1(xargs)))


    Are you looking for something like this (ruby19):

    def get_fns(string, count=0)
    m = /(?<fn>
    fn:[\w\-]+
    \s*\(\s*
    (\g<fn>|[^)]*)
    \s*\)
    )\s*/xi.match(string)
    if m
    n = /^(fn:[\w\-]+)(\s*)\((.*?)\)$/.match(m['fn'])
    puts "FN #{count}: #{n[1]} with args #{n[3]}"
    get_fns(n[3], count + 1)
    end
    end
    a = "foo fn:function3(fn:function2(fn:function1(xargs))) foo(bar) bar"
    get_fns(a)

    > a = "foo fn:function3(fn:function2(fn:function1(xargs))) foo(bar) bar"
    > get_fns(a)

    FN 0: fn:function3 with args fn:function2(fn:function1(xargs))
    FN 1: fn:function2 with args fn:function1(xargs)
    FN 2: fn:function1 with args xargs
    => nil

    Regards,
    Thomas.
    tho_mica_l, Feb 8, 2008
    #4
  5. jOhn

    jOhn Guest

    [Note: parts of this message were removed to make it a legal post.]

    wow good job thomas.

    On Feb 8, 2008 1:19 AM, tho_mica_l <> wrote:

    > > To parse a logic statement like this:
    > >
    > > fn:function3(fn:function2(fn:function1(xargs)))

    >
    > Are you looking for something like this (ruby19):
    >
    > def get_fns(string, count=0)
    > m = /(?<fn>
    > fn:[\w\-]+
    > \s*\(\s*
    > (\g<fn>|[^)]*)
    > \s*\)
    > )\s*/xi.match(string)
    > if m
    > n = /^(fn:[\w\-]+)(\s*)\((.*?)\)$/.match(m['fn'])
    > puts "FN #{count}: #{n[1]} with args #{n[3]}"
    > get_fns(n[3], count + 1)
    > end
    > end
    > a = "foo fn:function3(fn:function2(fn:function1(xargs))) foo(bar) bar"
    > get_fns(a)
    >
    > > a = "foo fn:function3(fn:function2(fn:function1(xargs))) foo(bar) bar"
    > > get_fns(a)

    > FN 0: fn:function3 with args fn:function2(fn:function1(xargs))
    > FN 1: fn:function2 with args fn:function1(xargs)
    > FN 2: fn:function1 with args xargs
    > => nil
    >
    > Regards,
    > Thomas.
    >
    >
    >
    jOhn, Feb 8, 2008
    #5
  6. jOhn

    jOhn Guest

    [Note: parts of this message were removed to make it a legal post.]

    I modified slightly to avoid parenthesis within quotes or double quotes

    def get_fns(string, count=0)
    m = /(?<fn>
    fn:[\w\-]+
    \s*\(\s*
    (\g<fn>|(".+")*('.+')*[^)]*)
    \s*\)
    )\s*/xi.match(string)
    if m
    n = /^(fn:[\w\-]+)(\s*)\((.*?)\)$/.match(m['fn'])
    puts "FN #{count}: #{n[1]} with args #{n[3]}"
    get_fns(n[3], count + 1)
    end
    end


    On Feb 8, 2008 7:39 AM, jOhn <> wrote:

    > wow good job thomas.
    >
    > On Feb 8, 2008 1:19 AM, tho_mica_l <> wrote:
    >
    > > > To parse a logic statement like this:
    > > >
    > > > fn:function3(fn:function2(fn:function1(xargs)))

    > >
    > > Are you looking for something like this (ruby19):
    > >
    > > def get_fns(string, count=0)
    > > m = /(?<fn>
    > > fn:[\w\-]+
    > > \s*\(\s*
    > > (\g<fn>|[^)]*)
    > > \s*\)
    > > )\s*/xi.match(string)
    > > if m
    > > n = /^(fn:[\w\-]+)(\s*)\((.*?)\)$/.match(m['fn'])
    > > puts "FN #{count}: #{n[1]} with args #{n[3]}"
    > > get_fns(n[3], count + 1)
    > > end
    > > end
    > > a = "foo fn:function3(fn:function2(fn:function1(xargs))) foo(bar) bar"
    > > get_fns(a)
    > >
    > > > a = "foo fn:function3(fn:function2(fn:function1(xargs))) foo(bar) bar"
    > > > get_fns(a)

    > > FN 0: fn:function3 with args fn:function2(fn:function1(xargs))
    > > FN 1: fn:function2 with args fn:function1(xargs)
    > > FN 2: fn:function1 with args xargs
    > > => nil
    > >
    > > Regards,
    > > Thomas.
    > >
    > >
    > >

    >
    jOhn, Feb 8, 2008
    #6
  7. jOhn

    ThoML Guest

    ThoML, Feb 9, 2008
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Patrick

    Max Min

    Patrick, Jul 29, 2004, in forum: VHDL
    Replies:
    1
    Views:
    3,001
    Jonathan Bromley
    Jul 29, 2004
  2. Lois
    Replies:
    1
    Views:
    3,182
    Ryan Stewart
    Dec 27, 2004
  3. juergen
    Replies:
    3
    Views:
    558
    opalinski from opalpaweb
    Sep 20, 2006
  4. Albert Hopkins

    When is min(a, b) != min(b, a)?

    Albert Hopkins, Jan 21, 2008, in forum: Python
    Replies:
    31
    Views:
    811
    Albert van der Horst
    Feb 4, 2008
  5. carmen

    Converting hrs and min to just min

    carmen, Aug 10, 2004, in forum: ASP General
    Replies:
    4
    Views:
    120
    carmen
    Aug 10, 2004
Loading...

Share This Page