Regexp match question on interpolated strings...

Discussion in 'Ruby' started by Richard Kilmer, Oct 5, 2004.

  1. If I had the source for a string:

    "name = #{person.first_name+" "+person.last_name} ... Ok?"

    And assuming I could find the first and last double quotes, how would I
    parse out the #{ ... } with a regular expression since anything can fall
    between the #{ ... } braces in a string?

    Thanks in advance.

    -rich
    Richard Kilmer, Oct 5, 2004
    #1
    1. Advertising

  2. Richard Kilmer

    Joe Cheng Guest

    Richard Kilmer wrote:
    > If I had the source for a string:
    >
    > "name = #{person.first_name+" "+person.last_name} ... Ok?"
    >
    > And assuming I could find the first and last double quotes, how would I
    > parse out the #{ ... } with a regular expression since anything can fall
    > between the #{ ... } braces in a string?


    Hmmm, if I understand your question, and if you really knew where the
    first and last double quotes were, you could calculate the number of
    chars between them, and do something like this:

    /#\{.*?\".{<number_of_chars>}\".*?}/

    But it seems like if you want to be able to get more dynamic/flexible
    than that, you really want to parse the expression for real--which is
    something I believe regexes aren't powerful enough for. You'd either
    have to write a parser by hand, or use something like racc:

    http://i.loveruby.net/en/prog/racc.html
    Joe Cheng, Oct 5, 2004
    #2
    1. Advertising

  3. Richard Kilmer wrote:
    > If I had the source for a string:
    >
    > "name = #{person.first_name+" "+person.last_name} ... Ok?"
    >
    > And assuming I could find the first and last double quotes, how would I
    > parse out the #{ ... } with a regular expression since anything can fall
    > between the #{ ... } braces in a string?
    >
    > Thanks in advance.
    >
    > -rich
    >
    >
    >

    Regular expressions are not able to "count" more than a finite number of
    states, and the number of states is fixed at compile time. That is
    because regular expressions map to finite automata. So it is impossible
    to match opening and closing braces in an unknown expression. For this
    to work always you need a model that can enter unbounded many states.

    But beware, your computer is also only a finite state machine with a lot
    of states. The number of its states is bounded by the size of ram (and
    harddisk).

    If you are shure that there will be no closing braces inside of the
    braces you could match
    /\#\{(.*?)\}/ =~ string

    or including at most one pair of inside braces

    /\#\{([^\{}]*(\{.*?\}|).*?)\}/ =~ string

    As you see it begins to get ugly now.

    Regards,

    Brian
    --
    Brian Schröder
    http://ruby.brian-schroeder.de/
    Brian Schröder, Oct 5, 2004
    #3
  4. On Oct 5, 2004, at 4:09 AM, Brian Schröder wrote:

    > Regular expressions are not able to "count" more than a finite number
    > of states, and the number of states is fixed at compile time. That is
    > because regular expressions map to finite automata. So it is
    > impossible to match opening and closing braces in an unknown
    > expression. For this to work always you need a model that can enter
    > unbounded many states.


    Just for the sake of clarity, you are speaking of Ruby's regular
    expressions here. Perl's regex engine has no such limitation. Using
    the (?? ... ) construct, Perl regular expressions can parse balanced
    delimiters. I miss this feature and would love to see Ruby add
    something similar in the future.

    James Edward Gray II
    James Edward Gray II, Oct 5, 2004
    #4
  5. On Oct 4, 2004, at 10:27 PM, Richard Kilmer wrote:

    > If I had the source for a string:
    >
    > "name = #{person.first_name+" "+person.last_name} ... Ok?"
    >
    > And assuming I could find the first and last double quotes, how would I
    > parse out the #{ ... } with a regular expression since anything can
    > fall
    > between the #{ ... } braces in a string?


    I would use:

    sub(/^(.+?)\#\(.+\}/m, '\1')

    Hope that helps.

    James Edward Gray II
    James Edward Gray II, Oct 5, 2004
    #5
  6. Richard Kilmer

    ts Guest

    >>>>> "J" == James Edward Gray <> writes:

    J> expressions here. Perl's regex engine has no such limitation. Using
    J> the (?? ... ) construct, Perl regular expressions can parse balanced
    ^^

    I've always find strange the choice for these 2 charcaters ...

    J> delimiters. I miss this feature and would love to see Ruby add
    J> something similar in the future.

    This ?

    svg% cat b.rb
    #!ruby -rjj
    ["(aaa(bbbc)xxx)", "(aaa(bb(b)c)xxx)"].each do |m|
    p $& if /(?<aa>\((?:(?>[^()]+)|\g<aa>)*\))/ =~ m
    end
    /(?<aa>\((?:(?>[^()]+)|\g<aa>)*\))/.dump
    svg%

    svg% ruby b.rb
    "(aaa(bbbc)xxx)"
    "(aaa(bb(b)c)xxx)"
    Regexp /(?<aa>\((?:(?>[^()]+)|\g<aa>)*\))/
    0 call 2
    1 jump 19
    2 mem-start-push 1
    3 exact1 (
    4 push-if-peek-next ) ===> -1
    5 null-check-start 0
    6 push 13
    7 cclass-not (-) (2)
    8 push 12
    9 cclass-not (-) (2)
    10 pop
    11 jump 8
    12 jump 14
    13 call 2
    14 null-check-end-memst-push 0
    15 jump 4
    16 exact1 )
    17 mem-end-rec 1
    18 return
    19 end
    Optimize EXACT : (
    svg%



    Guy Decoux
    ts, Oct 5, 2004
    #6
  7. James Edward Gray II wrote:
    >
    > On Oct 4, 2004, at 10:27 PM, Richard Kilmer wrote:
    >
    >> If I had the source for a string:
    >>
    >> "name = #{person.first_name+" "+person.last_name} ... Ok?"
    >>
    >> And assuming I could find the first and last double quotes, how would I
    >> parse out the #{ ... } with a regular expression since anything can fall
    >> between the #{ ... } braces in a string?

    >
    >
    > I would use:
    >
    > .sub(/^(.+?)\#\(.+\}/m, '\1')


    This would be:
    sub(/^(.+?)\#\{.+\}/m, '\1')
    ^
    Why are you preferring the greedy match? And if I get it right this
    substitutes
    "name = #{person.first_name+" "+person.last_name} ... Ok?"
    to
    "name = ... Ok?"

    I don't think that is what is asked? Or am I wrong?

    regards,

    Brian

    >
    > Hope that helps.
    >
    > James Edward Gray II
    >
    >



    --
    Brian Schröder
    http://ruby.brian-schroeder.de/
    Brian Schröder, Oct 5, 2004
    #7
  8. Richard Kilmer

    Markus Guest

    On Tue, 2004-10-05 at 07:04, James Edward Gray II wrote:
    > On Oct 5, 2004, at 4:09 AM, Brian Schröder wrote:
    >
    > > Regular expressions are not able to "count" more than a finite number
    > > of states, and the number of states is fixed at compile time. That is
    > > because regular expressions map to finite automata. So it is
    > > impossible to match opening and closing braces in an unknown
    > > expression. For this to work always you need a model that can enter
    > > unbounded many states.

    >
    > Just for the sake of clarity, you are speaking of Ruby's regular
    > expressions here. Perl's regex engine has no such limitation. Using
    > the (?? ... ) construct, Perl regular expressions can parse balanced
    > delimiters. I miss this feature and would love to see Ruby add
    > something similar in the future.


    I think Brian's point is true of regular expressions in general,
    not any particular implementation. If the perl idiom you mention can in
    fact do general purpose matching of unbounded depth, it doesn't mean
    that "regular expressions" can do this, but rather that Larry has
    implemented a more powerful parser and (incorrectly) called it "regular
    expressions."

    If this isn't clear, consider an analogy: if I write a language and
    include a trailing-dot-digit idiom, such that 1.6 can be used as an
    integer, does it mean that '1.6' is an now integer or that I've
    implemented some form of reals numbers and mislabeled them 'integers'?

    -- Markus
    Markus, Oct 5, 2004
    #8
  9. On Oct 5, 2004, at 9:44 AM, Brian Schröder wrote:

    > This would be:
    > .sub(/^(.+?)\#\{.+\}/m, '\1')
    > ^
    > Why are you preferring the greedy match?


    If it's know there are no braces in the string save the #{ ... }, I
    think it's much preferable. {}s are certainly allowed in Ruby code.

    > And if I get it right this substitutes
    > "name = #{person.first_name+" "+person.last_name} ... Ok?"
    > to
    > "name = ... Ok?"
    >
    > I don't think that is what is asked? Or am I wrong?


    Hmm, rereading the original message, I believe you are right. My
    apologies.

    James Edward Gray II
    James Edward Gray II, Oct 5, 2004
    #9
  10. On Oct 5, 2004, at 9:25 AM, ts wrote:

    >>>>>> "J" == James Edward Gray <> writes:

    > J> delimiters. I miss this feature and would love to see Ruby add
    > J> something similar in the future.
    >
    > This ?
    >
    > svg% cat b.rb
    > #!ruby -rjj
    > ["(aaa(bbbc)xxx)", "(aaa(bb(b)c)xxx)"].each do |m|
    > p $& if /(?<aa>\((?:(?>[^()]+)|\g<aa>)*\))/ =~ m
    > end
    > /(?<aa>\((?:(?>[^()]+)|\g<aa>)*\))/.dump


    Wow. I can't decipher how, but that sure appears to work, though not
    in my Ruby. ;) What is this magical "jj" library you loaded?

    James Edward Gray II
    James Edward Gray II, Oct 5, 2004
    #10
  11. Richard Kilmer

    ts Guest

    >>>>> "J" == James Edward Gray <> writes:

    J> Wow. I can't decipher how, but that sure appears to work, though not
    J> in my Ruby. ;)

    it's Oniguruma (the re engine for 1.9)

    J> What is this magical "jj" library you loaded?

    jj, is like ii, it want only work at moulon :)


    Guy decoux
    ts, Oct 5, 2004
    #11
  12. On Oct 5, 2004, at 10:42 AM, ts wrote:

    >>>>>> "J" == James Edward Gray <> writes:

    >
    > J> Wow. I can't decipher how, but that sure appears to work, though
    > not
    > J> in my Ruby. ;)
    >
    > it's Oniguruma (the re engine for 1.9)


    In that case, I guess my wishes have already been answered, I just
    haven't caught up with the results yet. Thanks for the demonstration.
    I'm looking forward to playing with Oniguruma...

    James Edward Gray II
    James Edward Gray II, Oct 5, 2004
    #12
  13. I am working in an environment that is neither Ruby or Perl. The piece of
    code looks something like this:

    { name = "Double Quoted String";
    begin = "\"";
    end = "\"";
    foregroundColor = "#66CC33";
    patterns = (
    { name = "Interpolated String";
    match = "#\\{([^\\}]*)\\}";
    foregroundColor = "#aaaaaa";
    }
    );
    },

    This is a syntax highlighting system for an editor. As you can see, you can
    use either begin="regexp"; end="regexp" or match="regexp" and patterns can
    be nested.

    What I have works...assuming that the code inside the #{ ... } does not,
    itself contain braces (which is limiting, I know).

    -rich


    On 10/5/04 10:04 AM, "James Edward Gray II" <>
    wrote:

    > On Oct 5, 2004, at 4:09 AM, Brian Schröder wrote:
    >
    >> Regular expressions are not able to "count" more than a finite number
    >> of states, and the number of states is fixed at compile time. That is
    >> because regular expressions map to finite automata. So it is
    >> impossible to match opening and closing braces in an unknown
    >> expression. For this to work always you need a model that can enter
    >> unbounded many states.

    >
    > Just for the sake of clarity, you are speaking of Ruby's regular
    > expressions here. Perl's regex engine has no such limitation. Using
    > the (?? ... ) construct, Perl regular expressions can parse balanced
    > delimiters. I miss this feature and would love to see Ruby add
    > something similar in the future.
    >
    > James Edward Gray II
    >
    >
    >
    Richard Kilmer, Oct 5, 2004
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bjorn Sagbakken

    Ajax animation: Interpolated, question

    Bjorn Sagbakken, Nov 18, 2007, in forum: ASP .Net
    Replies:
    0
    Views:
    404
    Bjorn Sagbakken
    Nov 18, 2007
  2. Replies:
    9
    Views:
    104
  3. Old Echo
    Replies:
    1
    Views:
    174
    Adam Shelly
    Sep 4, 2008
  4. Tom
    Replies:
    6
    Views:
    123
  5. Replies:
    6
    Views:
    117
Loading...

Share This Page