Regexp match question on interpolated strings...

Richard Kilmer · Oct 5, 2004

If I had the source for a string:

"name = #{person.first_name+" "+person.last_name} ... Ok?"

And assuming I could find the first and last double quotes, how would I
parse out the #{ ... } with a regular expression since anything can fall
between the #{ ... } braces in a string?

Thanks in advance.

-rich

Joe Cheng · Oct 5, 2004

Richard said:
If I had the source for a string:

"name = #{person.first_name+" "+person.last_name} ... Ok?"

And assuming I could find the first and last double quotes, how would I
parse out the #{ ... } with a regular expression since anything can fall
between the #{ ... } braces in a string?

Hmmm, if I understand your question, and if you really knew where the
first and last double quotes were, you could calculate the number of
chars between them, and do something like this:

/#\{.*?\".{<number_of_chars>}\".*?}/

But it seems like if you want to be able to get more dynamic/flexible
than that, you really want to parse the expression for real--which is
something I believe regexes aren't powerful enough for. You'd either
have to write a parser by hand, or use something like racc:

http://i.loveruby.net/en/prog/racc.html

Brian Schröder · Oct 5, 2004

Richard said:
If I had the source for a string:

"name = #{person.first_name+" "+person.last_name} ... Ok?"

And assuming I could find the first and last double quotes, how would I
parse out the #{ ... } with a regular expression since anything can fall
between the #{ ... } braces in a string?

Thanks in advance.

-rich

Regular expressions are not able to "count" more than a finite number of
states, and the number of states is fixed at compile time. That is
because regular expressions map to finite automata. So it is impossible
to match opening and closing braces in an unknown expression. For this
to work always you need a model that can enter unbounded many states.

But beware, your computer is also only a finite state machine with a lot
of states. The number of its states is bounded by the size of ram (and
harddisk).

If you are shure that there will be no closing braces inside of the
braces you could match
/\#\{(.*?)\}/ =~ string

or including at most one pair of inside braces

/\#\{([^\{}]*(\{.*?\}|).*?)\}/ =~ string

As you see it begins to get ugly now.

Regards,

Brian

James Edward Gray II · Oct 5, 2004

Regular expressions are not able to "count" more than a finite number
of states, and the number of states is fixed at compile time. That is
because regular expressions map to finite automata. So it is
impossible to match opening and closing braces in an unknown
expression. For this to work always you need a model that can enter
unbounded many states.

Just for the sake of clarity, you are speaking of Ruby's regular
expressions here. Perl's regex engine has no such limitation. Using
the (?? ... ) construct, Perl regular expressions can parse balanced
delimiters. I miss this feature and would love to see Ruby add
something similar in the future.

James Edward Gray II

James Edward Gray II · Oct 5, 2004

If I had the source for a string:

"name = #{person.first_name+" "+person.last_name} ... Ok?"

And assuming I could find the first and last double quotes, how would I
parse out the #{ ... } with a regular expression since anything can
fall
between the #{ ... } braces in a string?

I would use:

sub(/^(.+?)\#\(.+\}/m, '\1')

Hope that helps.

James Edward Gray II

ts · Oct 5, 2004

J> expressions here. Perl's regex engine has no such limitation. Using
J> the (?? ... ) construct, Perl regular expressions can parse balanced
^^

I've always find strange the choice for these 2 charcaters ...

J> delimiters. I miss this feature and would love to see Ruby add
J> something similar in the future.

This ?

svg% cat b.rb
#!ruby -rjj
["(aaa(bbbc)xxx)", "(aaa(bb(b)c)xxx)"].each do |m|
p $& if /(?<aa>\((?

?>[^()]+)|\g<aa>)*\))/ =~ m
end
/(?<aa>\((?

?>[^()]+)|\g<aa>)*\))/.dump
svg%

svg% ruby b.rb
"(aaa(bbbc)xxx)"
"(aaa(bb(b)c)xxx)"
Regexp /(?<aa>\((?

?>[^()]+)|\g<aa>)*\))/
0 call 2
1 jump 19
2 mem-start-push 1
3 exact1 (
4 push-if-peek-next ) ===> -1
5 null-check-start 0
6 push 13
7 cclass-not (-) (2)
8 push 12
9 cclass-not (-) (2)
10 pop
11 jump 8
12 jump 14
13 call 2
14 null-check-end-memst-push 0
15 jump 4
16 exact1 )
17 mem-end-rec 1
18 return
19 end
Optimize EXACT : (
svg%

Guy Decoux

Brian Schröder · Oct 5, 2004

James said:
I would use:

.sub(/^(.+?)\#\(.+\}/m, '\1')

This would be:
sub(/^(.+?)\#\{.+\}/m, '\1')
^
Why are you preferring the greedy match? And if I get it right this
substitutes
"name = #{person.first_name+" "+person.last_name} ... Ok?"
to
"name = ... Ok?"

I don't think that is what is asked? Or am I wrong?

regards,

Brian

Markus · Oct 5, 2004

Just for the sake of clarity, you are speaking of Ruby's regular
expressions here. Perl's regex engine has no such limitation. Using
the (?? ... ) construct, Perl regular expressions can parse balanced
delimiters. I miss this feature and would love to see Ruby add
something similar in the future.

I think Brian's point is true of regular expressions in general,
not any particular implementation. If the perl idiom you mention can in
fact do general purpose matching of unbounded depth, it doesn't mean
that "regular expressions" can do this, but rather that Larry has
implemented a more powerful parser and (incorrectly) called it "regular
expressions."

If this isn't clear, consider an analogy: if I write a language and
include a trailing-dot-digit idiom, such that 1.6 can be used as an
integer, does it mean that '1.6' is an now integer or that I've
implemented some form of reals numbers and mislabeled them 'integers'?

-- Markus

James Edward Gray II · Oct 5, 2004

This would be:
.sub(/^(.+?)\#\{.+\}/m, '\1')
^
Why are you preferring the greedy match?

If it's know there are no braces in the string save the #{ ... }, I
think it's much preferable. {}s are certainly allowed in Ruby code.

And if I get it right this substitutes
"name = #{person.first_name+" "+person.last_name} ... Ok?"
to
"name = ... Ok?"

I don't think that is what is asked? Or am I wrong?

Hmm, rereading the original message, I believe you are right. My
apologies.

James Edward Gray II

James Edward Gray II · Oct 5, 2004

J> delimiters. I miss this feature and would love to see Ruby add
J> something similar in the future.

This ?

svg% cat b.rb
#!ruby -rjj
["(aaa(bbbc)xxx)", "(aaa(bb(b)c)xxx)"].each do |m|
p $& if /(?<aa>$(??>[^()]+)|\g<aa>)*$)/ =~ m
end
/(?<aa>$(??>[^()]+)|\g<aa>)*$)/.dump

Wow. I can't decipher how, but that sure appears to work, though not
in my Ruby.

What is this magical "jj" library you loaded?

James Edward Gray II

ts · Oct 5, 2004

J> Wow. I can't decipher how, but that sure appears to work, though not
J> in my Ruby.

it's Oniguruma (the re engine for 1.9)

J> What is this magical "jj" library you loaded?

jj, is like ii, it want only work at moulon

Guy decoux

James Edward Gray II · Oct 5, 2004

J> Wow. I can't decipher how, but that sure appears to work, though
not
J> in my Ruby.

it's Oniguruma (the re engine for 1.9)

In that case, I guess my wishes have already been answered, I just
haven't caught up with the results yet. Thanks for the demonstration.
I'm looking forward to playing with Oniguruma...

James Edward Gray II

Richard Kilmer · Oct 5, 2004

I am working in an environment that is neither Ruby or Perl. The piece of
code looks something like this:

{ name = "Double Quoted String";
begin = "\"";
end = "\"";
foregroundColor = "#66CC33";
patterns = (
{ name = "Interpolated String";
match = "#\\{([^\\}]*)\\}";
foregroundColor = "#aaaaaa";
}
);
},

This is a syntax highlighting system for an editor. As you can see, you can
use either begin="regexp"; end="regexp" or match="regexp" and patterns can
be nested.

What I have works...assuming that the code inside the #{ ... } does not,
itself contain braces (which is limiting, I know).

-rich

RegExp - Match specific words, but not if they're inside parenthesis (with or without other words within)	6	Jan 29, 2023
regexp match and nil	2	Aug 26, 2008
Ruby regexp Match	4	Dec 4, 2007
Regexp to match strings that _don't_ being with a string	12	Mar 28, 2006
Can someone tell me what's wrong with this question on StackOverflow?	0	Aug 19, 2023
Multi-line regular expression match question	5	Nov 19, 2010
Regexp simple question	5	May 11, 2009
regexp(ing) Backus-Naurish expressions ...	7	Mar 13, 2013

Regexp match question on interpolated strings...

Richard Kilmer

Joe Cheng

Brian Schröder

James Edward Gray II

James Edward Gray II

ts

Brian Schröder

Markus

James Edward Gray II

James Edward Gray II

ts

James Edward Gray II

Richard Kilmer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads