Regexp match question on interpolated strings...

R

Richard Kilmer

If I had the source for a string:

"name = #{person.first_name+" "+person.last_name} ... Ok?"

And assuming I could find the first and last double quotes, how would I
parse out the #{ ... } with a regular expression since anything can fall
between the #{ ... } braces in a string?

Thanks in advance.

-rich
 
J

Joe Cheng

Richard said:
If I had the source for a string:

"name = #{person.first_name+" "+person.last_name} ... Ok?"

And assuming I could find the first and last double quotes, how would I
parse out the #{ ... } with a regular expression since anything can fall
between the #{ ... } braces in a string?

Hmmm, if I understand your question, and if you really knew where the
first and last double quotes were, you could calculate the number of
chars between them, and do something like this:

/#\{.*?\".{<number_of_chars>}\".*?}/

But it seems like if you want to be able to get more dynamic/flexible
than that, you really want to parse the expression for real--which is
something I believe regexes aren't powerful enough for. You'd either
have to write a parser by hand, or use something like racc:

http://i.loveruby.net/en/prog/racc.html
 
B

Brian Schröder

Richard said:
If I had the source for a string:

"name = #{person.first_name+" "+person.last_name} ... Ok?"

And assuming I could find the first and last double quotes, how would I
parse out the #{ ... } with a regular expression since anything can fall
between the #{ ... } braces in a string?

Thanks in advance.

-rich
Regular expressions are not able to "count" more than a finite number of
states, and the number of states is fixed at compile time. That is
because regular expressions map to finite automata. So it is impossible
to match opening and closing braces in an unknown expression. For this
to work always you need a model that can enter unbounded many states.

But beware, your computer is also only a finite state machine with a lot
of states. The number of its states is bounded by the size of ram (and
harddisk).

If you are shure that there will be no closing braces inside of the
braces you could match
/\#\{(.*?)\}/ =~ string

or including at most one pair of inside braces

/\#\{([^\{}]*(\{.*?\}|).*?)\}/ =~ string

As you see it begins to get ugly now.

Regards,

Brian
 
J

James Edward Gray II

Regular expressions are not able to "count" more than a finite number
of states, and the number of states is fixed at compile time. That is
because regular expressions map to finite automata. So it is
impossible to match opening and closing braces in an unknown
expression. For this to work always you need a model that can enter
unbounded many states.

Just for the sake of clarity, you are speaking of Ruby's regular
expressions here. Perl's regex engine has no such limitation. Using
the (?? ... ) construct, Perl regular expressions can parse balanced
delimiters. I miss this feature and would love to see Ruby add
something similar in the future.

James Edward Gray II
 
J

James Edward Gray II

If I had the source for a string:

"name = #{person.first_name+" "+person.last_name} ... Ok?"

And assuming I could find the first and last double quotes, how would I
parse out the #{ ... } with a regular expression since anything can
fall
between the #{ ... } braces in a string?

I would use:

sub(/^(.+?)\#\(.+\}/m, '\1')

Hope that helps.

James Edward Gray II
 
T

ts

J> expressions here. Perl's regex engine has no such limitation. Using
J> the (?? ... ) construct, Perl regular expressions can parse balanced
^^

I've always find strange the choice for these 2 charcaters ...

J> delimiters. I miss this feature and would love to see Ruby add
J> something similar in the future.

This ?

svg% cat b.rb
#!ruby -rjj
["(aaa(bbbc)xxx)", "(aaa(bb(b)c)xxx)"].each do |m|
p $& if /(?<aa>\((?:(?>[^()]+)|\g<aa>)*\))/ =~ m
end
/(?<aa>\((?:(?>[^()]+)|\g<aa>)*\))/.dump
svg%

svg% ruby b.rb
"(aaa(bbbc)xxx)"
"(aaa(bb(b)c)xxx)"
Regexp /(?<aa>\((?:(?>[^()]+)|\g<aa>)*\))/
0 call 2
1 jump 19
2 mem-start-push 1
3 exact1 (
4 push-if-peek-next ) ===> -1
5 null-check-start 0
6 push 13
7 cclass-not (-) (2)
8 push 12
9 cclass-not (-) (2)
10 pop
11 jump 8
12 jump 14
13 call 2
14 null-check-end-memst-push 0
15 jump 4
16 exact1 )
17 mem-end-rec 1
18 return
19 end
Optimize EXACT : (
svg%



Guy Decoux
 
B

Brian Schröder

James said:
I would use:

.sub(/^(.+?)\#\(.+\}/m, '\1')

This would be:
sub(/^(.+?)\#\{.+\}/m, '\1')
^
Why are you preferring the greedy match? And if I get it right this
substitutes
"name = #{person.first_name+" "+person.last_name} ... Ok?"
to
"name = ... Ok?"

I don't think that is what is asked? Or am I wrong?

regards,

Brian
 
M

Markus

Just for the sake of clarity, you are speaking of Ruby's regular
expressions here. Perl's regex engine has no such limitation. Using
the (?? ... ) construct, Perl regular expressions can parse balanced
delimiters. I miss this feature and would love to see Ruby add
something similar in the future.

I think Brian's point is true of regular expressions in general,
not any particular implementation. If the perl idiom you mention can in
fact do general purpose matching of unbounded depth, it doesn't mean
that "regular expressions" can do this, but rather that Larry has
implemented a more powerful parser and (incorrectly) called it "regular
expressions."

If this isn't clear, consider an analogy: if I write a language and
include a trailing-dot-digit idiom, such that 1.6 can be used as an
integer, does it mean that '1.6' is an now integer or that I've
implemented some form of reals numbers and mislabeled them 'integers'?

-- Markus
 
J

James Edward Gray II

This would be:
.sub(/^(.+?)\#\{.+\}/m, '\1')
^
Why are you preferring the greedy match?

If it's know there are no braces in the string save the #{ ... }, I
think it's much preferable. {}s are certainly allowed in Ruby code.
And if I get it right this substitutes
"name = #{person.first_name+" "+person.last_name} ... Ok?"
to
"name = ... Ok?"

I don't think that is what is asked? Or am I wrong?

Hmm, rereading the original message, I believe you are right. My
apologies.

James Edward Gray II
 
J

James Edward Gray II

J> delimiters. I miss this feature and would love to see Ruby add
J> something similar in the future.

This ?

svg% cat b.rb
#!ruby -rjj
["(aaa(bbbc)xxx)", "(aaa(bb(b)c)xxx)"].each do |m|
p $& if /(?<aa>\((?:(?>[^()]+)|\g<aa>)*\))/ =~ m
end
/(?<aa>\((?:(?>[^()]+)|\g<aa>)*\))/.dump

Wow. I can't decipher how, but that sure appears to work, though not
in my Ruby. ;) What is this magical "jj" library you loaded?

James Edward Gray II
 
T

ts

J> Wow. I can't decipher how, but that sure appears to work, though not
J> in my Ruby. ;)

it's Oniguruma (the re engine for 1.9)

J> What is this magical "jj" library you loaded?

jj, is like ii, it want only work at moulon :)


Guy decoux
 
J

James Edward Gray II

J> Wow. I can't decipher how, but that sure appears to work, though
not
J> in my Ruby. ;)

it's Oniguruma (the re engine for 1.9)

In that case, I guess my wishes have already been answered, I just
haven't caught up with the results yet. Thanks for the demonstration.
I'm looking forward to playing with Oniguruma...

James Edward Gray II
 
R

Richard Kilmer

I am working in an environment that is neither Ruby or Perl. The piece of
code looks something like this:

{ name = "Double Quoted String";
begin = "\"";
end = "\"";
foregroundColor = "#66CC33";
patterns = (
{ name = "Interpolated String";
match = "#\\{([^\\}]*)\\}";
foregroundColor = "#aaaaaa";
}
);
},

This is a syntax highlighting system for an editor. As you can see, you can
use either begin="regexp"; end="regexp" or match="regexp" and patterns can
be nested.

What I have works...assuming that the code inside the #{ ... } does not,
itself contain braces (which is limiting, I know).

-rich
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,534
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top