regex extension to handle matching parens?

I

ivo welch

Dear Experts: I am very new to ruby, literally having just read the
ruby book.

I want to write a program that does basic LaTeX parsing, so I need to
match '}' closings to the opening '{'. (yes, I understand that LaTeX
has very messy syntax, so this will only work for certain LaTeX docs.)
Does a gem exist that facilitates closing-paren-matching fairly
painlessly? For example,

sample = " \caption{my table \label{table-label} example:
$\sqrt{2+\sqrt{2}}$} more here {}"

so, I want to find my "\caption" matcher ruby program to be able to
detect the closing paren, and provide me with everything in between
the opener and closer (i.e., "my table \label{table-label} example:
$\sqrt{2+\sqrt{2}}$"). Possible?

I searched this mailing list first, but I only found discussions from
years back about this issue. I understand that this is not strictly
speaking a regular expression. I come from a perl background. There
are now some regex extension libraries that make it possible for the
built-in regex engine to parse matching parens
(Regexp::Common::balanced and Text::Balanced). I was hoping I could
find some similar gem for ruby.

help appreciated.

Sincerely,

/iaw
 
R

Rob Biedenharn

I think that you need to look at what Oniguruma might be able to do.
http://oniguruma.rubyforge.org/

I believe I've seen it demonstrated that balanced open/close pairs can
be found with this regular expression engine. It might be ugly,
however, but then you probably expected that.

-Rob
 
I

ivowel

thank you, rob. great reference. now I know that it can be done.
alas, this doc is a little over my head. can someone who has used
this construct possibly please show me how I would try it on my simple
example?

sample = " \caption{my table \label{table-label} example: $\sqrt{2+
\sqrt{2}}$} more here {}"


accomplishing this is actually not ugly at all in perl:

use Regexp::Common;
my $matchingarg = qr/$RE{balanced}{-parens=>'{ }'})/;
/\\caption$matchingarg/;
print "The \\caption argument is $1\n";

of course, perl is ugly in many other respects, but here, it does
nicely.

regards, /iaw
 
W

William James

ivowel said:
thank you, rob. great reference. now I know that it can be done.
alas, this doc is a little over my head. can someone who has used
this construct possibly please show me how I would try it on my simple
example?

sample = " \caption{my table \label{table-label} example: $\sqrt{2+
\sqrt{2}}$} more here {}"


accomplishing this is actually not ugly at all in perl:

use Regexp::Common;
my $matchingarg = qr/$RE{balanced}{-parens=>'{ }'})/;
/\\caption$matchingarg/;
print "The \\caption argument is $1\n";

of course, perl is ugly in many other respects, but here, it does
nicely.

regards, /iaw


sample = " \\caption{my table \\label{table-label}
example: $\\sqrt{2+\\sqrt{2}}$} more here {}"


def bal_fences str
left = str[0,1]
fences = /[#{Regexp.escape "(){}[]<>"[ /#{Regexp.escape left}./ ]}]/
accum = "" ; count = 0
str.scan( /.*?#{fences}/ ){|s|
count += if s[-1,1] == left ; 1 else -1 end
accum << s
break if 0 == count
}
accum
end


p bal_fences( sample[ /caption(.*)/m, 1 ] )
 
R

Robert Klemme

thank you, rob. great reference. now I know that it can be done.
alas, this doc is a little over my head. can someone who has used
this construct possibly please show me how I would try it on my simple
example?

sample = " \caption{my table \label{table-label} example: $\sqrt{2+
\sqrt{2}}$} more here {}"


accomplishing this is actually not ugly at all in perl:

use Regexp::Common;
my $matchingarg = qr/$RE{balanced}{-parens=>'{ }'})/;
/\\caption$matchingarg/;
print "The \\caption argument is $1\n";

of course, perl is ugly in many other respects, but here, it does
nicely.

Ugliness often means bad maintainability... I'd probably use a
different approach which also works with simpler regular expressions:

# untested
Node = Struct.new :parent, :children

current = root = Node.new nil, []
tokens = input.split(%r{([](){}])})

tokens.each do |token|
case token
when %r{[({]}
current = Node.new current, []
when %r{[])}]}
current = current.parent
else
current.children << token
end
end

In other words: build a rudimentary context free parser. Depends of
course on what you want to do.

Cheers

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top