string parsing / regexp question

R

Ryan Krauss

I need to parse the following string:

$$\pmatrix{{\it x_2}\cr 0\cr 1\cr }=\pmatrix{\left({{{\it m_2}\,s^2
}\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it m_2}\,s^2\,F
}\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it m_2}\,s^2}\over{k}}+1
\right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr }$$

The first thing I need to do is extract the arguments to \pmatrix{ }
on both the left and right hand sides of the equal sign, so that the
first argument is extracted as

{\it x_2}\cr 0\cr 1\cr

and the second is

\left({{{\it m_2}\,s^2
}\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it m_2}\,s^2\,F
}\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it m_2}\,s^2}\over{k}}+1
\right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr

The trick is that there are extra curly braces inside the \pmatrix{ }
strings and I don't know how to write a regexp that would count the
number of open and close curly braces and make sure they match, so
that it can find the correct ending curly brace.

Any suggestions?

I would prefer a regexp solution, but am open to other approaches.

Thanks,

Ryan
 
P

Paul McGuire

I need to parse the following string:

$$\pmatrix{{\it x_2}\cr 0\cr 1\cr }=\pmatrix{\left({{{\it m_2}\,s^2
}\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it m_2}\,s^2\,F
}\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it m_2}\,s^2}\over{k}}+1
\right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr }$$

The first thing I need to do is extract the arguments to \pmatrix{ }
on both the left and right hand sides of the equal sign, so that the
first argument is extracted as

{\it x_2}\cr 0\cr 1\cr

and the second is

\left({{{\it m_2}\,s^2
}\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it m_2}\,s^2\,F
}\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it m_2}\,s^2}\over{k}}+1
\right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr

The trick is that there are extra curly braces inside the \pmatrix{ }
strings and I don't know how to write a regexp that would count the
number of open and close curly braces and make sure they match, so
that it can find the correct ending curly brace.

As Tim Grove points out, writing a grammar for this expression is
really pretty simple, especially using the latest version of
pyparsing, which includes a new helper method, nestedExpr. Here is
the whole program to parse your example:

from pyparsing import *

data = r"""$$\pmatrix{{\it x_2}\cr 0\cr 1\cr }=
\pmatrix{\left({{{\it m_2}\,s^2
}\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it
m_2}\,s^2\,F
}\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it
m_2}\,s^2}\over{k}}+1
\right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr }$$"""

PMATRIX = Literal(r"\pmatrix")
nestedBraces = nestedExpr("{","}")
grammar = "$$" + PMATRIX + nestedBraces + "=" + \
PMATRIX + nestedBraces + \
"$$"
res = grammar.parseString(data)
print res

This prints the following:

['$$', '\\pmatrix', [['\\it', 'x_2'], '\\cr', '0\\cr', '1\\cr'], '=',
'\\pmatrix', ['\\left(', [[['\\it', 'm_2'], '\\,s^2'], '\\over',
['k']], '+1\\right)\\,', ['\\it', 'x_1'], '-', [['F'], '\\over',
['k']], '\\cr', '-', [[['\\it', 'm_2'], '\\,s^2\\,F'], '\\over',
['k']], '-F+\\left(', ['\\it', 'm_2'], '\\,s^2\\,\\left(', [[['\\it',
'm_2'], '\\,s^2'], '\\over', ['k']], '+1', '\\right)+', ['\\it',
'm_2'], '\\,s^2\\right)\\,', ['\\it', 'x_1'], '\\cr', '1\\cr'], '$$']

Okay, maybe this looks a bit messy. But believe it or not, the
returned results give you access to each grammar element as:

['$$', '\\pmatrix', [nested arg list], '=', '\\pmatrix',
[nestedArgList], '$$']

Not only has the parser handled the {} nesting levels, but it has
structured the returned tokens according to that nesting. (The '{}'s
are gone now, since their delimiting function has been replaced by the
nesting hierarchy in the results.)

You could use tuple assignment to get at the individual fields:
dummy,dummy,lhs_args,dummy,dummy,rhs_args,dummy = res

Or you could access the fields in res using list indexing:
lhs_args, rhs_args = res[2],res[5]

But both of these methods will break if you decide to extend the
grammar with additional or optional fields.

A safer approach is to give the grammar elements results names, as in
this slightly modified version of grammar:

grammar = "$$" + PMATRIX + nestedBraces("lhs_args") + "=" + \
PMATRIX + nestedBraces("rhs_args") + \
"$$"

Now you can access the parsed fields as if the results were a dict
with keys "lhs_args" and "rhs_args", or as an object with attributes
named "lhs_args" and "rhs_args":

res = grammar.parseString(data)
print res["lhs_args"]
print res["rhs_args"]
print res.lhs_args
print res.rhs_args

Note that the default behavior of nestedExpr is to give back a nested
list of the elements according to how the original text was nested
within braces.

If you just want the original text, add a parse action to nestedBraces
to do this for you (keepOriginalText is another pyparsing builtin).
The parse action is executed at parse time so that there is no post-
processing needed after the parsed results are returned:

nestedBraces.setParseAction(keepOriginalText)
grammar = "$$" + PMATRIX + nestedBraces("lhs_args") + "=" + \
PMATRIX + nestedBraces("rhs_args") + \
"$$"

res = grammar.parseString(data)
print res
print res.lhs_args
print res.rhs_args

Now this program returns the original text for the nested brace
expressions:

['$$', '\\pmatrix', '{{\\it x_2}\\cr 0\\cr 1\\cr }', '=', '\\pmatrix',
'{\\left({{{\\it m_2}\\,s^2 \n }\\over{k}}+1\\right)\\,{\\it x_1}-{{F}\
\over{k}}\\cr -{{{\\it m_2}\\,s^2\\,F \n }\\over{k}}-F+\\left({\\it
m_2}\\,s^2\\,\\left({{{\\it m_2}\\,s^2}\\over{k}}+1 \n \\right)+{\\it
m_2}\\,s^2\\right)\\,{\\it x_1}\\cr 1\\cr }', '$$']
['{{\\it x_2}\\cr 0\\cr 1\\cr }']
['{\\left({{{\\it m_2}\\,s^2 \n }\\over{k}}+1\\right)\\,{\\it x_1}-{{F}
\\over{k}}\\cr -{{{\\it m_2}\\,s^2\\,F \n }\\over{k}}-F+\\left({\\it
m_2}\\,s^2\\,\\left({{{\\it m_2}\\,s^2}\\over{k}}+1 \n \\right)+{\\it
m_2}\\,s^2\\right)\\,{\\it x_1}\\cr 1\\cr }']

You can find more info on pyparsing at http://pyparsing.wikispaces.com.

Cheers!
-- Paul
 
R

Ryan Krauss

Interesting. Thanks Paul and Tim. This looks very promising.

Ryan

I need to parse the following string:

$$\pmatrix{{\it x_2}\cr 0\cr 1\cr }=\pmatrix{\left({{{\it m_2}\,s^2
}\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it m_2}\,s^2\,F
}\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it m_2}\,s^2}\over{k}}+1
\right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr }$$

The first thing I need to do is extract the arguments to \pmatrix{ }
on both the left and right hand sides of the equal sign, so that the
first argument is extracted as

{\it x_2}\cr 0\cr 1\cr

and the second is

\left({{{\it m_2}\,s^2
}\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it m_2}\,s^2\,F
}\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it m_2}\,s^2}\over{k}}+1
\right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr

The trick is that there are extra curly braces inside the \pmatrix{ }
strings and I don't know how to write a regexp that would count the
number of open and close curly braces and make sure they match, so
that it can find the correct ending curly brace.

As Tim Grove points out, writing a grammar for this expression is
really pretty simple, especially using the latest version of
pyparsing, which includes a new helper method, nestedExpr. Here is
the whole program to parse your example:

from pyparsing import *

data = r"""$$\pmatrix{{\it x_2}\cr 0\cr 1\cr }=
\pmatrix{\left({{{\it m_2}\,s^2
}\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it
m_2}\,s^2\,F
}\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it
m_2}\,s^2}\over{k}}+1
\right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr }$$"""

PMATRIX = Literal(r"\pmatrix")
nestedBraces = nestedExpr("{","}")
grammar = "$$" + PMATRIX + nestedBraces + "=" + \
PMATRIX + nestedBraces + \
"$$"
res = grammar.parseString(data)
print res

This prints the following:

['$$', '\\pmatrix', [['\\it', 'x_2'], '\\cr', '0\\cr', '1\\cr'], '=',
'\\pmatrix', ['\\left(', [[['\\it', 'm_2'], '\\,s^2'], '\\over',
['k']], '+1\\right)\\,', ['\\it', 'x_1'], '-', [['F'], '\\over',
['k']], '\\cr', '-', [[['\\it', 'm_2'], '\\,s^2\\,F'], '\\over',
['k']], '-F+\\left(', ['\\it', 'm_2'], '\\,s^2\\,\\left(', [[['\\it',
'm_2'], '\\,s^2'], '\\over', ['k']], '+1', '\\right)+', ['\\it',
'm_2'], '\\,s^2\\right)\\,', ['\\it', 'x_1'], '\\cr', '1\\cr'], '$$']

Okay, maybe this looks a bit messy. But believe it or not, the
returned results give you access to each grammar element as:

['$$', '\\pmatrix', [nested arg list], '=', '\\pmatrix',
[nestedArgList], '$$']

Not only has the parser handled the {} nesting levels, but it has
structured the returned tokens according to that nesting. (The '{}'s
are gone now, since their delimiting function has been replaced by the
nesting hierarchy in the results.)

You could use tuple assignment to get at the individual fields:
dummy,dummy,lhs_args,dummy,dummy,rhs_args,dummy = res

Or you could access the fields in res using list indexing:
lhs_args, rhs_args = res[2],res[5]

But both of these methods will break if you decide to extend the
grammar with additional or optional fields.

A safer approach is to give the grammar elements results names, as in
this slightly modified version of grammar:

grammar = "$$" + PMATRIX + nestedBraces("lhs_args") + "=" + \
PMATRIX + nestedBraces("rhs_args") + \
"$$"

Now you can access the parsed fields as if the results were a dict
with keys "lhs_args" and "rhs_args", or as an object with attributes
named "lhs_args" and "rhs_args":

res = grammar.parseString(data)
print res["lhs_args"]
print res["rhs_args"]
print res.lhs_args
print res.rhs_args

Note that the default behavior of nestedExpr is to give back a nested
list of the elements according to how the original text was nested
within braces.

If you just want the original text, add a parse action to nestedBraces
to do this for you (keepOriginalText is another pyparsing builtin).
The parse action is executed at parse time so that there is no post-
processing needed after the parsed results are returned:

nestedBraces.setParseAction(keepOriginalText)
grammar = "$$" + PMATRIX + nestedBraces("lhs_args") + "=" + \
PMATRIX + nestedBraces("rhs_args") + \
"$$"

res = grammar.parseString(data)
print res
print res.lhs_args
print res.rhs_args

Now this program returns the original text for the nested brace
expressions:

['$$', '\\pmatrix', '{{\\it x_2}\\cr 0\\cr 1\\cr }', '=', '\\pmatrix',
'{\\left({{{\\it m_2}\\,s^2 \n }\\over{k}}+1\\right)\\,{\\it x_1}-{{F}\
\over{k}}\\cr -{{{\\it m_2}\\,s^2\\,F \n }\\over{k}}-F+\\left({\\it
m_2}\\,s^2\\,\\left({{{\\it m_2}\\,s^2}\\over{k}}+1 \n \\right)+{\\it
m_2}\\,s^2\\right)\\,{\\it x_1}\\cr 1\\cr }', '$$']
['{{\\it x_2}\\cr 0\\cr 1\\cr }']
['{\\left({{{\\it m_2}\\,s^2 \n }\\over{k}}+1\\right)\\,{\\it x_1}-{{F}
\\over{k}}\\cr -{{{\\it m_2}\\,s^2\\,F \n }\\over{k}}-F+\\left({\\it
m_2}\\,s^2\\,\\left({{{\\it m_2}\\,s^2}\\over{k}}+1 \n \\right)+{\\it
m_2}\\,s^2\\right)\\,{\\it x_1}\\cr 1\\cr }']

You can find more info on pyparsing at http://pyparsing.wikispaces.com.

Cheers!
-- Paul
 
R

Ryan Krauss

I need to parse the following string:

$$\pmatrix{{\it x_2}\cr 0\cr 1\cr }=\pmatrix{\left({{{\it m_2}\,s^2
}\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it m_2}\,s^2\,F
}\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it m_2}\,s^2}\over{k}}+1
\right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr }$$

The first thing I need to do is extract the arguments to \pmatrix{ }
on both the left and right hand sides of the equal sign, so that the
first argument is extracted as

{\it x_2}\cr 0\cr 1\cr

and the second is

\left({{{\it m_2}\,s^2
}\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it m_2}\,s^2\,F
}\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it m_2}\,s^2}\over{k}}+1
\right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr

The trick is that there are extra curly braces inside the \pmatrix{ }
strings and I don't know how to write a regexp that would count the
number of open and close curly braces and make sure they match, so
that it can find the correct ending curly brace.

As Tim Grove points out, writing a grammar for this expression is
really pretty simple, especially using the latest version of
pyparsing, which includes a new helper method, nestedExpr. Here is
the whole program to parse your example:

from pyparsing import *

data = r"""$$\pmatrix{{\it x_2}\cr 0\cr 1\cr }=
\pmatrix{\left({{{\it m_2}\,s^2
}\over{k}}+1\right)\,{\it x_1}-{{F}\over{k}}\cr -{{{\it
m_2}\,s^2\,F
}\over{k}}-F+\left({\it m_2}\,s^2\,\left({{{\it
m_2}\,s^2}\over{k}}+1
\right)+{\it m_2}\,s^2\right)\,{\it x_1}\cr 1\cr }$$"""

PMATRIX = Literal(r"\pmatrix")
nestedBraces = nestedExpr("{","}")
grammar = "$$" + PMATRIX + nestedBraces + "=" + \
PMATRIX + nestedBraces + \
"$$"
res = grammar.parseString(data)
print res

This prints the following:

['$$', '\\pmatrix', [['\\it', 'x_2'], '\\cr', '0\\cr', '1\\cr'], '=',
'\\pmatrix', ['\\left(', [[['\\it', 'm_2'], '\\,s^2'], '\\over',
['k']], '+1\\right)\\,', ['\\it', 'x_1'], '-', [['F'], '\\over',
['k']], '\\cr', '-', [[['\\it', 'm_2'], '\\,s^2\\,F'], '\\over',
['k']], '-F+\\left(', ['\\it', 'm_2'], '\\,s^2\\,\\left(', [[['\\it',
'm_2'], '\\,s^2'], '\\over', ['k']], '+1', '\\right)+', ['\\it',
'm_2'], '\\,s^2\\right)\\,', ['\\it', 'x_1'], '\\cr', '1\\cr'], '$$']

Okay, maybe this looks a bit messy. But believe it or not, the
returned results give you access to each grammar element as:

['$$', '\\pmatrix', [nested arg list], '=', '\\pmatrix',
[nestedArgList], '$$']

Not only has the parser handled the {} nesting levels, but it has
structured the returned tokens according to that nesting. (The '{}'s
are gone now, since their delimiting function has been replaced by the
nesting hierarchy in the results.)

You could use tuple assignment to get at the individual fields:
dummy,dummy,lhs_args,dummy,dummy,rhs_args,dummy = res

Or you could access the fields in res using list indexing:
lhs_args, rhs_args = res[2],res[5]

But both of these methods will break if you decide to extend the
grammar with additional or optional fields.

A safer approach is to give the grammar elements results names, as in
this slightly modified version of grammar:

grammar = "$$" + PMATRIX + nestedBraces("lhs_args") + "=" + \
PMATRIX + nestedBraces("rhs_args") + \
"$$"

Now you can access the parsed fields as if the results were a dict
with keys "lhs_args" and "rhs_args", or as an object with attributes
named "lhs_args" and "rhs_args":

res = grammar.parseString(data)
print res["lhs_args"]
print res["rhs_args"]
print res.lhs_args
print res.rhs_args

Note that the default behavior of nestedExpr is to give back a nested
list of the elements according to how the original text was nested
within braces.

If you just want the original text, add a parse action to nestedBraces
to do this for you (keepOriginalText is another pyparsing builtin).
The parse action is executed at parse time so that there is no post-
processing needed after the parsed results are returned:

nestedBraces.setParseAction(keepOriginalText)
grammar = "$$" + PMATRIX + nestedBraces("lhs_args") + "=" + \
PMATRIX + nestedBraces("rhs_args") + \
"$$"

res = grammar.parseString(data)
print res
print res.lhs_args
print res.rhs_args

Now this program returns the original text for the nested brace
expressions:

['$$', '\\pmatrix', '{{\\it x_2}\\cr 0\\cr 1\\cr }', '=', '\\pmatrix',
'{\\left({{{\\it m_2}\\,s^2 \n }\\over{k}}+1\\right)\\,{\\it x_1}-{{F}\
\over{k}}\\cr -{{{\\it m_2}\\,s^2\\,F \n }\\over{k}}-F+\\left({\\it
m_2}\\,s^2\\,\\left({{{\\it m_2}\\,s^2}\\over{k}}+1 \n \\right)+{\\it
m_2}\\,s^2\\right)\\,{\\it x_1}\\cr 1\\cr }', '$$']
['{{\\it x_2}\\cr 0\\cr 1\\cr }']
['{\\left({{{\\it m_2}\\,s^2 \n }\\over{k}}+1\\right)\\,{\\it x_1}-{{F}
\\over{k}}\\cr -{{{\\it m_2}\\,s^2\\,F \n }\\over{k}}-F+\\left({\\it
m_2}\\,s^2\\,\\left({{{\\it m_2}\\,s^2}\\over{k}}+1 \n \\right)+{\\it
m_2}\\,s^2\\right)\\,{\\it x_1}\\cr 1\\cr }']

You can find more info on pyparsing at http://pyparsing.wikispaces.com.

Cheers!
-- Paul


I can't seem to access pyparsing on wikispaces. Is there something
wrong with the website right now?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top