regular expressions, stack and nesting

A

Aaron Brady

Hi,

Every so often the group gets a request for parsing an expression. I
think it would be significantly easier to do if regular expressions
could modify a stack. However, since you might nearly as well write
Python, maybe there is a compromise.

Could the Secret Labs' regular expression engine be modified to
operate on lists, for example, or a mutable non-string type?

Details (juicy and otherwise):

One of the alternatives is to reconstruct a new string on every match,
removing the expression and replacing it with a tag. (This by the way
takes at least one out-of-band character.) The running time on it
involves constructing a string from at least three parts, maybe five:
the lead, the opening marker, the inside of the match, the closing
marker, and the tail. If it used ropes, it's still constant time, but
is O( string length * number of matches ) with just normal strings.

Another alternative is to create a new unicode object API,
PyUnicode_FROM_DATA, which creates a string object from an existing
buffer, but does not copy it. I expect this would receive -1 from
many people, not least because it breaks immutability of strings.

ctypes character arrays, arrays, and buffer objects are additional
possibilities.
 
C

Chris Rebert

2009/3/22 Aaron Brady said:
Hi,

Every so often the group gets a request for parsing an expression.  I
think it would be significantly easier to do if regular expressions
could modify a stack.  However, since you might nearly as well write
Python, maybe there is a compromise.

If you need to parse something of decent complexity, you ought to use
a actual proper parser generator, e.g. PLY, pyparsing, ANTLR, etc.
Abusing regular expressions like that to kludge jury-rigged parsers
together can only lead to pain when special cases and additional
grammar complexity emerge and start breaking the parser in difficult
ways. I'm not seeing the use case for your suggestion.

Cheers,
Chris
 
A

Aaron Brady

If you need to parse something of decent complexity, you ought to use
a actual proper parser generator, e.g. PLY, pyparsing, ANTLR, etc.
Abusing regular expressions like that to kludge jury-rigged parsers
together can only lead to pain when special cases and additional
grammar complexity emerge and start breaking the parser in difficult
ways. I'm not seeing the use case for your suggestion.

Cheers,
Chris

Hey, I don't see the use case either, but that doesn't stop everyone
and their pet snake from asking about it. </snippity>

I guess I'm looking at something on the scale of a recipe. Farewell,
dreams and glory. What do you think anyway?

P.S. What if the topics were, "kludge jury-rigged parsers"?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top