Trivial string substitution/parser

S

Samuel

Hi,

How would you implement a simple parser for the following string:

---
In this string $variable1 is substituted, while \$variable2 is not.
---

I know how to write a parser, but I am looking for an elegant (and lazy)
way. Any idea?

-Samuel
 
D

Duncan Booth

Samuel said:
Hi,

How would you implement a simple parser for the following string:
The elegant and lazy way would be to change your specification so that $
characters are escaped by $$ not by backslashes. Then you can write:
'In this string hello is substituted, while $variable2 is not.'

If you must insist on using backslash escapes (which introduces the
question of how you get backslashes into the output: do they have to be
escaped as well?) then use string.Template with a custom pattern.
 
S

Samuel

The elegant and lazy way would be to change your specification so that $
characters are escaped by $$ not by backslashes. Then you can write:

Thanks, however, turns out my specification of the problem was
incomplete: In addition, the variable names are not known at compilation
time.
I just did it that way, this looks fairly easy already:

-------------------
import re

def variable_sub_cb(match):
prepend = match.group(1)
varname = match.group(2)
value = get_variable(varname)
return prepend + value

string_re = re.compile(r'(^|[^\\])\$([a-z][\w_]+\b)', re.I)

input = r'In this string $variable1 is substituted,'
input += 'while \$variable2 is not.'

print string_re.sub(variable_sub_cb, input)
 
J

Josiah Carlson

Samuel said:
Thanks, however, turns out my specification of the problem was
incomplete: In addition, the variable names are not known at compilation
time.

You mean at edit-time.

Can be replaced by...

....as per the standard **kwargs passing semantics.


- Josiah
 
G

Graham Breed

Samuel wote:
Thanks, however, turns out my specification of the problem was
incomplete: In addition, the variable names are not known at compilation
time.
I just did it that way, this looks fairly easy already:

-------------------
import re

def variable_sub_cb(match):
prepend = match.group(1)
varname = match.group(2)
value = get_variable(varname)
return prepend + value

string_re = re.compile(r'(^|[^\\])\$([a-z][\w_]+\b)', re.I)

input = r'In this string $variable1 is substituted,'
input += 'while \$variable2 is not.'

print string_re.sub(variable_sub_cb, input)
-------------------

It gets easier:

import re

def variable_sub_cb(match):
return get_variable(match.group(1))

string_re = re.compile(r'(?<!\\)\$([A-Za-z]\w+)')

def get_variable(varname):
return globals()[varname]

variable1 = 'variable 1'

input = r'In this string $variable1 is substituted,'
input += 'while \$variable2 is not.'

print string_re.sub(variable_sub_cb, input)

or even

import re

def variable_sub_cb(match):
return globals()[match.group(1)]

variable1 = 'variable 1'
input = (r'In this string $variable1 is substituted,'
'while \$variable2 is not.')

print re.sub(r'(?<!\\)\$([A-Za-z]\w+)', variable_sub_cb, input)


Graham
 
D

Duncan Booth

Josiah Carlson said:
You mean at edit-time.


Can be replaced by...


...as per the standard **kwargs passing semantics.

You don't even need to do that. substitute will accept a dictionary as a
positional argument:

t.substitute(vars)

If you use both forms then the keyword arguments take priority.

Also, of course, vars just needs to be something which quacks like a dict:
it can do whatever it needs to do such as looking up a database or querying
a server to generate the value only when it needs it, or even evaluating
the name as an expression; in the OP's case it could call get_variable.

Anyway, the question seems to be moot since the OP's definition of 'elegant
and lazy' includes regular expressions and reinvented wheels.

.... and in another message Graham Breed said:
def get_variable(varname):
return globals()[varname]

Doesn't the mere thought of creating global variables with unknown names
make you shudder?
 
G

Graham Breed

Duncan Booth wote:
Also, of course, vars just needs to be something which quacks like a dict:
it can do whatever it needs to do such as looking up a database or querying
a server to generate the value only when it needs it, or even evaluating
the name as an expression; in the OP's case it could call get_variable.

And in case that sounds difficult, the code is

class VariableGetter:
def __getitem__(self, key):
return get_variable(key)
Anyway, the question seems to be moot since the OP's definition of 'elegant
and lazy' includes regular expressions and reinvented wheels.

Your suggestion of subclassing string.Template will also require a
regular expression -- and a fairly hairy one as far as I can work out
from the documentation. There isn't an example and I don't think it's
the easiest way of solving this problem. But if Samuel really wants
backslash escaping it'd be easier to do a replace('$$','$$$$') and
replace('\\$', '$$') (or replace('\\$','\\$$') if he really wants the
backslash to persist) before using the template.

Then, if he really does want to reject single letter variable names,
or names beginning with a backslash, he'll still need to subclass
Template and supply a regular expression, but a simpler one.
... and in another message Graham Breed said:
def get_variable(varname):
return globals()[varname]

Doesn't the mere thought of creating global variables with unknown names
make you shudder?

Not at all. It works, it's what the shell does, and it's easy to test
interactively. Obviously the application code wouldn't look like
that.


Graham
 
G

Graham Breed

Duncan Booth wote:
If you must insist on using backslash escapes (which introduces the
question of how you get backslashes into the output: do they have to be
escaped as well?) then use string.Template with a custom pattern.

If anybody wants this, I worked out the following regular expression
which seems to work:

(?P<escaped>\\)\$ | # backslash escape pattern
\$(?:
(?P<named>[_a-z][_a-z0-9]*) | # delimiter and Python identifier
{(?P<braced>[_a-z][_a-z0-9]*)} | # delimiter and braced identifier
(?P<invalid>) # Other ill-formed delimiter exprs
)

The clue is string.Template.pattern.pattern

So you compile that with verbose and case-insensitive flags and set it
to "pattern" in a string.Template subclass. (In fact you don't have
to compile it, but that behaviour's undocumented.) Something like
.... (?P<escaped>\\\\)\\$ | # backslash escape pattern
.... \$(?:
.... (?P<named>[_a-z][_a-z0-9]*) | # delimiter and identifier
.... {(?P<braced>[_a-z][_a-z0-9]*)} | # ... and braced identifier
.... (?P<invalid>) # Other ill-formed delimiter exprs
.... )
.... """.... pattern = re.compile(regexp, re.I | re.X)
....


Graham
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top