re.compile versus r''

T

Terrence Brannon

Hello, I'm using a tool (PLY) which apparently expects the tokens to
be created using r''

But because one token is a rather complex regular expression, I want
to create the regular expression programmatically.

How can I generate a string and then create something of the same type
that the r'' function does?

Concretely, in the program below, consonant is not the same type as
t_NAME, but I assume that it needs to be for PLY to use it for
tokenizing:

import re

t_NAME = r'[a-zA-Z_][a-zA-Z0-9_]*'

guttural = 'kh?|gh?|\"n'
palatal = '(?:chh?|jh?|\~n)'
cerebral = '\.(?:th?|dh?|n)'
dental = '(?:th?|dh?|n)'
semivowel = '[yrlv]'
sibilant = '[\"\.]?s'
aspirant = 'h'

consonant = re.compile('|'.join([guttural , palatal , cerebral ,
dental , semivowel , sibilant , aspirant]))

print consonant
print t_NAME
 
F

Fredrik Lundh

Terrence said:
Hello, I'm using a tool (PLY) which apparently expects the tokens to
be created using r''

But because one token is a rather complex regular expression, I want
to create the regular expression programmatically.

How can I generate a string and then create something of the same type
that the r'' function does?

r'' is an alternative syntax for string literals that affects how escape
sequences are interpreted; there's no separate string type for strings
created by this syntax.

</F>
 
C

Chris Rebert

Hello, I'm using a tool (PLY) which apparently expects the tokens to
be created using r''

But because one token is a rather complex regular expression, I want
to create the regular expression programmatically.

How can I generate a string and then create something of the same type
that the r'' function does?

The "r" prefix isn't a function or a type, it's merely a special
literal syntax for strings that's handy when you're writing regexes
and therefore have to deal with another level of backslash escaping.
See the second to last paragraph of
http://docs.python.org/ref/strings.html for more info.

Regards,
Chris
Concretely, in the program below, consonant is not the same type as
t_NAME, but I assume that it needs to be for PLY to use it for
tokenizing:

import re

t_NAME = r'[a-zA-Z_][a-zA-Z0-9_]*'

guttural = 'kh?|gh?|\"n'
palatal = '(?:chh?|jh?|\~n)'
cerebral = '\.(?:th?|dh?|n)'
dental = '(?:th?|dh?|n)'
semivowel = '[yrlv]'
sibilant = '[\"\.]?s'
aspirant = 'h'

consonant = re.compile('|'.join([guttural , palatal , cerebral ,
dental , semivowel , sibilant , aspirant]))

print consonant
print t_NAME
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,040
Latest member
papereejit

Latest Threads

Top