any chance regular expressions are cached?

M

mh

I've got a bit of code in a function like this:

s=re.sub(r'\n','\n'+spaces,s)
s=re.sub(r'^',spaces,s)
s=re.sub(r' *\n','\n',s)
s=re.sub(r' *$','',s)
s=re.sub(r'\n*$','',s)

Is there any chance that these will be cached somewhere, and save
me the trouble of having to declare some global re's if I don't
want to have them recompiled on each function invocation?

Many TIA!
Mark
 
T

Tim Chase

s=re.sub(r'\n','\n'+spaces,s)
s=re.sub(r'^',spaces,s)
s=re.sub(r' *\n','\n',s)
s=re.sub(r' *$','',s)
s=re.sub(r'\n*$','',s)

Is there any chance that these will be cached somewhere, and save
me the trouble of having to declare some global re's if I don't
want to have them recompiled on each function invocation?
....
Explicit is better than implicit
....


Sounds like what you want is to use the compile() call to compile
once, and then use the resulting objects:

re1 = re.compile(r'\n')
re2 = re.compile(r'^')
...
s = re1.sub('\n' + spaces, s)
s = re2.sub(spaces, s)
...


The compile() should be done once (outside loops, possibly at a
module level, as, in a way, they're constants) and then you can
use the resulting object without the overhead of compiling.

-tkc
 
R

Ryan Ginstrom

On Behalf Of Tim Chase
Sounds like what you want is to use the compile() call to
compile once, and then use the resulting objects:

re1 = re.compile(r'\n')
re2 = re.compile(r'^')
...
s = re1.sub('\n' + spaces, s)
s = re2.sub(spaces, s)

Yes. And I would go a step further and suggest that regular expressions are
best avoided in favor of simpler things when possible. That will make the
code easier to debug, and probably faster.

A couple of examples:spam spam


spam

spam"""
# normalize newlines
print "\n".join([line for line in text.splitlines() if line])
spam spam spam
spam spam
spam
spamspam

Regards,
Ryan Ginstrom
 
T

Terry Reedy

| I've got a bit of code in a function like this:
|
| s=re.sub(r'\n','\n'+spaces,s)
| s=re.sub(r'^',spaces,s)
| s=re.sub(r' *\n','\n',s)
| s=re.sub(r' *$','',s)
| s=re.sub(r'\n*$','',s)
|
| Is there any chance that these will be cached somewhere, and save
| me the trouble of having to declare some global re's if I don't
| want to have them recompiled on each function invocation?

The last time I looked, several versions ago, re did cache.
Don't know if still true. Not part of spec, I don't think.

tjr
 
S

Steven D'Aprano

I've got a bit of code in a function like this:

s=re.sub(r'\n','\n'+spaces,s)
s=re.sub(r'^',spaces,s)
s=re.sub(r' *\n','\n',s)
s=re.sub(r' *$','',s)
s=re.sub(r'\n*$','',s)

Is there any chance that these will be cached somewhere, and save me the
trouble of having to declare some global re's if I don't want to have
them recompiled on each function invocation?


At the interactive interpreter, type "help(re)" [enter]. A page or two
down, you will see:

purge()
Clear the regular expression cache


and looking at the source code I see many calls to _compile() which
starts off with:

def _compile(*key):
# internal: compile pattern
cachekey = (type(key[0]),) + key
p = _cache.get(cachekey)
if p is not None:
return p

So yes, the re module caches it's regular expressions.

Having said that, at least four out of the five examples you give are
good examples of when you SHOULDN'T use regexes.

re.sub(r'\n','\n'+spaces,s)

is better written as s.replace('\n', '\n'+spaces). Don't believe me?
Check this out:

.... "import re;from __main__ import s, spaces").timeit()
7.4031901359558105.... "import re;from __main__ import s, spaces").timeit()
1.6208670139312744

The regex is nearly five times slower than the simple string replacement.


Similarly:

re.sub(r'^',spaces,s)

is better written as spaces+s, which is nearly eleven times faster.

Also:

re.sub(r' *$','',s)
re.sub(r'\n*$','',s)

are just slow ways of writing s.rstrip(' ') and s.rstrip('\n').
 
J

John Machin

I've got a bit of code in a function like this:

s=re.sub(r'\n','\n'+spaces,s)
s=re.sub(r'^',spaces,s)
s=re.sub(r' *\n','\n',s)
s=re.sub(r' *$','',s)
s=re.sub(r'\n*$','',s)

Is there any chance that these will be cached somewhere, and save
me the trouble of having to declare some global re's if I don't
want to have them recompiled on each function invocation?

Yes they will be cached. But do yourself a favour and check out the
string methods.

E.g..... s=re.sub(r'\n','\n'+spaces,s)
.... s=re.sub(r'^',spaces,s)
.... s=re.sub(r' *\n','\n',s)
.... s=re.sub(r' *$','',s)
.... s=re.sub(r'\n*$','',s)
.... return s
........ return '\n'.join(spaces + x.rstrip() if x.rstrip() else '' for
x in s.splitlines())
....
t1 = 'foo\nbar\nzot\n'
t2 = 'foo\nbar \nzot\n'
t3 = 'foo\n\nzot\n'
[opfunc(s, ' ') for s in (t1, t2, t3)]
[' foo\n bar\n zot', ' foo\n bar\n zot', ' foo\n\n
zot']
[myfunc(s, ' ') for s in (t1, t2, t3)]
[' foo\n bar\n zot', ' foo\n bar\n zot', ' foo\n\n
zot']
 
J

John Machin

...>>> def myfunc(s, spaces):

... return '\n'.join(spaces + x.rstrip() if x.rstrip() else '' for
x in s.splitlines())

Better:
.... return '\n'.join((spaces + x).rstrip() for x in
s.splitlines())
 
A

Arnaud Delobelle

On Mar 10, 3:39 am, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
[...]
Having said that, at least four out of the five examples you give are
good examples of when you SHOULDN'T use regexes.

re.sub(r'\n','\n'+spaces,s)

is better written as s.replace('\n', '\n'+spaces). Don't believe me?
Check this out:


... "import re;from __main__ import s, spaces").timeit()
7.4031901359558105>>> Timer("s.replace('\\n', '\\n'+spaces)",

... "import re;from __main__ import s, spaces").timeit()
1.6208670139312744

The regex is nearly five times slower than the simple string replacement.

I agree that the second version is better, but most of the time in the
first one is spend compiling the regexp, so the comparison is not
really fair:

Regexps are still more than twice slower.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top