OK to memoize re objects?

K

kj

My Python code is filled with assignments of regexp objects to
globals variables at the top level; e.g.:

_spam_re = re.compile('^(?:ham|eggs)$', re.I)

Don't like it. My Perl-pickled brain wishes that re.compile was
a memoizing method, so that I could use it anywhere, even inside
tight loops, without ever having to worry about the overhead of
regexp compilation.

Of course, I can do the memoization myself. Would it be a bad
idea? How much state does a re object keep? Or to put it differently,
what should be avoided to keep a regexp object essentially "stateless",
so that its memoization makes sense?

TIA!

kynn
 
R

Robert Kern

kj said:
My Python code is filled with assignments of regexp objects to
globals variables at the top level; e.g.:

_spam_re = re.compile('^(?:ham|eggs)$', re.I)

Don't like it. My Perl-pickled brain wishes that re.compile was
a memoizing method, so that I could use it anywhere, even inside
tight loops, without ever having to worry about the overhead of
regexp compilation.

Just use re.search(), etc. They already memoize the compiled regex objects.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
K

kj

Just use re.search(), etc. They already memoize the compiled regex objects.

Thanks.

I find the docs are pretty confusing on this point. They first
make the point of noting that pre-compiling regular expressions is
more efficient, and then *immediately* shoot down this point by
saying that one need not worry about pre-compiling in most cases.
From the docs:

...using compile() and saving the resulting regular expression
object for reuse is more efficient when the expression will be
used several times in a single program.

Note: The compiled versions of the most recent patterns passed
to re.match(), re.search() or re.compile() are cached, so
programs that use only a few regular expressions at a time
needn't worry about compiling regular expressions.

Honestly I don't know what to make of this... I would love to see
an example in which re.compile was unequivocally preferable, to
really understand what the docs are saying here...

kynn
 
E

Ethan Furman

kj said:
Thanks.

I find the docs are pretty confusing on this point. They first
make the point of noting that pre-compiling regular expressions is
more efficient, and then *immediately* shoot down this point by
saying that one need not worry about pre-compiling in most cases.

...using compile() and saving the resulting regular expression
object for reuse is more efficient when the expression will be
used several times in a single program.

Note: The compiled versions of the most recent patterns passed
to re.match(), re.search() or re.compile() are cached, so
programs that use only a few regular expressions at a time
needn't worry about compiling regular expressions.

Honestly I don't know what to make of this... I would love to see
an example in which re.compile was unequivocally preferable, to
really understand what the docs are saying here...

kynn

Looking in the code for re in 2.5:
..
..
..
_MAXCACHE = 100
..
..
..
if len(_cache) >= _MAXCACHE:
_cache.clear()
..
..
..

so when you fill up, you lose the entire cache. On the other hand, I (a
re novice, to be sure) have only used between two to five in any one
program... it'll be a while before I hit _MAXCACHE!

~Ethan~
 
E

Ethan Furman

Nobody said:
Do you know how many REs import-ed modules are using? The cache isn't
reserved for __main__.

As a matter of fact, I haven't got a clue. :)

Fortunately, I always use .compile to save my re's. Seems simpler to me
that way.

~Ethan~
 
S

Steven D'Aprano

I find the docs are pretty confusing on this point. They first make the
point of noting that pre-compiling regular expressions is more
efficient, and then *immediately* shoot down this point by saying that
one need not worry about pre-compiling in most cases. From the docs:

...using compile() and saving the resulting regular expression
object for reuse is more efficient when the expression will be used
several times in a single program.

Note: The compiled versions of the most recent patterns passed to
re.match(), re.search() or re.compile() are cached, so programs that
use only a few regular expressions at a time needn't worry about
compiling regular expressions.

Honestly I don't know what to make of this... I would love to see an
example in which re.compile was unequivocally preferable, to really
understand what the docs are saying here...

I find it entirely understandable. If you have only a few regexes, then
there's no need to pre-compile them yourself, because the re module
caches them. Otherwise, don't rely on the cache -- it may help, or it may
not, no promises are made.

The nature of the cache isn't explained because it is an implementation
detail. As it turns out, the current implementation is a single cache in
the re module, so every module "import re" shares the one cache. The
cache is also completely emptied if it exceeds a certain number of
objects, so the cache may be flushed at arbitrary times out of your
control. Or it might not.
 
H

Hyuga

Do you know how many REs import-ed modules are using? The cache isn't
reserved for __main__.

Based on this, I'd say that the best policy would be that if you only
have a handful of simple REs that are used only on occasion, it's
probably not worth using re.compile--even if they fall out of cache,
it shouldn't take a noticeable amount of time to recompile them.

If, however, these are either complex REs, or REs that are being used
very frequently, say in a loop, might as well save the compiled RE
somewhere just to be sure it doesn't have to be recompiled at any
point.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top