Can I rely on...

E

Emanuele D'Arrigo

Hi everybody,

I just had a bit of a shiver for something I'm doing often in my code
but that might be based on a wrong assumption on my part. Take the
following code:

pattern = "aPattern"

compiledPatterns = [ ]
compiledPatterns.append(re.compile(pattern))

if(re.compile(pattern) in compiledPatterns):
print("The compiled pattern is stored.")

As you can see I'm effectively assuming that every time re.compile()
is called with the same input pattern it will return the exact same
object rather than a second, identical, object. In interactive tests
via python shell this seems to be the case but... can I rely on it -
always- being the case? Or is it one of those implementation-specific
issues?

And what about any other function or class/method? Is there a way to
discriminate between methods and functions that when invoked twice
with the same arguments will return the same object and those that in
the same circumstances will return two identical objects?

If the answer is no, am I right to state the in the case portrayed
above the only way to be safe is to use the following code instead?

for item in compiledPatterns:
if(item.pattern == pattern):
 
M

MRAB

Emanuele said:
Hi everybody,

I just had a bit of a shiver for something I'm doing often in my code
but that might be based on a wrong assumption on my part. Take the
following code:

pattern = "aPattern"

compiledPatterns = [ ]
compiledPatterns.append(re.compile(pattern))

if(re.compile(pattern) in compiledPatterns):
print("The compiled pattern is stored.")
You don't need parentheses in the 'if', or the 'print' in Python 2.x.
As you can see I'm effectively assuming that every time re.compile()
is called with the same input pattern it will return the exact same
object rather than a second, identical, object. In interactive tests
via python shell this seems to be the case but... can I rely on it -
always- being the case? Or is it one of those implementation-specific
issues?
The re module has a cache of patterns, so if the pattern is already
known then it'll return the existing compiled pattern. However, the
cache has a limited size. In reality, no 2 pattern objects are equal.
And what about any other function or class/method? Is there a way to
discriminate between methods and functions that when invoked twice
with the same arguments will return the same object and those that in
the same circumstances will return two identical objects?

If the answer is no, am I right to state the in the case portrayed
above the only way to be safe is to use the following code instead?

for item in compiledPatterns:
if(item.pattern == pattern):
This is the same as using 'in'.
 
A

Albert Hopkins

Hi everybody,

I just had a bit of a shiver for something I'm doing often in my code
but that might be based on a wrong assumption on my part. Take the
following code:

pattern = "aPattern"

compiledPatterns = [ ]
compiledPatterns.append(re.compile(pattern))

if(re.compile(pattern) in compiledPatterns):
print("The compiled pattern is stored.")

As you can see I'm effectively assuming that every time re.compile()
is called with the same input pattern it will return the exact same
object rather than a second, identical, object. In interactive tests
via python shell this seems to be the case but... can I rely on it -
always- being the case? Or is it one of those implementation-specific
issues?

And what about any other function or class/method? Is there a way to
discriminate between methods and functions that when invoked twice
with the same arguments will return the same object and those that in
the same circumstances will return two identical objects?

If the answer is no, am I right to state the in the case portrayed
above the only way to be safe is to use the following code instead?

for item in compiledPatterns:
if(item.pattern == pattern):

In general, no. You cannot rely on objects instantiated with the same
parameters to be equal. Eg.:
def __init__(self, foo):
self.foo = foo
False

If, however, the designer of the class implements it as such
(and documents it as well) then you can. E.g:
def __init__(self, foo):
self.foo = foo
def __eq__(self, other):
return self.foo == other.foo
True

For functions/methods it really depends on the implementation. For
example, do we *really* want the following to always be true? Even
though we passed the same arguments?

For the re module, unless it's documented that

is always true then you should not rely on it, because it's an
implementation detail that may change in the future.
 
T

Terry Reedy

Emanuele said:
Hi everybody,

I just had a bit of a shiver for something I'm doing often in my code
but that might be based on a wrong assumption on my part. Take the
following code:

pattern = "aPattern"

compiledPatterns = [ ]
compiledPatterns.append(re.compile(pattern))

if(re.compile(pattern) in compiledPatterns):

Note that for this generally take time proportional to the length of the
list. And as MRAB said, drop the parens.
print("The compiled pattern is stored.")

As you can see I'm effectively assuming that every time re.compile()
is called with the same input pattern it will return the exact same
object rather than a second, identical, object. In interactive tests
via python shell this seems to be the case but... can I rely on it -
always- being the case? Or is it one of those implementation-specific
issues?

As MRAB indicated, this only works because the CPython re module itself
has a cache so you do not have to make one. It is, however, limited to
100 or so since programs that use patterns repeatedly generally use a
limited number of patterns. Caches usually use a dict so that
cache[input] == output and lookup is O(1).
And what about any other function or class/method? Is there a way to
discriminate between methods and functions that when invoked twice
with the same arguments will return the same object and those that in
the same circumstances will return two identical objects?

In general, a function that calculates and return an object will return
a new object. The exceptions are exceptions.
If the answer is no, am I right to state the in the case portrayed
above the only way to be safe is to use the following code instead?

for item in compiledPatterns:
if(item.pattern == pattern):

Yes. Unless you are comparing against None (or True or False in Py3) or
specifically know otherwise, you probably want '==' rather than 'is'.

Terry Jan Reedy
 
A

alex23

I just had a bit of a shiver for something I'm doing often in my code
but that might be based on a wrong assumption on my part. Take the
following code:

pattern = "aPattern"

compiledPatterns = [ ]
compiledPatterns.append(re.compile(pattern))

if(re.compile(pattern) in compiledPatterns):
    print("The compiled pattern is stored.")

Others have discussed the problem with relying on the compiled RE
objects being the same, but one option may be to use a dict instead of
a list and matching on the pattern string itself:

compiledPatterns = { }
if pattern not in compiledPatterns:
compiledPatterns[pattern] = re.compile(pattern)
else:
print("The compiled pattern is stored.")
 
R

R. David Murray

alex23 said:
I just had a bit of a shiver for something I'm doing often in my code
but that might be based on a wrong assumption on my part. Take the
following code:

pattern = "aPattern"

compiledPatterns = [ ]
compiledPatterns.append(re.compile(pattern))

if(re.compile(pattern) in compiledPatterns):
    print("The compiled pattern is stored.")

Others have discussed the problem with relying on the compiled RE
objects being the same, but one option may be to use a dict instead of
a list and matching on the pattern string itself:

compiledPatterns = { }
if pattern not in compiledPatterns:
compiledPatterns[pattern] = re.compile(pattern)
else:
print("The compiled pattern is stored.")

FYI this is almost exactly the approach the re module takes to
caching the expressions. (The difference is re adds a type
token to the front of the key.)
 
B

Bruno Desthuilliers

Emanuele D'Arrigo a écrit :
Hi everybody,

I just had a bit of a shiver for something I'm doing often in my code
but that might be based on a wrong assumption on my part.

Do not assume. Either check or use another solution. My 2 cents...
Take the
following code:

pattern = "aPattern"

compiledPatterns = [ ]
compiledPatterns.append(re.compile(pattern))

if(re.compile(pattern) in compiledPatterns):

you don't need the parens around the test.
print("The compiled pattern is stored.")

As you can see I'm effectively assuming that every time re.compile()
is called with the same input pattern it will return the exact same
object rather than a second, identical, object. In interactive tests
via python shell this seems to be the case but... can I rely on it -
always- being the case? Or is it one of those implementation-specific
issues?

I can't tell - I'm not willing to write a serious test for it. IIRC, the
re module maintains a cache of already seen patterns, but I didn't
bother reading the implementation. Anyway: why don't you use a dict
instead, using the "source (ie : string representation) of the pattern
as key ?

ie:

pattern = "aPattern"
compiled_patterns = {}

compiled_patterns[pattern] = re.compile(pattern)

# ...

if pattern in compiled_patterns:
print("The compiled pattern is stored.")

And what about any other function or class/method? Is there a way to
discriminate between methods and functions that when invoked twice
with the same arguments will return the same object and those that in
the same circumstances will return two identical objects?

Except reading the source code (hey, this is OSS, isn't it), I don't see
any reliable way to know this - unless it is clearly documented of course.
If the answer is no, am I right to state the in the case portrayed
above the only way to be safe is to use the following code instead?

for item in compiledPatterns:
if(item.pattern == pattern):

Once again, using a dict will be *way* more efficient.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,045
Latest member
DRCM

Latest Threads

Top