Can I rely on...

Discussion in 'Python' started by Emanuele D'Arrigo, Mar 19, 2009.

  1. Hi everybody,

    I just had a bit of a shiver for something I'm doing often in my code
    but that might be based on a wrong assumption on my part. Take the
    following code:

    pattern = "aPattern"

    compiledPatterns = [ ]
    compiledPatterns.append(re.compile(pattern))

    if(re.compile(pattern) in compiledPatterns):
    print("The compiled pattern is stored.")

    As you can see I'm effectively assuming that every time re.compile()
    is called with the same input pattern it will return the exact same
    object rather than a second, identical, object. In interactive tests
    via python shell this seems to be the case but... can I rely on it -
    always- being the case? Or is it one of those implementation-specific
    issues?

    And what about any other function or class/method? Is there a way to
    discriminate between methods and functions that when invoked twice
    with the same arguments will return the same object and those that in
    the same circumstances will return two identical objects?

    If the answer is no, am I right to state the in the case portrayed
    above the only way to be safe is to use the following code instead?

    for item in compiledPatterns:
    if(item.pattern == pattern):
    Emanuele D'Arrigo, Mar 19, 2009
    #1
    1. Advertising

  2. Emanuele D'Arrigo

    MRAB Guest

    Emanuele D'Arrigo wrote:
    > Hi everybody,
    >
    > I just had a bit of a shiver for something I'm doing often in my code
    > but that might be based on a wrong assumption on my part. Take the
    > following code:
    >
    > pattern = "aPattern"
    >
    > compiledPatterns = [ ]
    > compiledPatterns.append(re.compile(pattern))
    >
    > if(re.compile(pattern) in compiledPatterns):
    > print("The compiled pattern is stored.")
    >

    You don't need parentheses in the 'if', or the 'print' in Python 2.x.

    > As you can see I'm effectively assuming that every time re.compile()
    > is called with the same input pattern it will return the exact same
    > object rather than a second, identical, object. In interactive tests
    > via python shell this seems to be the case but... can I rely on it -
    > always- being the case? Or is it one of those implementation-specific
    > issues?
    >

    The re module has a cache of patterns, so if the pattern is already
    known then it'll return the existing compiled pattern. However, the
    cache has a limited size. In reality, no 2 pattern objects are equal.

    > And what about any other function or class/method? Is there a way to
    > discriminate between methods and functions that when invoked twice
    > with the same arguments will return the same object and those that in
    > the same circumstances will return two identical objects?
    >
    > If the answer is no, am I right to state the in the case portrayed
    > above the only way to be safe is to use the following code instead?
    >
    > for item in compiledPatterns:
    > if(item.pattern == pattern):
    >

    This is the same as using 'in'.
    MRAB, Mar 19, 2009
    #2
    1. Advertising

  3. On Thu, 2009-03-19 at 08:42 -0700, Emanuele D'Arrigo wrote:
    > Hi everybody,
    >
    > I just had a bit of a shiver for something I'm doing often in my code
    > but that might be based on a wrong assumption on my part. Take the
    > following code:
    >
    > pattern = "aPattern"
    >
    > compiledPatterns = [ ]
    > compiledPatterns.append(re.compile(pattern))
    >
    > if(re.compile(pattern) in compiledPatterns):
    > print("The compiled pattern is stored.")
    >
    > As you can see I'm effectively assuming that every time re.compile()
    > is called with the same input pattern it will return the exact same
    > object rather than a second, identical, object. In interactive tests
    > via python shell this seems to be the case but... can I rely on it -
    > always- being the case? Or is it one of those implementation-specific
    > issues?
    >
    > And what about any other function or class/method? Is there a way to
    > discriminate between methods and functions that when invoked twice
    > with the same arguments will return the same object and those that in
    > the same circumstances will return two identical objects?
    >
    > If the answer is no, am I right to state the in the case portrayed
    > above the only way to be safe is to use the following code instead?
    >
    > for item in compiledPatterns:
    > if(item.pattern == pattern):


    In general, no. You cannot rely on objects instantiated with the same
    parameters to be equal. Eg.:

    >>> class N(object):

    def __init__(self, foo):
    self.foo = foo

    >>> a = N('m')
    >>> b = N('m')
    >>> a == b

    False

    If, however, the designer of the class implements it as such
    (and documents it as well) then you can. E.g:

    >>> del N
    >>> class N(object):

    def __init__(self, foo):
    self.foo = foo
    def __eq__(self, other):
    return self.foo == other.foo

    >>> a = N('m')
    >>> b = N('m')
    >>> a == b

    True

    For functions/methods it really depends on the implementation. For
    example, do we *really* want the following to always be true? Even
    though we passed the same arguments?

    >>> import random
    >>> random.randint(0, 10) == random.randint(0, 10)


    For the re module, unless it's documented that

    >>> re.compile(p) == re.compile(p)


    is always true then you should not rely on it, because it's an
    implementation detail that may change in the future.
    Albert Hopkins, Mar 19, 2009
    #3
  4. Emanuele D'Arrigo

    Terry Reedy Guest

    Emanuele D'Arrigo wrote:
    > Hi everybody,
    >
    > I just had a bit of a shiver for something I'm doing often in my code
    > but that might be based on a wrong assumption on my part. Take the
    > following code:
    >
    > pattern = "aPattern"
    >
    > compiledPatterns = [ ]
    > compiledPatterns.append(re.compile(pattern))
    >
    > if(re.compile(pattern) in compiledPatterns):


    Note that for this generally take time proportional to the length of the
    list. And as MRAB said, drop the parens.

    > print("The compiled pattern is stored.")
    >
    > As you can see I'm effectively assuming that every time re.compile()
    > is called with the same input pattern it will return the exact same
    > object rather than a second, identical, object. In interactive tests
    > via python shell this seems to be the case but... can I rely on it -
    > always- being the case? Or is it one of those implementation-specific
    > issues?


    As MRAB indicated, this only works because the CPython re module itself
    has a cache so you do not have to make one. It is, however, limited to
    100 or so since programs that use patterns repeatedly generally use a
    limited number of patterns. Caches usually use a dict so that
    cache[input] == output and lookup is O(1).

    > And what about any other function or class/method? Is there a way to
    > discriminate between methods and functions that when invoked twice
    > with the same arguments will return the same object and those that in
    > the same circumstances will return two identical objects?


    In general, a function that calculates and return an object will return
    a new object. The exceptions are exceptions.

    >
    > If the answer is no, am I right to state the in the case portrayed
    > above the only way to be safe is to use the following code instead?
    >
    > for item in compiledPatterns:
    > if(item.pattern == pattern):


    Yes. Unless you are comparing against None (or True or False in Py3) or
    specifically know otherwise, you probably want '==' rather than 'is'.

    Terry Jan Reedy
    Terry Reedy, Mar 19, 2009
    #4
  5. Emanuele D'Arrigo

    alex23 Guest

    On Mar 20, 1:42 am, "Emanuele D'Arrigo" <> wrote:
    > I just had a bit of a shiver for something I'm doing often in my code
    > but that might be based on a wrong assumption on my part. Take the
    > following code:
    >
    > pattern = "aPattern"
    >
    > compiledPatterns = [ ]
    > compiledPatterns.append(re.compile(pattern))
    >
    > if(re.compile(pattern) in compiledPatterns):
    >     print("The compiled pattern is stored.")


    Others have discussed the problem with relying on the compiled RE
    objects being the same, but one option may be to use a dict instead of
    a list and matching on the pattern string itself:

    compiledPatterns = { }
    if pattern not in compiledPatterns:
    compiledPatterns[pattern] = re.compile(pattern)
    else:
    print("The compiled pattern is stored.")
    alex23, Mar 20, 2009
    #5
  6. alex23 <> wrote:
    > On Mar 20, 1:42 am, "Emanuele D'Arrigo" <> wrote:
    > > I just had a bit of a shiver for something I'm doing often in my code
    > > but that might be based on a wrong assumption on my part. Take the
    > > following code:
    > >
    > > pattern = "aPattern"
    > >
    > > compiledPatterns = [ ]
    > > compiledPatterns.append(re.compile(pattern))
    > >
    > > if(re.compile(pattern) in compiledPatterns):
    > >     print("The compiled pattern is stored.")

    >
    > Others have discussed the problem with relying on the compiled RE
    > objects being the same, but one option may be to use a dict instead of
    > a list and matching on the pattern string itself:
    >
    > compiledPatterns = { }
    > if pattern not in compiledPatterns:
    > compiledPatterns[pattern] = re.compile(pattern)
    > else:
    > print("The compiled pattern is stored.")


    FYI this is almost exactly the approach the re module takes to
    caching the expressions. (The difference is re adds a type
    token to the front of the key.)

    --
    R. David Murray http://www.bitdance.com
    R. David Murray, Mar 20, 2009
    #6
  7. Emanuele D'Arrigo a écrit :
    > Hi everybody,
    >
    > I just had a bit of a shiver for something I'm doing often in my code
    > but that might be based on a wrong assumption on my part.


    Do not assume. Either check or use another solution. My 2 cents...

    > Take the
    > following code:
    >
    > pattern = "aPattern"
    >
    > compiledPatterns = [ ]
    > compiledPatterns.append(re.compile(pattern))
    >
    > if(re.compile(pattern) in compiledPatterns):


    you don't need the parens around the test.

    > print("The compiled pattern is stored.")
    >
    > As you can see I'm effectively assuming that every time re.compile()
    > is called with the same input pattern it will return the exact same
    > object rather than a second, identical, object. In interactive tests
    > via python shell this seems to be the case but... can I rely on it -
    > always- being the case? Or is it one of those implementation-specific
    > issues?


    I can't tell - I'm not willing to write a serious test for it. IIRC, the
    re module maintains a cache of already seen patterns, but I didn't
    bother reading the implementation. Anyway: why don't you use a dict
    instead, using the "source (ie : string representation) of the pattern
    as key ?

    ie:

    pattern = "aPattern"
    compiled_patterns = {}

    compiled_patterns[pattern] = re.compile(pattern)

    # ...

    if pattern in compiled_patterns:
    print("The compiled pattern is stored.")


    > And what about any other function or class/method? Is there a way to
    > discriminate between methods and functions that when invoked twice
    > with the same arguments will return the same object and those that in
    > the same circumstances will return two identical objects?


    Except reading the source code (hey, this is OSS, isn't it), I don't see
    any reliable way to know this - unless it is clearly documented of course.

    > If the answer is no, am I right to state the in the case portrayed
    > above the only way to be safe is to use the following code instead?
    >
    > for item in compiledPatterns:
    > if(item.pattern == pattern):


    Once again, using a dict will be *way* more efficient.
    Bruno Desthuilliers, Mar 20, 2009
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bob
    Replies:
    1
    Views:
    373
    Shawn B.
    Jan 12, 2004
  2. Casey Hawthorne
    Replies:
    1
    Views:
    696
    Arne Vajhøj
    Mar 18, 2009
  3. Emanuele D'Arrigo

    Can I rely on...

    Emanuele D'Arrigo, Mar 19, 2009, in forum: Python
    Replies:
    3
    Views:
    252
    R. David Murray
    Mar 19, 2009
  4. Fredrik Eriksson
    Replies:
    7
    Views:
    245
    Fredrik Eriksson
    Oct 19, 2011
  5. Edwin Fine
    Replies:
    2
    Views:
    105
Loading...

Share This Page