Can I rely on...

Discussion in 'Python' started by Emanuele D'Arrigo, Mar 19, 2009.

  1. Sorry for the double-post, the first one was sent by mistake before
    completion.

    Hi everybody,

    I just had a bit of a shiver for something I'm doing often in my code
    but that might be based on a wrong assumption on my part. Take the
    following code:

    pattern = "aPattern"

    compiledPatterns = [ ]
    compiledPatterns.append(re.compile(pattern))

    if(re.compile(pattern) in compiledPatterns):
    print("The compiled pattern is stored.")

    As you can see I'm effectively assuming that every time re.compile()
    is called with the same input pattern it will return the exact same
    object rather than a second, identical, object. In interactive tests
    via python shell this seems to be the case but... can I rely on it -
    always- being the case?

    If the answer is no, am I right to state the in the case portrayed
    above the only way to be safe is to use the following code instead?

    for item in compiledPatterns:
    if(item.pattern == pattern):
    print("The compiled pattern is stored.")
    break

    And what about any other function or class/method? Is there a way to
    discriminate between methods and functions that when invoked twice
    with the same arguments will return the same object and those that in
    the same circumstances will return two identical objects? Or is it one
    of those implementation-specific issues?

    Manu
     
    Emanuele D'Arrigo, Mar 19, 2009
    #1
    1. Advertising

  2. Emanuele D'Arrigo

    MRAB Guest

    Emanuele D'Arrigo wrote:
    [snip]
    > If the answer is no, am I right to state the in the case portrayed
    > above the only way to be safe is to use the following code instead?
    >
    > for item in compiledPatterns:
    > if(item.pattern == pattern):
    > print("The compiled pattern is stored.")
    > break
    >

    Correction to my last post: this isn't the same as using 'in'.

    It should work, but remember that it compares only the pattern and not
    any flags you might have used in the original re.compile().
     
    MRAB, Mar 19, 2009
    #2
    1. Advertising

  3. Thank you everybody for the informative replies.

    I'll have to comb my code for all the instances of "item in sequence"
    statement because I suspect some of them are as unsafe as my first
    example. Oh well. One more lesson learned.

    Thank you again.

    Manu
     
    Emanuele D'Arrigo, Mar 19, 2009
    #3
  4. "Emanuele D'Arrigo" <> wrote:
    > Thank you everybody for the informative replies.
    >
    > I'll have to comb my code for all the instances of "item in sequence"
    > statement because I suspect some of them are as unsafe as my first
    > example. Oh well. One more lesson learned.


    You may have far fewer unsafe cases than you think, depending
    on how you understood the answers you got, some of which
    were a bit confusing. Just to make sure it is clear
    what is going on in your example....

    >From the documentation of 'in':


    x in s True if an item of s is equal to x, else False

    (http://docs.python.org/library/stdtypes.html#sequence-types-str-unicode-list-tuple-buffer-xrange)

    Note the use of 'equal' there. So for lists and tuples,

    if x in s: dosomething

    is the same as

    for item in s:
    if item == x:
    do something
    break

    So:

    >>> s = ['sdb*&', 'uuyh', 'foo']
    >>> x = 'sdb*&'
    >>> x is s[0]

    False
    >>> x in s

    True

    (I used a string with special characters in it to avoid Python's
    interning of identifier-like strings so that x and s[0] would not be
    the same object).

    Your problem with the regex example is that re makes no promise that
    patterns compiled from the same source string will compare equal to
    each other. Thus their _equality_ is not guaranteed. Switching to
    using an equals comparison won't help you avoid your problem in
    the example you showed.

    Now, if you have a custom sequence type, 'in' and and an '==' loop
    might produce different results, since 'in' is evaluated by the special
    method __contains__ if it exists (and list iteration with equality if
    it doesn't). But the _intent_ of __contains__ is that comparison be
    by equality, not object identity, so if the two are not the same something
    weird is going on and there'd better be a good reason for it :)

    In summary, 'in' is the thing to use if you want to know if your
    sample object is _equal to_ any of the objects in the container.
    As long as equality is meaningful for the objects involved, there's
    no reason to switch to a loop.

    --
    R. David Murray http://www.bitdance.com
     
    R. David Murray, Mar 19, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bob
    Replies:
    1
    Views:
    388
    Shawn B.
    Jan 12, 2004
  2. Casey Hawthorne
    Replies:
    1
    Views:
    728
    Arne Vajhøj
    Mar 18, 2009
  3. Emanuele D'Arrigo

    Can I rely on...

    Emanuele D'Arrigo, Mar 19, 2009, in forum: Python
    Replies:
    6
    Views:
    321
    Bruno Desthuilliers
    Mar 20, 2009
  4. Fredrik Eriksson
    Replies:
    7
    Views:
    267
    Fredrik Eriksson
    Oct 19, 2011
  5. Edwin Fine
    Replies:
    2
    Views:
    118
Loading...

Share This Page