list comprehension problem

Discussion in 'Python' started by mk, Oct 29, 2009.

  1. mk

    mk Guest

    Hello everyone,

    print hosts
    hosts = [ s.strip() for s in hosts if s is not '' and s is not None and
    s is not '\n' ]
    print hosts

    ['9.156.44.227\n', '9.156.46.34 \n', '\n']
    ['9.156.44.227', '9.156.46.34', '']

    Why does the hosts list after list comprehension still contain '' in
    last position?

    I checked that:

    print hosts
    hosts = [ s.strip() for s in hosts if s != '' and s != None and s != '\n' ]
    print hosts

    ...works as expected:

    ['9.156.44.227\n', '9.156.46.34 \n', '\n']
    ['9.156.44.227', '9.156.46.34']


    Are there two '\n' strings in the interpreter's memory or smth so the
    identity check "s is not '\n'" does not work as expected?

    This is weird. I expected that at all times there is only one '\n'
    string in Python's cache or whatever that all labels meant by the
    programmer as '\n' string actually point to. Is that wrong assumption?



    Regards,
    mk
    mk, Oct 29, 2009
    #1
    1. Advertising

  2. mk wrote:

    > Hello everyone,
    >
    > print hosts
    > hosts = [ s.strip() for s in hosts if s is not '' and s is not None and
    > s is not '\n' ]
    > print hosts
    >
    > ['9.156.44.227\n', '9.156.46.34 \n', '\n']
    > ['9.156.44.227', '9.156.46.34', '']
    >
    > Why does the hosts list after list comprehension still contain '' in
    > last position?
    >
    > I checked that:
    >
    > print hosts
    > hosts = [ s.strip() for s in hosts if s != '' and s != None and s != '\n'
    > ] print hosts
    >
    > ..works as expected:
    >
    > ['9.156.44.227\n', '9.156.46.34 \n', '\n']
    > ['9.156.44.227', '9.156.46.34']
    >
    >
    > Are there two '\n' strings in the interpreter's memory or smth so the
    > identity check "s is not '\n'" does not work as expected?
    >
    > This is weird. I expected that at all times there is only one '\n'
    > string in Python's cache or whatever that all labels meant by the
    > programmer as '\n' string actually point to. Is that wrong assumption?


    Yes. Never use "is" unless you know 100% that you are talking about the same
    object, not just equality.

    Diez
    Diez B. Roggisch, Oct 29, 2009
    #2
    1. Advertising

  3. mk

    Falcolas Guest

    On Oct 29, 9:31 am, "Diez B. Roggisch" <> wrote:
    > mk wrote:
    > > Hello everyone,

    >
    > > print hosts
    > > hosts = [ s.strip() for s in hosts if s is not '' and s is not None and
    > > s is not '\n' ]
    > > print hosts

    >
    > > ['9.156.44.227\n', '9.156.46.34 \n', '\n']
    > > ['9.156.44.227', '9.156.46.34', '']

    >
    > > Why does the hosts list after list comprehension still contain '' in
    > > last position?

    >
    > > I checked that:

    >
    > > print hosts
    > > hosts = [ s.strip() for s in hosts if s != '' and s != None and s != '\n'
    > > ] print hosts

    >
    > > ..works as expected:

    >
    > > ['9.156.44.227\n', '9.156.46.34 \n', '\n']
    > > ['9.156.44.227', '9.156.46.34']

    >
    > > Are there two '\n' strings in the interpreter's memory or smth so the
    > > identity check "s is not '\n'" does not work as expected?

    >
    > > This is weird. I expected that at all times there is only one '\n'
    > > string in Python's cache or whatever that all labels meant by the
    > > programmer as '\n' string actually point to. Is that wrong assumption?

    >
    > Yes. Never use "is" unless you know 100% that you are talking about the same
    > object, not just equality.
    >
    > Diez


    I'd also recommend trying the following filter, since it is identical
    to what you're trying to do, and will probably catch some additional
    edge cases without any additional effort from you.

    [s.strip() for s in hosts if s.strip()]

    This will check the results of s.strip(), and since empty strings are
    considered false, they will not make it into your results.

    Garrick
    Falcolas, Oct 29, 2009
    #3
  4. mk

    MRAB Guest

    Diez B. Roggisch wrote:
    > mk wrote:
    >
    >> Hello everyone,
    >>
    >> print hosts
    >> hosts = [ s.strip() for s in hosts if s is not '' and s is not None and
    >> s is not '\n' ]
    >> print hosts
    >>
    >> ['9.156.44.227\n', '9.156.46.34 \n', '\n']
    >> ['9.156.44.227', '9.156.46.34', '']
    >>
    >> Why does the hosts list after list comprehension still contain '' in
    >> last position?
    >>
    >> I checked that:
    >>
    >> print hosts
    >> hosts = [ s.strip() for s in hosts if s != '' and s != None and s != '\n'
    >> ] print hosts
    >>
    >> ..works as expected:
    >>
    >> ['9.156.44.227\n', '9.156.46.34 \n', '\n']
    >> ['9.156.44.227', '9.156.46.34']
    >>
    >>
    >> Are there two '\n' strings in the interpreter's memory or smth so the
    >> identity check "s is not '\n'" does not work as expected?
    >>
    >> This is weird. I expected that at all times there is only one '\n'
    >> string in Python's cache or whatever that all labels meant by the
    >> programmer as '\n' string actually point to. Is that wrong assumption?

    >
    > Yes. Never use "is" unless you know 100% that you are talking about the same
    > object, not just equality.
    >

    Some objects are singletons, ie there's only ever one of them. The most
    common singleton is None. In virtually every other case you should be
    using "==" and "!=".
    MRAB, Oct 29, 2009
    #4
  5. Falcolas a écrit :
    (snip)
    >
    > I'd also recommend trying the following filter, since it is identical
    > to what you're trying to do, and will probably catch some additional
    > edge cases without any additional effort from you.
    >
    > [s.strip() for s in hosts if s.strip()]


    The problem with this expression is that it calls str.strip two times...
    Sometimes, a more lispish approach is better:

    whatever = filter(None, map(str.strip, hosts))

    or just a plain procedural loop:

    whatever = []
    for s in hosts:
    s = s.strip()
    if s:
    whatever.append(s)

    As far as I'm concerned, I have a clear preference for the first
    version, but well, YMMV...
    Bruno Desthuilliers, Oct 29, 2009
    #5
  6. mk a écrit :
    > Hello everyone,
    >
    > print hosts
    >
    > ['9.156.44.227\n', '9.156.46.34 \n', '\n']
    >



    Just for the record, where did you get this "hosts" list from ? (hint :
    depending on the answer, there might be a way to avoid having to filter
    out the list)
    Bruno Desthuilliers, Oct 29, 2009
    #6

  7. > Some objects are singletons, ie there's only ever one of them. The most
    > common singleton is None. In virtually every other case you should be
    > using "==" and "!=".


    Please correct me if I am wrong, but I believe you meant to say some
    objects are immutable, in which case you would be correct.

    > --
    > http://mail.python.org/mailman/listinfo/python-list
    Nick Stinemates, Oct 30, 2009
    #7
  8. mk

    alex23 Guest

    On Oct 30, 1:10 pm, Nick Stinemates <> wrote:
    > > Some objects are singletons, ie there's only ever one of them. The most
    > > common singleton is None. In virtually every other case you should be
    > > using "==" and "!=".

    >
    > Please correct me if I am wrong, but I believe you meant to say some
    > objects are immutable, in which case you would be correct.


    You're completely wrong. Immutability has nothing to do with identity,
    which is what 'is' is testing for:

    >>> t1 = (1,2,3) # an immutable object
    >>> t2 = (1,2,3) # another immutable object
    >>> t1 is t2

    False
    >>> t1 == t2

    True

    MRAB was refering to the singleton pattern[1], of which None is the
    predominant example in Python. None is _always_ None, as it's always
    the same object.

    1: http://en.wikipedia.org/wiki/Singleton_pattern
    alex23, Oct 30, 2009
    #8
  9. mk

    Terry Reedy Guest

    alex23 wrote:
    > On Oct 30, 1:10 pm, Nick Stinemates <> wrote:
    >>> Some objects are singletons, ie there's only ever one of them. The most
    >>> common singleton is None. In virtually every other case you should be
    >>> using "==" and "!=".

    >> Please correct me if I am wrong, but I believe you meant to say some
    >> objects are immutable, in which case you would be correct.

    >
    > You're completely wrong. Immutability has nothing to do with identity,
    > which is what 'is' is testing for:


    What immutability has to do with identity is that 'two' immutable
    objects with the same value *may* actually be the same object,
    *depending on the particular version of a particular implementation*.

    >
    >>>> t1 = (1,2,3) # an immutable object
    >>>> t2 = (1,2,3) # another immutable object


    Whether or not this is 'another' object or the same object is irrelevant
    for all purposes except identity checking. It is completely up to the
    interpreter.

    >>>> t1 is t2

    > False


    In this case, but it could have been True.

    >>>> t1 == t2

    > True
    >
    > MRAB was refering to the singleton pattern[1], of which None is the
    > predominant example in Python. None is _always_ None, as it's always
    > the same object.


    And in 3.x, the same is true of True and False.
    Terry Reedy, Oct 30, 2009
    #9
  10. mk

    alex23 Guest

    Terry Reedy <> wrote:
    > alex23 wrote:
    > > You're completely wrong. Immutability has nothing to do with identity,
    > > which is what 'is' is testing for:

    >
    > What immutability has to do with identity is that 'two' immutable
    > objects with the same value *may* actually be the same object,
    > *depending on the particular version of a particular implementation*.


    See, I prefer a little more certainty in my code. Isn't this why we
    continually caution people against relying on implementation details?

    > >>>> t1 is t2

    > > False

    > In this case, but it could have been True.


    Yes, and if my aunt had a penis she'd be my uncle. But she doesn't. So
    what's the point here? Under certain implementations, _some_ immutable
    objects _may_ share identity, but you shouldn't rely on it? Are you
    trying to advocate a use for this behaviour by highlighting it?

    I'm honestly not getting your point here.

    > > MRAB was refering to the singleton pattern[1], of which None is the
    > > predominant example in Python. None is _always_ None, as it's always
    > > the same object.

    >
    > And in 3.x, the same is true of True and False.


    None of which refutes or lessens anything I wrote. What my post was
    _countering_ was the claim that immutables should use identity checks,
    mutables should use equality checks. That the implementation caches
    _some_ objects for performance reasons certainly doesn't make that
    claim any less wrong.

    Again, what was the point of this other than "things differ on the
    implementation level"? Isn't it better to talk about the level of the
    language that you can _expect_ to be consistent?
    alex23, Oct 31, 2009
    #10
  11. mk

    Terry Reedy Guest

    alex23 wrote:
    > Terry Reedy <> wrote:
    >> alex23 wrote:
    >>> You're completely wrong. Immutability has nothing to do with identity,

    ....
    > I'm honestly not getting your point here.


    Let me try again, a bit differently.

    I claim that the second statement, and therefor the first, can be seen
    as wrong. I also claim that (Python) programmers need to understand why.

    In mathematics, we generally have immutable values whose 'identity' is
    their value. There is, for example, only one, immutable, empty set.

    In informatics, and in particular in Python, in order to have
    mutability, we have objects with value and an identity that is separate
    from their value. There can be, for example, multiple mutable empty
    sets. Identity is important because we must care about which empty set
    we add things to. 'Identity' is only needed because of 'mutability', so
    it is mistaken to say they have nothing to do with each other.

    Ideally, from both a conceptual and space efficiency view, an
    implementation would allow only one copy for each value of immutable
    classes. This is what new programmers assume when they blithely use 'is'
    instead of '==' (as would usually be correct in math).

    However, for time efficiency reasons, there is no unique copy guarantee,
    so one must use '==' instead of 'is', except in those few cases where
    there is a unique copy guarantee, either by the language spec or by
    one's own design, when one must use 'is' and not '=='. Here 'must'
    means 'must to be generally assured of program correctness as intended'.

    We obviously agree on this guideline.

    Terry Jan Reedy
    Terry Reedy, Oct 31, 2009
    #11
  12. On Sat, 31 Oct 2009 14:12:40 -0400, Terry Reedy wrote:

    > alex23 wrote:
    >> Terry Reedy <> wrote:
    >>> alex23 wrote:
    >>>> You're completely wrong. Immutability has nothing to do with
    >>>> identity,

    > ...
    > > I'm honestly not getting your point here.

    >
    > Let me try again, a bit differently.
    >
    > I claim that the second statement, and therefor the first, can be seen
    > as wrong. I also claim that (Python) programmers need to understand why.
    >
    > In mathematics, we generally have immutable values whose 'identity' is
    > their value. There is, for example, only one, immutable, empty set.



    I think it's more than that -- I don't think pure mathematics makes any
    distinction at all between identity and equality. There are no instances
    at all, so you can't talk about individual values. It's not that the
    empty set is a singleton, because the empty set isn't a concrete object-
    with-existence at all. It's an abstraction, and as such, questions of
    "how many separate empty sets are there?" are meaningless.

    There are an infinite number of empty sets that differ according to their
    construction:

    The set of all American Presidents called Boris Nogoodnik.
    The set of all human languages with exactly one noun and one verb.
    The set of all fire-breathing mammals.
    The set of all real numbers equal to sqrt(-1).
    The set of all even prime numbers other than 2.
    The set of all integers between 0 and 1 exclusive.
    The set of all integers between 1 and 2 exclusive.
    The set of all positive integers between 2/5 and 4/5.
    The set of all multiples of five between 26 and 29.
    The set of all non-zero circles in Euclidean geometry where the radius
    equals the circumference.
    ....

    I certainly wouldn't say all fire-breathing mammals are integers between
    0 and 1, so those sets are "different", and yet clearly they're also "the
    same" in some sense. I think this demonstrates that the question of how
    many different empty sets is meaningless -- it depends on what you mean
    by different and how many.



    > In informatics, and in particular in Python, in order to have
    > mutability, we have objects with value and an identity that is separate
    > from their value.


    I think you have this backwards. We have value and identity because of
    the hardware we use -- we store values in memory locations, which gives
    identity. Our universe imposes the distinction between value and
    identity. To simulate immutability and singletons is hard, and needs to
    be worked at.

    Nevertheless, it would be possible to go the other way. Given
    hypothetical hardware which only supported mutable singletons, we could
    simulate multiple instances. It would be horribly inefficient, but it
    could be done. Imagine a singleton-mutable-set implementation, something
    like this:

    class set:
    def __init__(id):
    return singleton
    def add(id, obj):
    singleton.elements.append((id, obj))
    def __contains__(id, element)
    return (id, obj) in singleton.elements


    and so forth.

    You might notice that this is not terribly different from how one might
    define non-singleton sets. The difference being, Python sets have
    identity implied by storage in distinct memory locations, while this
    hypothetical singleton-set has to explicitly code for identity.



    > There can be, for example, multiple mutable empty
    > sets. Identity is important because we must care about which empty set
    > we add things to. 'Identity' is only needed because of 'mutability', so
    > it is mistaken to say they have nothing to do with each other.


    True, but it is not a mistake to say that identity and mutability are
    independent: there are immutable singletons, and mutable singletons, and
    immutable non-singletons, and mutable non-singletons. Clearly, knowing
    that an object is mutable doesn't tell you whether it is a singleton or
    not, and knowing it is a singleton doesn't tell you whether it is
    immutable or not.

    E.g. under normal circumstances modules are singletons, but they are
    mutable; frozensets are immutable, but they are not singletons.


    > Ideally, from both a conceptual and space efficiency view, an
    > implementation would allow only one copy for each value of immutable
    > classes.


    Ideally, from a complexity of implementation view, an implementation
    would allow an unlimited number of copies of each value of immutable
    classes.


    > This is what new programmers assume when they blithely use 'is'
    > instead of '==' (as would usually be correct in math).


    Nah, I think you're crediting them with far more sophistication than they
    actually have. I think most people in general, including many new
    programmers, simply don't have a good grasp of the conceptual difference
    between equality and identity. In plain language, "is" and its
    grammatical forms "be", "are", "am" etc. have many meanings:

    (1) Set membership testing:
    Socrates is a man.
    This is a hammer.

    (2) Existence:
    There is a computer language called Python.
    There is a monster under the bed.

    (3) Identity:
    Iron Man is Tony Stark.
    The butler is the murderer.

    (4) Mathematical equality:
    If x is 5, and y is 11, then y is 2x+1.

    (5) Equivalence:
    The winner of this race is the champion.
    The diameter of a circle is twice the radius.

    (6) Metaphoric equivalence:
    Kali is death.
    Life is like a box of chocolates.

    (7) Testing of state:
    My ankle is sore.
    Fred is left-handed.

    (8) Consequence
    If George won the lottery, he would say he is happy.

    (9) Cost
    A cup of coffee is $3.


    Only two of these usages work at all in any language I know of: equality
    and identity testing, although it would be interesting to consider a
    language that allowed type testing:

    45 is an int -> returns True
    "abc" is a float -> returns False

    Some languages, like Hypertalk (by memory) and related languages, make
    "is" a synonym for equals.


    > However, for time efficiency reasons, there is no unique copy guarantee,
    > so one must use '==' instead of 'is', except in those few cases where
    > there is a unique copy guarantee, either by the language spec or by
    > one's own design, when one must use 'is' and not '=='. Here 'must'
    > means 'must to be generally assured of program correctness as intended'.
    >
    > We obviously agree on this guideline.



    Yes.



    --
    Steven
    Steven D'Aprano, Nov 1, 2009
    #12

  13. > ....
    > There are an infinite number of empty sets
    > that differ according to their construction:
    > ....
    > The set of all fire-breathing mammals.
    > ....


    Apparently, you have never been a witness
    to someone who recently ingested one of
    Cousin Chuy's Super-Burritos .......... :)


    --
    Stanley C. Kitching
    Human Being
    Phoenix, Arizona
    Cousin Stanley, Nov 1, 2009
    #13
  14. mk

    Mel Guest

    Steven D'Aprano wrote:

    > (6) Metaphoric equivalence:
    > Kali is death.
    > Life is like a box of chocolates.


    OK to here, but this one switches between metaphor and simile, and arguably,
    between identity and equality.

    Mel.
    Mel, Nov 1, 2009
    #14
  15. Steven D'Aprano wrote:
    > There are an infinite number of empty sets that differ according to their
    > construction:
    >
    > The set of all American Presidents called Boris Nogoodnik.
    > The set of all human languages with exactly one noun and one verb.
    > The set of all fire-breathing mammals.
    > The set of all real numbers equal to sqrt(-1).
    > The set of all even prime numbers other than 2.
    > The set of all integers between 0 and 1 exclusive.
    > The set of all integers between 1 and 2 exclusive.
    > The set of all positive integers between 2/5 and 4/5.
    > The set of all multiples of five between 26 and 29.
    > The set of all non-zero circles in Euclidean geometry where the radius
    > equals the circumference.
    > ...


    Logically, they're all the same, by extensionality. There is of course a
    difference between the reference of an expression and it's meaning, but
    logical truth only depends on reference.

    In mathematical logic 'the thing, that ...' can be expressed with the
    iota operator (i...), defined like this:

    ((ia)phi e b) := (Ec)((c e b) & (Aa)((a = b) <-> phi)).

    with phi being a formula, E and A the existential and universal
    quantors, resp., e the membership relation, & the conjunction operator
    and <-> the bi-conditional operator.

    When we want find out if two sets s1 and s2 are the same we only need to
    look at their extensions, so given:

    (i s1)(Ay)(y e s1 <-> y is a fire-breathing animal)
    (i s2)(Ay)(y e s2 <-> y is a real number equal to sqrt(-1))

    we only need to find out if:

    (Ax)(x is a fire-breathing animal <-> x is a real number equal to
    sqrt(-1)).

    And since there are neither such things, it follows that s1 = s2.

    BTW, '=' is usually defined as:

    a = b := (AabP)(Pa <-> Pb)

    AKA the Leibniz-Principle, but this definition is 2nd order logic. If we
    have sets at our disposal when we're axiomatisizing mathematics, we can
    also define it 1st-orderly:

    a = b := (Aabc)((a e c) <-> (b e c))

    Regargs,
    Mick.
    Mick Krippendorf, Nov 1, 2009
    #15
  16. On Sun, 01 Nov 2009 21:32:15 +0100, Mick Krippendorf wrote:

    > When we want find out if two sets s1 and s2 are the same we only need to
    > look at their extensions, so given:
    >
    > (i s1)(Ay)(y e s1 <-> y is a fire-breathing animal) (i s2)(Ay)(y e s2
    > <-> y is a real number equal to sqrt(-1))
    >
    > we only need to find out if:
    >
    > (Ax)(x is a fire-breathing animal <-> x is a real number equal to
    > sqrt(-1)).
    >
    > And since there are neither such things, it follows that s1 = s2.


    That assumes that all({}) is defined as true. That is a common definition
    (Python uses it), it is what classical logic uses, and it often leads to
    the "obvious" behaviour you want, but there is no a priori reason to
    accept that all({}) is true, and indeed it leads to some difficulties:

    All invisible men are alive.
    All invisible men are dead.

    are both true. Consequently, not all logic systems accept vacuous truths.

    http://en.wikipedia.org/wiki/Vacuous_truth



    --
    Steven
    Steven D'Aprano, Nov 1, 2009
    #16
  17. Steven D'Aprano wrote:
    > On Sun, 01 Nov 2009 21:32:15 +0100, Mick Krippendorf wrote:
    >>
    >> (Ax)(x is a fire-breathing animal <-> x is a real number equal to
    >> sqrt(-1)).
    >>
    >> And since there are neither such things, it follows that s1 = s2.

    >
    > That assumes that all({}) is defined as true. That is a common definition
    > (Python uses it), it is what classical logic uses, and it often leads to
    > the "obvious" behaviour you want, but there is no a priori reason to
    > accept that all({}) is true, and indeed it leads to some difficulties:
    >
    > All invisible men are alive.
    > All invisible men are dead.
    >
    > are both true. Consequently, not all logic systems accept vacuous truths.
    >
    > http://en.wikipedia.org/wiki/Vacuous_truth


    You're right, of course, but I'm an oldfashioned quinean guy :) Also,
    in relevance logic and similar systems my beloved proof that there are
    no facts (Davidson's Slingshot) goes down the drain. So I think I'll
    stay with classical logic FTTB.

    Regards,
    Mick.
    Mick Krippendorf, Nov 2, 2009
    #17
  18. mk

    Aahz Guest

    In article <>,
    Falcolas <> wrote:
    >
    >I'd also recommend trying the following filter, since it is identical
    >to what you're trying to do, and will probably catch some additional
    >edge cases without any additional effort from you.
    >
    >[s.strip() for s in hosts if s.strip()]


    This breaks if s might be None
    --
    Aahz () <*> http://www.pythoncraft.com/

    [on old computer technologies and programmers] "Fancy tail fins on a
    brand new '59 Cadillac didn't mean throwing out a whole generation of
    mechanics who started with model As." --Andrew Dalke
    Aahz, Nov 2, 2009
    #18
  19. On Mon, Nov 2, 2009 at 10:11 PM, Aahz <> wrote:
    > In article <..com>,
    > Falcolas  <> wrote:
    >>
    >>I'd also recommend trying the following filter, since it is identical
    >>to what you're trying to do, and will probably catch some additional
    >>edge cases without any additional effort from you.
    >>
    >>[s.strip() for s in hosts if s.strip()]

    >
    > This breaks if s might be None


    If you don't want Nones in your list just make a check for it...
    [s.strip() for s in hosts if s is not None and s.strip()]
    Krister Svanlund, Nov 2, 2009
    #19
  20. mk

    Paul Rudin Guest

    Falcolas <> writes:

    > [s.strip() for s in hosts if s.strip()]


    There's something in me that rebels against seeing the same call
    twice. I'd probably write:

    filter(None, (s.strip() for s in hosts))
    Paul Rudin, Nov 3, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter Barth

    Mix lambda and list comprehension?

    Peter Barth, Jul 15, 2003, in forum: Python
    Replies:
    4
    Views:
    396
    Michele Simionato
    Jul 17, 2003
  2. Shane Geiger
    Replies:
    4
    Views:
    373
    bullockbefriending bard
    Mar 25, 2007
  3. Debajit Adhikary
    Replies:
    17
    Views:
    671
    Debajit Adhikary
    Oct 18, 2007
  4. Vedran Furac(
    Replies:
    4
    Views:
    316
    Marc 'BlackJack' Rintsch
    Dec 19, 2008
  5. Adrian Dragulescu
    Replies:
    1
    Views:
    382
    Diez B. Roggisch
    May 21, 2009
Loading...

Share This Page