list comprehension problem

mk · Oct 29, 2009

Hello everyone,

print hosts
hosts = [ s.strip() for s in hosts if s is not '' and s is not None and
s is not '\n' ]
print hosts

['9.156.44.227\n', '9.156.46.34 \n', '\n']
['9.156.44.227', '9.156.46.34', '']

Why does the hosts list after list comprehension still contain '' in
last position?

I checked that:

print hosts
hosts = [ s.strip() for s in hosts if s != '' and s != None and s != '\n' ]
print hosts

...works as expected:

['9.156.44.227\n', '9.156.46.34 \n', '\n']
['9.156.44.227', '9.156.46.34']

Are there two '\n' strings in the interpreter's memory or smth so the
identity check "s is not '\n'" does not work as expected?

This is weird. I expected that at all times there is only one '\n'
string in Python's cache or whatever that all labels meant by the
programmer as '\n' string actually point to. Is that wrong assumption?

Regards,
mk

Diez B. Roggisch · Oct 29, 2009

mk said:
Hello everyone,

print hosts
hosts = [ s.strip() for s in hosts if s is not '' and s is not None and
s is not '\n' ]
print hosts

['9.156.44.227\n', '9.156.46.34 \n', '\n']
['9.156.44.227', '9.156.46.34', '']

Why does the hosts list after list comprehension still contain '' in
last position?

I checked that:

print hosts
hosts = [ s.strip() for s in hosts if s != '' and s != None and s != '\n'
] print hosts

..works as expected:

['9.156.44.227\n', '9.156.46.34 \n', '\n']
['9.156.44.227', '9.156.46.34']

Are there two '\n' strings in the interpreter's memory or smth so the
identity check "s is not '\n'" does not work as expected?

This is weird. I expected that at all times there is only one '\n'
string in Python's cache or whatever that all labels meant by the
programmer as '\n' string actually point to. Is that wrong assumption?

Yes. Never use "is" unless you know 100% that you are talking about the same
object, not just equality.

Diez

Falcolas · Oct 29, 2009

mk said:
mk said:

Hello everyone,

Click to expand...

print hosts
hosts = [ s.strip() for s in hosts if s is not '' and s is not None and
s is not '\n' ]
print hosts

Click to expand...

['9.156.44.227\n', '9.156.46.34 \n', '\n']
['9.156.44.227', '9.156.46.34', '']

Click to expand...

Why does the hosts list after list comprehension still contain '' in
last position?

Click to expand...

I checked that:

Click to expand...

print hosts
hosts = [ s.strip() for s in hosts if s != '' and s != None and s != '\n'
] print hosts

Click to expand...

..works as expected:

Click to expand...

['9.156.44.227\n', '9.156.46.34 \n', '\n']
['9.156.44.227', '9.156.46.34']

Click to expand...

Are there two '\n' strings in the interpreter's memory or smth so the
identity check "s is not '\n'" does not work as expected?

Click to expand...

This is weird. I expected that at all times there is only one '\n'
string in Python's cache or whatever that all labels meant by the
programmer as '\n' string actually point to. Is that wrong assumption?

Click to expand...

Yes. Never use "is" unless you know 100% that you are talking about the same
object, not just equality.

Diez

I'd also recommend trying the following filter, since it is identical
to what you're trying to do, and will probably catch some additional
edge cases without any additional effort from you.

[s.strip() for s in hosts if s.strip()]

This will check the results of s.strip(), and since empty strings are
considered false, they will not make it into your results.

Garrick

MRAB · Oct 29, 2009

Diez said:
mk said:

Hello everyone,

print hosts
hosts = [ s.strip() for s in hosts if s is not '' and s is not None and
s is not '\n' ]
print hosts

['9.156.44.227\n', '9.156.46.34 \n', '\n']
['9.156.44.227', '9.156.46.34', '']

Why does the hosts list after list comprehension still contain '' in
last position?

I checked that:

print hosts
hosts = [ s.strip() for s in hosts if s != '' and s != None and s != '\n'
] print hosts

..works as expected:

['9.156.44.227\n', '9.156.46.34 \n', '\n']
['9.156.44.227', '9.156.46.34']

Are there two '\n' strings in the interpreter's memory or smth so the
identity check "s is not '\n'" does not work as expected?

This is weird. I expected that at all times there is only one '\n'
string in Python's cache or whatever that all labels meant by the
programmer as '\n' string actually point to. Is that wrong assumption?

Click to expand...

Yes. Never use "is" unless you know 100% that you are talking about the same
object, not just equality.

Some objects are singletons, ie there's only ever one of them. The most
common singleton is None. In virtually every other case you should be
using "==" and "!=".

Bruno Desthuilliers · Oct 29, 2009

Falcolas a écrit :
(snip)

>
I'd also recommend trying the following filter, since it is identical
to what you're trying to do, and will probably catch some additional
edge cases without any additional effort from you.

[s.strip() for s in hosts if s.strip()]

The problem with this expression is that it calls str.strip two times...
Sometimes, a more lispish approach is better:

whatever = filter(None, map(str.strip, hosts))

or just a plain procedural loop:

whatever = []
for s in hosts:
s = s.strip()
if s:
whatever.append(s)

As far as I'm concerned, I have a clear preference for the first
version, but well, YMMV...

Bruno Desthuilliers · Oct 29, 2009

mk a écrit :

Hello everyone,

print hosts

['9.156.44.227\n', '9.156.46.34 \n', '\n']

Just for the record, where did you get this "hosts" list from ? (hint :
depending on the answer, there might be a way to avoid having to filter
out the list)

Nick Stinemates · Oct 30, 2009

Some objects are singletons, ie there's only ever one of them. The most
common singleton is None. In virtually every other case you should be
using "==" and "!=".

Please correct me if I am wrong, but I believe you meant to say some
objects are immutable, in which case you would be correct.

alex23 · Oct 30, 2009

Please correct me if I am wrong, but I believe you meant to say some
objects are immutable, in which case you would be correct.

You're completely wrong. Immutability has nothing to do with identity,
which is what 'is' is testing for:
True

MRAB was refering to the singleton pattern[1], of which None is the
predominant example in Python. None is _always_ None, as it's always
the same object.

1: http://en.wikipedia.org/wiki/Singleton_pattern

Terry Reedy · Oct 30, 2009

alex23 said:
You're completely wrong. Immutability has nothing to do with identity,
which is what 'is' is testing for:

What immutability has to do with identity is that 'two' immutable
objects with the same value *may* actually be the same object,
*depending on the particular version of a particular implementation*.

Whether or not this is 'another' object or the same object is irrelevant
for all purposes except identity checking. It is completely up to the
interpreter.

False

In this case, but it could have been True.

True

MRAB was refering to the singleton pattern[1], of which None is the
predominant example in Python. None is _always_ None, as it's always
the same object.

And in 3.x, the same is true of True and False.

alex23 · Oct 31, 2009

Terry Reedy said:
What immutability has to do with identity is that 'two' immutable
objects with the same value *may* actually be the same object,
*depending on the particular version of a particular implementation*.

See, I prefer a little more certainty in my code. Isn't this why we
continually caution people against relying on implementation details?

In this case, but it could have been True.

Yes, and if my aunt had a penis she'd be my uncle. But she doesn't. So
what's the point here? Under certain implementations, _some_ immutable
objects _may_ share identity, but you shouldn't rely on it? Are you
trying to advocate a use for this behaviour by highlighting it?

I'm honestly not getting your point here.

MRAB was refering to the singleton pattern[1], of which None is the
predominant example in Python. None is _always_ None, as it's always
the same object.

Click to expand...

And in 3.x, the same is true of True and False.

None of which refutes or lessens anything I wrote. What my post was
_countering_ was the claim that immutables should use identity checks,
mutables should use equality checks. That the implementation caches
_some_ objects for performance reasons certainly doesn't make that
claim any less wrong.

Again, what was the point of this other than "things differ on the
implementation level"? Isn't it better to talk about the level of the
language that you can _expect_ to be consistent?

Terry Reedy · Oct 31, 2009

alex23 said:
....
> I'm honestly not getting your point here.

Let me try again, a bit differently.

I claim that the second statement, and therefor the first, can be seen
as wrong. I also claim that (Python) programmers need to understand why.

In mathematics, we generally have immutable values whose 'identity' is
their value. There is, for example, only one, immutable, empty set.

In informatics, and in particular in Python, in order to have
mutability, we have objects with value and an identity that is separate
from their value. There can be, for example, multiple mutable empty
sets. Identity is important because we must care about which empty set
we add things to. 'Identity' is only needed because of 'mutability', so
it is mistaken to say they have nothing to do with each other.

Ideally, from both a conceptual and space efficiency view, an
implementation would allow only one copy for each value of immutable
classes. This is what new programmers assume when they blithely use 'is'
instead of '==' (as would usually be correct in math).

However, for time efficiency reasons, there is no unique copy guarantee,
so one must use '==' instead of 'is', except in those few cases where
there is a unique copy guarantee, either by the language spec or by
one's own design, when one must use 'is' and not '=='. Here 'must'
means 'must to be generally assured of program correctness as intended'.

We obviously agree on this guideline.

Terry Jan Reedy

Steven D'Aprano · Nov 1, 2009

Let me try again, a bit differently.

I claim that the second statement, and therefor the first, can be seen
as wrong. I also claim that (Python) programmers need to understand why.

In mathematics, we generally have immutable values whose 'identity' is
their value. There is, for example, only one, immutable, empty set.

I think it's more than that -- I don't think pure mathematics makes any
distinction at all between identity and equality. There are no instances
at all, so you can't talk about individual values. It's not that the
empty set is a singleton, because the empty set isn't a concrete object-
with-existence at all. It's an abstraction, and as such, questions of
"how many separate empty sets are there?" are meaningless.

There are an infinite number of empty sets that differ according to their
construction:

The set of all American Presidents called Boris Nogoodnik.
The set of all human languages with exactly one noun and one verb.
The set of all fire-breathing mammals.
The set of all real numbers equal to sqrt(-1).
The set of all even prime numbers other than 2.
The set of all integers between 0 and 1 exclusive.
The set of all integers between 1 and 2 exclusive.
The set of all positive integers between 2/5 and 4/5.
The set of all multiples of five between 26 and 29.
The set of all non-zero circles in Euclidean geometry where the radius
equals the circumference.
....

I certainly wouldn't say all fire-breathing mammals are integers between
0 and 1, so those sets are "different", and yet clearly they're also "the
same" in some sense. I think this demonstrates that the question of how
many different empty sets is meaningless -- it depends on what you mean
by different and how many.

In informatics, and in particular in Python, in order to have
mutability, we have objects with value and an identity that is separate
from their value.

I think you have this backwards. We have value and identity because of
the hardware we use -- we store values in memory locations, which gives
identity. Our universe imposes the distinction between value and
identity. To simulate immutability and singletons is hard, and needs to
be worked at.

Nevertheless, it would be possible to go the other way. Given
hypothetical hardware which only supported mutable singletons, we could
simulate multiple instances. It would be horribly inefficient, but it
could be done. Imagine a singleton-mutable-set implementation, something
like this:

class set:
def __init__(id):
return singleton
def add(id, obj):
singleton.elements.append((id, obj))
def __contains__(id, element)
return (id, obj) in singleton.elements

and so forth.

You might notice that this is not terribly different from how one might
define non-singleton sets. The difference being, Python sets have
identity implied by storage in distinct memory locations, while this
hypothetical singleton-set has to explicitly code for identity.

There can be, for example, multiple mutable empty
sets. Identity is important because we must care about which empty set
we add things to. 'Identity' is only needed because of 'mutability', so
it is mistaken to say they have nothing to do with each other.

True, but it is not a mistake to say that identity and mutability are
independent: there are immutable singletons, and mutable singletons, and
immutable non-singletons, and mutable non-singletons. Clearly, knowing
that an object is mutable doesn't tell you whether it is a singleton or
not, and knowing it is a singleton doesn't tell you whether it is
immutable or not.

E.g. under normal circumstances modules are singletons, but they are
mutable; frozensets are immutable, but they are not singletons.

Ideally, from both a conceptual and space efficiency view, an
implementation would allow only one copy for each value of immutable
classes.

Ideally, from a complexity of implementation view, an implementation
would allow an unlimited number of copies of each value of immutable
classes.

This is what new programmers assume when they blithely use 'is'
instead of '==' (as would usually be correct in math).

Nah, I think you're crediting them with far more sophistication than they
actually have. I think most people in general, including many new
programmers, simply don't have a good grasp of the conceptual difference
between equality and identity. In plain language, "is" and its
grammatical forms "be", "are", "am" etc. have many meanings:

(1) Set membership testing:
Socrates is a man.
This is a hammer.

(2) Existence:
There is a computer language called Python.
There is a monster under the bed.

(3) Identity:
Iron Man is Tony Stark.
The butler is the murderer.

(4) Mathematical equality:
If x is 5, and y is 11, then y is 2x+1.

(5) Equivalence:
The winner of this race is the champion.
The diameter of a circle is twice the radius.

(6) Metaphoric equivalence:
Kali is death.
Life is like a box of chocolates.

(7) Testing of state:
My ankle is sore.
Fred is left-handed.

(8) Consequence
If George won the lottery, he would say he is happy.

(9) Cost
A cup of coffee is $3.

Only two of these usages work at all in any language I know of: equality
and identity testing, although it would be interesting to consider a
language that allowed type testing:

45 is an int -> returns True
"abc" is a float -> returns False

Some languages, like Hypertalk (by memory) and related languages, make
"is" a synonym for equals.

However, for time efficiency reasons, there is no unique copy guarantee,
so one must use '==' instead of 'is', except in those few cases where
there is a unique copy guarantee, either by the language spec or by
one's own design, when one must use 'is' and not '=='. Here 'must'
means 'must to be generally assured of program correctness as intended'.

We obviously agree on this guideline.

Yes.

Cousin Stanley · Nov 1, 2009

....
There are an infinite number of empty sets
that differ according to their construction:
....
The set of all fire-breathing mammals.
....

Apparently, you have never been a witness
to someone who recently ingested one of
Cousin Chuy's Super-Burritos ..........

Mel · Nov 1, 2009

Steven said:
(6) Metaphoric equivalence:
Kali is death.
Life is like a box of chocolates.

OK to here, but this one switches between metaphor and simile, and arguably,
between identity and equality.

Mel.

Mick Krippendorf · Nov 1, 2009

Steven said:
There are an infinite number of empty sets that differ according to their
construction:

The set of all American Presidents called Boris Nogoodnik.
The set of all human languages with exactly one noun and one verb.
The set of all fire-breathing mammals.
The set of all real numbers equal to sqrt(-1).
The set of all even prime numbers other than 2.
The set of all integers between 0 and 1 exclusive.
The set of all integers between 1 and 2 exclusive.
The set of all positive integers between 2/5 and 4/5.
The set of all multiples of five between 26 and 29.
The set of all non-zero circles in Euclidean geometry where the radius
equals the circumference.
...

Logically, they're all the same, by extensionality. There is of course a
difference between the reference of an expression and it's meaning, but
logical truth only depends on reference.

In mathematical logic 'the thing, that ...' can be expressed with the
iota operator (i...), defined like this:

((ia)phi e b) := (Ec)((c e b) & (Aa)((a = b) <-> phi)).

with phi being a formula, E and A the existential and universal
quantors, resp., e the membership relation, & the conjunction operator
and <-> the bi-conditional operator.

When we want find out if two sets s1 and s2 are the same we only need to
look at their extensions, so given:

(i s1)(Ay)(y e s1 <-> y is a fire-breathing animal)
(i s2)(Ay)(y e s2 <-> y is a real number equal to sqrt(-1))

we only need to find out if:

(Ax)(x is a fire-breathing animal <-> x is a real number equal to
sqrt(-1)).

And since there are neither such things, it follows that s1 = s2.

BTW, '=' is usually defined as:

a = b := (AabP)(Pa <-> Pb)

AKA the Leibniz-Principle, but this definition is 2nd order logic. If we
have sets at our disposal when we're axiomatisizing mathematics, we can
also define it 1st-orderly:

a = b := (Aabc)((a e c) <-> (b e c))

Regargs,
Mick.

Steven D'Aprano · Nov 1, 2009

When we want find out if two sets s1 and s2 are the same we only need to
look at their extensions, so given:

(i s1)(Ay)(y e s1 <-> y is a fire-breathing animal) (i s2)(Ay)(y e s2
<-> y is a real number equal to sqrt(-1))

we only need to find out if:

(Ax)(x is a fire-breathing animal <-> x is a real number equal to
sqrt(-1)).

And since there are neither such things, it follows that s1 = s2.

That assumes that all({}) is defined as true. That is a common definition
(Python uses it), it is what classical logic uses, and it often leads to
the "obvious" behaviour you want, but there is no a priori reason to
accept that all({}) is true, and indeed it leads to some difficulties:

All invisible men are alive.
All invisible men are dead.

are both true. Consequently, not all logic systems accept vacuous truths.

http://en.wikipedia.org/wiki/Vacuous_truth

Mick Krippendorf · Nov 2, 2009

Steven said:
That assumes that all({}) is defined as true. That is a common definition
(Python uses it), it is what classical logic uses, and it often leads to
the "obvious" behaviour you want, but there is no a priori reason to
accept that all({}) is true, and indeed it leads to some difficulties:

All invisible men are alive.
All invisible men are dead.

are both true. Consequently, not all logic systems accept vacuous truths.

http://en.wikipedia.org/wiki/Vacuous_truth

You're right, of course, but I'm an oldfashioned quinean guy

Also,
in relevance logic and similar systems my beloved proof that there are
no facts (Davidson's Slingshot) goes down the drain. So I think I'll
stay with classical logic FTTB.

Regards,
Mick.

Aahz · Nov 2, 2009

I'd also recommend trying the following filter, since it is identical
to what you're trying to do, and will probably catch some additional
edge cases without any additional effort from you.

[s.strip() for s in hosts if s.strip()]

This breaks if s might be None
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

[on old computer technologies and programmers] "Fancy tail fins on a
brand new '59 Cadillac didn't mean throwing out a whole generation of
mechanics who started with model As." --Andrew Dalke

Krister Svanlund · Nov 2, 2009

I'd also recommend trying the following filter, since it is identical
to what you're trying to do, and will probably catch some additional
edge cases without any additional effort from you.

[s.strip() for s in hosts if s.strip()]

Click to expand...

This breaks if s might be None

If you don't want Nones in your list just make a check for it...
[s.strip() for s in hosts if s is not None and s.strip()]

Paul Rudin · Nov 3, 2009

Falcolas said:
[s.strip() for s in hosts if s.strip()]

There's something in me that rebels against seeing the same call
twice. I'd probably write:

filter(None, (s.strip() for s in hosts))

Python List Comprehension Error: Unexpected Output	1	Aug 28, 2023
Code suggestion - List comprehension	0	Dec 12, 2013
is list comprehension necessary?	15	Oct 26, 2010
List comprehension for testing **params	10	Nov 11, 2012
looping versus comprehension	0	Jan 30, 2013
list comprehension misbehaving	1	Mar 28, 2013
List comprehension vs filter()	6	Apr 20, 2011
Range / empty list issues??	1	Dec 11, 2023

list comprehension problem

mk

Diez B. Roggisch

Falcolas

MRAB

Bruno Desthuilliers

Bruno Desthuilliers

Nick Stinemates

alex23

Terry Reedy

alex23

Terry Reedy

Steven D'Aprano

Cousin Stanley

Mel

Mick Krippendorf

Steven D'Aprano

Mick Krippendorf

Aahz

Krister Svanlund

Paul Rudin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads