None in string => TypeError?

R

Roy Smith

We noticed recently that:

raises (at least in Python 2.7)

TypeError: 'in <string>' requires string as left operand, not NoneType

This is surprising. The description of the 'in' operatator is, 'True if an item of s is equal to x, else False '. From that, I would assume it behaves as if it were written:

for item in iterable:
if item == x:
return True
else:
return False

why the extra type check for str.__contains__()? That seems very unpythonic. Duck typing, and all that.
 
I

Ian Kelly

We noticed recently that:


raises (at least in Python 2.7)

TypeError: 'in <string>' requires string as left operand, not NoneType

This is surprising. The description of the 'in' operatator is, 'True if an item of s is equal to x, else False '. From that, I would assume it behaves as if it were written:

for item in iterable:
if item == x:
return True
else:
return False

why the extra type check for str.__contains__()? That seems very unpythonic. Duck typing, and all that.

I guess for the same reason that you get a TypeError if you test
whether the number 4 is in a string: it can't ever be, so it's a
nonsensical comparison. It could return False, but the comparison is
more likely to be symptomatic of a bug in the code than intentional,
so it makes some noise instead.
 
P

Paul Sokolovsky

Hello,

We noticed recently that:


raises (at least in Python 2.7)

TypeError: 'in <string>' requires string as left operand, not NoneType

This is surprising. The description of the 'in' operatator is, 'True
if an item of s is equal to x, else False '. From that, I
would assume it behaves as if it were written:

for item in iterable:
if item == x:
return True
else:
return False

why the extra type check for str.__contains__()? That seems very
unpythonic. Duck typing, and all that. --

This is very Pythonic, Python is strictly typed language. There's no
way None could possibly be "inside" a string, so if you're trying to
look for it there, you're doing something wrong, and told so.

Also, it's not "extra check", it's "extra checks less", just consider
that "in" operator just checks types of its arguments for sanity once
at the start, and then just looks for a substring within string. You
suggest that it should check for each element type in a loop, which is
great waste, as once again, nothing but a string can be inside another
string.
 
M

MRAB

We noticed recently that:


raises (at least in Python 2.7)

TypeError: 'in <string>' requires string as left operand, not NoneType

This is surprising. The description of the 'in' operatator is, 'True if an item of s is equal to x, else False '. From that, I would assume it behaves as if it were written:

for item in iterable:
if item == x:
return True
else:
return False

why the extra type check for str.__contains__()? That seems very unpythonic. Duck typing, and all that.
When working with strings, it's not entirely the same. For example:
True

If you iterated over the string, it would return False.
 
S

Steven D'Aprano

We noticed recently that:


raises (at least in Python 2.7)

That goes back to at least Python 1.5, when member tests only accepted a
single character, not a substring:

Traceback (innermost last):
File "<stdin>", line 1, in ?
TypeError: string member test needs char left operand


It's a matter of taste whether predicate functions should always return a
bool, or sometimes raise an exception. Would you be surprised that this
raises TypeError?

"my string".startswith(None)


A predicate function could swallow any exception, e.g. be the logical
equivalent of:

try:
return True if the condition holds, else return False
except:
return False # or True as needed


but that is, I think, an anti-pattern, as it tends to hide errors rather
than be useful. Most of the time, doing `[] in "xyz"` is an error, so
returning False is not a useful thing to do.

I think that Python has been moving away from the "swallow exceptions"
model in favour of letting errors propagate. E.g. hasattr used to swallow
a lot more exceptions than it does now, and order comparisons (less than,
greater than etc.) of dissimilar types used to return a version-dependent
arbitrary but consistent result (e.g. all ints compared less than all
strings), but in Python 3 that is now an error.
 
S

Steven D'Aprano

Hello,



This is very Pythonic, Python is strictly typed language. There's no way
None could possibly be "inside" a string,

Then `None in some_string` could immediately return False, instead of
raising an exception.

so if you're trying to look
for it there, you're doing something wrong, and told so.

This, I think, is the important factor. `x in somestring` is almost
always an error if x is not a string. If you want to accept None as well:

x is not None and x in somestring

does the job nicely.
 
C

Chris Angelico

Then `None in some_string` could immediately return False, instead of
raising an exception.

Note, by the way, that CPython does have some optimizations that
immediately return False. If you ask if a 16-bit string is in an 8-bit
string, eg "\u1234" in "asdf", it knows instantly that it cannot
possibly be, and it just returns false. The "None in string" check is
different, and deliberately so.

I do prefer the thrown error. Some things make absolutely no sense,
and even if it's technically valid to say "No, the integer 61 is not
in the string 'asdf'", it's likely to be helpful to someone who thinks
that characters and integers are equivalent. You'll get an exception
immediately, instead of trying to figure out why it's returning False.

ChrisA
 
R

Roy Smith

This is very Pythonic, Python is strictly typed language. There's no
way None could possibly be "inside" a string, so if you're trying to
look for it there, you're doing something wrong, and told so.

Well, the code we've got is:

hourly_data = [(t if status in 'CSRP' else None) for (t, status) in hours]

where status can be None. I don't think I'm doing anything wrong. I wrote exactly what I mean :) We've changed it to:

hourly_data = [(t if (status and status in 'CSRP') else None) for (t, status) in hours]

but that's pretty ugly. In retrospect, I suspect:

hourly_data = [(t if status in set('CSRP') else None) for (t, status) in hours]

is a little cleaner.
 
C

Chris Angelico

In retrospect, I suspect:

hourly_data = [(t if status in set('CSRP') else None) for (t,
status) in hours]

is a little cleaner.

I'd go with this. It's clearer that a status of 'SR' should result in
False, not True. (Presumably that can never happen, but it's easier to
read.) I'd be inclined to use set literal syntax, even though it's a
bit longer - again to make it clear that these are four separate
strings that you're checking against.

Alternatively, you could go "if status or '0' in 'CSRP", which would
work, but be quite cryptic. (It would also mean that '' is not deemed
to be in the string, same as the set() transformation does.)

ChrisA
 
I

Ian Kelly

In retrospect, I suspect:

hourly_data = [(t if status in set('CSRP') else None) for (t,
status) in hours]

is a little cleaner.

I'd go with this. It's clearer that a status of 'SR' should result in
False, not True. (Presumably that can never happen, but it's easier to
read.) I'd be inclined to use set literal syntax, even though it's a
bit longer - again to make it clear that these are four separate
strings that you're checking against.

Depending on how much work this has to do, I might also consider
moving the set construction outside the list comprehension since it
doesn't need to be repeated on every iteration.
 
C

Chris Angelico

In retrospect, I suspect:

hourly_data = [(t if status in set('CSRP') else None) for (t,
status) in hours]

is a little cleaner.

I'd go with this. It's clearer that a status of 'SR' should result in
False, not True. (Presumably that can never happen, but it's easier to
read.) I'd be inclined to use set literal syntax, even though it's a
bit longer - again to make it clear that these are four separate
strings that you're checking against.

Depending on how much work this has to do, I might also consider
moving the set construction outside the list comprehension since it
doesn't need to be repeated on every iteration.

Set literal notation will accomplish that, too, for what it's worth.
hourly_data = [(t if status in {'C','S','R','P'} else None) for (t,
status) in hours]
2 0 LOAD_CONST 1 (<code object <listcomp> at
0x012BE660, file "<pyshell#10>", line 2>)
3 LOAD_CONST 2 ('x.<locals>.<listcomp>')
6 MAKE_FUNCTION 0
9 LOAD_GLOBAL 0 (hours)
12 GET_ITER
13 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
16 STORE_FAST 0 (hourly_data)
19 LOAD_CONST 0 (None)
22 RETURN_VALUE
dis.dis(x.__code__.co_consts[1])
2 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)9 UNPACK_SEQUENCE 2
12 STORE_FAST 1 (t)
15 STORE_FAST 2 (status)
18 LOAD_FAST 2 (status)
21 LOAD_CONST 5 (frozenset({'R', 'S', 'C', 'P'}))
24 COMPARE_OP 6 (in)
27 POP_JUMP_IF_FALSE 36
30 LOAD_FAST 1 (t)
33 JUMP_FORWARD 3 (to 39)
36 LOAD_CONST 4 (None)
39 LIST_APPEND 2 42 JUMP_ABSOLUTE 6
45 RETURN_VALUE
isinstance(x.__code__.co_consts[1].co_consts[5],set)
False

Interestingly, the literal appears to be a frozenset rather than a
regular set. The compiler must have figured out that it can never be
changed, and optimized.

Also, this is the first time I've seen None as a constant other than
the first. Usually co_consts[0] is None, but this time co_consts[4] is
None.

ChrisA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top