Is there any way to say ignore case with "in"?

tinnews · Apr 4, 2008

Is there any way in python to say

if string1 in string2:
<do something>

ignoring the case of string1 and string2?

I know I could use:-

if lower(string1) in lower(string2):
<do something>

but it somehow feels there ought to be an easier (tidier?) way.

Fredrik Lundh · Apr 4, 2008

Is there any way in python to say

if string1 in string2:
<do something>

ignoring the case of string1 and string2?

if string1.lower() in string2.lower():
...

(there's no case-insensitive version of the "in" operator in stock Python)

</F>

7stud · Apr 4, 2008

Is there any way in python to say

if string1 in string2:
<do something>

ignoring the case of string1 and string2?

I know I could use:-

if lower(string1) in lower(string2):
<do something>

but it somehow feels there ought to be an easier (tidier?) way.

Easier? You mean like some kind of mind meld?

Steve Holden · Apr 5, 2008

7stud said:
Easier? You mean like some kind of mind meld?

That's right, DWIM mode Python. Rock on!

regards
Steve

Martin v. Löwis · Apr 6, 2008

I know I could use:-

Easier? You mean like some kind of mind meld?

Interestingly enough, it shouldn't be (but apparently is) obvious that

a.lower() in b.lower()

is a way of expressing "a is a substring of b, with case-insensitive
matching". Can we be sure that these are really the same concepts,
and if so, is

a.upper() in b.upper()

also equivalent?

It's probably a common assumption that, for any character c,
c.lower()==c.upper().lower(). Yet,

py> [i for i in range(65536) if unichr(i).upper().lower() !=
unichr(i).lower()]
[181, 305, 383, 837, 962, 976, 977, 981, 982, 1008, 1009, 1010, 1013,
7835, 8126]

Take, for example, U+017F, LATIN SMALL LETTER LONG S. It's .lower() is
the same character, as the character is already in lower case.
It's .upper() is U+0053, LATIN CAPITAL LETTER S. Notice that the LONG
is gone - there is no upper-case version of a "long s".
It's .upper().lower() is U+0073, LATIN SMALL LETTER S.

So should case-insensitive matching match the small s with the small
long s, as they have the same upper-case letter?

Regards,
Martin

ijoshua · Apr 6, 2008

That's right, DWIM mode Python. Rock on!

If it is common enough, define a custom type of string. I have
appended a simple version that should work for your example of `in`.
You would probably want to define all of the builtin str methods for
this class to be really useful.

Regards,
Josh

---
# cistr.py

import operator

class cistr(object):
"""A type of string that ignores character
case
for the right side of the `in`
operator.
eGgS')

True
"""
def __init__(self, string):
self.string = str(string).lower()

def __contains__(self, other):
return operator.contains(self.string, other.lower())

def __repr__(self):
return 'cistr(%r)'%(self.string)

def lower(self):
return self.string

if '__main__' == __name__:
string1 = 'AND'
string2 = 'sPaM aNd eGgS'
print '%r in %r ? %r' % (string1, string2, string1 in string2)
print '%r in %r ? %r' % (string1, cistr(string2), string1 in
cistr(string2))

Paul McGuire · Apr 6, 2008

Easier? You mean like some kind of mind meld?

Click to expand...

Interestingly enough, it shouldn't be (but apparently is) obvious that

a.lower() in b.lower()

is a way of expressing "a is a substring of b, with case-insensitive
matching". Can we be sure that these are really the same concepts,
and if so, is

a.upper() in b.upper()

also equivalent?

It's probably a common assumption that, for any character c,
c.lower()==c.upper().lower(). Yet,

py> [i for i in range(65536) if unichr(i).upper().lower() !=
unichr(i).lower()]
[181, 305, 383, 837, 962, 976, 977, 981, 982, 1008, 1009, 1010, 1013,
7835, 8126]

Take, for example, U+017F, LATIN SMALL LETTER LONG S. It's .lower() is
the same character, as the character is already in lower case.
It's .upper() is U+0053, LATIN CAPITAL LETTER S. Notice that the LONG
is gone - there is no upper-case version of a "long s".
It's .upper().lower() is U+0073, LATIN SMALL LETTER S.

So should case-insensitive matching match the small s with the small
long s, as they have the same upper-case letter?

Regards,
Martin

Another surprise (or maybe not so surprising) - this "upper != lower"
is not symmetric. Using the inverse of your list comp, I get

[i for i in range(65536) if unichr(i).lower().upper() !=

Click to expand...

Click to expand...

... unichr(i).upper()]
[304, 1012, 8486, 8490, 8491]

Instead of 15 exceptions to the rule, conversion to upper has only 5
exceptions. So perhaps comparsion of upper's is, while not foolproof,
less likely to encounter these exceptions? Or at least, simpler to
code explicit tests.

-- Paul

Mel · Apr 7, 2008

Paul said:
I know I could use:-
if lower(string1) in lower(string2):
<do something>
but it somehow feels there ought to be an easier (tidier?) way.

Click to expand...

Take, for example, U+017F, LATIN SMALL LETTER LONG S. It's .lower() is
the same character, as the character is already in lower case.
It's .upper() is U+0053, LATIN CAPITAL LETTER S. Notice that the LONG
is gone - there is no upper-case version of a "long s".
It's .upper().lower() is U+0073, LATIN SMALL LETTER S.

So should case-insensitive matching match the small s with the small
long s, as they have the same upper-case letter? [ ... ]

[i for i in range(65536) if unichr(i).lower().upper() !=

Click to expand...

Click to expand...

... unichr(i).upper()]
[304, 1012, 8486, 8490, 8491]

Instead of 15 exceptions to the rule, conversion to upper has only 5
exceptions. So perhaps comparsion of upper's is, while not foolproof,
less likely to encounter these exceptions? Or at least, simpler to
code explicit tests.

I don't know what meaning is carried by all those differences in
lower-case glyphs. Converting to upper seems to fold together a lot
of variant pi's and rho's which I think would be roughly a good thing.
I seem to recall that the tiny iota (ypogegrammeni) has or had
grammatical significance. The other effect would be conflating
physics' Angstron unit and Kelvin unit signs with ring-a and K.
Applicaton programmers beware.

Mel.

Did you know that there is a match-case function in python?	4	Dec 17, 2023
Is there any point of using Zimbra Mail over Gmail?	2	Nov 7, 2024
Help with statement Select Case in BASIC	7	Apr 19, 2022
Switch case in a JavaScript function	7	Dec 1, 2022
Is there a way to get a single mode using all the points within a 2D array?	2	Oct 17, 2022
any use case of logging.config.listen()?	2	Dec 3, 2013
Is there a way to pass this state from component to the fetch?	1	Apr 24, 2023
How to get expertise in "cyber security" or from where to start for this?	0	Apr 20, 2024

Is there any way to say ignore case with "in"?

tinnews

Fredrik Lundh

7stud

Steve Holden

Martin v. Löwis

ijoshua

Paul McGuire

Mel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads