Is there any way to say ignore case with "in"?

T

tinnews

Is there any way in python to say

if string1 in string2:
<do something>

ignoring the case of string1 and string2?


I know I could use:-

if lower(string1) in lower(string2):
<do something>

but it somehow feels there ought to be an easier (tidier?) way.
 
F

Fredrik Lundh

Is there any way in python to say

if string1 in string2:
<do something>

ignoring the case of string1 and string2?

if string1.lower() in string2.lower():
...

(there's no case-insensitive version of the "in" operator in stock Python)

</F>
 
7

7stud

Is there any way in python to say

    if string1 in string2:
        <do something>

ignoring the case of string1 and string2?

I know I could use:-

    if lower(string1) in lower(string2):
        <do something>

but it somehow feels there ought to be an easier (tidier?) way.

Easier? You mean like some kind of mind meld?
 
M

Martin v. Löwis

I know I could use:-
Easier? You mean like some kind of mind meld?

Interestingly enough, it shouldn't be (but apparently is) obvious that

a.lower() in b.lower()

is a way of expressing "a is a substring of b, with case-insensitive
matching". Can we be sure that these are really the same concepts,
and if so, is

a.upper() in b.upper()

also equivalent?

It's probably a common assumption that, for any character c,
c.lower()==c.upper().lower(). Yet,

py> [i for i in range(65536) if unichr(i).upper().lower() !=
unichr(i).lower()]
[181, 305, 383, 837, 962, 976, 977, 981, 982, 1008, 1009, 1010, 1013,
7835, 8126]

Take, for example, U+017F, LATIN SMALL LETTER LONG S. It's .lower() is
the same character, as the character is already in lower case.
It's .upper() is U+0053, LATIN CAPITAL LETTER S. Notice that the LONG
is gone - there is no upper-case version of a "long s".
It's .upper().lower() is U+0073, LATIN SMALL LETTER S.

So should case-insensitive matching match the small s with the small
long s, as they have the same upper-case letter?

Regards,
Martin
 
I

ijoshua

That's right, DWIM mode Python. Rock on!

If it is common enough, define a custom type of string. I have
appended a simple version that should work for your example of `in`.
You would probably want to define all of the builtin str methods for
this class to be really useful.

Regards,
Josh

---
# cistr.py

import operator

class cistr(object):
"""A type of string that ignores character
case
for the right side of the `in`
operator.
eGgS')

True
"""
def __init__(self, string):
self.string = str(string).lower()

def __contains__(self, other):
return operator.contains(self.string, other.lower())

def __repr__(self):
return 'cistr(%r)'%(self.string)

def lower(self):
return self.string

if '__main__' == __name__:
string1 = 'AND'
string2 = 'sPaM aNd eGgS'
print '%r in %r ? %r' % (string1, string2, string1 in string2)
print '%r in %r ? %r' % (string1, cistr(string2), string1 in
cistr(string2))
 
P

Paul McGuire

Easier?  You mean like some kind of mind meld?

Interestingly enough, it shouldn't be (but apparently is) obvious that

   a.lower() in b.lower()

is a way of expressing "a is a substring of b, with case-insensitive
matching". Can we be sure that these are really the same concepts,
and if so, is

  a.upper() in b.upper()

also equivalent?

It's probably a common assumption that, for any character c,
c.lower()==c.upper().lower(). Yet,

py> [i for i in range(65536) if unichr(i).upper().lower() !=
unichr(i).lower()]
[181, 305, 383, 837, 962, 976, 977, 981, 982, 1008, 1009, 1010, 1013,
7835, 8126]

Take, for example, U+017F, LATIN SMALL LETTER LONG S. It's .lower() is
the same character, as the character is already in lower case.
It's .upper() is U+0053, LATIN CAPITAL LETTER S. Notice that the LONG
is gone - there is no upper-case version of a "long s".
It's .upper().lower() is U+0073, LATIN SMALL LETTER S.

So should case-insensitive matching match the small s with the small
long s, as they have the same upper-case letter?

Regards,
Martin

Another surprise (or maybe not so surprising) - this "upper != lower"
is not symmetric. Using the inverse of your list comp, I get
[i for i in range(65536) if unichr(i).lower().upper() !=
... unichr(i).upper()]
[304, 1012, 8486, 8490, 8491]

Instead of 15 exceptions to the rule, conversion to upper has only 5
exceptions. So perhaps comparsion of upper's is, while not foolproof,
less likely to encounter these exceptions? Or at least, simpler to
code explicit tests.

-- Paul
 
M

Mel

Paul said:
I know I could use:-
if lower(string1) in lower(string2):
<do something>
but it somehow feels there ought to be an easier (tidier?) way.
Take, for example, U+017F, LATIN SMALL LETTER LONG S. It's .lower() is
the same character, as the character is already in lower case.
It's .upper() is U+0053, LATIN CAPITAL LETTER S. Notice that the LONG
is gone - there is no upper-case version of a "long s".
It's .upper().lower() is U+0073, LATIN SMALL LETTER S.

So should case-insensitive matching match the small s with the small
long s, as they have the same upper-case letter? [ ... ]
[i for i in range(65536) if unichr(i).lower().upper() !=
... unichr(i).upper()]
[304, 1012, 8486, 8490, 8491]

Instead of 15 exceptions to the rule, conversion to upper has only 5
exceptions. So perhaps comparsion of upper's is, while not foolproof,
less likely to encounter these exceptions? Or at least, simpler to
code explicit tests.

I don't know what meaning is carried by all those differences in
lower-case glyphs. Converting to upper seems to fold together a lot
of variant pi's and rho's which I think would be roughly a good thing.
I seem to recall that the tiny iota (ypogegrammeni) has or had
grammatical significance. The other effect would be conflating
physics' Angstron unit and Kelvin unit signs with ring-a and K.
Applicaton programmers beware.

Mel.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top