Is there any way to say ignore case with "in"?

Discussion in 'Python' started by tinnews, Apr 4, 2008.

  1. tinnews

    tinnews Guest

    Is there any way in python to say

    if string1 in string2:
    <do something>

    ignoring the case of string1 and string2?


    I know I could use:-

    if lower(string1) in lower(string2):
    <do something>

    but it somehow feels there ought to be an easier (tidier?) way.
     
    tinnews, Apr 4, 2008
    #1
    1. Advertisements

  2. if string1.lower() in string2.lower():
    ...

    (there's no case-insensitive version of the "in" operator in stock Python)

    </F>
     
    Fredrik Lundh, Apr 4, 2008
    #2
    1. Advertisements

  3. tinnews

    7stud Guest

    Easier? You mean like some kind of mind meld?
     
    7stud, Apr 5, 2008
    #3
  4. tinnews

    Steve Holden Guest

    That's right, DWIM mode Python. Rock on!

    regards
    Steve
     
    Steve Holden, Apr 5, 2008
    #4
  5. I know I could use:-
    Interestingly enough, it shouldn't be (but apparently is) obvious that

    a.lower() in b.lower()

    is a way of expressing "a is a substring of b, with case-insensitive
    matching". Can we be sure that these are really the same concepts,
    and if so, is

    a.upper() in b.upper()

    also equivalent?

    It's probably a common assumption that, for any character c,
    c.lower()==c.upper().lower(). Yet,

    py> [i for i in range(65536) if unichr(i).upper().lower() !=
    unichr(i).lower()]
    [181, 305, 383, 837, 962, 976, 977, 981, 982, 1008, 1009, 1010, 1013,
    7835, 8126]

    Take, for example, U+017F, LATIN SMALL LETTER LONG S. It's .lower() is
    the same character, as the character is already in lower case.
    It's .upper() is U+0053, LATIN CAPITAL LETTER S. Notice that the LONG
    is gone - there is no upper-case version of a "long s".
    It's .upper().lower() is U+0073, LATIN SMALL LETTER S.

    So should case-insensitive matching match the small s with the small
    long s, as they have the same upper-case letter?

    Regards,
    Martin
     
    Martin v. Löwis, Apr 6, 2008
    #5
  6. tinnews

    ijoshua Guest

    If it is common enough, define a custom type of string. I have
    appended a simple version that should work for your example of `in`.
    You would probably want to define all of the builtin str methods for
    this class to be really useful.

    Regards,
    Josh

    ---
    # cistr.py

    import operator

    class cistr(object):
    """A type of string that ignores character
    case
    for the right side of the `in`
    operator.
    eGgS')

    True
    """
    def __init__(self, string):
    self.string = str(string).lower()

    def __contains__(self, other):
    return operator.contains(self.string, other.lower())

    def __repr__(self):
    return 'cistr(%r)'%(self.string)

    def lower(self):
    return self.string

    if '__main__' == __name__:
    string1 = 'AND'
    string2 = 'sPaM aNd eGgS'
    print '%r in %r ? %r' % (string1, string2, string1 in string2)
    print '%r in %r ? %r' % (string1, cistr(string2), string1 in
    cistr(string2))
     
    ijoshua, Apr 6, 2008
    #6
  7. tinnews

    Paul McGuire Guest

    Another surprise (or maybe not so surprising) - this "upper != lower"
    is not symmetric. Using the inverse of your list comp, I get
    ... unichr(i).upper()]
    [304, 1012, 8486, 8490, 8491]

    Instead of 15 exceptions to the rule, conversion to upper has only 5
    exceptions. So perhaps comparsion of upper's is, while not foolproof,
    less likely to encounter these exceptions? Or at least, simpler to
    code explicit tests.

    -- Paul
     
    Paul McGuire, Apr 6, 2008
    #7
  8. tinnews

    Mel Guest

    I don't know what meaning is carried by all those differences in
    lower-case glyphs. Converting to upper seems to fold together a lot
    of variant pi's and rho's which I think would be roughly a good thing.
    I seem to recall that the tiny iota (ypogegrammeni) has or had
    grammatical significance. The other effect would be conflating
    physics' Angstron unit and Kelvin unit signs with ring-a and K.
    Applicaton programmers beware.

    Mel.
     
    Mel, Apr 7, 2008
    #8
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.