Re: String comparison question

Discussion in 'Python' started by Michael Spencer, Mar 20, 2006.

  1. Olivier Langlois wrote:
    > Hi Michael!
    >
    > Your suggestion is fantastic and is doing exactly what I was looking
    > for! Thank you very much.
    > There is something that I'm wondering though. Why is the solution you
    > proposed wouldn't work with Unicode strings?
    >

    Simply, that str.translate with two arguments isn't implemented for unicode
    strings. I don't know the underlying reason, or how hard it would be to change.

    If you do need the comparison functionality for unicode strings, you'll have
    to go with a different approach. For example, using regular expressions:

    import re
    def compare2(a, b):
    """Compare two basestrings, disregarding whitespace -> bool"""
    return re.sub("\s*", "", a) == re.sub("\s*", "", b)

    This is slower than the str.translate approach, though it has the advantage that
    you could easily modify it to normalize, rather than eliminate whitespace. This
    would be a more useful comparison in many cases.

    def compare3(a, b):
    """Compare two basestrings, normalizing whitespace -> bool"""
    return re.sub("\s*", " ", a) == re.sub("\s*", " ", b)

    Continuing the disclaimers: none these approaches makes any attempt to deal
    specially with quoted whitespace or any other sort of escapes.

    Michael
    Michael Spencer, Mar 20, 2006
    #1
    1. Advertising

  2. Michael Spencer <> wrote:

    > Olivier Langlois wrote:
    > > Hi Michael!
    > >
    > > Your suggestion is fantastic and is doing exactly what I was looking
    > > for! Thank you very much.
    > > There is something that I'm wondering though. Why is the solution you
    > > proposed wouldn't work with Unicode strings?
    > >

    > Simply, that str.translate with two arguments isn't implemented for
    > unicode strings. I don't know the underlying reason, or how hard it would
    > be to change.


    A Unicode's string translate takes a dict argument -- you delete
    characters by mapping their ord(...) to None. For example:

    >>> u'banana'.translate({ord('a'):None})

    u'bnn'

    That is in fact much handier, when all you want to do is deleting some
    characters, than using string.maketrans to create a "null" translation
    table and passing as the 2nd argument the string of chars to delete.

    With unicode .translate, you can also translate a character into a
    STRING...:

    >>> u'banana'.translate({ord('a'):u'ay'})

    u'baynaynay'

    ....which is simply impossible with plainstring's .translate.


    Alex
    Alex Martelli, Mar 20, 2006
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jason
    Replies:
    2
    Views:
    27,829
    Phil Hanna
    Sep 20, 2003
  2. Michael Spencer

    Re: String comparison question

    Michael Spencer, Mar 20, 2006, in forum: Python
    Replies:
    7
    Views:
    424
    Fredrik Lundh
    Mar 22, 2006
  3. Replies:
    21
    Views:
    1,398
    Alex Vinokur
    Aug 18, 2007
  4. Smithers
    Replies:
    12
    Views:
    1,171
    Ben Voigt [C++ MVP]
    Jul 7, 2009
  5. Deepu
    Replies:
    1
    Views:
    236
    ccc31807
    Feb 7, 2011
Loading...

Share This Page