Re: Detect string has non-ASCII chars without checking each char?

Discussion in 'Python' started by Michel Claveau - MVP, Aug 22, 2010.

  1. Hi!

    Another way :

    # -*- coding: utf-8 -*-

    import unicodedata

    def test_ascii(struni):
    strasc=unicodedata.normalize('NFD', struni).encode('ascii','replace')
    if len(struni)==len(strasc):
    return True
    else:
    return False

    print test_ascii(u"abcde")
    print test_ascii(u"abcdê")



    @-salutations
    --
    Michel Claveau
     
    Michel Claveau - MVP, Aug 22, 2010
    #1
    1. Advertising

  2. Michel Claveau - MVP

    John Machin Guest

    On Aug 22, 5:07 pm, "Michel Claveau -
    MVP"<> wrote:
    > Hi!
    >
    > Another way :
    >
    >   # -*- coding: utf-8 -*-
    >
    >   import unicodedata
    >
    >   def test_ascii(struni):
    >       strasc=unicodedata.normalize('NFD', struni).encode('ascii','replace')
    >       if len(struni)==len(strasc):
    >          return True
    >       else:
    >          return False
    >
    >   print test_ascii(u"abcde")
    >   print test_ascii(u"abcdê")


    -1

    Try your code with u"abcd\xa1" ... it says it's ASCII.

    Suggestions:
    test_ascii = lambda s: len(s.decode('ascii', 'ignore')) == len(s)
    or
    test_ascii = lambda s: all(c < u'\x80' for c in s)
    or
    use try/except

    Also:
    if a == b:
    return True
    else:
    return False
    is a horribly bloated way of writing
    return a == b
     
    John Machin, Aug 22, 2010
    #2
    1. Advertising

  3. Re !

    > Try your code with u"abcd\xa1" ... it says it's ASCII.


    Ah? in my computer, it say "False"

    @-salutations
    --
    MCi
     
    Michel Claveau - MVP, Aug 22, 2010
    #3
  4. Michel Claveau - MVP

    John Machin Guest

    On Aug 23, 1:10 am, "Michel Claveau -
    MVP"<> wrote:
    > Re !
    >
    > > Try your code with u"abcd\xa1" ... it says it's ASCII.

    >
    > Ah?  in my computer, it say "False"


    Perhaps your computer has a problem. Mine does this with both Python
    2.7 and Python 2.3 (which introduced the unicodedata.normalize
    function):

    >>> import unicodedata
    >>> t1 = u"abcd\xa1"
    >>> t2 = unicodedata.normalize('NFD', t1)
    >>> t3 = t2.encode('ascii', 'replace')
    >>> [t1, t2, t3]

    [u'abcd\xa1', u'abcd\xa1', 'abcd?']
    >>> map(len, _)

    [5, 5, 5]
    >>>
     
    John Machin, Aug 22, 2010
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. TOXiC
    Replies:
    5
    Views:
    1,330
    TOXiC
    Jan 31, 2007
  2. Hongyu
    Replies:
    9
    Views:
    967
    James Kanze
    Aug 8, 2008
  3. Vlastimil Brom
    Replies:
    1
    Views:
    938
    John Nagle
    Aug 22, 2010
  4. bruce
    Replies:
    38
    Views:
    324
    Mark Lawrence
    Nov 1, 2013
  5. MRAB
    Replies:
    0
    Views:
    116
Loading...

Share This Page