Alphabetics respect to a given locale

Discussion in 'Python' started by candide, Apr 1, 2011.

  1. candide

    candide Guest

    How to retrieve the list of all characters defined as alphabetic for the
    current locale ?
     
    candide, Apr 1, 2011
    #1
    1. Advertising

  2. On 4/1/2011 1:55 PM candide said...
    > How to retrieve the list of all characters defined as alphabetic for the
    > current locale ?


    I think this is supposed to work, but not for whatever reason for me
    when I try to test after changing my locale (but I think that's a centos
    thing)...

    import locale
    locale.setlocale(locale.LC_ALL,'')
    import string
    print string.lowercase

    I don't see where else this might be for python.

    However, you can test if something is alpha:

    >>> val = u'caf' u'\xE9'
    >>> val.isalpha()

    True
    >>>


    .... and check its unicode category

    >>> import unicodedata
    >>> unicodedata.category(u'a')

    'Ll' # Letter - lower case
    >>> unicodedata.category(u'A')

    'Lu' # Letter - upper case
    >>> unicodedata.category(u'1')

    'Nd' # Number - decimal?
    >>> unicodedata.category(u'\x01')

    'Cc' #


    HTH,

    Emile
     
    Emile van Sebille, Apr 2, 2011
    #2
    1. Advertising

  3. candide

    candide Guest

    Le 01/04/2011 22:55, candide a écrit :
    > How to retrieve the list of all characters defined as alphabetic for the
    > current locale ?



    Thanks for the responses. Alas, neither solution works.

    Under Ubuntu :

    # ----------------------
    import string
    import locale

    print locale.getdefaultlocale()
    print locale.getpreferredencoding()

    locale.setlocale(locale.LC_ALL, "")

    print string.letters

    letter_class = u"[" + u"".join(unichr(c) for c in range(0x10000) if
    unichr(c).isalpha()) + u"]"

    #print letter_class
    # ----------------------

    prints the following :


    ('fr_FR', 'UTF8')
    UTF-8
    ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz


    I commented out the letter_class printing for outputing a flood of
    characters not belonging to the usual french character set.


    More or less the same problem under Windows, for instance,
    string.letters gives the "latin capital letter eth" as an analphabetic
    character (this is not the case, we never use this letter in true french
    words).
     
    candide, Apr 2, 2011
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Maurice Hulsman
    Replies:
    1
    Views:
    1,870
    Guus Bosman
    Jul 25, 2004
  2. Replies:
    4
    Views:
    1,013
  3. Gabriel Genellina
    Replies:
    0
    Views:
    713
    Gabriel Genellina
    Feb 18, 2009
  4. zade
    Replies:
    1
    Views:
    622
    James Kanze
    Mar 5, 2010
  5. Sibylle Koczian
    Replies:
    2
    Views:
    1,129
    Sibylle Koczian
    Nov 20, 2010
Loading...

Share This Page