Problems With Accented Characters

Discussion in 'Python' started by Fuzzyman, Feb 22, 2004.

  1. Fuzzyman

    Fuzzyman Guest

    I've written an anagram finder that produces anagrams from a
    dictionary of words. The user can load their own dictionary.

    ( http://www.voidspace.org.uk/atlantibots/nanagram.html )

    In order to ensure it is able to find anagrams properly I wanted to
    strip characters like punctuation etc from words in the dictionary and
    words the user entered. I test(ed) against the 26 English letters (
    string.ascii_lowercase ).

    I now have someone who wants to use a French dictionary - with words
    containing accented characters !! I have two choices - either map the
    accented characters to their unaccented equivalent (slightly
    innacurate) or treat the accented charcters as a separate letter (very
    few anagrams). However - at the moment I can't experiment with either
    because my default codec is the 7-bit ascii and crashes (sometimes !!)
    when using the accented characters.

    Has anyone any advice - or can point me to any resources - for
    effectively handling these characters. I guess it's a latin-1 encoding
    I want to use... I can't even work out how to cahnge the default
    codec........

    Thanks,

    Fuzzy

    http://www.voidspace.org.uk/atlantibots/pythonutils.html
     
    Fuzzyman, Feb 22, 2004
    #1
    1. Advertising

  2. Fuzzyman

    Fuzzyman Guest

    (Fuzzyman) wrote in message news:<>...
    > I've written an anagram finder that produces anagrams from a
    > dictionary of words. The user can load their own dictionary.
    >
    > ( http://www.voidspace.org.uk/atlantibots/nanagram.html )
    >
    > In order to ensure it is able to find anagrams properly I wanted to
    > strip characters like punctuation etc from words in the dictionary and
    > words the user entered. I test(ed) against the 26 English letters (
    > string.ascii_lowercase ).
    >
    > I now have someone who wants to use a French dictionary - with words
    > containing accented characters !! I have two choices - either map the
    > accented characters to their unaccented equivalent (slightly
    > innacurate) or treat the accented charcters as a separate letter (very
    > few anagrams). However - at the moment I can't experiment with either
    > because my default codec is the 7-bit ascii and crashes (sometimes !!)
    > when using the accented characters.
    >



    It's particularly difficult for me to understand what is happening -
    because python's behaviour *seems* intermittent.

    For example - if I run my program from IDLE and give it the word
    'degré' (containing e-acute) then I get the error :

    Exception in Tkinter callback
    Traceback (most recent call last):
    [snip..]
    File "D:\Python Projects\Nanagram1.3\Nanagram-GUI.pyw", line 123, in
    prepare
    if letter in self.valid_letters:
    UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position
    26: ordinal not in range(128)
    Traceback (most recent call last):

    It is testing each character of the users input to remove invalid
    characters (like "-" and "'")... It crashes when it comes tot he
    e-acute.


    *However* - If I run it by double clicking on the file then it appears
    to work fine (e.g. if I ask it find anagrams of 'degré hello ma' then
    it strips out the e-acute (thinking it's an invalid character) and
    finds anagrams of the rest :

    gleam holder
    hallo merged

    What I'd like to do is switch by default to an 8 bit codec (latin-1 I
    think ?????) and then offer the user the choice of either mapping the
    accented characters to their nearest equivalent (e-acute to e for
    example) *or* treating them as seperate characters.............


    Anyone able to help ??



    Fuzzy



    > Has anyone any advice - or can point me to any resources - for
    > effectively handling these characters. I guess it's a latin-1 encoding
    > I want to use... I can't even work out how to cahnge the default
    > codec........
    >
    > Thanks,
    >
    > Fuzzy
    >
    > http://www.voidspace.org.uk/atlantibots/pythonutils.html
     
    Fuzzyman, Feb 23, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mickey Segal

    Text search with accented characters

    Mickey Segal, Dec 15, 2005, in forum: Java
    Replies:
    3
    Views:
    787
    Roedy Green
    Dec 16, 2005
  2. Davide Benini

    accented characters

    Davide Benini, Jun 1, 2005, in forum: XML
    Replies:
    4
    Views:
    814
    David Carlisle
    Jun 1, 2005
  3. Mark Drummond

    Dealing with accented characters

    Mark Drummond, May 31, 2006, in forum: Perl
    Replies:
    0
    Views:
    2,923
    Mark Drummond
    May 31, 2006
  4. Stephen Boulet
    Replies:
    3
    Views:
    403
    Terry Reedy
    Jul 16, 2004
  5. Anna Kavan
    Replies:
    0
    Views:
    400
    Anna Kavan
    Oct 31, 2006
Loading...

Share This Page