Conditional for...in failing with utf-8, Spanish book translation

Discussion in 'Python' started by Hunter, Apr 21, 2008.

  1. Hunter

    Hunter Guest

    Hi all,

    This is my first Usenet post!
    I've run into a wall with my first Python program. I'm writing some
    simple code to take a text file that's utf-8 and in Spanish and to use
    online translation tools to convert it, word-by-word, into English. Then
    I'm generating a PDF with both of the languages.

    Most of this is working great, but I get intermittent errors of the form:

    ---

    Translating coche(coche)...
    Already cached!
    English: car
    Translating ahora(ahora)...
    tw returned now
    English: now
    Translating mismo?(mismo)...
    Already cached!
    English: same
    Translating ¡A(�a)...
    iconv: illegal input sequence at position 0
    tw returned error: the required parameter "srctext" is missing
    English: error: the required parameter "srctext" is missing

    ---

    The output should look like:
    Translating Raw_Text(lowercaserawtextwithoutpunctuation)...
    tw returned englishtranslation
    English: englishtranslation

    I've narrowed the problem down to a simple test program. Check this out:

    ---

    # -*- coding: utf-8 -*-

    acceptable = "abcdefghijklmnopqrstuvwxyzóíñú" # this line will work
    acceptable = "abcdefghijklmnopqrstuvwxyzóíñúá" # this line won't
    #wtf?

    word = "¡A"
    word_key = ''.join([c for c in word.lower() if c in acceptable])
    print "word_key = " + word_key

    ---

    Any ideas? I'm really stumped!

    Thanks,
    Hunter
     
    Hunter, Apr 21, 2008
    #1
    1. Advertising

  2. Re: Conditional for...in failing with utf-8, Spanish booktranslation

    On Mon, 21 Apr 2008 08:33:47 +0200, Hunter wrote:

    > I've narrowed the problem down to a simple test program. Check this out:
    >
    > ---
    >
    > # -*- coding: utf-8 -*-
    >
    > acceptable = "abcdefghijklmnopqrstuvwxyzóíñú" # this line will work
    > acceptable = "abcdefghijklmnopqrstuvwxyzóíñúá" # this line won't
    > #wtf?
    >
    > word = "¡A"
    > word_key = ''.join([c for c in word.lower() if c in acceptable])
    > print "word_key = " + word_key
    >
    > ---
    >
    > Any ideas? I'm really stumped!


    You are not working with unicode but UTF-8 encoded characters. That's
    bytes and not letters/characters. Your `word` for example contains three
    bytes and not the two characters you think it contains:

    In [43]: word = "¡A"

    In [44]: len(word)
    Out[44]: 3

    In [45]: for c in word: print repr(c)
    ....:
    '\xc2'
    '\xa1'
    'A'

    So you are *not* testing if ¡ is in `acceptable` but the two byte values
    that are the UTF-8 representation of that character.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Apr 21, 2008
    #2
    1. Advertising

  3. Hunter wrote:
    > I've narrowed the problem down to a simple test program. Check this out:
    >
    > ---
    >
    > # -*- coding: utf-8 -*-
    >
    > acceptable = "abcdefghijklmnopqrstuvwxyzóíñú" # this line will work
    > acceptable = "abcdefghijklmnopqrstuvwxyzóíñúá" # this line won't


    [bad words stripped]

    this should read

    acceptable = u"abcdefghijklmnopqrstuvwxyzóíñú"
    acceptable = u"abcdefghijklmnopqrstuvwxyzóíñúá"

    Mind the little "u" before the string, which makes it a unicode string instead
    of an encoded byte string.

    http://docs.python.org/tut/node5.html#SECTION005130000000000000000

    Stefan
     
    Stefan Behnel, Apr 21, 2008
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jeffc
    Replies:
    2
    Views:
    472
    jeffc
    Jun 7, 2004
  2. Olaf \El Blanco\

    Spanish Translation of any python Book?

    Olaf \El Blanco\, Jan 13, 2006, in forum: Python
    Replies:
    6
    Views:
    1,283
    jean-michel bain-cornu
    Jan 15, 2006
  3. hg

    utf - string translation

    hg, Nov 22, 2006, in forum: Python
    Replies:
    22
    Views:
    770
    John Machin
    Nov 29, 2006
  4. Pedro Baldanta
    Replies:
    0
    Views:
    146
    Pedro Baldanta
    Feb 9, 2004
  5. Manoel Lemos
    Replies:
    2
    Views:
    104
    Daniel DeLorme
    Jun 1, 2007
Loading...

Share This Page