Problem Regarding Handling of Unicode string

Discussion in 'Python' started by joy99, Aug 10, 2009.

  1. joy99

    joy99 Guest

    Dear Group,

    I am using Python26 on WindowsXP with service pack2. My GUI is IDLE.
    I am using Hindi resources and get nice output like:
    à¤à¤•
    where I can use all the re functions and other functions without doing
    any transliteration,etc.
    I was trying to use Bengali but it is giving me output like:
    '\xef\xbb\xbf\xe0\xa6\x85\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x95'
    I wanted to see Bengali output as
    অনেক
    and I like to use all functions including re.
    If any one can help me on that.
    Best Regards,
    Subhabrata.
    joy99, Aug 10, 2009
    #1
    1. Advertising

  2. joy99 wrote:
    > [...] it is giving me output like:
    > '\xef\xbb\xbf\xe0\xa6\x85\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x95'

    ^^^^^^^^^^^^

    These three bytes encode the byte-order marker (BOM, Unicode uFEFF) as
    UTF-8, followed by codepoint u09a8 (look it up on unicode.org what that
    is).

    In any case, if this is produced as output, there is some missing
    encoding/decoding going on. You mentioned that it works in one case but
    doesn't in another. Since you didn't provide any information how to
    reproduce what you saw, any further help is at most guesswork.

    Uli

    --
    Sator Laser GmbH
    Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
    Ulrich Eckhardt, Aug 10, 2009
    #2
    1. Advertising

  3. >>>>> joy99 <> (j) wrote:

    >j> Dear Group,
    >j> I am using Python26 on WindowsXP with service pack2. My GUI is IDLE.
    >j> I am using Hindi resources and get nice output like:
    >j> à¤à¤•
    >j> where I can use all the re functions and other functions without doing
    >j> any transliteration,etc.
    >j> I was trying to use Bengali but it is giving me output like:
    >j> '\xef\xbb\xbf\xe0\xa6\x85\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x95'
    >j> I wanted to see Bengali output as
    >j> অনেক
    >j> and I like to use all functions including re.
    >j> If any one can help me on that.
    >j> Best Regards,
    >j> Subhabrata.


    Make sure your stdout (in case you use print) has utf-8 encoding. This
    might be problematic on Windows, however.

    >>> print '\xef\xbb\xbf\xe0\xa6\x85\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x95'

    অনেক

    Or if you write to a file, open it with utf-8 encoding.

    I take utf-8 because in general this is the preferred encoding for
    non-ASCII text. It could be that Bengali has a different preferred encoding.
    --
    Piet van Oostrum <>
    URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
    Private email:
    Piet van Oostrum, Aug 10, 2009
    #3
  4. joy99

    John Machin Guest

    On Aug 10, 9:26 pm, joy99 <> wrote:
    > Dear Group,
    >
    > I am using Python26 on WindowsXP with service pack2. My GUI is IDLE.
    > I am using Hindi resources and get nice output like:
    > à¤à¤•
    > where I can use all the re functions and other functions without doing
    > any transliteration,etc.
    > I was trying to use Bengali but it is giving me output like:


    WHAT is giving you this output?

    > '\xef\xbb\xbf\xe0\xa6\x85\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x95'


    In a very ordinary IDLE session (Win XP SP3, Python 2.6.2, locale:
    Australia/English, no "Hindi resources"):

    >>> x = '\xef\xbb\xbf\xe0\xa6\x85\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x95'
    >>> ux = x.decode('utf-8')
    >>> ux

    u'\ufeff\u0985\u09a8\u09c7\u0995'
    >>> print ux

    অনেক # looks like what you wanted; please confirm
    >>> import unicodedata
    >>> for c in ux:

    print unicodedata.name(c)


    ZERO WIDTH NO-BREAK SPACE # this is a BOM
    BENGALI LETTER A
    BENGALI LETTER NA
    BENGALI VOWEL SIGN E
    BENGALI LETTER KA
    >>>


    > I wanted to see Bengali output as
    > অনেক
    > and I like to use all functions including re.
    > If any one can help me on that.


    "I am using Hindi resources" doesn't tell us much ... except to prompt
    the comment that perhaps if you want to display Bengali script, you
    may need Bengali resources. However it looks like I can display your
    Bengali data without any special resources.

    It seems like you are not doing the same with Bengali as you are doing
    with Hindi. We can't help you very much if you don't show exactly what
    you are doing.

    Have you considered asking in an Indian Python forum? Note: you will
    still need to say what you are doing that works with Hindi but not
    with Bengali.

    Cheers,
    John
    John Machin, Aug 11, 2009
    #4
  5. joy99

    joy99 Guest

    On Aug 11, 1:17 pm, John Machin <> wrote:
    > On Aug 10, 9:26 pm, joy99 <> wrote:
    >
    > > Dear Group,

    >
    > > I am using Python26 on WindowsXP with service pack2. My GUI is IDLE.
    > > I am using Hindi resources and get nice output like:
    > > à¤à¤•
    > > where I can use all the re functions and other functions without doing
    > > any transliteration,etc.
    > > I was trying to use Bengali but it is giving me output like:

    >
    > WHAT is giving you this output?
    >
    > > '\xef\xbb\xbf\xe0\xa6\x85\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x95'

    >
    > In a very ordinary IDLE session (Win XP SP3, Python 2.6.2, locale:
    > Australia/English, no "Hindi resources"):
    >
    > >>> x = '\xef\xbb\xbf\xe0\xa6\x85\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x95'
    > >>> ux = x.decode('utf-8')
    > >>> ux

    >
    > u'\ufeff\u0985\u09a8\u09c7\u0995'>>> print ux
    >
    > অনেক # looks like what you wanted; please confirm>>> import unicodedata
    > >>> for c in ux:

    >
    >         print unicodedata.name(c)
    >
    > ZERO WIDTH NO-BREAK SPACE # this is a BOM
    > BENGALI LETTER A
    > BENGALI LETTER NA
    > BENGALI VOWEL SIGN E
    > BENGALI LETTER KA
    >
    >
    >
    > > I wanted to see Bengali output as
    > > অনেক
    > > and I like to use all functions including re.
    > > If any one can help me on that.

    >
    > "I am using Hindi resources" doesn't tell us much ... except to prompt
    > the comment that perhaps if you want to display Bengali script, you
    > may need Bengali resources. However it looks like I can display your
    > Bengali data without any special resources.
    >
    > It seems like you are not doing the same with Bengali as you are doing
    > with Hindi. We can't help you very much if you don't show exactly what
    > you are doing.
    >
    > Have you considered asking in an Indian Python forum? Note: you will
    > still need to say what you are doing that works with Hindi but not
    > with Bengali.
    >
    > Cheers,
    > John


    Dear Group,
    I have already worked out my solution but everyone of yours' answers
    helped me to see different solutions from different angles. Thank you
    for the same. I am building some social network program in Bengali. I
    just gave the transliteration problem which was giving me problem, I
    thought as you are very expert pythoners so it would be minutes'
    matter. By your answers I saw I was not wrong. But as I solved the
    problem so I checked it bit late. Sorry for the same.
    Best Regards,
    Subhabrata.
    joy99, Aug 16, 2009
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gabriele *darkbard* Farina

    Unicode digit to unicode string

    Gabriele *darkbard* Farina, May 16, 2006, in forum: Python
    Replies:
    2
    Views:
    515
    Gabriele *darkbard* Farina
    May 16, 2006
  2. Richard Schulman

    Unicode string handling problem

    Richard Schulman, Sep 6, 2006, in forum: Python
    Replies:
    8
    Views:
    334
    John Machin
    Sep 7, 2006
  3. Richard Schulman

    Unicode string handling problem (revised)

    Richard Schulman, Sep 6, 2006, in forum: Python
    Replies:
    1
    Views:
    249
    John Machin
    Sep 6, 2006
  4. Holger Joukl
    Replies:
    5
    Views:
    528
    Ben Finney
    Dec 13, 2006
  5. joy99
    Replies:
    2
    Views:
    422
    Mark Tolonen
    Sep 12, 2009
Loading...

Share This Page