Problem Regarding Handling of Unicode string

J

joy99

Dear Group,

I am using Python26 on WindowsXP with service pack2. My GUI is IDLE.
I am using Hindi resources and get nice output like:
à¤à¤•
where I can use all the re functions and other functions without doing
any transliteration,etc.
I was trying to use Bengali but it is giving me output like:
'\xef\xbb\xbf\xe0\xa6\x85\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x95'
I wanted to see Bengali output as
অনেক
and I like to use all functions including re.
If any one can help me on that.
Best Regards,
Subhabrata.
 
U

Ulrich Eckhardt

joy99 said:
[...] it is giving me output like:
'\xef\xbb\xbf\xe0\xa6\x85\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x95'
^^^^^^^^^^^^

These three bytes encode the byte-order marker (BOM, Unicode uFEFF) as
UTF-8, followed by codepoint u09a8 (look it up on unicode.org what that
is).

In any case, if this is produced as output, there is some missing
encoding/decoding going on. You mentioned that it works in one case but
doesn't in another. Since you didn't provide any information how to
reproduce what you saw, any further help is at most guesswork.

Uli
 
P

Piet van Oostrum

joy99 said:
j> Dear Group,
j> I am using Python26 on WindowsXP with service pack2. My GUI is IDLE.
j> I am using Hindi resources and get nice output like:
j> à¤à¤•
j> where I can use all the re functions and other functions without doing
j> any transliteration,etc.
j> I was trying to use Bengali but it is giving me output like:
j> '\xef\xbb\xbf\xe0\xa6\x85\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x95'
j> I wanted to see Bengali output as
j> অনেক
j> and I like to use all functions including re.
j> If any one can help me on that.
j> Best Regards,
j> Subhabrata.

Make sure your stdout (in case you use print) has utf-8 encoding. This
might be problematic on Windows, however.
অনেক

Or if you write to a file, open it with utf-8 encoding.

I take utf-8 because in general this is the preferred encoding for
non-ASCII text. It could be that Bengali has a different preferred encoding.
 
J

John Machin

Dear Group,

I am using Python26 on WindowsXP with service pack2. My GUI is IDLE.
I am using Hindi resources and get nice output like:
à¤à¤•
where I can use all the re functions and other functions without doing
any transliteration,etc.
I was trying to use Bengali but it is giving me output like:

WHAT is giving you this output?
'\xef\xbb\xbf\xe0\xa6\x85\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x95'

In a very ordinary IDLE session (Win XP SP3, Python 2.6.2, locale:
Australia/English, no "Hindi resources"):
print unicodedata.name(c)


ZERO WIDTH NO-BREAK SPACE # this is a BOM
BENGALI LETTER A
BENGALI LETTER NA
BENGALI VOWEL SIGN E
BENGALI LETTER KA
I wanted to see Bengali output as
অনেক
and I like to use all functions including re.
If any one can help me on that.

"I am using Hindi resources" doesn't tell us much ... except to prompt
the comment that perhaps if you want to display Bengali script, you
may need Bengali resources. However it looks like I can display your
Bengali data without any special resources.

It seems like you are not doing the same with Bengali as you are doing
with Hindi. We can't help you very much if you don't show exactly what
you are doing.

Have you considered asking in an Indian Python forum? Note: you will
still need to say what you are doing that works with Hindi but not
with Bengali.

Cheers,
John
 
J

joy99

WHAT is giving you this output?


In a very ordinary IDLE session (Win XP SP3, Python 2.6.2, locale:
Australia/English, no "Hindi resources"):


u'\ufeff\u0985\u09a8\u09c7\u0995'>>> print ux

অনেক # looks like what you wanted; please confirm>>> import unicodedata

        print unicodedata.name(c)

ZERO WIDTH NO-BREAK SPACE # this is a BOM
BENGALI LETTER A
BENGALI LETTER NA
BENGALI VOWEL SIGN E
BENGALI LETTER KA




"I am using Hindi resources" doesn't tell us much ... except to prompt
the comment that perhaps if you want to display Bengali script, you
may need Bengali resources. However it looks like I can display your
Bengali data without any special resources.

It seems like you are not doing the same with Bengali as you are doing
with Hindi. We can't help you very much if you don't show exactly what
you are doing.

Have you considered asking in an Indian Python forum? Note: you will
still need to say what you are doing that works with Hindi but not
with Bengali.

Cheers,
John

Dear Group,
I have already worked out my solution but everyone of yours' answers
helped me to see different solutions from different angles. Thank you
for the same. I am building some social network program in Bengali. I
just gave the transliteration problem which was giving me problem, I
thought as you are very expert pythoners so it would be minutes'
matter. By your answers I saw I was not wrong. But as I solved the
problem so I checked it bit late. Sorry for the same.
Best Regards,
Subhabrata.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top