Question regarding handling of Unicode data in Devnagari

J

joy99

Dear Group,

As per the standard posted by the UNICODE for the Devnagari script
used for Hindi and some other languages of India, we have a standard
set, like from the range of 0900-097F.
Where, we have numbers for each character:
like 0904 for Devnagari letter short a, etc.
Now, if write a program,

where
ch="0904"
and I like to see the Devnagari letter short a as output then how
should I proceed? Can codecs help me or should I use unicodedata?

If you can kindly help me.

Best Regards,
Subhabrata.
 
M

MRAB

joy99 said:
Dear Group,

As per the standard posted by the UNICODE for the Devnagari script
used for Hindi and some other languages of India, we have a standard
set, like from the range of 0900-097F.
Where, we have numbers for each character:
like 0904 for Devnagari letter short a, etc.
Now, if write a program,

where
ch="0904"
and I like to see the Devnagari letter short a as output then how
should I proceed? Can codecs help me or should I use unicodedata?

If you can kindly help me.
That number is hexadecimal, so the character/codepoint is unichr(int(ch,
16)) in Python 2.x.
 
M

Mark Tolonen

joy99 said:
Dear Group,

As per the standard posted by the UNICODE for the Devnagari script
used for Hindi and some other languages of India, we have a standard
set, like from the range of 0900-097F.
Where, we have numbers for each character:
like 0904 for Devnagari letter short a, etc.
Now, if write a program,

where
ch="0904"
and I like to see the Devnagari letter short a as output then how
should I proceed? Can codecs help me or should I use unicodedata?

Here are a number of ways to generate a Unicode character. Displaying them
is another matter. My newsreader program could display them properly but my
the interactive window in my Python editor could not.

c = unichr(0x904)
print c,unicodedata.name(c)
print u'\N{DEVANAGARI LETTER SHORT A}'
print u'\u0904'
print u''.join(unichr(c) for c in range(0x900,0x980))

OUTPUT
ऄ DEVANAGARI LETTER SHORT A
ऄ
ऄ
ऀà¤à¤‚ःऄअआइईउऊऋऌà¤à¤Žà¤à¤à¤‘ऒओऔकखगघङचछजà¤à¤žà¤Ÿà¤ à¤¡à¤¢à¤£à¤¤à¤¥à¤¦à¤§à¤¨à¤©à¤ªà¤«à¤¬à¤­à¤®à¤¯à¤°à¤±à¤²à¤³à¤´à¤µà¤¶à¤·à¤¸à¤¹à¤ºà¤»à¤¼à¤½à¤¾à¤¿à¥€à¥à¥‚ृॄॅॆेैॉॊोौà¥à¥Žà¥à¥à¥‘॒॓॔ॕॖॗक़ख़ग़ज़ड़à¥à¥žà¥Ÿà¥ à¥¡à¥¢à¥£à¥¤à¥¥à¥¦à¥§à¥¨à¥©à¥ªà¥«à¥¬à¥­à¥®à¥¯à¥°à¥±à¥²à¥³à¥´à¥µà¥¶à¥·à¥¸à¥¹à¥ºà¥»à¥¼à¥½à¥¾à¥¿

If you use an editor that can write Devnagari and save in an encoding such
as UTF-8, you can write Devnagari directly in the editor. You only need to
tell Python what encoding the source code is in. You'll also need a
terminal and know the encoding it uses for display of characters to actually
see the correct character. For example, below is a program written using
Pythonwin from the pywin32 extensions (version 214). It can write programs
in most encodings and its interactive window supports UTF-8.

I can type Chinese and my fonts support it so I'll use that in this example.
This message is sent in UTF-8 so hopefully it displays properly for you.

# coding: gbk
encoded_text = '你好ï¼ä½ åœ¨å¹²ä»€ä¹ˆï¼Ÿ'
Unicode_text = u'你好ï¼ä½ åœ¨å¹²ä»€ä¹ˆï¼Ÿ'
print encoded_text
print encoded_text.decode('gbk')
print Unicode_text
print Unicode_text.encode('utf-8')

OUTPUT:
ţۃáţ՚ىʲôÿ
你好ï¼ä½ åœ¨å¹²ä»€ä¹ˆï¼Ÿ
你好ï¼ä½ åœ¨å¹²ä»€ä¹ˆï¼Ÿ
你好ï¼ä½ åœ¨å¹²ä»€ä¹ˆï¼Ÿ

'encoded_text' is a byte string encoded in the encoding the file is saved in
(*not*what the #coding line declares...*you* have to make sure they agree!).
Since my terminal is UTF-8, The gbk-encoded line is garbage.

The 2nd line should be correct because it decoded the byte string to
Unicode. 'print' will automatically encode Unicode text in the terminal's
encoding. As long as the terminal's encoding and font supports the Unicode
characters used (which in Pythonwin it does), the line will be correct.

The 3rd line works for the same reason the 2nd line does...The string is
already Unicode.

The 4th line works because it was explicitly encoded into UTF-8, and the
terminal supports it.

I hope this is useful to you.
-Mark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top