Encoding for Devanagari Script.

Atul. · Jul 24, 2008

Hello All,

I wanted to know what encoding should I use to open the files with
Devanagari characters. I was thinking of UTF-8 but was not sure, any
leads on this? Anyone used it earlier?

Thanks in Advance.

Regards,
Atul.

Fredrik Lundh · Jul 24, 2008

Atul. skrev:

I wanted to know what encoding should I use to open the files with
Devanagari characters. I was thinking of UTF-8 but was not sure, any
leads on this? Anyone used it earlier?

Are we talking about existing files? If you don't know what encoding
the files use, you could always try using the UTF-8 codec; it's very
likely to complain if you're attempting to decode something that's isn't
UTF-8.

If that doesn't work, it's a bit trickier -- there are several ways to
encode Unicode, and then there's ISCII as well. If you cannot sort it
out, try running this:

on one of your files, and post the result, and chances are that someone
will be able to identify the encoding.

</F>

Terry Reedy · Jul 25, 2008

Atul. said:
Hello All,

I wanted to know what encoding should I use to open the files with
Devanagari characters. I was thinking of UTF-8 but was not sure, any
leads on this? Anyone used it earlier?

You cannot hurt your machine by giving that a try.

This is a general comment for all beginners. Before posting, open the
interactive interpreter (or IDLE) and try something(s). If the result
puzzles you, copy and paste into a post. Or if more appropriate, open
the Python manuals and search a bit, or try a search engine.

Atul. · Jul 28, 2008

Hi Fredrik and Terry,

Well I got this on IDLE I think I have done something wrong.

Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
f = open("C:\Documents and Settings\admin\My Documents\corpus
\dainaikAikya collected by sushant.txt","r", "utf_8")
TypeError: an integer is required

after that I tried the read binary mode and tried reading the firt 32
bytes and this is what I got.
'\xef\xbb\xbf\xe0\xa4\xa8\xe0\xa4\xb5\xe0\xa5\x80
\xe0\xa4\xa6\xe0\xa4\xbf\xe0\xa4\xb2\xe0\xa5\x8d
\xe0\xa4\xb2\xe0\xa5\x80,'

Now based on my knowledge of Unicode I think this is a utf-8 file (the
first 3 bytes \xef\xbb\xbf), please correct me if I am wrong. How do I
read this?

Atul.

PS: the above code I wrote using the information from the Library
Reference pdf section 4.8 "Codecs". Something wrong I am doing? Please
do let me know.

Tim Golden · Jul 28, 2008

Atul. said:
Hi Fredrik and Terry,

Well I got this on IDLE I think I have done something wrong.

Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
f = open("C:\Documents and Settings\admin\My Documents\corpus
\dainaikAikya collected by sushant.txt","r", "utf_8")
TypeError: an integer is required

PS: the above code I wrote using the information from the Library
Reference pdf section 4.8 "Codecs". Something wrong I am doing? Please
do let me know.

Only slightly. You're importing the codecs module
but you're not using it. So you're *actually* using
the built-in open function, which doesn't have an
encoding parameter. It does have a third param
which is to do with the buffer size.

Just change your code to use codecs.open ("...")
and, I suggest, either use raw strings for your
filename (r"c:\docume...") or use the other kind
of slash ("c:/documen..."). Otherwise you might
run into some problems.

TJG

Atul. · Jul 28, 2008

Thanks, Tim that did work. I will proceed with my playing around now.

Thanks a ton.

Atul.

Encoding trouble when script called from application	0	Jan 14, 2014
files.py (weird encoding error)	0	Jun 10, 2013
Python Windows release and encoding	1	May 22, 2013
files.py (encoding error)	0	Jun 10, 2013
xhtml encoding question	8	Jan 31, 2012
Need help with this script	4	Mar 12, 2023
Preserving unicode filename encoding	1	Oct 20, 2012
C Script Prematurely Terminating	3	Feb 7, 2022

Encoding for Devanagari Script.

Atul.

Fredrik Lundh

Terry Reedy

Atul.

Tim Golden

Atul.

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads