Anyway to designating the encoding of the "source" for compile?

janeaustine50 · May 14, 2005

Python's InteractiveInterpreter uses the built-in compile function.

According to the ref. manual, it doesn't seem to concern about the
encoding of the source string.

When I hand in an unicode object, it is encoded in utf-8 automatically.
It can be a problem when I'm building an interactive environment using
"compile", with a different encoding from utf-8. IDLE itself has the
same problem. ( '<a string with non-ascii-encoding>' is treated okay
but u'<a string with non-ascii-encoding>' is treated wrong.)

Any suggestions or any plans in future python versions?

janeaustine50 · May 16, 2005

Python's InteractiveInterpreter uses the built-in compile function.

According to the ref. manual, it doesn't seem to concern about the
encoding of the source string.

When I hand in an unicode object, it is encoded in utf-8 automatically.
It can be a problem when I'm building an interactive environment using
"compile", with a different encoding from utf-8. IDLE itself has the
same problem. ( '<a string with non-ascii-encoding>' is treated okay
but u'<a string with non-ascii-encoding>' is treated wrong.)

Any suggestions or any plans in future python versions?

I've read a posting from Martin Von Loewis mentioning trying to build
in that feature(optionally marking encoding when calling "compile").
Anyone knows how it is going on?

John Machin · May 16, 2005

I don't understand this. Suppose your "different encoding" is cp125x
(where x is a digit). Would you not do something like this?

compile_input = user_input.decode('cp125x')
code_object = compile(compile_input, ......

I've read a posting from Martin Von Loewis mentioning trying to build
in that feature(optionally marking encoding when calling "compile").
Anyone knows how it is going on?

Firstly, it would help those who might be trying to help you if you
could post a simple example: input, output, what error message, what
you mean by 'is treated wrong' ... and when it comes to Unicode
objects (indeed any text), show us repr(text) -- "what you see is
often not what others see and often not what you've actually got".

Secondly, are any of the contents of PEP 263 of any use to you?
http://www.python.org/peps/pep-0263.html

janeaustine50 · May 17, 2005

John Machin ìž‘ì„±:

I don't understand this. Suppose your "different encoding" is cp125x
(where x is a digit). Would you not do something like this?

compile_input = user_input.decode('cp125x')
code_object = compile(compile_input, ......

Firstly, it would help those who might be trying to help you if you
could post a simple example: input, output, what error message, what
you mean by 'is treated wrong' ... and when it comes to Unicode
objects (indeed any text), show us repr(text) -- "what you see is
often not what others see and often not what you've actually got".

Secondly, are any of the contents of PEP 263 of any use to you?
http://www.python.org/peps/pep-0263.html

Okay, I'll use one of the CJK codecs as the example. EUC-KR is the
default encoding.
u'\ud55c\uae00'

So I reckon that the "compile" should get a unicode object. However...

C:\Python24\Lib>python code.py

<string>(1)?()

(Pdb) c
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
Am I right that I assume the problem lies in the code.py(and therefore
in codeop.py)? To correct the problem, I seem to parse each string and
change the literal unicode object... Hmm... Sounds a bad approach.

janeaustine50 · May 17, 2005

[email protected] said:
John Machin ìž‘ì„±:

I don't understand this. Suppose your "different encoding" is cp125x
(where x is a digit). Would you not do something like this?

compile_input = user_input.decode('cp125x')
code_object = compile(compile_input, ......

Firstly, it would help those who might be trying to help you if you
could post a simple example: input, output, what error message, what
you mean by 'is treated wrong' ... and when it comes to Unicode
objects (indeed any text), show us repr(text) -- "what you see is
often not what others see and often not what you've actually got".

Secondly, are any of the contents of PEP 263 of any use to you?
http://www.python.org/peps/pep-0263.html

Click to expand...

Okay, I'll use one of the CJK codecs as the example. EUC-KR is the
default encoding.
u'\ud55c\uae00'

So I reckon that the "compile" should get a unicode object. However...

C:\Python24\Lib>python code.py

<string>(1)?()

Click to expand...

(Pdb) c
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.

(InteractiveConsole)

Click to expand...

Am I right that I assume the problem lies in the code.py(and therefore
in codeop.py)? To correct the problem, I seem to parse each string and
change the literal unicode object... Hmm... Sounds a bad approach.

Oh, I forgot one more thing.

C:\Python24\Lib>python
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.

John Machin · May 17, 2005

==== This is *EXACTLY* what your problem is =====================================================

==== It would have helped had you followed this ==========

===========================================================

# There's a very strong assumption that the above was originally
encoded in euc-kr but by the time I copied the 2 chars out of my
browser it was definitely Unicode. See what I mean about using repr()?

[big snip]

Like I said, *ALL* you have to do (like in any other Unicode-aware
app) is decode your user input into Unicode (you *don't* need to parse
bits and pieces of it) and feed it in ... like this:

HTH,
John

janeaustine50 · May 17, 2005

John said:
On 16 May 2005 16:44:30 -0700, (e-mail address removed) wrote: [snip]

Like I said, *ALL* you have to do (like in any other Unicode-aware
app) is decode your user input into Unicode (you *don't* need to parse
bits and pieces of it) and feed it in ... like this:

HTH,
John

Serge.Orlov · May 17, 2005

That is the problem. Non-ascii characters in byte strings are
deprecated. Here is what I get when I run a deprecated hello
world program in Russian:
------- hello.py ---------
print "Ð—Ð´Ñ€Ð°Ð²ÑÑ‚Ð²ÑƒÐ¹, Ð¼Ð¸Ñ€!"
--------------------------
C:\py>c:\Python24\python.exe hello.py
sys:1: DeprecationWarning: Non-ASCII character '\xc7' in file
text.py on line 1, but no encoding declared; see
http://www.python.org/peps/pep-0263.html for details
â•ŸÑ„ÐÑ€Ñ‚Ñ‘Ð„Ñ‚Ñ”Ñ‰, ÑŒÑˆÐ!
--------------------------
Oops, not only there is a warning, but it doesn't even work
on Windows in Russian locale. To correct the program I need
to switch to unicode strings:
------- hello.py ---------
# -*- coding: windows-1251 -*-
print u"Ð—Ð´Ñ€Ð°Ð²ÑÑ‚Ð²ÑƒÐ¹, Ð¼Ð¸Ñ€!"
--------------------------
C:\py>c:\Python24\python.exe hello.py
Ð—Ð´Ñ€Ð°Ð²ÑÑ‚Ð²ÑƒÐ¹, Ð¼Ð¸Ñ€!
--------------------------

Since non-ascii characters are deprecated in byte strings,
any non-ascii encoding for sys.getdefaultencoding() is
deprecated as well. Don't set it to 'euc-kr'.

Any suggestions or any plans in future python versions?

In python 3.0 byte strings will be gone. So you won't be
able to put non-ascii characters into them.

Serge.

Serge.Orlov · May 17, 2005

[email protected] said:
Thank you but there is still a problem.

|>>> s='euckr="\xc7\xd1";uni=u"\xc7\xd1"'
|>>> su=s.decode('euc-kr')
|>>> su
|u'euckr="\ud55c";uni=u"\ud55c"'

su[7] is a non-ascii character inside the byte string euckr

|>>> c=compile(su,'','single')
|>>> exec c
|>>> euckr,uni
|('\xed\x95\x9c', u'\ud55c')
|>>>

As you see the single's result is turned into UTF-8 encoding.

See my previous message. Non-ascii characters in byte strings
are deprecated.

Serge.

encoding problem	11	Dec 19, 2008
preferred way to set encoding for print	5	Sep 15, 2009
newbie with a encoding question, please help	8	Apr 1, 2010
compile(unicode) & source encoding	2	Nov 28, 2003
Best ways of managing text encodings in source/regexes?	6	Nov 26, 2007
Is there a way to change the default string encoding?	4	Aug 21, 2007
Use of Unicode in Python 2.5 source code literals	3	May 3, 2009
What the \xc2\xa0 ?!!	1	Sep 7, 2010

Anyway to designating the encoding of the "source" for compile?

janeaustine50

janeaustine50

John Machin

janeaustine50

janeaustine50

John Machin

janeaustine50

Serge.Orlov

Serge.Orlov

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads