G
Goran Novosel
Hi everybody.
I've played for few hours with encoding in py, but it's still somewhat
confusing to me. So I've written a test file (encoded as utf-8). I've
put everything I think is true in comment at the beginning of script.
Could you check if it's correct (on side note, script does what I
intended it to do).
One more thing, is there some mechanism to avoid writing all the time
'something'.decode('utf-8')? Some sort of function call to tell py
interpreter that id like to do implicit decoding with specified
encoding for all string constants in script?
Here's my script:
-------------------
# vim: set encoding=utf-8 :
"""
----- encoding and py -----
- 1st (or 2nd) line tells py interpreter encoding of file
- if this line is missing, interpreter assumes 'ascii'
- it's possible to use variations of first line
- the first or second line must match the regular expression
"coding[:=]\s*([-\w.]+)" (PEP-0263)
- some variations:
'''
# coding=<encoding name>
'''
'''
#!/usr/bin/python
# -*- coding: <encoding name> -*-
'''
'''
#!/usr/bin/python
# vim: set fileencoding=<encoding name> :
'''
- this version works for my vim:
'''
# vim: set encoding=utf-8 :
'''
- constants can be given via str.decode() method or via unicode
constructor
- if locale is used, it shouldn't be set to 'LC_ALL' as it changes
encoding
"""
import datetime, locale
#locale.setlocale(locale.LC_ALL,'croatian') # changes encoding
locale.setlocale(locale.LC_TIME,'croatian') # sets correct date
format, but encoding is left alone
print 'default locale:', locale.getdefaultlocale()
s='abcdef ÄŒÄĆćÄ𩹮ž'.decode('utf-8')
ss=unicode('ab ČćŠđŽ','utf-8')
# date part of string is decoded as cp1250, because it's default
locale
all=datetime.date(2000,1,6).strftime("'%d.%m.%Y.', %x, %A, %B,
").decode('cp1250')+'%s, %s' % (s, ss)
print all
-------------------
I've played for few hours with encoding in py, but it's still somewhat
confusing to me. So I've written a test file (encoded as utf-8). I've
put everything I think is true in comment at the beginning of script.
Could you check if it's correct (on side note, script does what I
intended it to do).
One more thing, is there some mechanism to avoid writing all the time
'something'.decode('utf-8')? Some sort of function call to tell py
interpreter that id like to do implicit decoding with specified
encoding for all string constants in script?
Here's my script:
-------------------
# vim: set encoding=utf-8 :
"""
----- encoding and py -----
- 1st (or 2nd) line tells py interpreter encoding of file
- if this line is missing, interpreter assumes 'ascii'
- it's possible to use variations of first line
- the first or second line must match the regular expression
"coding[:=]\s*([-\w.]+)" (PEP-0263)
- some variations:
'''
# coding=<encoding name>
'''
'''
#!/usr/bin/python
# -*- coding: <encoding name> -*-
'''
'''
#!/usr/bin/python
# vim: set fileencoding=<encoding name> :
'''
- this version works for my vim:
'''
# vim: set encoding=utf-8 :
'''
- constants can be given via str.decode() method or via unicode
constructor
- if locale is used, it shouldn't be set to 'LC_ALL' as it changes
encoding
"""
import datetime, locale
#locale.setlocale(locale.LC_ALL,'croatian') # changes encoding
locale.setlocale(locale.LC_TIME,'croatian') # sets correct date
format, but encoding is left alone
print 'default locale:', locale.getdefaultlocale()
s='abcdef ÄŒÄĆćÄ𩹮ž'.decode('utf-8')
ss=unicode('ab ČćŠđŽ','utf-8')
# date part of string is decoded as cp1250, because it's default
locale
all=datetime.date(2000,1,6).strftime("'%d.%m.%Y.', %x, %A, %B,
").decode('cp1250')+'%s, %s' % (s, ss)
print all
-------------------