Python3 - encoding issues

D

DreiJane

Hello,

at first i must beg the pardon of those from you, whose mailboxes got
flooded by my last announcement of depikt. I myself get no emails from
this list, and when i had done my corrections and posted each of the
sligthly improved versions, i wasn't aware of the extra emails that
produces. Sorry !

I read here recently, that some reagard Python3 worse at encoding
issues than former versions. For me, a German, quite the contrary is
true. The automatic conversion without an Exception from before 3 has
caused pain over pain during the last years. Even some weeks before it
happened, that pygtk suddenly returned utf-8, not unicode, and my
software had delivered a lot of muddled automatically written emails,
before i saw the mess. Python 3 would have raised Exceptions - however
the translation of my software to 3 has just begun.

Now there is a concept of two separated worlds, and i have decided to
use bytes for my software. The string representation, that output
needs anyway, and with depikt and a changed apsw (file reads anyway)
or other database-APIs (internally they all understand utf-8) i can
get utf-8 for all input too.

This means, that i do not have the standard string methods, but
substitutes are easily made. Not for a subclass of bytes, that
wouldn't have the b"...." initialization. Thus only in form of
functions. Here are some of my utools:

u0 = "".encode('utf-8')
def u(s):
if type(s) in (int, float, type): s = str(s)
if type(s) == str: return s.encode("utf-8")
if type(s) == bytes: # we keep the two worlds cleanly separated
raise TypeError(b"argument is bytes already")
raise TypeError(b"Bad argument for utf-encoding")

def u_startswith(s, test):
try:
if s.index(test) == 0: return True
except: # a bit frisky perhaps
return False

def u_endswith(s, test):
if s[-len(test):] == test: return True
return False

def u_split(s, splitter):
ret = []
while s and splitter in s:
if u_startswith(s, splitter):
s = s[len(splitter):]; continue
ret += s[:s.index[splitter]]
return ret +

def u_join(joiner, l):
while True:
if len(l) in (0,1): return l
else: l = [l[0]+joiner+l[1]]+l[2:]

(not all with the standard signatures). Writing them is trivial. Note
u0 - unfortunately b"" doesn't at all work as expected, i had to learn
the hard way.

Looking more close to these functions one sees, that they only use the
sequence protocol. "index" is in the sequence protocol too now - there
the library reference has still to be updated. Thus all of these and
much more string methods could get to the sequence protocol too
without much work - then nobody would have to write all this. This
doesn't only affect string-like objects: split and join for lists
could open interesting possibilities for list representations of trees
for example.

Does anybody want to make a PEP from this (i won't do so) ?

Joost Behrends
 
B

Benjamin Peterson

DreiJane said:
Does anybody want to make a PEP from this (i won't do so) ?

I will answer this query with a little interactive prompt session:

$ python3
Python 3.1.1 (r311:74480, Nov 14 2009, 13:56:40)
[GCC 4.3.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
data = b"Good, morning"
data.startswith(b"Good") True
data.split(b", ") [b'Good', b'morning']
x = data.split(b", ")
b", ".join(x)
b'Good, morning'

Bytes already have the basic string functions!
 
D

DreiJane

No, sorry, i must correct me. There is a paragraph below on the quoted
site.
".index" is still under "Mutable sequence types" - but bytes are
treated below.
 
B

Benjamin Peterson

DreiJane said:
Ohhh - that's nice. But no words of that in the library reference
here:
http://docs.python.org/3.1/library/stdtypes.html#sequence-types-str-bytes-bytearray-list-tuple-range

That's because it's here:
http://docs.python.org/3.1/library/stdtypes.html#bytes-and-byte-array-methods
Still this fails:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'startswith'

Still methods of this kind would have a better place in the sequence
protocol.

You are welcome to bring this idea to the python-ideas list, just know that it
has a small chance of being accepted.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Range / empty list issues?? 1
Issues with writing pytest 0
Translater + module + tkinter 1
ChatBot 4
Crawling 1
Finding Relative Maxima in Python3 1
Why Python3 12
files.py (encoding error) 0

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top