G
glacier
I use chinese charactors as an example here.
My first question is : what strategy does 'decode' use to tell the way
to seperate the words. I mean since s1 is an multi-bytes-char string,
how did it determine to seperate the string every 2bytes or 1byte?
My second question is: is there any one who has tested very long mbcs
decode? I tried to decode a long(20+MB) xml yesterday, which turns out
to be very strange and cause SAX fail to parse the decoded string.
However, I use another text editor to convert the file to utf-8 and
SAX will parse the content successfully.
I'm not sure if some special byte array or too long text caused this
problem. Or maybe thats a BUG of python 2.5?
My first question is : what strategy does 'decode' use to tell the way
to seperate the words. I mean since s1 is an multi-bytes-char string,
how did it determine to seperate the string every 2bytes or 1byte?
My second question is: is there any one who has tested very long mbcs
decode? I tried to decode a long(20+MB) xml yesterday, which turns out
to be very strange and cause SAX fail to parse the decoded string.
However, I use another text editor to convert the file to utf-8 and
SAX will parse the content successfully.
I'm not sure if some special byte array or too long text caused this
problem. Or maybe thats a BUG of python 2.5?