Unicode in Python

Discussion in 'Python' started by Rustom Mody, Apr 23, 2014.

  1. Rustom Mody

    Rustom Mody Guest

    Chris Angelico wrote:
    > it's impossible for most people to type (and programming with a palette
    > of arbitrary syntactic tokens isn't my idea of fun)...


    Where's the suggestion to use a "palette of arbitrary tokens" ?

    I just tried a greek keyboard; ie do
    $ setxkbmap -option "grp:switch,grp:alt_shift_toggle,grp_led:scroll" -layout "us,gr"

    Thereafter typing
    abcdefghijklmnopqrstuvwxyz
    after a Shift-Alt
    gives
    αβψδεφγηιξκλμνοπ;Ïστθωςχυζ

    One more Shift-Alt and back to roman

    IOW the extra typing cost for greek letters is negligible
    over the corresponding roman ones

    Of course
    - One would need to define such a keyboard (setxkb)
    - One would have to find similar technologies for other OSes (Im on
    debian; even ubuntu/unity grabs too many keys)
     
    Rustom Mody, Apr 23, 2014
    #1
    1. Advertising

  2. On Wed, Apr 23, 2014 at 3:31 PM, Rustom Mody <> wrote:
    > Chris Angelico wrote:
    >> it's impossible for most people to type (and programming with a palette
    >> of arbitrary syntactic tokens isn't my idea of fun)...

    >
    > Where's the suggestion to use a "palette of arbitrary tokens" ?
    >
    > I just tried a greek keyboard; ie do
    > $ setxkbmap -option "grp:switch,grp:alt_shift_toggle,grp_led:scroll" -layout "us,gr"
    >
    > Thereafter typing
    > abcdefghijklmnopqrstuvwxyz
    > after a Shift-Alt
    > gives
    > αβψδεφγηιξκλμνοπ;Ïστθωςχυζ
    >
    > One more Shift-Alt and back to roman


    Okay. Now what about your other symbols? Your alternative assignment
    operator, for instance. How do you type that?

    ChrisA
     
    Chris Angelico, Apr 23, 2014
    #2
    1. Advertising

  3. On Tue, 22 Apr 2014 22:31:41 -0700, Rustom Mody wrote:

    > Chris Angelico wrote:
    >> it's impossible for most people to type (and programming with a palette
    >> of arbitrary syntactic tokens isn't my idea of fun)...

    >
    > Where's the suggestion to use a "palette of arbitrary tokens" ?
    >
    > I just tried a greek keyboard; ie do
    > $ setxkbmap -option "grp:switch,grp:alt_shift_toggle,grp_led:scroll"
    > -layout "us,gr"
    >
    > Thereafter typing
    > abcdefghijklmnopqrstuvwxyz
    > after a Shift-Alt
    > gives
    > αβψδεφγηιξκλμνοπ;Ïστθωςχυζ
    >
    > One more Shift-Alt and back to roman
    >
    > IOW the extra typing cost for greek letters is negligible over the
    > corresponding roman ones



    25 Unicode characters down, 1114000+ to go :)

    There's not just the keyboard mapping. There's the mental cost of knowing
    which keyboard mapping you need ("is it Greek, Hebrew, or maths
    symbols?"), the cost of remembering the mapping from the keys you see on
    the keyboard to the keys they are mapped to ("is Ω mapped to O or W?")
    and so forth. If you know lambda-calculus, you might associate λ with
    functions, but if you don't, it's as obfuscated as associating Ч with
    raising exceptions.

    if not isinstance(obj, int):
    ЧTypeError("expected an int, got %r" % type(obj))




    --
    Steven
     
    Steven D'Aprano, Apr 23, 2014
    #3
  4. On Tue, Apr 22, 2014 at 10:52 PM, Steven D'Aprano <> wrote:
    > There's not just the keyboard mapping. There's the mental cost of knowing
    > which keyboard mapping you need ("is it Greek, Hebrew, or maths
    > symbols?"), the cost of remembering the mapping from the keys you see on
    > the keyboard to the keys they are mapped to ("is Ω mapped to O or W?")
    > and so forth. If you know lambda-calculus, you might associate λ with
    > functions, [...]


    Or if you know Python and the name of the letter ("lambda").

    But yes, typing out the special characters is annoying. I just use
    words. The only downside to using words is, how do you specify capital
    versus lowercase letters? "Gamma = ..." violates the style guide! :(

    -- Devin
     
    Devin Jeanpierre, Apr 23, 2014
    #4
  5. Rustom Mody

    Rustom Mody Guest

    On Wednesday, April 23, 2014 11:22:33 AM UTC+5:30, Steven D'Aprano wrote:

    > 25 Unicode characters down, 1114000+ to go :)


    The question would arise if there was some suggestion to add
    1114000(+) characters to the syntactic/lexical definition of python.

    IOW while its true that unicode is a character-set, its better to think
    of it as a repertory -- here is the universal set from which a choice is available.

    On Wednesday, April 23, 2014 11:20:35 AM UTC+5:30, Chris Angelico wrote:
    > On Wed, Apr 23, 2014 at 3:31 PM, Rustom Mody wrote:
    > > Chris Angelico wrote:
    > >> it's impossible for most people to type (and programming with a palette
    > >> of arbitrary syntactic tokens isn't my idea of fun)...

    > > Where's the suggestion to use a "palette of arbitrary tokens" ?
    > > I just tried a greek keyboard; ie do
    > > $ setxkbmap -option "grp:switch,grp:alt_shift_toggle,grp_led:scroll" -layout "us,gr"
    > > Thereafter typing
    > > abcdefghijklmnopqrstuvwxyz
    > > after a Shift-Alt
    > > gives
    > > αβψδεφγηιξκλμνοπ;Ïστθωςχυζ
    > > One more Shift-Alt and back to roman


    > Okay. Now what about your other symbols? Your alternative assignment
    > operator, for instance. How do you type that?


    In case you missed it, I said:

    > Of course
    > - One would need to define such a keyboard (setxkb)
    > - One would have to find similar technologies for other OSes


    In more detail:
    In our normal use of a US-104 keyboard, every letter 'costs' something.
    eg 'a' costs 1 keystroke
    'A' costs 2 (Shift+a)
    Most people do not count that as a significant cost.
    and when kids come on this list and talk smsese -- i wanna do so-n-so

    we chide them for keystrokes at the cost of readability.

    In such a (default) setup typing a ∧ or ∨ is not possible at all without
    something like a char-picker and at best has an ergonomic cost that is an
    order of magnitude higher than the 'naturally available' characters.

    On the other hand when/if a keyboard mapping is defined in which
    the characters that are commonly needed are available, it is
    reasonable to expect the ∨,∧ to cost no more than 2 strokeseach
    (ie about as much as an 'A'; slightly more than an 'a'. Which means
    that '∨' is expected to cost about the same as 'or' and ∧ to cost less than an 'and'

    Readability is another question altogether.
    Random example from my machine
    calendar.py line 99
    If one finds this:

    return year % 4 == 0 and (year % 100 != 0 or year % 400 == 0)

    more readable than
    return year%4=0 ∧ (year%100≠0 ∨ year%100 = 0)
    then perhaps the following is the most preferred?

    COMPUTE YEAR MODULO 4 EQUALS 0 AND YEAR MODULO 100 NOT
    EQUAL TO ZERO OR YEAR MODULO 100 EQUAL to 0

    IOW COBOL is desirable?
     
    Rustom Mody, Apr 23, 2014
    #5
  6. On Wed, Apr 23, 2014 at 4:57 PM, Rustom Mody <> wrote:
    > In such a (default) setup typing a ∧ or ∨ is not possibleat all without
    > something like a char-picker and at best has an ergonomic cost that is an
    > order of magnitude higher than the 'naturally available' characters.
    >
    > On the other hand when/if a keyboard mapping is defined in which
    > the characters that are commonly needed are available, it is
    > reasonable to expect the ∨,∧ to cost no more than 2 strokes each
    > (ie about as much as an 'A'; slightly more than an 'a'. Which means
    > that '∨' is expected to cost about the same as 'or' and ∧to cost less than an 'and'


    So how much effort are you going to go to for, effectively, the same
    end result? You can type "or" with the same keystrokes, and it takes
    zero setup work and zero memorization (you may forget which keystroke
    you set up for ∨, but I doubt you'll forget how to spell "or", evenif
    you think it means gold/yellow). Where's the benefit? I'm seriously
    not seeing it.

    ChrisA
     
    Chris Angelico, Apr 23, 2014
    #6
  7. On Tue, 22 Apr 2014 23:57:46 -0700, Rustom Mody wrote:

    > perhaps the following is the most preferred?
    >
    > COMPUTE YEAR MODULO 4 EQUALS 0 AND YEAR MODULO 100 NOT EQUAL TO ZERO OR
    > YEAR MODULO 100 EQUAL to 0
    >
    > IOW COBOL is desirable?


    If the only choices are COBOL on one hand and the mutant offspring of
    Perl and APL on the other, I'd vote for COBOL.

    But surely they aren't the only options, and it is possible to find a
    happy medium which is neither excessively verbose nor painfully,
    cryptically terse.

    Remember that we're talking about general purpose programming here. There
    are domains which favour terseness and a vast number of symbols, e.g.
    mathematics, but most programming is not in that domain, even when it
    uses tools from that domain.


    --
    Steve
     
    Steven D'Aprano, Apr 23, 2014
    #7
  8. On Tue, 22 Apr 2014 23:57:46 -0700, Rustom Mody wrote:

    > On the other hand when/if a keyboard mapping is defined in which the
    > characters that are commonly needed are available, it is reasonable to
    > expect the ∨,∧ to cost no more than 2 strokes each (ie about as much as
    > an 'A'; slightly more than an 'a'. Which means that '∨' is expected to
    > cost about the same as 'or' and ∧ to cost less than an 'and'


    Oh, a further thought...

    Consider your example:

    return year%4=0 ∧ (year%100≠0 ∨ year%100 = 0)

    vs

    return year%4=0 and (year%100!=0 or year%100 = 0)


    [aside: personally I like ≠ and if there was a platform independent way
    to type it in any editor, I'd much prefer it over != or <> ]

    Apart from the memorization problem, which I've already touched on, there
    is the mode problem. Keyboard layouts are modes, and you're swapping
    modes. Every time you swap modes, there is a small mental cost. Think of
    it as an interrupt which has to be caught, pausing the current thought
    and starting a new one. So rather than:

    char char char char char char char ...

    you have:

    char char char INTERRUPT
    char INTERRUPT
    char char char ...


    which is a heavier cost that it appears from just counting keystrokes. Of
    course, the more experienced you become, the smaller that cost will be,
    but it will never be quite as low as just a "regular" keystroke.

    Normally, when people use multiple keyboards, its because that interrupt
    cost is amortized over a significant amount of typing:

    INTERRUPT (English layout)
    paragraph paragraph paragraph paragraph
    INTERRUPT (Greek layout)
    paragraph paragraph paragraph
    INTERRUPT (English again)
    paragraph ...

    and possibly even lost in the noise of a far greater interrupt, namely
    task-switching from one application to another. So it's manageable. But
    switching layouts for a single character is likely to be far more
    painful, especially for casual users of that layout.

    Based on an extremely generous estimate that I use "lambda" four times in
    100 lines of code, I might use λ perhaps once in a thousand non-Greek
    characters. Similarly, I might use ∧ or ∨ maybe once per hundred
    characters. That means I'm unlikely to ever get familiar enough with
    those that the cost of two interrupts per use will be negligible.


    --
    Steven
     
    Steven D'Aprano, Apr 23, 2014
    #8
  9. Rustom Mody

    Rustom Mody Guest

    On Wednesday, April 23, 2014 1:23:00 PM UTC+5:30, Steven D'Aprano wrote:
    > On Tue, 22 Apr 2014 23:57:46 -0700, Rustom Mody wrote:


    > > On the other hand when/if a keyboard mapping is defined in which the
    > > characters that are commonly needed are available, it is reasonable to
    > > expect the ∨,∧ to cost no more than 2 strokes each (ie about as much as
    > > an 'A'; slightly more than an 'a'. Which means that '∨' is expected to
    > > cost about the same as 'or' and ∧ to cost less than an 'and'


    > Oh, a further thought...


    > Consider your example:


    > return year%4=0 ∧ (year%100≠0 ∨ year%100 = 0)


    > vs


    > return year%4=0 and (year%100!=0 or year%100 = 0)


    > [aside: personally I like ≠ and if there was a platform independent way
    > to type it in any editor, I'd much prefer it over != or <> ]


    > Apart from the memorization problem, which I've already touched on, there
    > is the mode problem. Keyboard layouts are modes, and you're swapping
    > modes. Every time you swap modes, there is a small mental cost. Think of
    > it as an interrupt which has to be caught, pausing the current thought
    > and starting a new one. So rather than:


    > char char char char char char char ...


    > you have:


    > char char char INTERRUPT
    > char INTERRUPT
    > char char char ...


    > which is a heavier cost that it appears from just counting keystrokes. Of
    > course, the more experienced you become, the smaller that cost will be,
    > but it will never be quite as low as just a "regular" keystroke.


    > Normally, when people use multiple keyboards, its because that interrupt
    > cost is amortized over a significant amount of typing:


    > INTERRUPT (English layout)
    > paragraph paragraph paragraph paragraph
    > INTERRUPT (Greek layout)
    > paragraph paragraph paragraph
    > INTERRUPT (English again)
    > paragraph ...


    > and possibly even lost in the noise of a far greater interrupt, namely
    > task-switching from one application to another. So it's manageable. But
    > switching layouts for a single character is likely to be far more
    > painful, especially for casual users of that layout.


    > Based on an extremely generous estimate that I use "lambda" four times in
    > 100 lines of code, I might use λ perhaps once in a thousand non-Greek
    > characters. Similarly, I might use ∧ or ∨ maybe once per hundred
    > characters. That means I'm unlikely to ever get familiar enough with
    > those that the cost of two interrupts per use will be negligible.


    Its gratifying to see an argument whose framing is cognitive-based!

    More on that later.

    For now: mode/modeless

    Yes most of us prefer the Shift key to the Caps Lock even for stretches of capitals. So analogously here is a modeless solution

    Earlier I found this mode-switching version
    $ setxkbmap -option "grp:switch,grp:alt_shift_toggle,grp_led:scroll" -layout "us,gr"
    this makes Shift-Alt the mode-switcher

    This one on the other hand
    $ setxkbmap -layout "us,gr" -option "grp:switch"
    will make right-alt behave like 'Greek-Shift'

    ie typing
    abcdefghijklmnopqrstuvwxyz
    with RAlt depressed throughout, produces
    αβψδεφγηιξκλμνοπ;Ïστθωςχυζ

    This makes the a Greek letter's ergonomic cost identical to a capital English
    letter's: For Greek use RAlt the way one uses Shift for English.

    Notes:
    1. Tried on Debian and Ubuntu -- Recent Ubuntus are rather more ill-mannered in
    the way they appropriates keys. Still it works as far as I can see.

    2. ';' ?? ie semicolon is produced from 'q'? Whats that semicolon doing there?? But then Greek is -- well -- Greek to me! (As is xkb!)
     
    Rustom Mody, Apr 23, 2014
    #9
  10. Rustom Mody

    Guest

    ==========

    I wrote once 90 % of Python 2 apps (a generic term) supposed to
    process text, strings are not working.

    In Python 3, that's 100 %. It is somehow only by chance, apps may
    give the illusion they are properly working.

    jmf
     
    , Apr 26, 2014
    #10
  11. <> wrote in message
    news:...
    > ==========
    >
    > I wrote once 90 % of Python 2 apps (a generic term) supposed to
    > process text, strings are not working.
    >
    > In Python 3, that's 100 %. It is somehow only by chance, apps may
    > give the illusion they are properly working.
    >


    It is quite frustrating when you make these statements without explaining
    what you mean by 'not working'.

    It would be really useful if you could spell out -

    1. what you did
    2. what you expected to happen
    3. what actually happened

    Frank Millman
     
    Frank Millman, Apr 26, 2014
    #11
  12. Rustom Mody

    Ian Kelly Guest

    On Apr 26, 2014 3:46 AM, "Frank Millman" <> wrote:
    >
    >
    > <> wrote in message
    > news:...
    > > ==========
    > >
    > > I wrote once 90 % of Python 2 apps (a generic term) supposed to
    > > process text, strings are not working.
    > >
    > > In Python 3, that's 100 %. It is somehow only by chance, apps may
    > > give the illusion they are properly working.
    > >

    >
    > It is quite frustrating when you make these statements without explaining
    > what you mean by 'not working'.


    As far as anybody has been able to determine, what jmf means by "not
    working" is that strings containing the € character are handled less
    efficiently than strings that do not contain it in certain contrived test
    cases.
     
    Ian Kelly, Apr 26, 2014
    #12
  13. Rustom Mody

    Guest

    Le samedi 26 avril 2014 15:38:29 UTC+2, Ian a écrit :
    > On Apr 26, 2014 3:46 AM, "Frank Millman" <> wrote:
    >
    > >

    >
    > >

    >
    > > <> wrote in message

    >
    > > news:...

    >
    > > > ==========

    >
    > > >

    >
    > > > I wrote once 90 % of Python 2 apps (a generic term) supposed to

    >
    > > > process text, strings are not working.

    >
    > > >

    >
    > > > In Python 3, that's 100 %. It is somehow only by chance, apps may

    >
    > > > give the illusion they are properly working.

    >
    > > >

    >
    > >

    >
    > > It is quite frustrating when you make these statements without explaining

    >
    > > what you mean by 'not working'.

    >
    > As far as anybody has been able to determine, what jmf means by "not working" is that strings containing the EURO character are handled less efficiently than strings that do not contain it in certain contrived test cases.


    -----


    'EURO SIGN' ? No, it's just a character!
     
    , Apr 27, 2014
    #13
  14. Rustom Mody

    Rustom Mody Guest

    On Wednesday, April 23, 2014 11:29:13 PM UTC+5:30, Rustom Mody wrote:
    > On Wednesday, April 23, 2014 1:23:00 PM UTC+5:30, Steven D'Aprano wrote:
    > > On Tue, 22 Apr 2014 23:57:46 -0700, Rustom Mody wrote:


    > > > On the other hand when/if a keyboard mapping is defined in which the
    > > > characters that are commonly needed are available, it is reasonable to
    > > > expect the ∨,∧ to cost no more than 2 strokes each (ie about as much as
    > > > an 'A'; slightly more than an 'a'. Which means that '∨' is expected to
    > > > cost about the same as 'or' and ∧ to cost less than an 'and'


    > > Oh, a further thought...


    > > Consider your example:


    > > return year%4=0 ∧ (year%100≠0 ∨ year%100 = 0)


    > > vs


    > > return year%4=0 and (year%100!=0 or year%100 = 0)


    > > [aside: personally I like ≠ and if there was a platform independent way
    > > to type it in any editor, I'd much prefer it over != or <> ]


    I checked haskell and find the unicode support is better.

    For variables (ie identifiers) python and haskell are much the same:

    Python3:

    >>> α = 1
    >>> α

    1

    Haskell:

    Prelude> let α = 1
    Prelude> α
    1


    However in haskell one can also do this unlike python:
    *Main> 2 ≠ 3
    True

    All that's needed to make this work is this set of new-in-terms-of-old definitions:

    [The -- is comments for those things that dont work as one may wish]
    --------------
    import qualified Data.Set as Set
    -- Experimenting with Unicode in Haskell source

    -- Numbers
    x ≠ y = x /= y
    x ≤ y = x <= y
    x ≥ y = x >= y
    x ÷ y = divMod x y
    x ⇑ y = x ^ y

    x × y = x * y -- readability hmmm !!!
    π = pi

    -- ⌊ x = floor x
    -- ⌈ x = ceiling x

    -- Lists
    xs ⤚ ys = xs ++ ys
    n ↑ xs = take n xs
    n ↓ xs = drop n xs

    -- Bools
    x ∧ y = x && y
    x ∨ y = y || y
    -- ¬x = not x


    -- Sets

    x ∈ s = x `Set.member` s
    s ∪ t = s `Set.union` t
    s ∩ t = s `Set.intersection` t
    s ⊆ t = s `Set.isSubsetOf` t
    s ⊂ t = s `Set.isProperSubsetOf` t
    s ⊈ t = not (s `Set.isSubsetOf` t)
    -- ∅ = Set.null
     
    Rustom Mody, Apr 27, 2014
    #14
  15. Rustom Mody

    Guest

    Le samedi 26 avril 2014 15:38:29 UTC+2, Ian a écrit :
    > On Apr 26, 2014 3:46 AM, "Frank Millman" <> wrote:
    >
    > >

    >
    > >

    >
    > > <> wrote in message

    >
    > > news:...

    >
    > > > ==========

    >
    > > >

    >
    > > > I wrote once 90 % of Python 2 apps (a generic term) supposed to

    >
    > > > process text, strings are not working.

    >
    > > >

    >
    > > > In Python 3, that's 100 %. It is somehow only by chance, apps may

    >
    > > > give the illusion they are properly working.

    >
    > > >

    >
    > >

    >
    > > It is quite frustrating when you make these statements without explaining

    >
    > > what you mean by 'not working'.

    >
    > As far as anybody has been able to determine, what jmf means by "not working" is that strings containing the EURO character are handled less efficiently than strings that do not contain it in certain contrived test cases.


    ----

    Python 2.7 + cp1252:
    - Solid and coherent system (nothing to do with the Euro).

    Python 3:
    - It missed the unicode shift.
    - Covering the whole unicode range will not make
    Python a unicode compliant product.
    - Flexible String Representation (a problem per se),
    a mathematical absurditiy which does the opposite of
    the coding schemes endorsed by Unicord.org (sheet of
    paper and pencil!)
    - Very deeply buggy (quadrature of the circle problem).

    Positive side:
    - A very nice tool to teach the coding of characters
    and unicode.

    jmf
     
    , Apr 28, 2014
    #15
  16. Rustom Mody

    Guest

    On Mon, Apr 28, 2014, at 4:57, wrote:
    > Python 3:
    > - It missed the unicode shift.
    > - Covering the whole unicode range will not make
    > Python a unicode compliant product.


    Please cite exactly what portion of the unicode standard requires
    operations with all characters to be handled in the same amount of time
    and space, and forbids optimizations that make some characters handled
    faster or in less space than others.
     
    , May 1, 2014
    #16
  17. Can't help but feed the troll... forgive me.

    On 04/28/2014 02:57 AM, wrote:
    > Python 2.7 + cp1252:
    > - Solid and coherent system (nothing to do with the Euro).


    Except that cp1252 is not unicode. Perhaps some subset of unicode can
    be encoded into bytes using cp1252. But if it works for you keep using
    it, and stop spreading nonsense about FSR.

    > Python 3:
    > - Flexible String Representation (a problem per se),
    > a mathematical absurditiy which does the opposite of
    > the coding schemes endorsed by Unicord.org (sheet of
    > paper and pencil!)
    > - Very deeply buggy (quadrature of the circle problem).


    Maybe it's the language barrier, but whatever it is you are talking
    about, I certainly can't make out.

    You've been ranting about FSR for years without being able to clearly
    say what's wrong with it. Please quote unicode specifications that you
    feel Python does not implement. What unicode characters cannot be
    represented? Does Python choke on certain unicode strings or expose
    entities it should not (like Javascript does)?

    Why would you think that the unicode consortium's list of byte encodings
    are the only possible valid ways of encoding unicode to a byte stream?

    If you're going to continue to write this sort of stuff, please have the
    decency to answer these questions at least.

    > Positive side:
    > - A very nice tool to teach the coding of characters
    > and unicode.


    Indeed.
     
    Michael Torrie, May 2, 2014
    #17
  18. Rustom Mody

    Guest

    Le vendredi 2 mai 2014 05:50:40 UTC+2, Michael Torrie a écrit :
    > Can't help but feed the troll... forgive me.
    >
    >
    >
    > On 04/28/2014 02:57 AM, wrote:
    >
    > > Python 2.7 + cp1252:

    >
    > > - Solid and coherent system (nothing to do with the Euro).

    >
    >
    >
    > Except that cp1252 is not unicode. Perhaps some subset of unicode can
    >
    > be encoded into bytes using cp1252. But if it works for you keep using
    >
    > it, and stop spreading nonsense about FSR.
    >
    >
    >
    > > Python 3:

    >
    > > - Flexible String Representation (a problem per se),

    >
    > > a mathematical absurditiy which does the opposite of

    >
    > > the coding schemes endorsed by Unicord.org (sheet of

    >
    > > paper and pencil!)

    >
    > > - Very deeply buggy (quadrature of the circle problem).

    >
    >
    >
    > Maybe it's the language barrier, but whatever it is you are talking
    >
    > about, I certainly can't make out.
    >
    >
    >
    > You've been ranting about FSR for years without being able to clearly
    >
    > say what's wrong with it. Please quote unicode specifications that you
    >
    > feel Python does not implement. What unicode characters cannot be
    >
    > represented? Does Python choke on certain unicode strings or expose
    >
    > entities it should not (like Javascript does)?
    >
    >
    >
    > Why would you think that the unicode consortium's list of byte encodings
    >
    > are the only possible valid ways of encoding unicode to a byte stream?
    >
    >
    >
    > If you're going to continue to write this sort of stuff, please have the
    >
    > decency to answer these questions at least.
    >
    >
    >
    > > Positive side:

    >
    > > - A very nice tool to teach the coding of characters

    >
    > > and unicode.

    >
    >
    >
    > Indeed.


    ========

    -
     
    , May 3, 2014
    #18
  19. Rustom Mody

    Guest

    Le jeudi 1 mai 2014 19:21:14 UTC+2, a écrit :
    > On Mon, Apr 28, 2014, at 4:57, wrote:
    >
    > > Python 3:

    >
    > > - It missed the unicode shift.

    >
    > > - Covering the whole unicode range will not make

    >
    > > Python a unicode compliant product.

    >
    >
    >
    > Please cite exactly what portion of the unicode standard requires
    >
    > operations with all characters to be handled in the same amount of time
    >
    > and space, and forbids optimizations that make some characters handled
    >
    > faster or in less space than others.


    ==========

    I missed you comment. Regression is only a side effect.

    I can make Python failing (lead Python to failures) with
    any piece of text or valid sequence of characters I wish [*].

    I'm no more writing code (apps), only maintaining
    my interactive interpreters.

    [*] I do not count as failures, issues like cp65001,
    only "basic" text/string manipulations.

    jmf
     
    , May 8, 2014
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Robert Mark Bram
    Replies:
    0
    Views:
    3,984
    Robert Mark Bram
    Sep 28, 2003
  2. ygao

    unicode wrap unicode object?

    ygao, Apr 8, 2006, in forum: Python
    Replies:
    6
    Views:
    576
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Apr 8, 2006
  3. Gabriele *darkbard* Farina

    Unicode digit to unicode string

    Gabriele *darkbard* Farina, May 16, 2006, in forum: Python
    Replies:
    2
    Views:
    542
    Gabriele *darkbard* Farina
    May 16, 2006
  4. Grzegorz ¦liwiñski
    Replies:
    2
    Views:
    1,002
    Grzegorz ¦liwiñski
    Jan 19, 2011
  5. Terry Reedy
    Replies:
    0
    Views:
    86
    Terry Reedy
    Jan 7, 2014
Loading...

Share This Page