IDE for python

M

Mark Lawrence

On Friday, May 30, 2014 8:36:54 PM UTC+5:30, (e-mail address removed) wrote:

It is now about time that we stop taking ASCII seriously!!

This can't happen in the Python world until there is a sensible approach
to unicode. Ah, but wait a minute, the ball was set rolling with Python
3.0. Then came PEP 393 and the Flexible String Representation in Python
3.3 and some strings came down in size by a factor of 75% and in most
cases it was faster. Just what do some people want in life, jam on it?
 
R

Rustom Mody

This can't happen in the Python world until there is a sensible approach
to unicode. Ah, but wait a minute, the ball was set rolling with Python
3.0. Then came PEP 393 and the Flexible String Representation in Python
3.3 and some strings came down in size by a factor of 75% and in most
cases it was faster. Just what do some people want in life, jam on it?

I dont see that these two are related¹

You are talking about the infrastructure needed for writing unicode apps.
The language need not have non-ASCII lexemes for that

I am talking about something quite different.
Think for example of a German wanting to write "Gödel"
According to some conventions (s)he can write Goedel
But if that is forced just because of ASCII/US-104/what-have-u it would justifiably
cause irritation/offense.

Likewise I am talking about the fact that x≠y is prettier than x != y.²

In earlier times the former was not an option.
Today the latter is drawn from an effectively random subset of unicode
only for historical reasons and not anything technologically current.


-----------------------
¹ Ok very very distantly related maybe in the sense that since python is a
key part of modern linux system admin, and getting out of ASCII-jail needs
the infrastructure to work smoothly in the wider unicode world.

² And probably 100s of other such egs, some random sample of which I have listed:
http://blog.languager.org/2014/04/unicoded-python.html
 
W

wxjmfauth

Le vendredi 30 mai 2014 18:15:09 UTC+2, Rustom Mody a écrit :
Yeah :)



As my blog posts labelled unicode will indicate I am a fan of using

unicode in program source:

http://blog.languager.org/search/label/Unicode



Of course it is not exactly a coincidence that I used APL a bit in my

early days. At that time it was great fun though we did not take it

seriously.*



It is now about time that we stop taking ASCII seriously!!



And for those who dont know xetex, its is really xɘtex – a pictorial

anagram if written as XÆŽTEX



However in all fairness I should say that I cannot seem to find my

way to that promised land yet:

- xetex does not quite work whereas pdflatex works smoothly

- mathjax is awesome however its firmly latex (not xetex) based



-------------------

* And the fact that there are recent implementations including web ones means its by no means dead:

http://baruchel.hd.free.fr/apps/apl/
which I think unicode aficionados will enjoy


=========

Ok, thanks for the answer.


"xetex does not quite work whereas pdflatex works smoothly"

?

jmf
 
M

Mark Lawrence

I dont see that these two are related¹

You are talking about the infrastructure needed for writing unicode apps.
The language need not have non-ASCII lexemes for that

I am talking about something quite different.
Think for example of a German wanting to write "Gödel"
According to some conventions (s)he can write Goedel
But if that is forced just because of ASCII/US-104/what-have-u it would justifiably
cause irritation/offense.

Likewise I am talking about the fact that x≠y is prettier than x != y.²

In earlier times the former was not an option.
Today the latter is drawn from an effectively random subset of unicode
only for historical reasons and not anything technologically current.


-----------------------
¹ Ok very very distantly related maybe in the sense that since python is a
key part of modern linux system admin, and getting out of ASCII-jail needs
the infrastructure to work smoothly in the wider unicode world.

² And probably 100s of other such egs, some random sample of which I have listed:
http://blog.languager.org/2014/04/unicoded-python.html

I just happen to like fishing :)
 
R

Rustom Mody

=========
Ok, thanks for the answer.
"xetex does not quite work whereas pdflatex works smoothly"

Problem is a combination of
1. I am a somewhat clueless noob
2. xetex is emerging technology therefore changing fast and not stable

So when something does not work I dont know whether:
- its 1 (I am doing something silly)
- Or 2 (I have actually hit a bug)

I tried writing some small (hello-world) type text using unicode chars rather
the old-fashioned \alpha type of locutions. It worked.
Added a bunch of more latex packages from apt.
It stopped working.
 
W

wxjmfauth

Le vendredi 30 mai 2014 19:30:27 UTC+2, Rustom Mody a écrit :
Problem is a combination of

1. I am a somewhat clueless noob

2. xetex is emerging technology therefore changing fast and not stable



So when something does not work I dont know whether:

- its 1 (I am doing something silly)

- Or 2 (I have actually hit a bug)



I tried writing some small (hello-world) type text using unicode chars rather

the old-fashioned \alpha type of locutions. It worked.

Added a bunch of more latex packages from apt.

It stopped working.



------------------------------

PS It would help all if you read

https://wiki.python.org/moin/GoogleGroupsPython

and dont double-space earlier mails.

========

It's not the place to discuss TeX here.
(I have actually 16 more or less complete "distros" on
my hd on Windows, all working very well. They are on
my hd, but all run from an usb stick as well!)

jmf
 
W

wxjmfauth

Le vendredi 30 mai 2014 18:38:04 UTC+2, Mark Lawrence a écrit :
This can't happen in the Python world until there is a sensible approach

to unicode. Ah, but wait a minute, the ball was set rolling with Python

3.0. Then came PEP 393 and the Flexible String Representation in Python

3.3 and some strings came down in size by a factor of 75% and in most

cases it was faster. Just what do some people want in life, jam on it?



--

My fellow Pythonistas, ask not what our language can do for you, ask

what you can do for our language.



Mark Lawrence



---

This email is free from viruses and malware because avast! Antivirus protection is active.

http://www.avast.com

========

A guy who is understanding unicode would not have even
spent its time in writing a PEP 393 proposal.

I skip the discussion(s) I read here and there about PDF.

Put this comment in relation with my Xe(La)TeX knowledge.

jmf
 
R

Rustom Mody

You are talking about the infrastructure needed for writing unicode apps.
The language need not have non-ASCII lexemes for that
I am talking about something quite different.
Think for example of a German wanting to write "Gödel"
According to some conventions (s)he can write Goedel
But if that is forced just because of ASCII/US-104/what-have-u it would justifiably
cause irritation/offense.

Curiously I just saw this tex/emacs question/answer elsewhere –
particularly amusing the first 'char' of the answer.

Question:
| I'm a new Emacs/Auctex User. Auctex for Emacs is amazing but
| there are some little things could be better. When generating a
| section with c-c c-s the label ist generated automatically. But
| if there is an german Umlaut in the section title like 'ä' this
| becomes just 'a' in the label. Is there any possibility that
| auctex will substitute the 'ä' by 'ae' and not by 'a'?

Answer:
| '�' is not possible, since latex can not handle Umlauts in references.
| For 'ae' I'm sure someone is able to provide a little patch.
 
W

wxjmfauth

Le dimanche 1 juin 2014 03:48:07 UTC+2, Rustom Mody a écrit :
Curiously I just saw this tex/emacs question/answer elsewhere –

particularly amusing the first 'char' of the answer.



Question:

| I'm a new Emacs/Auctex User. Auctex for Emacs is amazing but

| there are some little things could be better. When generating a

| section with c-c c-s the label ist generated automatically. But

| if there is an german Umlaut in the section title like 'ä' this

| becomes just 'a' in the label. Is there any possibility that

| auctex will substitute the 'ä' by 'ae' and not by 'a'?



Answer:

| '�' is not possible, since latex can not handle Umlauts in references.

| For 'ae' I'm sure someone is able to provide a little patch.

%%%%%%%%%%

\begin{document}
""" A small text, αβγ. {\label{étiquette€α}}\\
See page \pageref{étiquette€α}. """
\end{document}
.... See page 1. """
' A small text, αβγ.\nSee page 1. '
jmf
 
M

Marko Rauhamaa

Rustom Mody said:
Think for example of a German wanting to write "Gödel"
According to some conventions (s)he can write Goedel

[...]

| if there is an german Umlaut in the section title like 'ä' this
| becomes just 'a' in the label. Is there any possibility that auctex
| will substitute the 'ä' by 'ae' and not by 'a'?

Answer:
| '�' is not possible, since latex can not handle Umlauts in
| references. For 'ae' I'm sure someone is able to provide a little
| patch.

As a Finnish-speaker, I hope that patch doesn't become default behavior.
Too many times, we have been victimized by the German conventions. A
Finnish-speaker would much rather see

Järvenpää => Jarvenpaa
Öllölä => Ollola
Kärkkäinen => Karkkainen

than

Järvenpää => Jaervenpaeae
Öllölä => Oelloelae
Kärkkäinen => Kaerkkaeinen


Marko
 
C

Chris Angelico

As a Finnish-speaker, I hope that patch doesn't become default behavior.
Too many times, we have been victimized by the German conventions. A
Finnish-speaker would much rather see

Järvenpää => Jarvenpaa
Öllölä => Ollola
Kärkkäinen => Karkkainen

than

Järvenpää => Jaervenpaeae
Öllölä => Oelloelae
Kärkkäinen => Kaerkkaeinen

It's even worse than that. The rules for ASCIIfying adorned characters
vary according to context - Müller and Mueller are different names,
and in many contexts should sort and compare differently, and I
remember reading somewhere that there's a context in which it's more
useful to decompose ü to u rather than ue. There is no "safe" lossy
transformation that can be done to any language's words, and this is
no exception. ASCIIfication has to be accepted as flawed; this issue
(an inability to handle non-ASCII labels) is similar to a lot of blog
URLs - http://rosuav.blogspot.com/2013/08/20th-international-g-festival-awards.html
is talking about the "International G&S Festival" awards, but the URL
drops the "&S" part. (If you absolutely have to transmit something
losslessly in pure ASCII, you need a scheme like Punycode, which is a
lot less clean and readable than a decomposition scheme.)

Of course, the better solution is to permit the full Unicode alphabet
in identifiers...

ChrisA
 
R

Rustom Mody

It's even worse than that. The rules for ASCIIfying adorned characters
vary according to context - Müller and Mueller are different names,
and in many contexts should sort and compare differently, and I
remember reading somewhere that there's a context in which it's more
useful to decompose ü to u rather than ue. There is no "safe" lossy
transformation that can be done to any language's words, and this is
no exception. ASCIIfication has to be accepted as flawed; this issue
(an inability to handle non-ASCII labels) is similar to a lot of blog
URLs - http://rosuav.blogspot.com/2013/08/20th-international-g-festival-awards.html
is talking about the "International G&S Festival" awards, but the URL
drops the "&S" part. (If you absolutely have to transmit something
losslessly in pure ASCII, you need a scheme like Punycode, which is a
lot less clean and readable than a decomposition scheme.)
Of course, the better solution is to permit the full Unicode alphabet
in identifiers...

Yes that is the real point.

Changing the current behavior which maps [ö,ä…] →[o,a…] to a new
behavior that maps it to [oe,ae…], then arguing that this should/should
not become default is the wrong battle.

The more useful line is: Why have this conversion at all?
Until hardly 3 years ago html authors wrote non-ASCII as chars as html entities.
Now the current standard practice is directly to write the character and
make sure the page is explicitly utf-8.

Its only a question of time before this becomes standard practice in
all domains
 
S

Steven D'Aprano

the better solution is to permit the full Unicode alphabet in
identifiers...

I'm not entirely sure about that. Full Unicode support in identifiers
such as URLs doesn't create a brand new vulnerability, but it does
increase it from a fairly minor problem to something *much* harder to
deal with. It's bad enough when somebody manages to fool you into going
to (say) app1e.com instead of apple.com, without also being at risk from
аррlе, аpрlе, арplе and аррle (to mention just a few). At least nobody
can fake .com with .Ñоm.

To put it another way:

py> аррlе = 23
py> apple = 42
py> assert аррlе == apple
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
 
C

Chris Angelico

I'm not entirely sure about that. Full Unicode support in identifiers
such as URLs doesn't create a brand new vulnerability, but it does
increase it from a fairly minor problem to something *much* harder to
deal with. It's bad enough when somebody manages to fool you into going
to (say) app1e.com instead of apple.com, without also being at risk from
аррlе, аpрlе, арplе andаррle (to mention just a few). At least nobody
can fake .com with .Ñоm.

To put it another way:

py> аррlе = 23
py> apple = 42
py> assert аррlе == apple
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError

Yeah, that is a concern. But as you say, it's already possible to
confuse rn with m (in many fonts) and i/l/1, and (on a different
level) Foo, foo, _foo, _Foo, and FOO, or movement_Direction and
movement_direction. If you saw one of those in one part of a program
and another in another, you'd have to consume an annoying amount of
mindspace to keep them separate.

Note, incidentally, that I said "alphabet" rather than the entire
Unicode character set. I do *not* support the use of, for instance,
U+200B 'ZERO WIDTH SPACE' in identifiers, that's just stupid :)

ChrisA
 
W

Wolfgang Maier

Amen.
Ite missa est.

Oh, why all the lamenting about python's unicode support, when your latin is
so superbe ! Elegant solution to all your problems :)
 
T

Tim Golden

Oh, why all the lamenting about python's unicode support, when your latin is
so superbe ! Elegant solution to all your problems :)

After all, if you can't use Latin-1 for Latin, what can you use it for?

TJG
 
W

Wolfgang Maier

Chris Angelico said:
Google Translate says:

Eusebius, et revertatur in domum perito resident.

ChrisA

Oh, the joys of Google Translate.
Round-tripping this through French (as wxjm may do) back to English I get:
Eusebius, and return to their seat in the house experience

I'd translate it roughly as:

I domum, perite omnipräsens unicodicis!

but my last (school) use of Latin is many years back.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top