Using non-ascii symbols

  • Thread starter Christoph Zwerschke
  • Start date
C

Claudio Grondi

Rocco said:
One of the issues that would need to be dealt with in allowing new
operators to be defined is how to work out precedence rules for the new
operators. Right now you can redefine the meaning of addition and
multiplication, but you can't change the order of operations. (Witness
%, and that it must have the same precedence in both multiplication and
string replacement.)

If you allow (semi)arbitrary characters to be used as operators, some
scheme must be chosen for assigning a place in the precedence hierarchy.

Speaking maybe only for myself:
I don't like implicit rules, so I don't like also any precedence
hierarchy being in action, so for safety reasons I always write even
8+6*2 (==20) as 8+(6*2) to be sure all will go the way I expect it.

Claudio
 
C

Christoph Zwerschke

Claudio said:
Speaking maybe only for myself:
I don't like implicit rules, so I don't like also any precedence
hierarchy being in action, so for safety reasons I always write even
8+6*2 (==20) as 8+(6*2) to be sure all will go the way I expect it.

But for people who often use mathematical formulas this looks pretty
weird. If it wasn't a programming language, you wouldn't write an
asterik even, but either a mid dot or nothing. The latter is possible
because contrary to programming languages, you usually use one-letter
names in formulas, so it is clear that ab means a*b, and does not
designate a variable with the name "ab". x**2+y**2+(2*pi*r) looks way
uglier than x²+y²+2πr (another appication for greek letters). Maybe
providing a "formula" or "math style" mode would be sometimes helpful.
Or maybe not, because other conventions of mathematical formulas (long
fraction strokes, using subscript indices and superscript exponents
etc.) couldn't be solved so easily anyway. You would need editors with
the ability to display and input "formula sections" in Python programs
differently. Python would become something like "executable TeX" rather
than "executable pseudo code"...

-- Christoph
 
B

Bengt Richter

Speaking maybe only for myself:
I don't like implicit rules, so I don't like also any precedence
hierarchy being in action, so for safety reasons I always write even
8+6*2 (==20) as 8+(6*2) to be sure all will go the way I expect it.
Maybe you would like the unambiguousness of
(+ 8 (* 6 2))
or
6 2 * 8 +
?

Hm, ... ISTM you could have a concept of all objects as potential operator
objects as now, but instead of selecting methods of the objects according
to special symbols like + - * etc, allow method selection by rules applied
to a sequence of objects for selecting methods. E.g., say
a, X, b, Y, c
is a sequence of objects (happening to be contained in a tuple expression here).
Now let's define seqeval such that
seqeval((a, X, b, Y, c))
looks at the objects to see if they have certain methods, and then calls some of
those methods with some of the other objects as arguments, and applies rules of
precedence and association to do something useful, producing a final result.

I'm just thinking out loud here, but what I'm getting at is being able to write
8+6*2
as
seqeval((8, PLUS, 6, TIMES, 2))
with the appropriate definitions of seqeval and PLUS and TIMES. This is with a view
to having seqeval as a builtin that does standard processing, and then having
a language change to make white-space-separated expressions like
8 PLUS 6 TIMES 2
be syntactic sugar for an implicit
seqeval((8, PLUS, 6, TIMES, 2))
where PLUS and TIMES may be arbitrary user-defined objects suitable for seqeval.
I'm thinking out loud, so I anticipate syntactic ambiguities in expressions and the need to
use parens etc., but this would in effect let us define arbitrarily named operators.
Precedence might be established by looking for PLUS.__precedence__. But as usual,
parens would control precedence dominantly. E.g.,
(8 PLUS 6) TIMES 2
would be sugar for
seqeval((seqeval(8, PLUS, 6), TIMES, 2)

IOW, we have an object sequence expression analogous to a tuple expression without commas.
I guess generator expressions might be somewhat of a problem to disambiguate sometimes, we'll see
how bad that gets ;-)

One way to detect operator objects would be to test callable(obj), which would allow
for functions and types and bound methods etc. Now there needs to be a way of
handling UNARY_PLUS vs PLUS functionality (obviously the name bindings are just mnemonic
and aren't seen by seqeval unless they're part of the operator object). ...

A sketch:
... """evaluate an object sequence. rules tbd."""
... args=[]
... ops=[]
... for obj in objseq:
... if callable(obj):
... if ops[-1:] and obj.__precedence__<= ops[-1].__precedence__:
... args[-2:] = [ops.pop()(*args[-2:])]
... ops.append(obj)
... continue
... elif isinstance(obj, tuple):
... obj = seqeval(obj)
... while len(args)==0 and ops: # unary
... obj = ops.pop()(obj)
... args.append(obj)
... while ops:
... args[-2:] = [ops.pop()(*args[-2:])]
... return args[-1]
... ... print 'PLUS(%s, %s)'%(x,y)
... if y is None: return x
... else: return x+y
... ... print 'MINUS(%s, %s)'%(x,y)
... if y is None: return -x
... else: return x-y
... ... print 'TIMES(%s, %s)'%(x,y)
... return x*y
... TIMES(6, 2)
PLUS(8, 12)
20 PLUS(8, 6)
TIMES(14, 2)
28 PLUS(8, 6)
MINUS(2, None)
TIMES(14, -2)
-28 PLUS(8, 6)
MINUS(14, None)
MINUS(2, None)
TIMES(-14, -2)
28 TIMES(2, 10)
PLUS(20, 5)
TIMES(2, 10)
PLUS(20, 7)
TIMES(2, 100)
PLUS(200, 5)
TIMES(2, 100)
PLUS(200, 7)
TIMES(3, 10)
PLUS(30, 5)
TIMES(3, 10)
PLUS(30, 7)
TIMES(3, 100)
PLUS(300, 5)
TIMES(3, 100)
PLUS(300, 7)
[25, 27, 205, 207, 35, 37, 305, 307]

Regards,
Bengt Richter
 
C

Claudio Grondi

Bengt said:
Speaking maybe only for myself:
I don't like implicit rules, so I don't like also any precedence
hierarchy being in action, so for safety reasons I always write even
8+6*2 (==20) as 8+(6*2) to be sure all will go the way I expect it.

Maybe you would like the unambiguousness of
(+ 8 (* 6 2))
or
6 2 * 8 +
?

Hm, ... ISTM you could have a concept of all objects as potential operator
objects as now, but instead of selecting methods of the objects according
to special symbols like + - * etc, allow method selection by rules applied
to a sequence of objects for selecting methods. E.g., say
a, X, b, Y, c
is a sequence of objects (happening to be contained in a tuple expression here).
Now let's define seqeval such that
seqeval((a, X, b, Y, c))
looks at the objects to see if they have certain methods, and then calls some of
those methods with some of the other objects as arguments, and applies rules of
precedence and association to do something useful, producing a final result.

I'm just thinking out loud here, but what I'm getting at is being able to write
8+6*2
as
seqeval((8, PLUS, 6, TIMES, 2))
with the appropriate definitions of seqeval and PLUS and TIMES. This is with a view
to having seqeval as a builtin that does standard processing, and then having
a language change to make white-space-separated expressions like
8 PLUS 6 TIMES 2
be syntactic sugar for an implicit
seqeval((8, PLUS, 6, TIMES, 2))
where PLUS and TIMES may be arbitrary user-defined objects suitable for seqeval.
I'm thinking out loud, so I anticipate syntactic ambiguities in expressions and the need to
use parens etc., but this would in effect let us define arbitrarily named operators.
Precedence might be established by looking for PLUS.__precedence__. But as usual,
parens would control precedence dominantly. E.g.,
(8 PLUS 6) TIMES 2
would be sugar for
seqeval((seqeval(8, PLUS, 6), TIMES, 2)

IOW, we have an object sequence expression analogous to a tuple expression without commas.
I guess generator expressions might be somewhat of a problem to disambiguate sometimes, we'll see
how bad that gets ;-)

One way to detect operator objects would be to test callable(obj), which would allow
for functions and types and bound methods etc. Now there needs to be a way of
handling UNARY_PLUS vs PLUS functionality (obviously the name bindings are just mnemonic
and aren't seen by seqeval unless they're part of the operator object). ...

A sketch:
... """evaluate an object sequence. rules tbd."""
... args=[]
... ops=[]
... for obj in objseq:
... if callable(obj):
... if ops[-1:] and obj.__precedence__<= ops[-1].__precedence__:
... args[-2:] = [ops.pop()(*args[-2:])]
... ops.append(obj)
... continue
... elif isinstance(obj, tuple):
... obj = seqeval(obj)
... while len(args)==0 and ops: # unary
... obj = ops.pop()(obj)
... args.append(obj)
... while ops:
... args[-2:] = [ops.pop()(*args[-2:])]
... return args[-1]
...... print 'PLUS(%s, %s)'%(x,y)
... if y is None: return x
... else: return x+y
...... print 'MINUS(%s, %s)'%(x,y)
... if y is None: return -x
... else: return x-y
...... print 'TIMES(%s, %s)'%(x,y)
... return x*y
...TIMES(6, 2)
PLUS(8, 12)
20PLUS(8, 6)
TIMES(14, 2)
28PLUS(8, 6)
MINUS(2, None)
TIMES(14, -2)
-28PLUS(8, 6)
MINUS(14, None)
MINUS(2, None)
TIMES(-14, -2)
28TIMES(2, 10)
PLUS(20, 5)
TIMES(2, 10)
PLUS(20, 7)
TIMES(2, 100)
PLUS(200, 5)
TIMES(2, 100)
PLUS(200, 7)
TIMES(3, 10)
PLUS(30, 5)
TIMES(3, 10)
PLUS(30, 7)
TIMES(3, 100)
PLUS(300, 5)
TIMES(3, 100)
PLUS(300, 7)
[25, 27, 205, 207, 35, 37, 305, 307]

Regards,
Bengt Richter
At the first glance I like this concept much and mean it is very
Pythonic in the sense of the term as I understand it. I would be glad to
see it implemented if it does not result in any side effects or other
problems I can't currently anticipate.

Claudio
 
M

Magnus Lycka

Terry said:
That's interesting. I think many people in the West tend to
imagine han/kanji characters as archaisms that will
disappear (because to most Westerners they seem impossibly
complex to learn and use, "not suited for the modern
world").
I don't know about "the West". Isn't it more typical for the
US that people believe that "everybody really wants to be like
us". Here in Sweden, *we* obviously want to be like you, even
if we don't admit it openly, but we don't suffer from the
misconception that this applies to all of the world. ;)
After taking a couple of semesters of Japanese, though, I've
come to appreciate why they are preferred. Getting rid of
them would be like convincing English people to kunvurt to
pur fonetik spelin'.

Which isn't happening either, I can assure you. ;-)
The Germans just had a spelling reform. Norway had a major
language reform in the mid 19th century to get rid of the old
Danish influences (and still have two completely different ways
of spelling everything). You never know what will happen. You
are also embracing the metric system, inch by inch... ;)

Actually, it seems that recent habit of sending text messages
via mobile phones is the prime driver for reformed spelling
these days.
I'm not sure I understand how this works, but surely if
Python can provide readline support in the interactive
shell, it ought to be able to handle "phrase input"/"kanji
input." Come to think of it, you probably can do this by
running the interpreter in a kanji terminal -- but Python
just doesn't know what to do with the characters yet.
I'm sure the same principles could be used to make a very fast
and less misspelling prone editing environment though. That
could actually be a reason to step away from vi or Emacs (but
I assume it would soon work in Emacs too...)
I would like to point out also, that as long as Chinese
programmers don't go "hog wild" and use obscure characters,
I suspect that I would have much better luck reading their
programs with han characters, than with, say, the Chinese
phonetic names! Possibly even better than what they thought
were the correct English words, if their English isn't that
good.
You certainly have a point there. Even when I don't work in an
English speaking environment as I do now, I try to write all
comments and variable names etc in English. You never know when
you need to show a code snippet to people who don't read Swedish.
Also, ASCII lacks three of our letters and properly translated
is often better than written with the wrong letters.

On the other hand, if the target users describe their problem
domain with e.g. a Swedish terminology, translating all terms
will take time and increase confusion. Also, there are plenty
of programmers who don't write English so well...
 
D

Dave Hansen

Just a couple half-serious responses to your comment...

I don't know about "the West". Isn't it more typical for the
US that people believe that "everybody really wants to be like
us". Here in Sweden, *we* obviously want to be like you, even
if we don't admit it openly, but we don't suffer from the
misconception that this applies to all of the world. ;)

1) Actually, we don't think "everyone wants to be like us." More like
"anyone who doesn't want to be like us is weird."

2) This extends to our own fellow citizens.

Regards,
-=Dave
 
D

Dave Hansen

[...]
Maybe you would like the unambiguousness of
(+ 8 (* 6 2))
or
6 2 * 8 +
?

Well, I do like lisp and Forth, but would prefer Python to remain
Python.

Though it's hard to fit Python into 1k on an 8-bit mocrocontroller...

Regards,
-=Dave
 
R

Runsun Pan

The Germans just had a spelling reform. Norway had a major
language reform in the mid 19th century to get rid of the old
Danish influences (and still have two completely different ways
of spelling everything). You never know what will happen. You
are also embracing the metric system, inch by inch... ;)

The simplified chinese exists due to the call for modernization of
language decades ago. That involved the 'upside-down' of almost
entire culture --- nowadays people in China can't even read most of
the documents written just 70~80 years ago. Imagine its damage
to the 'historical sense' of modern chinese !!! The "anti-simplification"
force was thus imaginaribly huge. Actually, not only the original
plan of simplification wasn't completed (only proceded to the 1st
stage; the 2nd stage was put off), there are calls for reversal -- back
to the traditional forms -- lately. Obviously, language reform is not
trivial; Especially, for asian countries, it is probably not as easy as it
is for western countries.

China is still a central authoritarian country. Even with that government
they were unable to push this thru. If any one would even dream about
language reform in democratic Taiwan, I bet the proposal won't even
pass the first step in the congress.
Actually, it seems that recent habit of sending text messages
via mobile phones is the prime driver for reformed spelling
these days.

Well, to solve the problem you can either (1) reform the spelling
of a language to meet the limitation of mobile phones, or (2)
advancing the input device on the mobile phones such that they
can input the language of your choice. For most asian languages,
(1) is certainly out of question.
I'm sure the same principles could be used to make a very fast
and less misspelling prone editing environment though. That
could actually be a reason to step away from vi or Emacs (but
I assume it would soon work in Emacs too...)

True. Actually Google, Answers.com and some other desktop
applications use 'auto-complete' feature already. It might seem
impressive to most western users but, from where I was from
(Taiwan), this 'phrase-input', as well as "showing up in the order
of the most-frequently-use for any specific user", have been
around for about 20 years.
You certainly have a point there. Even when I don't work in an
English speaking environment as I do now, I try to write all
comments and variable names etc in English. You never know when
you need to show a code snippet to people who don't read Swedish.
Also, ASCII lacks three of our letters and properly translated
is often better than written with the wrong letters.

If there will be someday that any programming language can
be input with some form like Big5, I believe its intended target
will ONLY be people using only Big5. That means, if it exists, the
chance of showing it to other-language-users probably be extremely
nil, Think about this: there are still a whole lot of people who don't
know English at all. If no such a 'Big5-specific' programming
tool around, their chances of learning programming is completely
rid off.

--
~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~
Runsun Pan, PhD
(e-mail address removed)
Nat'l Center for Macromolecular Imaging
http://ncmi.bcm.tmc.edu/ncmi/
~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~
 
I

Ivan Voras

Robert said:
On OS X,

≤ is Alt-,
≥ is Alt-.
≠ is Alt-=

Thumbs up on the unicode idea, but national keyboards (i.e. non-english)
have already used almost every possible
not-strictly-defined-in-EN-keyboards combination of keys for their own
characters. In particular, the key combinations above are reprogrammed
to something else in my language/keyboard.

But, the idea that Python could be made ready for when the keyboards and
editors start supporting such characters is a good one (i.e. keep both
<= and ≤ for several decades).

It's not a far-out idea. I stumbled about a year ago on a programming
language that INSISTED on unicode characters like ≤ as well as the rest
of mathematical/logical symbols; I don't remember its name but the
source code with characters like that looked absolutely beautiful. I
suppose that one day, when unicode becomes more used than ascii7, "old"
code like current C and python will be considered ugly and unelegant in
appearance :)
 
R

Rocco Moretti

Ivan said:
It's not a far-out idea. I stumbled about a year ago on a programming
language that INSISTED on unicode characters like ≤ as well as the rest
of mathematical/logical symbols; I don't remember its name but the
source code with characters like that looked absolutely beautiful.

Could it be APL?

http://en.wikipedia.org/wiki/APL_programming_language

Although saying it used "Unicode characters" is a bit of a stretch - APL
predated Unicode by some 30+ years.
 
N

Neil Hodgson

Having a bit of a play with some of my spam reduction code.

Original:

def isMostlyCyrillic(u):
if type(u) != type(u""):
u = unicode(u, "UTF-8")
cnt = float(sum(0x400 <= ord(c) < 0x500 for c in u))
return (cnt > 1) and ((cnt / len(u)) > 0.5)

Using more mathematical operators:

def isMostlyCyrillic(u):
if type(u) ≠ type(u""):
u ↠unicode(u, "UTF-8")
cnt ↠float(∑(0x400 ≤ ord(c) < 0x500 ∀ c ∈ u))
return (cnt > 1) ∧ ((cnt ÷ len(u)) > 0.5)

The biggest win for me is "≠" with "â†" also an improvement. I'm so
used to "/" for division that "÷" now looks strange.

Neil
 
T

Terry Hancock

OMG ru kdng?

Make it stop!

Well, let's just say, I think there should be different
standards for "write once / read once" versus "write once /
read many". The mere use of written language once implied
the latter, but I suppose text messaging breaks that rule.
Well, to solve the problem you can either (1) reform the
spelling of a language to meet the limitation of mobile
phones, or (2) advancing the input device on the mobile
phones such that they can input the language of your
choice. For most asian languages, (1) is certainly out of
question.

IIRC, back in the 1990s there was a *lot* of work in Japan
on optical character recognition, and especially "digital
ink" or "stroke" recognition. With all the pen tablets out
these days, it seems like that would be an awfully good way
to handle ideograms.

First of all, they are, much more than Western alphabets,
strict about stroke order and direction (technically the
Roman alphabet is supposed to be drawn a certain way, but
many people "cheat" -- I think that's harder to get away
with with Asian characters, because they tend not to look
right when drawn wrong). And when you have the actual
stroke sequence data as input, recognition is easier and
more reliable (I think that was the point behind the
"graffiti" system for the Palm Pilot).
 
D

Dan Sommers

... I'm so used to "/" for division that "÷" now looks strange.

Strange, indeed, and too close to + for me (at least within my
newsreader).

Regards,
Dan
 
R

Runsun Pan

Well, let's just say, I think there should be different
standards for "write once / read once" versus "write once /
read many". The mere use of written language once implied
the latter, but I suppose text messaging breaks that rule.

Since we are on this, let me share with you guys a little 'ice-tip'
for how the younger generations in Taiwan communicate:

A: why did you tell av8d that I am a bmw ?
B: Well, you are just like one of those ogs or obs ...
A: oic, you think you are much q than I ?
B: ...
A: I would 3q if you stop doing so.
B: ok.
A: Orz
B: 88
A: 881

Can you guys figure out the details ?

Here is the decoded version:

A: why did you tell av8d that I am a bmw ?
[8 in our language is pronounced as "ba", so av8d = everybody]

B: Well, you are just like one of those ogs or obs ...
[ogs= oh-ji-sang, obs=oh-ba-sang, Japanese, means old guy, old
woman, respectively]

A: oic, you think you are much q than I ?
[oic=Oh I see; q = cute]

A: I would 3q if you stop doing so.
[ 3q = thank you ]

B: ok.

A: Orz
[ appreciate very much --- it looks like a guy knee down when seeing an Empire ]

B: 88
[ bye-bye ]

A: 881
[ bye-bye with a tone, sometimes 886 = bye-bye-loh ]

The above example is just an extremely simple one. In the real world,
they combined all sort of language sources --- mandarine, japanese,
english, taiwanese ... as well as "shape" like Orz.

This kind of mixture-of-everything is widely used in young
generations, sometimes called "net terms", sometimes called "Martian
words". It faciliates the online activities among youngists, but
creates huge 'generation gaps' --- some dictionaries were published
for high school teachers to study in order for them to talk and
understand their students.

IMO, a language is a living organism, it has its own life and often
evolves with unexpected turns. Maybe in the future some of those
Martian Words will become part of formal Taiwanese, who knows ? :)
First of all, they are, much more than Western alphabets,
strict about stroke order and direction (technically the
Roman alphabet is supposed to be drawn a certain way, but
many people "cheat" -- I think that's harder to get away
with with Asian characters, because they tend not to look
right when drawn wrong). And when you have the actual
stroke sequence data as input, recognition is easier and
more reliable (I think that was the point behind the
"graffiti" system for the Palm Pilot).

But ... to my knowledge, all of the input tablets that using OCR has a
training feature. You can teach the program to recognize your own
order of strokes. The ability to train (be trained) is a very key
element of such an input device.

--
~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~
Runsun Pan, PhD
(e-mail address removed)
Nat'l Center for Macromolecular Imaging
http://ncmi.bcm.tmc.edu/ncmi/
~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~
 
J

Jorge Godoy

Runsun Pan said:
Can you guys figure out the details ?

Here is the decoded version:

It looks that with all my 26 years I'm too old to understand something like
that... All I can say is OMG... :)
IMO, a language is a living organism, it has its own life and often
evolves with unexpected turns. Maybe in the future some of those
Martian Words will become part of formal Taiwanese, who knows ? :)

I am extremely against that for pt_BR (Brazilian Portuguese). There's a TV
channel here that has some movies with "net terms" instead of pt_BR for the
translation...

--
Jorge Godoy <[email protected]>

"Quidquid latine dictum sit, altum sonatur."
- Qualquer coisa dita em latim soa profundo.
- Anything said in Latin sounds smart.
 
M

Magnus Lycka

Runsun said:
The simplified chinese exists due to the call for modernization of
language decades ago. That involved the 'upside-down' of almost
entire culture
This is in some ways quite the opposite compared to Nynorsk
in Norway, which was an attempt to revive the old and pure
Norwegian, after being dominated (in politics as well as in
grammar) by Denmark from 1387-1814. (I guess it was a
complicating factor that the end of the union with Denmark
led to a union with Sweden. The Norwegians probably had some
difficulties deciding what neighbour they disliked most. When
they broke out of the union with Sweden in 1905, they actually
elected a Danish prince to be their king.) Anyway, only a
fraction of the Norwegians use Nynorsk today, and the majority
still speak the Danish-based bokmål. On the other hand, the
spelling of bokmål has also been modernized a lot, with a
series of spelling reforms of both languages.
 
T

Terry Hancock

But ... to my knowledge, all of the input tablets that
using OCR has a training feature. You can teach the
program to recognize your own order of strokes. The
ability to train (be trained) is a very key element of
such an input device.

Yeah, but I would think that would be a real drawback when
there's something like 2000 to 10,000 characters to train
on! I think you'd need some kind of short cut (maybe you
could share radical information between characters?).

But I guess I assumed this would already be a solved problem
by now. Maybe it was a lot harder than expected.
 
D

Dave Hansen

Indeed, I don't think I've used ÷ for division since about 7th grade,
when I first started taking Algebra (over 30 years ago).
Strange, indeed, and too close to + for me (at least within my
newsreader).

FWIW, it looks closer to - than + in mine. And as you say, _too_
close. IMHO.

Regards,
-=Dave
 
R

Roel Schroeven

Dave Hansen schreef:
Indeed, I don't think I've used ÷ for division since about 7th grade,
when I first started taking Algebra (over 30 years ago).

I have even never used it, except that it's printed on calculators. In
school we used ":" and afterwards "/".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,149
Latest member
Vinay Kumar Nevatia0
Top