Am I the only one who would love these extentions? - Python 3.0 proposals (long)

G

Georgy Pruss

Hi all,

I would like to propose some extentions to the language, probably
for the version 3.0. They are mainly lexical or syntactical and the
language will stay compatible with 2.X (even although it was
declared somewhere that it's not the main concern in the Python
community) but some of them are a bit more "radical." Yes, I know
that some of the proposals were discussed here before, but I don't
think it means that the language is frozen in its development
forever and that everybody's absolutely happy with what it is
today. Anyway, this is my wish list :)

The features I'd like to see in Python are: underscores in numbers,
binary integers, hex strings, true boolean and none constants,
enum type, optional colon, built-in regex and RE match, slices,
for-as loop, until loop, unconditional loop, one line if-else, etc.


1) Underscores in numbers. It will help to read long numbers.
E.g.
12_345_678
3.14159_26535_89793_23846


2) Binary constants. Not in great demand, just nice to have,
half an hour to implement.
E.g.
0b01110011
0b1110_0101_1100_0111


3) Hex strings. Very useful when you want to initialize long
binary data, like inline pictures.
E.g.
x'48656C6C6F 21 0D0A'
ux'0021 000D 000A'
They can be combined with other strings: 'Hello!' x'0d0a'
Now you can use hexadecimal values, but with two times
longer sequences like '\x..\x..\x..\x..', or do the translation
during run-time using '....'.decode('hex').


4) Keywords 'none', 'false', 'true'. They should be keywords,
and they should be lowercase, like all the rest keywords.
True, False and None can stay for some time as predefined
identifiers, exactly like they are now.


5) Enum type. Introduces global (module-level) constants of
number or string values. The numeric values can be assigned
automatically like in the C language. If the enum declaration
has a name, this name defines "a namespace" and it must be
specified for referring to the values.

# defines constants AXIS.X=0, AXIS.Y=1, AXIS.Z=2
enum AXIS: X, Y, Z

# defines color.red, color.green, color.blue
enum color
red = '#FF0000', green = '#00FF00', blue = '#0000FF'

# defines consts A=0, B=1, C=2, D=10, E=11, F=12
enum
A, B, C
D = 10, E
F

# the same as above
enum: A; B, C, D=10; E, F # ';' and ',' are the same here.


6) The colon is optional after for,if,try,enum etc. if it is
followed by a new line. Although it's mandatory if there are
statements on the same line. So it's like ';' -- you MUST
use it when it's needed. You CAN still use it if you like it,
like now you can put a semicolon after each statement
as well.

def abs(x)
if x < 0
return -x
else
return x


7) Built-in regex'es. It would be nice to have built-in regular
expressions. Probably, some RE functionality can be shared
with the built-in string class, like presently we can write
'x'.encode('y'). For example $str can produce a regex object
and then s==re can return true if s matches re. This would be
very good for the switch construction (see below).
E.g.
id = $ "[A-Za-z_][A-Za-z0-9_]*"
if token == id: return token


8) Slices. They can have two external forms and they are
used in three contexts: a) in [index], b) in the 'case' clause,
c) in the 'for-as' statement. The have these forms:
a:b means a <= x < b (a:b:c -- with step c)
a..b means a <= x <= b (a..b:c -- with step c)
E.g.
1:100 == 1,2,3,...,99
1..100 == 1,2,3,...,100


9) For-as loop. Actually, 'as' is chosen because it is already
used as a keyword, and it fits here quite well. Another possible
spelling can be 'from' (as somebody proposed in 1999).

for <var> as <sequence>:
<stmts>

The sequence (it also occurs in the case clause, see below)
is composed of one or more expressions or slices. This form
is very short and visual. It will permit a better optimization
comparing to the old "in range(x,y)" form.
E.g.
for n as 1:10,10:50:2,50:100:5
for item as 1..100, 201..300

The ambiguity of constructions like "for v as a():b():c()" can be
solved with additional parenthesis:
for v as a():b():c() # from a() till b() with step c(); stmts start
stmts() # on the next line
for v as (a():b()): c() # from a() till b() with step 1; do c()
for v as [a():b()]: c() # maybe this form looks better


10) Until loop -- repeat the loop body at least one time
until the condition is true.

until <postcond>
<stmts>

It's the same as:

<stmts>
while not <postcond>
<stmts>


11) Unconditional loop. Yes, I'm from the camp that
finds 'while 1' ugly. Besides, sometimes we don't need
a loop variable, just a repetition.

loop [<expr_times_to_repeat>]
<stmts>

E.g.
loop 10
print_the_form()

loop
line = file.readline()
if not line
break
process_line( line )


12) Selection statement. The sequence (it can also
occur in the for-as statement, see above) is composed
of one or more expressions or slices.

switch <expr>
case <sequence>
<stmts>
case <sequence>
<stmts>
else
<stmts>

The sequence can contain RE-patterns.

case $pattern,$pattern2 # if expr matches patterns
...


13) One line if-else.

if <cond>: <stmt>; else: <stmt>


14) Conditional expression. Yes, I'd love it.

cond ? yes : no


15) Better formatting for repr()

repr(1.1) == '1.1'

If the parser recognizes 1.1 as a certain number, I can't see
any reason why it should print it as 1.1000000000000001 for
the next parser run.


16) Depreciated/obsolete items:

-- No `...` as short form of repr();
-- No '\' for continuation lines -- you can always use parenthesis;
-- No else for while,for,try etc -- they sound very unnatural;
-- Ellipsis -- it looks like an alien in the language.


I do believe that these changes follow the spirit of the language
and help Python to become an even better language.

And I'm ready to implement them, if they pass the PEP vote :)
If not... I love Python anyway!
 
A

Alex Martelli

Georgy Pruss wrote:
...
I would like to propose some extentions to the language, probably
for the version 3.0. They are mainly lexical or syntactical and the
language will stay compatible with 2.X (even although it was

Not at all -- these changes would break the vast majority of significant,
well-coded existing Python programs, for example by removing the 'else'
clause on 'try' for no other reason that it "sounds very unnatural" to you.
(I'm not sure how extensive your Python experience is, but this remark
suggests "not very").

3.0 will not guarantee 100% backwards compatibility (that is what will
make it a 3.0 rather than a 2.something), but the main way Guido will
use that extra degree of freedom is by removing some redundancies and
complications that have crept in the language and built-ins over the
years due to the need to keep compatible with past versions even while
better ways to perform many tasks were developed. I believe it is
totally out of the question that something as crucial to good Python
coding as try/except/else will be removed. I also strongly doubt the
changes will go in the direction of _introducing_ many "more than one
way to do it"'s, given that Guido's on record as advocating exactly the
reverse direction for 3.0 -- simplification, _removal_ of MTOWTDI's.
declared somewhere that it's not the main concern in the Python
community) but some of them are a bit more "radical." Yes, I know
that some of the proposals were discussed here before, but I don't
think it means that the language is frozen in its development

Of course not, but the direction of that development appears likely,
in most important ways, to go exactly the _other_ way round from
what you would want (introducing many more looping constructs, etc).

This suggests to me that, if you're truly keen on all items on your
wish list, you might want to consider _forking_ Python (or perhaps
some other language that is already closer than Python to many of
your wishes, such as Ruby). Sure, a decision to fork is considered
very extreme in the open-source world, but then, so is your list of
60 or more items grouped under 16 headings. If you can get some
substantial number of people agreeing to a majority of your desires,
and more importantly to your overall stance and philosophy on
language design, then starting your own language with Python as its
base might be more productive for y'all than wasting your efforts
lobbying for changes most of which are unlikely to ever be accepted.

E.g., you might center your new language's development on a
drastically different principle from Python's (which is that Guido,
the language architect with the consistent vision and a proven track
record of making good design trade-offs, gets absolute veto rights);
for example, "any change with 50% or more of votes that can in fact
be implemented gets in", or something like that. I'm a pessimist and
my prediction is that such an effort would rapidly degenerate into an
inconsistent mess, but, hey, I might be wrong -- here's your chance
to prove that Eric Raymond's theorizations about cathedrals and
bazaars are well-founded, by making a much more bazaar-like and less
cathedral-like language-design environment than Python's.


Alex
 
G

Georgy Pruss

| Georgy Pruss wrote:
| ...
| > I would like to propose some extentions to the language, probably
| > for the version 3.0. They are mainly lexical or syntactical and the
| > language will stay compatible with 2.X (even although it was
|
| Not at all -- these changes would break the vast majority of significant,
| well-coded existing Python programs, for example by removing the 'else'
| clause on 'try' for no other reason that it "sounds very unnatural" to you.
| (I'm not sure how extensive your Python experience is, but this remark
| suggests "not very").

That particular item, 'else' for 'try' and 'for' has the lowest priority for me,
if I dare to say so. Of course I understand that it would break many many
existing programs. My main suggestions were the first ones, dealing mainly
with the lexer and the parser of Python.

I have rather little experience with Python, but very big programming
experience. I remember PL/I and IBM/360 assembler :) And I like Python a lot.

| 3.0 will not guarantee 100% backwards compatibility (that is what will
| make it a 3.0 rather than a 2.something), but the main way Guido will
| use that extra degree of freedom is by removing some redundancies and
| complications that have crept in the language and built-ins over the
| years due to the need to keep compatible with past versions even while
| better ways to perform many tasks were developed. I believe it is
| totally out of the question that something as crucial to good Python
| coding as try/except/else will be removed. I also strongly doubt the
| changes will go in the direction of _introducing_ many "more than one
| way to do it"'s, given that Guido's on record as advocating exactly the
| reverse direction for 3.0 -- simplification, _removal_ of MTOWTDI's.

I see.

| <...>
| This suggests to me that, if you're truly keen on all items on your
| wish list, you might want to consider _forking_ Python (or perhaps
| some other language that is already closer than Python to many of
| your wishes, such as Ruby). Sure, a decision to fork is considered
| very extreme in the open-source world, but then, so is your list of
| 60 or more items grouped under 16 headings. If you can get some
| substantial number of people agreeing to a majority of your desires,
| and more importantly to your overall stance and philosophy on
| language design, then starting your own language with Python as its
| base might be more productive for y'all than wasting your efforts
| lobbying for changes most of which are unlikely to ever be accepted.

No, no. I don't think that it's a good idea to fork Python. Again, I'm
quite happy with what it is now. I don't insist on introducing the switch
statement and conditional expression in 3.0. But I forget a colon for
each 'else' and I feel that I'm not the only one.

| E.g., you might center your new language's development on a
| drastically different principle from Python's (which is that Guido,
| <...>
|
|
| Alex
|

Thank you for the answer Alex.
Georgy
 
K

KefX

But I forget a colon for
each 'else' and I feel that I'm not the only one.

Funny...I always remember it on the 'else' statement and forget it on the 'if'
statement. ;)

Yet, I'm in favor of leaving the colon there. All you have to do to make sure
that your module is syntactically correct, in case you're worried about it, is
try to import it (it doesn't have to be in the context of your program). The
colon also helps programs such as text editors, code processing tools, and such
figure out where blocks begin: all blocks begin with a colon. Think of it as an
opening brace, or a 'begin' statement, and nobody has to argue over where it
belongs. ;) You'll get used to it in time and I don't think it really hurts
anything. And I do think it makes code a little more readable, by saying "Hey!
A block begins here!", which is why the colon was put into the language in the
first place.

As for the rest of your suggestions, well, from what I've heard, half of the
issues have been beaten to death, and some of the rest seem silly. Why make
"True" a keyword when it's fine as it is?

There's no ?: operator because nobody can agree on its syntax, and most of the
people who commented on the matter said that if they couldn't have their way,
they wouldn't have it at all, so nobody has it. When the primary idea in a PEP
is rejected, it has almost zero chance of ever resurfacing.

regexps aren't built into the language because, for one thing, they really
aren't pretty. Most Python code, assuming a competent programmer, is pretty.
You wouldn't want to ruin that prettiness by making a non-pretty feature too
tempting, would you? ;) Of course, you still have regexps when you need them,
so what does it cost you?

And so on. I don't think your ideas are the worst things I've ever heard or
anything, but I have to say that you might be better served in trying to figure
out why it is the way it is, rather than thinking "I don't like this, what can
I do to change it?"...and if you can't figure it out for yourself, you can
always ask here. :)

- Kef
 
A

Alex Martelli

Georgy said:
| Georgy Pruss wrote:
| ...
| > I would like to propose some extentions to the language, probably
| > for the version 3.0. They are mainly lexical or syntactical and the
| > language will stay compatible with 2.X (even although it was
|
| Not at all -- these changes would break the vast majority of
| significant, well-coded existing Python programs, for example by
| removing the 'else' clause on 'try' for no other reason that it "sounds
| very unnatural" to you. (I'm not sure how extensive your Python
| experience is, but this remark suggests "not very").

That particular item, 'else' for 'try' and 'for' has the lowest priority
for me, if I dare to say so. Of course I understand that it would break
many many existing programs.

It's not just that: it would break *well-coded* programs, programs that
are coded using the _BEST, MOST APPROPRIATE_ constructs, and not give any
really good alternative way to recode them. It's not so much the 'else'
on _for_ -- removing _that_ would break a zillion good programs, but the
alternatives are almost decent, at least in most cases. But the 'else' on
_try_?!?! How would you code around _THAT_ lack?

One sign that somebody has moved from "Python newbie" to "good Python
programmer" is exactly the moment they realize why it's wrong to code:

try:
x = could_raise_an_exception(23)
process(x)
except Exception, err:
deal_with_exception(err)
proceed_here()

and why the one obvious way is, instead:

try:
x = could_raise_an_exception(23)
except Exception, err:
deal_with_exception(err)
else:
process(x)
proceed_here()

What WOULD sound natural to you in order to express the crucial pattern:
1. first, try doing "this and that"
2. if "this and that" raises such an exception, do something
3. else (i.e. if it doesn't raise exceptions) do s/thing else instead
4. and then in either case proceed from here
....???

Trying to kludge it around with "status-flag" variables isn't very
satisfactory, of course -- it can only complicate things and add
error-prone needs for setting and resetting "status flags". To me,
removing 'else' from 'try' and proposing to use status-flags instead
is about as sensible as making the same proposal regarding 'if' --
i.e., not very.
My main suggestions were the first ones,
dealing mainly with the lexer and the parser of Python.

Ah, I see -- you probably weakened the effectiveness of your post
by mixing in it things you cared little about and at the same time
were such as to ensure alarm among experienced Pythonistas.

Each of your proposals requires very detailed analysis to become a
PEP. For example, consider one I do like a bit: the ability to insert
underscores in literal numbers. If we enrich the lexer to allow
that, I would _also_ want at the very least a warning to be raised
if the underscores are not inserted at "regular" intervals -- this
would, I think, enhance the usefulness, by making the lexer able to
help diagnose at least _some_ typos in number literals. However,
there seem to be arguments for at least two approaches: one, just
force the number of digits allowed between underscores to exactly
three -- probably the most common usage; or, two, allow any number
of digits between underscores as long as the usage is "consistent"
(within a single number? or in a larger lexical unit, and, if so,
which one -- a module?). In either case, the issue of number of
digits allowed before the first underscore and after the last one
should be tackled -- perhaps differently for sequences of digits
that come before a decimal point (or without any at all) or after
a decimal point.

Further, if we're able to _input_, say, 1_000_000 to indicate "a
million" to the lexer -- surely we'll want that same ability when
the same string is otherwise recovered (e.g. read from a file)
and then transformed into an integer by calling int(s). Should
int(s) just accept underscores in string s, or be more selective?
Should underscores perhaps only be accepted in int(...) [and other
type constructors such as long(...), float(...), ...) if explicitly
enabled by an optional argument?

And what about the _output_ side of things? Surely if 1_000_000
becomes an acceptable syntax for "one million" there should be a
simple way to make the number 1000000 _print out_ that way. Should
that be through some optional argument to str() and/or repr()? Or
a new function (and/or method), and where located? How would we
get that effect with the %-formatting operator?

A PEP on this single change to the language would stand no chance
of acceptance -- it would probably, and quite correctly, just get
rejected outright by the PEP-editor, as clearly under-researched --
unless each and every one of these points (and probably more yet
that I haven't thought of -- all of the above is the result of just
a couple minutes' worth of thought, after all!) was identified and
clearly addressed, indicating what design choices were considered
and why they were accepted or rejected. Considerations about how
such things are dealt with in other languages and why adopting their
solutions would be appropriate or inappropriate in Python would be
very helpful here, too.

And this one is probably the simplest out of all of your wishes,
right? Plus, as I said, is one I might well like, so I'm definitely
not "looking for excuses" against it -- on the contrary, I'm trying to
pin down enough details so it stands a _chance_ (and it might, even
in Python 2.4, since changes that are useful and backwards-compatible
ARE under consideration for that, the next release).
I have rather little experience with Python, but very big programming
experience. I remember PL/I and IBM/360 assembler :) And I like Python a
lot.

Yep, I also recall PL/I (not very fondly...) and BAL (very fondly indeed,
though I admit it was on a /370 that I was using it -- I don't think there
were any /360's left by that time, at least not in IBM Research).

No, no. I don't think that it's a good idea to fork Python. Again, I'm

Probably not -- forking rarely _is_ a good idea, except perhaps under
extreme provocation. However, Python's conservatism might feel like
such (extreme provocation) to somebody who's truly gung-ho for change.
quite happy with what it is now. I don't insist on introducing the switch
statement and conditional expression in 3.0. But I forget a colon for
each 'else' and I feel that I'm not the only one.

Yep -- as for me, I kept doing that continuously for the first (roughly)
three months of intense Python use, still a lot (just a bit less) over the
_next_ three months, and even though the frequency's been going down it's
mainly thanks to smart-editors which immediately show, typically by not
auto-formatting things as I expect, that I have _again_ forgotten.

However, I do like the colons there when I _read_ Python code, which, all
in all, I do far more often than writing it; so, the slight inconvenience
in writing is (for me, and by now) balanced out by the convenience in
reading. You may want to investigate "smart editors" to see if the balance
can be made similarly acceptable to you.


Alex
 
A

Alex Martelli

KefX wrote:
...
As for the rest of your suggestions, well, from what I've heard, half of
the issues have been beaten to death, and some of the rest seem silly. Why
make "True" a keyword when it's fine as it is?

True, False and None may well become keywords in the future, because that
might make things "even finer" in some respects. E.g., right now,
while True:
...
has to look-up 'True' at EACH step just in case the ... code rebinds
that name. This _is_ a bit silly, when there is no real use-case for
"letting True be re-bound". Nothing major, but...:

[alex@lancelot test]$ timeit.py -c -s'import itertools as it' 'c=it.count()'
'while True:' ' if c.next()>99: break'
10000 loops, best of 3: 91 usec per loop

[alex@lancelot test]$ timeit.py -c -s'import itertools as it' 'c=it.count()'
'while 1:' ' if c.next()>99: break'
10000 loops, best of 3: 76 usec per loop

....it still seems silly to slow things down by 20% w/o good reason...

(people who wonder why suddenly some Pythonistas are _so_ interested in
efficiency might want to review
http://www.artima.com/weblogs/viewpost.jsp?thread=7589
for a possible hint...).

OTOH, the chance that the spelling of True, False and None will change
is close to that of a snowball in the _upper_ reaches of Hell (the
_lower_ ones are in fact frozen, as any reader of Alighieri well knows,
so the common idiom just doesn't apply...:).


Alex
 
P

Paul Rubin

Alex Martelli said:
This suggests to me that, if you're truly keen on all items on your
wish list, you might want to consider _forking_ Python (or perhaps
some other language that is already closer than Python to many of
your wishes, such as Ruby). Sure, a decision to fork is considered
very extreme in the open-source world, but then, so is your list of
60 or more items grouped under 16 headings.

I've been thinking for a while that Python could benefit from a fork,
that's not intended to replace Python for production use for any group
of users, but rather to be a testbed for improvements and extensions
that would allow more concrete experiments than can be done through
pure thought and the PEP process. Most proposals for Python
improvements could then be implemented and tried out in the
experimental platform before being folded back into the regular
implementation.
 
A

Alex Martelli

Paul said:
I've been thinking for a while that Python could benefit from a fork,
that's not intended to replace Python for production use for any group
of users, but rather to be a testbed for improvements and extensions
that would allow more concrete experiments than can be done through
pure thought and the PEP process. Most proposals for Python
improvements could then be implemented and tried out in the
experimental platform before being folded back into the regular
implementation.

That's one of the benefits that the pypy project is intended to
provide: a "reference implementation" that's easiest to play with
for exactly such purposes as the experiments you mention. And --
we do hope to release pypy 1.0 before Christmas...!


Alex
 
P

Paul Rubin

Alex Martelli said:
That's one of the benefits that the pypy project is intended to
provide: a "reference implementation" that's easiest to play with
for exactly such purposes as the experiments you mention. And --
we do hope to release pypy 1.0 before Christmas...!

Interesting and cool. During the early discussion of Pypy it sounded
as if Pypy was specifically *not* intended for such purposes. It'll
be great to see what comes out.
 
P

Peter Otten

Georgy said:
I would like to propose some extentions to the language, probably
....

5) Enum type. Introduces global (module-level) constants of
number or string values. The numeric values can be assigned
automatically like in the C language. If the enum declaration
has a name, this name defines "a namespace" and it must be
specified for referring to the values.

# defines constants AXIS.X=0, AXIS.Y=1, AXIS.Z=2
enum AXIS: X, Y, Z

# defines color.red, color.green, color.blue
enum color
red = '#FF0000', green = '#00FF00', blue = '#0000FF'

# defines consts A=0, B=1, C=2, D=10, E=11, F=12
enum
A, B, C
D = 10, E
F

# the same as above
enum: A; B, C, D=10; E, F # ';' and ',' are the same here.

enum WhosAfraidOf:
red
yellow
blue

for color in WhosAfraidOf:
barnetPaintsItIn(color)3

I would like something along these lines. I think you omitted the most
important benefit, getting the names rather than the values, which I'm
looking forward to, so I will shamelessly copy from a recent conversation
on Python-Dev:

I've had this idea too. I like it, I think. The signal module could
use it too...

Yes, that would be cool for many enums.
[end quote]

(I do not follow the list closely, but I think there was no talking of a new
syntax for enums, though)
6) The colon is optional after for,if,try,enum etc. if it is
followed by a new line. Although it's mandatory if there are
statements on the same line. So it's like ';' -- you MUST
use it when it's needed. You CAN still use it if you like it,
like now you can put a semicolon after each statement
as well.

def abs(x)
if x < 0
return -x
else
return x

While I forget the colon more often than I'd like to, this is mostly a
non-issue as it is caught by the compiler and does no runtime harm. Also,
the colon would make the above example slightly more readable.
11) Unconditional loop. Yes, I'm from the camp that
finds 'while 1' ugly. Besides, sometimes we don't need
a loop variable, just a repetition.

loop [<expr_times_to_repeat>]
<stmts>

Not sure. Maybe

while:
repeatThisForever()
16) Depreciated/obsolete items:

-- No `...` as short form of repr();
-- No '\' for continuation lines -- you can always use parenthesis;

Yes, away with these!
-- No else for while,for,try etc -- they sound very unnatural;

I'm just starting to use else in try ... except, so my resistance to the
extra else clauses is not as strong as it used to be...
-- Ellipsis -- it looks like an alien in the language.

What the heck is this? The Nutshell index does not have it, and I couldn't
make any sense of the section in the language reference.


Peter
 
T

Thomas Bellman

Alex Martelli said:
True, False and None may well become keywords in the future, because that
might make things "even finer" in some respects. E.g., right now,
while True:
...
has to look-up 'True' at EACH step just in case the ... code rebinds
that name. This _is_ a bit silly, when there is no real use-case for
"letting True be re-bound". Nothing major, but...:

That's a silly reason to make them keywords. A much better way
to achieve the same goal would be to make the optimizer recognize
that True isn't re-bound within the loop. Making the optimizer
better would improve the performance of much more code than just
'while True' loops.
 
A

Alex Martelli

Peter Otten wrote:
...
What the heck is this? The Nutshell index does not have it, and I couldn't
make any sense of the section in the language reference.

Ooops, sorry -- I guess I shouldn't have skipped it in the Nutshell.

Ellipsis is the singleton object that represents ... (three dots) used
in an indexing. Check this out:
.... def __getitem__(self, index): return index
....
si = showindex()
si[1, ..., 7] (1, Ellipsis, 7)

see? in this case, the index is a 3-tuple, and Ellipsis is its middle
item. As far as I know, the only USE of this notation is in package
Numeric (p. 307-308 in the Nutshell), to indicate that some indices or
slices apply to the rightmost axes of a multi-dimensional array.

Few Pythonistas outside of the ranks of numeric-array mavens may have
any interest in multi-dimensional arrays and the like. However, such
numeric-array mavens are an important part of the Python community (thanks
to such wonderful add-ons as Numeric, its future successor numarray, and
the many splendid packages built on top), so you can judge for yourself
what the chance may be that Guido would want to take away from under them
the "..." syntax that is so useful and habitual for them.


Alex
 
A

Alex Martelli

Paul said:
Interesting and cool. During the early discussion of Pypy it sounded
as if Pypy was specifically *not* intended for such purposes. It'll
be great to see what comes out.

At http://codespeak.net/pypy/index.cgi?doc you can already see a
lot of what comes out...!-)

In particular, re a "reference implementation", at:
http://codespeak.net/pypy/index.cgi?doc/funding/B3.impact.html

you can read...:

"""
Guido van Rossum has expressed interest in giving PyPy the status of
'implementation standard' (executable specification) of the Python
programming language.
"""

and the quotation from Guido:

"""
It is even possible that PyPy will eventually serve as an "executable
specification" for the language, and the behavior of one of PyPy's object
spaces will be deemed the correct one.
"""

Whether or not one of pypy's object spaces is ever officially blessed
as the "executable specification", aka "implementation standard", you
can see that many of the work-packages under B6.7 have to do exactly
with the "easiest to play with ... for such purposes as experiments"
nature of pypy. Of course, we'll be able to develop the project's
ambitious objectives only if we obtain the EU funding we have requested
under "IST-2002-2.3.2.3 - Open development platforms for software and
services" (indeed, all the general and detailed plans & projections
you'll see on the webpage were developed mostly to submit to the EU a
good proposal -- it IS, however, a nice side effect that our plans and
ambitions are now spelled out with such extreme precision:).

Still, we _have_ gone farther than most would have expected without yet a
scrap of funding from anybody -- each participant covering his or her own
travel & accomodation expenses to the Sprints... -- and we'll keep trying
to do our best come what may, in typical open-source spirit!


Alex
 
A

Alex Martelli

Thomas said:
That's a silly reason to make them keywords. A much better way
to achieve the same goal would be to make the optimizer recognize
that True isn't re-bound within the loop. Making the optimizer
better would improve the performance of much more code than just
'while True' loops.

You are heartily welcome to perform the dynamic whole-code analysis
needed to prove whether True (or any other built-in identifier) may
or not be re-bound as a side effect of functions called within the
loop -- including the calls to methods that can be performed by just
about every operation in the language. It seems a very worthy
research project, which might take good advantage of the results
of the pypy project (there is plenty of such advanced research that
is scheduled to take place as part of pypy, but there is also surely
space for a lot of other such projects; few are going to bear fruit,
after all, so, the more are started, the merrier).

Meanwhile, during the years in which this advanced research project
of yours matures and (with luck) produces usable results, Python's
mainstream implementation will stick to the general strategy that has
served it so well and for so long: _simplicity_ first and foremost,
optimization (and thus complication) only where it has provable great
benefit/cost ratios (e.g., dictionary access and list-sorting).

As there is really no good use case for letting user code rebind the
built-in names None, True, and False, making them keywords has almost
only pluses (the "almost" being related to backwards compatibility
issues), as does "nailing down" many other popular built-in names (by
techniques different from making them into keywords, most likely, as
the tradeoffs are quite different in those cases).

The process has already begun in Python 2.3, mind you...:

[alex@lancelot bo]$ python2.3 -c 'None=23'
<string>:1: SyntaxWarning: assignment to None

(True and False will take more time, because a program may try to
assign them [if not already known] in order to be compatible with
relatively recent Python releases which did not have them as
built-in names).


Alex
 
J

John Roth

Thomas Bellman said:
That's a silly reason to make them keywords. A much better way
to achieve the same goal would be to make the optimizer recognize
that True isn't re-bound within the loop. Making the optimizer
better would improve the performance of much more code than just
'while True' loops.

Making them keywords isn't exactly correct. There's a movement
to make just about everything in the built-in scope immutable and
not rebindable at any lower scope for performance reasons. The
usual example is the len() built-in function. All this function does is
call the __len__() method on the object; the extra function call
is a complete waste of time, and could be eliminated if the
compiler could depend on len() never being modified or
rebound at any level.

John Roth
 
D

David Eppstein

"John Roth said:
Making them keywords isn't exactly correct. There's a movement
to make just about everything in the built-in scope immutable and
not rebindable at any lower scope for performance reasons. The
usual example is the len() built-in function. All this function does is
call the __len__() method on the object; the extra function call
is a complete waste of time, and could be eliminated if the
compiler could depend on len() never being modified or
rebound at any level.

This would also have the advantage of more quickly catching certain
common programming errors:

idSequence = 0
def makeUniqueID(str):
global idSequence
n = str(idSequence)
idSequence += 1
return str + n

Here of course the argument "str" should be named differently than the
built in type "str"... if it's a syntax error, you'd find out about it
at compile time instead of run time.

Of course such explicit shadowing is easy for the compiler to detect,
the harder part is when some module changes e.g. the "cmp" global of
some other module. (I did this intentionally recently but later thought
better of it...)
 
R

Ron Adam

[alex@lancelot test]$ timeit.py -c -s'import itertools as it' 'c=it.count()'
'while True:' ' if c.next()>99: break'
10000 loops, best of 3: 91 usec per loop

[alex@lancelot test]$ timeit.py -c -s'import itertools as it' 'c=it.count()'
'while 1:' ' if c.next()>99: break'
10000 loops, best of 3: 76 usec per loop

...it still seems silly to slow things down by 20% w/o good reason...

Is it possible to make the argument optional for while? That may
allow for an even faster time?

while:
<instructions>

It would be the equivalent as making the default argument for while
equal to 1 or True. Could it optimize to single cpu instruction when
that format is used? No checks or look ups at all?

_Ron Adam
 
A

Alex Martelli

Ron Adam wrote:
...
Is it possible to make the argument optional for while? That may

Yes, it is technically possible -- you'd need to play around with the
Dreaded Grammar File, but as Python grammar tasks go this one seems
simple.
allow for an even faster time?

No, there is no reason why "while 1:", "while True:" once True is
a keyword, and "while:" were the syntax extended to accept it, could
or should compile down to code that is at all different.
while:
<instructions>

It would be the equivalent as making the default argument for while
equal to 1 or True. Could it optimize to single cpu instruction when
that format is used? No checks or look ups at all?

"while 1:" is _already_ compiled to a single (bytecode, of course)
instruction with "no checks or look ups at all". Easy to check:
1 0 SETUP_LOOP 19 (to 22)
3 JUMP_FORWARD 4 (to 10)
6 JUMP_IF_FALSE 11 (to 20)
9 POP_TOP 13 CALL_FUNCTION 0
16 POP_TOP
17 JUMP_ABSOLUTE 10 25 RETURN_VALUE

the four bytecodes from 10 to 17 are the loop: name 'foop'
is loaded, it's called with no argument, its result is
discarded, and an unconditional jump back to the first of
these steps is taken (this loop, of course, will only get
out when function foop [or the lookup for its name] cause
an exception). (The 2 opcodes at 6 and 9 never execute,
and the opcode at 3 could be eliminated if those two were,
but that's the typical kind of job for a peephole optimizer,
an easy but low-returns project).

Note the subtle difference when we use True rather than 1:
6 JUMP_IF_FALSE 11 (to 20)
9 POP_TOP
10 LOAD_NAME 1 (foop)
13 CALL_FUNCTION 0
16 POP_TOP
17 JUMP_ABSOLUTE 3 25 RETURN_VALUE

_Now_, the loop runs all the way through the bytecodes
from 3 to 20 included (the opcodes at 0 and 21 surround
it just like they surrounded the unconditional loop we
just examined). Before we can get to the "real job" of
bytecodes 10 to 17, each time around the loop, we need
to load the value of name True (implying a lookup), do
a conditional jump on it, otherwise discard its value.

If True was a keyword, the compiler could recognize it and
generate just the same code as it does for "while 1:" --
or as it could do for "while:", were that extension of
the syntax accepted into the language.


As to the chances that a patch, implementing "while:" as
equivalent to "while 1:", might be accepted, I wouldn't
be particularly optimistic. Still, one never knows!


Alex
 
A

Andrew Dalke

Georgy Pruss:
1) Underscores in numbers. It will help to read long numbers.
E.g.
12_345_678
3.14159_26535_89793_23846

Perl has this. How often do long numbers occur in your code?

When is the workaround of
int("12_345_678".replace("_", ""))
float("3.14159_26535_89793_23846".replace("_",""))
inappropriate? Note also that this allows the more readable
(to some)
float("3.14159 26535 89793 23846".replace(" ",""))
2) Binary constants. Not in great demand, just nice to have,
half an hour to implement.
E.g.
0b01110011
0b1110_0101_1100_0111

Why is it nice enough to make it be a syntax addition,
as compared to
?

3) Hex strings. Very useful when you want to initialize long
binary data, like inline pictures.
E.g.
x'48656C6C6F 21 0D0A'
ux'0021 000D 000A'
They can be combined with other strings: 'Hello!' x'0d0a'
Now you can use hexadecimal values, but with two times
longer sequences like '\x..\x..\x..\x..', or do the translation
during run-time using '....'.decode('hex').

I very rarely include encode binary data in my data files.
Images should usually be external resources since it's hard
to point an image viewer/editor at a chunk of Python code.

A counter example is some of the PyQt examples, which
have pbm (I think) encoded images, which are easy to see
in ASCII. In that case, there's a deliberate choice to use
a highly uncompressed format (one byte per pixel).

The run-time conversion you don't is only done once.
In addition, another solution is to have the Python spec
require that a few encodings (like 'hex') cannot be changed,
and allow the implementation to preprocess those cases.

So why is it useful enough to warrant a new special-case
syntax?
4) Keywords 'none', 'false', 'true'. They should be keywords,
and they should be lowercase, like all the rest keywords.
True, False and None can stay for some time as predefined
identifiers, exactly like they are now.

Why *should* they be lower case? There was a big dicussion
when True/False came out, which resulted in that "casing".

You argue consistancy to other keywords. What other
keywords refer to objects?
['and', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif',
'else',
'except', 'exec', 'finally', 'for', 'from', 'global', 'if', 'import', 'in',
'is', 'lambda', 'not', 'or', 'pass', 'print', 'raise', 'return', 'try',
'while', 'yield']
None that I can see.
5) Enum type.

There are approximations to this, as in
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/67107
# defines constants AXIS.X=0, AXIS.Y=1, AXIS.Z=2
enum AXIS: X, Y, Z

For that case, I would prefer the following

class AXIS:
X, Y, Z = range(3)

If there are more terms then here's an alternative. I've not
seen it before so I think it's a novel one

class Keywords:
AND, ASSERT, BREAK, CLASS, CONTINUE, DEF, \
DEL, ELIF, YIELD, ... = itertools.count()

That is, the ellipsis at the end of a sequence assignment means
to ignore RHS terms at that point and beyond.
# defines color.red, color.green, color.blue
enum color
red = '#FF0000', green = '#00FF00', blue = '#0000FF'

This can be done already as

class color:
red = '#FF0000'
green = '#00FF00'
blue = '#0000FF'
# defines consts A=0, B=1, C=2, D=10, E=11, F=12
enum
A, B, C
D = 10, E
F

While I know C allows this, I've found it more confusing
than useful. What about

class consts:
A, B, C, ... = itertools.count()
D, E, F, ... = itertools.count(10)

6) The colon is optional after for,if,try,enum etc. if it is
followed by a new line. Although it's mandatory if there are
statements on the same line. So it's like ';' -- you MUST
use it when it's needed. You CAN still use it if you like it,
like now you can put a semicolon after each statement
as well.

def abs(x)
if x < 0
return -x
else
return x

The reason for the ":" is to make the block more visible. This
came out during human factors testing in ABC (as I recall). It
also simplifies tool development since software can assume
that if the previous line ends in a : then there should be an indent.

BTW, what advantage is there in having an optional syntax
for this?
7) Built-in regex'es. It would be nice to have built-in regular
expressions. Probably, some RE functionality can be shared
with the built-in string class, like presently we can write
'x'.encode('y'). For example $str can produce a regex object
and then s==re can return true if s matches re. This would be
very good for the switch construction (see below).
E.g.
id = $ "[A-Za-z_][A-Za-z0-9_]*"
if token == id: return token

Regexps play much less a role than in languages like Perl and
Ruby. Why for Python should this be a syntax-level feature.
Eg, your example can already be done as

id = re.compile("[A-Za-z_][A-Za-z0-9_]*")
if id.match(token): return token

Note that you likely want the id pattern to end in a $.

A downside of the == is that it isn't obvious that ==
maps to 'match' as compared to 'search'.

Suppose in 5 years we decide that Larry Wall is right
and we should be using Perl6 regexp language. (Just
like we did years ago with the transition from the regex
module to the re module.) What's the viable migration
path?

Does your new regexp builtin allow addition

$"[A-Za-z_]" + $"[A-Za-z0-9_]*"

or string joining, like

$"[A-Za-z_]" $"[A-Za-z0-9_]*"

or addition with strings, as in

$"[A-Za-z_]" + "[A-Za-z0-9_]*"

What about string interpolation?

$"[%s]" % "ABCDEF"

8) Slices. They can have two external forms and they are
used in three contexts: a) in [index], b) in the 'case' clause,
c) in the 'for-as' statement. The have these forms:
a:b means a <= x < b (a:b:c -- with step c)
a..b means a <= x <= b (a..b:c -- with step c)
E.g.
1:100 == 1,2,3,...,99
1..100 == 1,2,3,...,100

See http://python.org/peps/pep-0204.html . Proposed and
rejected.
9) For-as loop.

See also http://python.org/peps/pep-0284.html which is for
integer loops. Since your 8) is rejected your 9) syntax
won't be valid.
10) Until loop -- repeat the loop body at least one time
until the condition is true.

until <postcond>
<stmts>

It's the same as:

<stmts>
while not <postcond>
<stmts>

Actually, it's the same as

while True:
<stmts>
if postcond:
break

Why is this new loop construct of yours useful enough
to warrant a new keyword?
11) Unconditional loop. Yes, I'm from the camp that
finds 'while 1' ugly. Besides, sometimes we don't need
a loop variable, just a repetition.

loop [<expr_times_to_repeat>]
<stmts>

E.g.
loop 10
print_the_form()

loop
line = file.readline()
if not line
break
process_line( line )

Do you find 'while True' to be ugly?

How much code will you break which uses the
word "loop" as a variable? I know you expect it
for Python 3 which can be backwards incompatible,
but is this really worthwhile? How many times do
you loop where you don't need the loop variable?
Is this enough to warrant a special case construct?

I don't think so.

Note that the case you gave is no longer appropriate.
The modern form is

for line in file:
process_line(line)
12) Selection statement. The sequence (it can also
occur in the for-as statement, see above) is composed
of one or more expressions or slices.

switch <expr>
case <sequence>
<stmts>
case <sequence>
<stmts>
else
<stmts>

The sequence can contain RE-patterns.

case $pattern,$pattern2 # if expr matches patterns
...

See http://python.org/peps/pep-0275.html which has
not yet been decided

13) One line if-else.

if <cond>: <stmt>; else: <stmt>

Why? What's the advantage to cramming things on a line
compared to using two lines?

if a: print "Andrew"
else: print "Dalke"

Even this is almost always too compact for readability.
14) Conditional expression. Yes, I'd love it.

cond ? yes : no

rejected. See
http://python.org/peps/pep-0308.html
15) Better formatting for repr()

repr(1.1) == '1.1'

If the parser recognizes 1.1 as a certain number, I can't see
any reason why it should print it as 1.1000000000000001 for
the next parser run.

Then use 'str', or "%2.1f" % 1.1

What you want (a parser level change) makes
a = 1.1
repr(a)

different from
repr(1.1)

That's not good.

Note that 1.1 cannot be represented exactly in IEEE 754
math. Following IEEE 754 is a good thing. What is your
alternative math proposal?
16) Depreciated/obsolete items:

-- No `...` as short form of repr();

I agree. I believe this is one of the things to be removed
for Python 3.
-- No '\' for continuation lines -- you can always use parenthesis;

I agree. I believe this is one of the things to be removed
for Python 3.
-- No else for while,for,try etc -- they sound very unnatural;

I've found 'else' very useful for a few cases which once required
ugly flags.
-- Ellipsis -- it looks like an alien in the language.

As you can see, I proposed a new way to use ellipsis. In real
life I almost never use the ....
I do believe that these changes follow the spirit of the language
and help Python to become an even better language.

Given how many new looping constructs you want, I disagree
with your understanding of the spirit of the language.

Andrew
(e-mail address removed)
 
D

David M. Wilson

-- No `...` as short form of repr();

I have to agree here, the backtick syntax makes no sense to me at all.
Distinguishing between backticks and single quotes can be hard,
especially at 4am. On top of that, the syntax does not hint at what it
does, unless the reader is a unix type who has used backward shells
and languages all his life. :)

Even then, why repr()? Why not eval() or <insert crazy os.popen
shortcut>?

-- No '\' for continuation lines -- you can always use parenthesis;

When reading complex expressions at speed, I have found that bumping
into a '\' in the absence of another continuation indicator (eg.
indentation) improves my ability to comprehend what I am reading. Once
again, this is one of those "4am'ers":

a = x/y+(1^2+(3+u(b(4**5)))//3
...

Quickly now! Without counting the parens, is the expression complete?
Time's up. Now:

a = x/y+(1^2+(3+u(b(4**5)))//3 \
...


David.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top