From D

B

Ben Finney

On Jul 25, 9:04?pm, Steven D'Aprano
Why does it make no sense? Have you never had to scrape a web page
or read a CSV file?

Again, unrelated to the way the Python compiler syntactically treats
the source code.
So this proposal would only apply to string literals at compile
time, not running programs?

Exactly the same way that it works for string literals in source code:
once the source code is compiled, the literal is indistinguishable
from the same value written a different way.
And I want the same error to occur if my CSV parser tries to convert
'123 456' into a single number. I don't want it to assume the
number is '123456'.

Once again, this is a discussion about Python syntax, not the
behaviour of the csv module.
 
B

bearophileHUGS

Sorry for the slow feedback.

Stargaming>Sounds like a good thing to be but the arbitrary
positioning doesnt make any sense.<

The arbitrary positioning allows you to denote 4-digit groups too in
binary/hex literals, like in my example:
auto x = 0b0100_0011;


Stargaming>fits into the current movement towards generator'ing
everything. But (IIRC) this idea came up earlier and there has been a
patch, too.<

Python is old so most simple ideas aren't new :)


Steven D'Aprano>Underscores in numerics are UGLY.<

I presume it's a matter of taste too. I use them often in D code, and
the _ symbol is very different from the 0..F/0..f digits so you can
tell them apart with no problems.


Steven D'Aprano>Why not take a leaf out of implicit string
concatenation and allow numeric literals to implicitly concatenate?<

The "_" helps my eyes see that those digit groups are part of the same
number. With spaces I think my eyes may need a bit of extra time to
decide if they are parts of the same number literal.


Eric Dexter>I think there is a language bridge so that you can compile
d for python.. looks realy easy but I have python 2.5 and panda and
it try's to go for the panda instalation. It looks much easier than c
to use with python in fact..<

Are you talking about "Pyd"? It's a good bridge, and I like it. It's
actively updated, soon in version 1.0.

Bye,
bearophile
 
M

mensanator

Again, unrelated to the way the Python compiler syntactically treats
the source code.

That's what I was enquiring about.

So, just as
123456

is not an error, the proposal is that
SyntaxError: invalid syntax

will not be an error either.

Yet,
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
a = int('123 456')
ValueError: invalid literal for int() with base 10: '123 456'

will still be an error. Just trying to be clear on this. Wouldn't
want that syntax behavior to carry over into run-time.
Exactly the same way that it works for string literals in source code:
once the source code is compiled, the literal is indistinguishable
from the same value written a different way.


Once again, this is a discussion about Python syntax, not the
behaviour of the csv module.

Who said I was using the csv module?
 
M

mensanator

IDLE 1.2c1
s = '123 456'
s.split()
['123', '456']

The str.split method has no bearing on this discussion,

It most certainly does. To make '123 456' into an integer,
you split it and then join it.123456

Just wanted to be sure that this must still be done explicitly
and that the language won't do it for me behind my back.
which is about
the Python language syntax,

Provided it is confined to the language syntax.
and numeric literal values in particular.

Fine, as long as int('123 456') continues to be an error.
 
K

Kay Schluehr

So, spaces will no longer be delimiters? Won't that cause
much wailing and gnashing of teeth?

Nope. Just replace the current grammar rule

atom: ... NAME | STRING+ | NUMBER

by

atom: ... NAME | STRING+ | NUMBER+

The resulting grammar is still free of ambiguities. The tokenizer
doesn't complain anyway - not even yet.
 
R

Ryan Ginstrom

On Behalf Of Leo Petr
Digits are grouped in 2s in India and in 4s in China and Japan.

This is not entirely true in Japan's case. When written without Japanese
characters, Japan employs the same format as the US, for example:

1,000,000
(However, they would read this as $BI4K|(B (hyaku man), literally 100 ten
thousands.)

Raymond is correct in that Japan traditionally groups in fours (and stills
reads it that way regardless, as shown above), but in an ordinary
programming context, this almost never comes into play.

On the original topic of the thread, I personally like the underscore idea
from D, and I like it better than the "concatenation" idea, even though I
agree that it is more consistent with Python's string-format rules.

Regards,
Ryan Ginstrom
 
B

Ben Finney

So, just as

123456

is not an error, the proposal is that

SyntaxError: invalid syntax

will not be an error either.

More directly: Just as these three statements create the same literal
value:
'abcdef'

the proposal is that these three statements create the same literal
value:
12345.67890

and not be a syntax error.
Yet,

Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
a = int('123 456')
ValueError: invalid literal for int() with base 10: '123 456'

will still be an error.

Since that value, '123 456', is one that is rejected by the 'int'
constructor. Nothing to do with this proposal.
Just trying to be clear on this. Wouldn't want that syntax behavior
to carry over into run-time.

The distinction you need to be clear on is between the Python syntax
for writing literal values in code (which is proposed to change by
this), and the behaviour of operations on arbitrary values at runtime
(which is outside the scope of this proposal).
 
F

fdu.xiaojf

Gabriel said:
Why not? Because in English major numbers are labeled in thousands?
(thousand, million, billion...)
In India, they're grouped by two after the first thousand; in China,
they're grouped each 4 digits (that is, there is a single word for "ten
thousands" = wan4 = 万, and the next required word is for 10**8 = yi4 = 亿)

Yes, in China numbers are grouped each 4 digits while it is different in
other countries, so I think it would be better if we could put arbitrary white
spaces inside number literals.
 
A

Alex Martelli

code files? What's the regular expression for
locating a number with an arbitrary number of digits
seperated into an arbitrary number of blocks of an
arbitray number of digits with an arbitrary number
of whitespace characters between each block?

For a decimal integer (or octal) number, I'd use something similar to:
r'\d[\d\s]+'

This also gets trailing whitespace, but that shouldn't be much of a
problem in most practical cases. Of course, just like today, it becomes
a bit hairier if you also want to find hex, oct (to be 0o777 in the
future), other future notations such as binary, floats, complex numbers,
&c:) -- but the simple fact that a [\d\s] is accepted where today only
a \d would be, per se, would not contribute to that hair in any
significant way, it seems to me.


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,273
Latest member
DamonShoem

Latest Threads

Top