Underscores in Python numbers

  • Thread starter =?iso-8859-1?q?Gustav_H=E5llberg?=
  • Start date
?

=?iso-8859-1?q?Gustav_H=E5llberg?=

I tried finding a discussion around adding the possibility to have
optional underscores inside numbers in Python. This is a popular option
available in several "competing" scripting langauges, that I would love
to see in Python.

Examples:
1_234_567
0xdead_beef
3.141_592

Would appreciate if someone could find a pointer to a previous
discussion on this topic, or add it to a Python-feature-wishlist.

- Gustav
 
D

Devan L

Gustav said:
I tried finding a discussion around adding the possibility to have
optional underscores inside numbers in Python. This is a popular option
available in several "competing" scripting langauges, that I would love
to see in Python.

Examples:
1_234_567
0xdead_beef
3.141_592

Would appreciate if someone could find a pointer to a previous
discussion on this topic, or add it to a Python-feature-wishlist.

- Gustav

I'm not sure what the _s are for, but I'm guessing they serve as
separators ("." or "," depending on where you're from). I think the _s
look ugly to me, besides, underscores look more like spaces than
separators.
 
P

Peter Hansen

Gustav said:
I tried finding a discussion around adding the possibility to have
optional underscores inside numbers in Python. This is a popular option
available in several "competing" scripting langauges, that I would love
to see in Python.

Examples:
1_234_567
0xdead_beef
3.141_592

Would appreciate if someone could find a pointer to a previous
discussion on this topic, or add it to a Python-feature-wishlist.

Perhaps these threads, via Google?

http://groups.google.com/group/comp...mp.lang.python&q=numeric+literals+underscores


-Peter
 
D

Dave Hansen

I tried finding a discussion around adding the possibility to have
optional underscores inside numbers in Python. This is a popular option
available in several "competing" scripting langauges, that I would love
to see in Python.

Examples:
1_234_567
0xdead_beef
3.141_592

Would appreciate if someone could find a pointer to a previous
discussion on this topic, or add it to a Python-feature-wishlist.

I've never needed them in Python, but I've very often wished for them
in C. Along with 0b(0|1)* for binary numbers, where they'd be even
more useful.

Of course, I write _far_ more code in C than Python. But I've seen
enough bugs of the sort where someone wrote 1200000 when they meant
12000000, that I see great value in being able to specify 12_000_000.

Regards,
-=Dave
 
R

Roy Smith

Dave Hansen said:
Of course, I write _far_ more code in C than Python. But I've seen
enough bugs of the sort where someone wrote 1200000 when they meant
12000000, that I see great value in being able to specify 12_000_000.

I'll admit that being able to write 12_000_000 would be convenient.
On the other hand, writing 12 * 1000 * 1000 is almost as clear. In C,
the multiplication would be done at compile time, so it's not even any
less efficient. I'm not sure how Python handles that, but if it
turned out to be a serious run-time performance issue, it's easy
enough to factor it out into something that's done once and stored.

Bottom line, embedded no-op underscores in numbers would be nice (and,
IHMO, should be added), but the lack of such a feature should not be
used as an excuse to write such unreadable monstrosities as 12000000
in source code.

Semi-related: see Jakob Nielsen's complaint about having to enter
credit card numbers as 16-digit strings with no breaks on web forms
(http://www.useit.com/alertbox/designmistakes.html, item #7, last
bullet point).
 
D

Dave Hansen

Sorry for the delayed response. I somehow missed this earlier.

Of course, I write _far_ more code in C than Python. But I've seen
enough bugs of the sort where someone wrote 1200000 when they meant
[/QUOTE]

Digression: 1 was enough.
I'll admit that being able to write 12_000_000 would be convenient.
On the other hand, writing 12 * 1000 * 1000 is almost as clear. In C,

Perhaps, but it's pretty obvious that something's wrong when you have
to resort to ugly tricks like this to make the value of a simple
integer constant "clear."

And think about 64 (or longer) -bit unsigned long long hexadecimal
values. How much nicer is 0xFFF0_FF0F_F0FF_0FFF_ULL than
0xFFF0FF0FF0FF0FFFULL? I guess we could do something like
((((0xFFF0ULL<<16)|0xFF0FULL)<<16)|0xF0FFULL)<<16)|0x0FFFULL), but I'm
not sure it's any better.

Regards,
-=Dave
 
B

bonono

Personally, I would rather see the int() and float() function be
smarter to take what is used for this, i.e. :

a = int("1,234,567")

Of course, also support the locale variant where the meaning of "," and
"." is swapped in most European countries.
 
S

Steven D'Aprano

Personally, I would rather see the int() and float() function be
smarter to take what is used for this, i.e. :

a = int("1,234,567")

But the problem isn't just with conversion of strings. It is also
with literals.

n = 99999999999

Without counting, how many nines?

Obviously repeated digits is an extreme case, but even different digits
are easier to process if grouped. That's why we write phone numbers like
62 3 9621 2377 instead of 62396212377.

Here is a thought: Python already concatenates string literals:

"abc" "def" is the same as "abcdef".

Perhaps Python should concatenate numeric literals at compile time:

123 456 is the same as 123456.

Off the top of my head, I don't think this should break any older code,
because 123 456 is not currently legal in Python.
 
B

bonono

Steven said:
But the problem isn't just with conversion of strings. It is also
with literals.

n = 99999999999

Without counting, how many nines?
For readability, I don't see why it cannot be written as :

n = int("99,999,999,999")

we already needs to do this for decimal("9.9")
 
S

Stefan Rank

on 19.11.2005 06:56 Steven D'Aprano said the following:
[snip]
Perhaps Python should concatenate numeric literals at compile time:

123 456 is the same as 123456.

Off the top of my head, I don't think this should break any older code,
because 123 456 is not currently legal in Python.

+1

but only allow (a single ?) space(s), otherwise readability issues ensue.

The other idea of teaching int() about separator characters has
internationalis/zation issues:
In many European countries, one would naturally try::

int('500.000,23')

instead of::

int('500,000.23')
 
B

bearophileHUGS

Steven D'Aprano:
Perhaps Python should concatenate numeric literals at compile time:
123 456 is the same as 123456.

I think using the underscore it is more explicit:
n = 123_456

Alternatively the underscore syntax may be used to separate the number
from its base:
22875 == 22875_10 == 595b_16 == 123456_7
But probably this is less commonly useful (and not much explicit).

Bye,
bearophile
 
B

bonono

Stefan said:
The other idea of teaching int() about separator characters has
internationalis/zation issues:
In many European countries, one would naturally try::

int('500.000,23')

instead of::

int('500,000.23')

That is why I said

"Of course, also support the locale variant where the meaning of ","
and
"." is swapped in most European countries. "

We are seeing the same about base 2, 8, 10, 16.

May be :

int("E500.000,23")

as we are using :

0xffff

already for hex number
 
S

Sybren Stuvel

(e-mail address removed) enlightened us with:
Of course, also support the locale variant where the meaning of ","
and "." is swapped in most European countries.

This is exactly why I wouldn't use that notation. What happens if it
is hardcoded into the source? I mean, that's what we're talking about.
Then the program would have to have an indication of which locale is
used for which source file. Without that, a program would be
interpreted in a different way on different computers. I think that
would be rather messy.

I'm in favour of using spaces or underscores.

Sybren
 
B

bonono

Sybren said:
(e-mail address removed) enlightened us with:

This is exactly why I wouldn't use that notation. What happens if it
is hardcoded into the source? I mean, that's what we're talking about.
Then the program would have to have an indication of which locale is
used for which source file. Without that, a program would be
interpreted in a different way on different computers. I think that
would be rather messy.
As mentioned in another post, we have that situation in all other
places. Such as

mm/dd/yyyy vs dd/mm/yyyy
decimal("10.23") - would european people expect decimal("10,23") to
work ?
0xffff - a notation for base 16

why can't I have "E100.000,23" to mean "100,000.23" ? Nothing but
notation.
 
R

Roy Smith

Alternatively the underscore syntax may be used to separate the number
from its base:
22875 == 22875_10 == 595b_16 == 123456_7
But probably this is less commonly useful (and not much explicit).

We already have a perfectly good syntax for entering octal and hex
integers, because those are commonly used in many applications. There are,
on occasion, need for other bases, but they are so rare, specialized, and
non-standard (RFC-1924, for example, uses an interesting flavor of base-85)
that having syntax built into the language to support them would be
completely unjustified.
 
S

Steve Holden

That is why I said

"Of course, also support the locale variant where the meaning of ","
and
"." is swapped in most European countries. "

We are seeing the same about base 2, 8, 10, 16.

May be :

int("E500.000,23")

as we are using :

0xffff

already for hex number
I really wouldn't want it to become possible to write Python code in one
locale that had to be edited before the numeric literals were valid in
another locale. That way madness lies.

regards
Steve
 
B

bearophileHUGS

Roy Smith>We already have a perfectly good syntax for entering octal
and hex integers,

There is this syntax:
1536 == int("600", 16)
that accepts strings only, up to a base of 36.
There are the hex() and oct() functions.
There is the %x and %o sintax, that isn't easy to remember.
There are the 0x600 and 0600 syntaxes that probably look good only from
the point of view of a C programmer.
I think some cleaning up, with a simpler and more consistent and
general way of converting bases, can be positive. But probably no one
shares this point of view, and compatibility with C syntax is probably
positive, so you are right. I am still learning the correct way of
thinking in python.

Bye,
bearophile
 
P

Peter Hansen

Steven D'Aprano:

I think using the underscore it is more explicit:
n = 123_456

Alternatively the underscore syntax may be used to separate the number
from its base:
22875 == 22875_10 == 595b_16 == 123456_7
But probably this is less commonly useful (and not much explicit).

Umm... in other words, "the underscore is under-used so let's assign
some arbitrary meaning to it" (to make the language more like Perl
perhaps?).

Or maybe one should instead interpret this as "numeric literals need
more bells and whistles, and I don't care which of these two we add, but
we have to do *something*!". :)

-Peter
 
B

bonono

Steve said:
I really wouldn't want it to become possible to write Python code in one
locale that had to be edited before the numeric literals were valid in
another locale. That way madness lies.
That is the fact, from the very beginning. 1.234 striaightly speaking
can have different meaning,. So if you don't want, don't support it and
always use the non-European notation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,607
Members
45,240
Latest member
pashute

Latest Threads

Top