Underscores in Python numbers

Steve Holden · Nov 19, 2005

Steve Holden wrote: [...]

I really wouldn't want it to become possible to write Python code in one
locale that had to be edited before the numeric literals were valid in
another locale. That way madness lies.

Click to expand...

That is the fact, from the very beginning. 1.234 striaightly speaking
can have different meaning,. So if you don't want, don't support it and
always use the non-European notation.

Being European myself I am well aware of the notational differences of
the different locales, and I am perfectly happy that users can enter
numbers in their preferred format when they execute a program.

However, I am not happy about the idea that a program source would need
to be edited before it would work after being moved to another locale.

regards
Steve

Steven D'Aprano · Nov 20, 2005

Umm... in other words, "the underscore is under-used so let's assign
some arbitrary meaning to it" (to make the language more like Perl
perhaps?).

+1

I *really* don't like the idea of allowing underscores in numeric
literals. Firstly, for aesthetic reasons: I think 123_456 is seriously
ugly. Secondly, for pragmatic reasons, I think it is too easy to mistype
as 123-456. I know that Python can't protect you from typing 9-1 instead
of 901, but why add special syntax that makes that sort of error MORE
common?)

Or maybe one should instead interpret this as "numeric literals need
more bells and whistles, and I don't care which of these two we add, but
we have to do *something*!".

-1

That's a tad unfair. Dealing with numeric literals with lots of digits is
a real (if not earth-shattering) human interface problem: it is hard for
people to parse long numeric strings. In the wider world outside of IT,
people deal with long numeric digits by grouping. This is *exceedingly*
common: mathematicians do it, economists do it, everybody who handles long
numeric literals does it *except* computer language designers.

Depending on personal preference and context, we use any of comma, period,
dash or space as a separator. Underscore is never used. Of these, the
comma clashes with tuples, the period opens a rather large can of worms
vis-a-vis internationalisation, and the dash clashes with the minus sign.
Allowing spaces to group digits is subtle but effective, doesn't clash
with other syntax, and is analogous to string concatenation.

I don't believe it is either practical or desirable for a computer
language to accept every conceivable digit separator in literals. If you
need full support for internationalised numbers, that should go into a
function. But the question of including a digit separator for numeric
literals does solve a real problem, it isn't just meaningless bells and
whistles.

Likewise, base conversion into arbitrary bases is not, in my opinion,
common enough a task that support for it needs to be built into the syntax
for literals. If somebody cares enough about it, write a module to handle
it and try to get it included with the Python standard modules.

Steve Holden · Nov 20, 2005

Steven D'Aprano wrote:
[...]

Likewise, base conversion into arbitrary bases is not, in my opinion,
common enough a task that support for it needs to be built into the syntax
for literals. If somebody cares enough about it, write a module to handle
it and try to get it included with the Python standard modules.

In fact Icon managed to offer a syntax that allowed every base up to 36
to be used: an "r" was used to indicate the radix of the literal, so hex
453FF would be represented as "16r453FF". This worked fine. Upper- and
lower-case letters werw regarded as equivalent.

regards
Steve

bonono · Nov 20, 2005

Steve said:
Being European myself I am well aware of the notational differences of
the different locales, and I am perfectly happy that users can enter
numbers in their preferred format when they execute a program.

However, I am not happy about the idea that a program source would need
to be edited before it would work after being moved to another locale.

Huh ?

Up to now, all I am talking about is making the three init
function(int/float/decimal) to be smarter on coverting string to their
type. It doesn't affect the code in anyway if you don't need it or want
to use it. It is more like a helper function for the issue of people
are so concern about the seperators in big numbers. It introduce no new
syntax to the language at all. And should you say use the imaginary
format "E500.000,23", it still works no matter where your program is
running or what the hosting locale is. Don't understand what changes
you are referring to.

We are facing similar issue today. A typical case is MM/DD/YYYY date
format. Or may be I need to import text file(csv for example) which may
already contain numbers in this format.

bonono · Nov 20, 2005

Steven said:
That's a tad unfair. Dealing with numeric literals with lots of digits is
a real (if not earth-shattering) human interface problem: it is hard for
people to parse long numeric strings. In the wider world outside of IT,
people deal with long numeric digits by grouping. This is *exceedingly*
common: mathematicians do it, economists do it, everybody who handles long
numeric literals does it *except* computer language designers.

However, what is the percentage of these big number literals appears in
source code ? I believe most of them either appears in some data
file(thus is nothing but string) or during data input(again string).
Why change the language when we just want a smarter string converter ?

Steven D'Aprano · Nov 20, 2005

Steven D'Aprano wrote:
[...]

Likewise, base conversion into arbitrary bases is not, in my opinion,
common enough a task that support for it needs to be built into the syntax
for literals. If somebody cares enough about it, write a module to handle
it and try to get it included with the Python standard modules.

Click to expand...

In fact Icon managed to offer a syntax that allowed every base up to 36
to be used: an "r" was used to indicate the radix of the literal, so hex
453FF would be represented as "16r453FF". This worked fine. Upper- and
lower-case letters werw regarded as equivalent.

Forth goes significantly further than that: you can tell the Forth
interpreter what base you are using, and all numbers are then read and
displayed using that base. Numbers were case sensitive, which meant Forth
understood bases to at least 62. I don't remember whether it allows
non-alphanumeric digits, and therefore higher bases -- I think it does,
but am not sure.

Nevertheless, I don't believe that sort of functionality belongs in the
language itself. It is all well and good to be able to write 32r37gm, but
how often do you really need to write numbers in base 32?

Peter Hansen · Nov 20, 2005

Steven said:
Dealing with numeric literals with lots of digits is
a real (if not earth-shattering) human interface problem: it is hard for
people to parse long numeric strings.

I'm totally unconvinced that this _is_ a real problem, if we define
"real" as being even enough to jiggle my mouse, let alone shattering the
planet.

What examples does anyone have of where it is necessary to define a
large number of large numeric literals? Isn't it the case that other
than the odd constants in various programs, defining a large number of
such values would be better done by creating a data file and parsing it?

And if that's the case, one could easily define any convention one
desired for formatting the raw data.

And for the odd constant, either take a moment to verify the value, or
define it in parts (e.g. 24*60*60*1000*1000 microseconds per day), or
write a nice little variant on int() that can do exactly what you would
have done for the external data file if you had more values.

-Peter

Roy Smith · Nov 20, 2005

Steven D'Aprano said:
That's a tad unfair. Dealing with numeric literals with lots of digits is
a real (if not earth-shattering) human interface problem: it is hard for
people to parse long numeric strings.

There are plenty of ways to make numeric literals easier to read without
resorting to built-in language support. One way is:

sixTrillion = 6 * 1000 * 1000 * 1000 * 1000

Or, a more general solution might be to write a little factory function
which took a string, stripped out the underscores (or spaces, or commas, or
whatever bit of punctuation turned you on), and then converted the
remaining digit string to an integer. You could then write:

creditCardNumber = myInt ("1234 5678 9012 3456 789")

Perhaps not as convenient as having it built into the language, but
workable in those cases which justify the effort.

Mike Meyer · Nov 20, 2005

Steven D'Aprano said:
+1

I *really* don't like the idea of allowing underscores in numeric
literals. Firstly, for aesthetic reasons: I think 123_456 is seriously
ugly. Secondly, for pragmatic reasons, I think it is too easy to mistype
as 123-456. I know that Python can't protect you from typing 9-1 instead
of 901, but why add special syntax that makes that sort of error MORE
common?)

I've seen at least one language (forget which one) that allowed such
separators, but only for groups of three. So 123_456 would be valid,
but 9_1 would be a syntax error. This kind of thing might help with
the detecting typos issue, and probably won't be noticed by most
users.

<mike

Roy Smith · Nov 20, 2005

Mike Meyer said:
I've seen at least one language (forget which one) that allowed such
separators, but only for groups of three.

That seems a bit silly. Not all numbers are naturally split into groups of
three. Credit card numbers are (typically) split into groups of four.
Account numbers are often split into all sorts of random groupings.

Raymond Hettinger · Nov 20, 2005

Gustav said:
I tried finding a discussion around adding the possibility to have
optional underscores inside numbers in Python. This is a popular option
available in several "competing" scripting langauges, that I would love
to see in Python.

Examples:
1_234_567
0xdead_beef
3.141_592

I suppose it could be done. OTOH, one could argue that most production
code has no business hardwiring-in numerical constants greater than 999
;-)

Mike Meyer · Nov 20, 2005

Roy Smith said:
That seems a bit silly. Not all numbers are naturally split into groups of
three. Credit card numbers are (typically) split into groups of four.
Account numbers are often split into all sorts of random groupings.

True. But how often do you want to add two account numbers, or
multiply two credit card numbers? Or display them in hex, or otherwise
treat them as something other than a string that happens to be
composed of digits?

<mike

David M. Cooke · Nov 20, 2005

Peter Hansen said:
I'm totally unconvinced that this _is_ a real problem, if we define
"real" as being even enough to jiggle my mouse, let alone shattering the
planet.

What examples does anyone have of where it is necessary to define a
large number of large numeric literals? Isn't it the case that other
than the odd constants in various programs, defining a large number of
such values would be better done by creating a data file and parsing
it?

One example I can think of is a large number of float constants used
for some math routine. In that case they usually be a full 16 or 17
digits. It'd be handy in that case to split into smaller groups to
make it easier to match with tables where these constants may come
from. Ex:

def sinxx(x):
"computes sin x/x for 0 <= x <= pi/2 to 2e-9"
a2 = -0.16666 66664
a4 = 0.00833 33315
a6 = -0.00019 84090
a8 = 0.00000 27526
a10= -0.00000 00239
x2 = x**2
return 1. + x2*(a2 + x2*(a4 + x2*(a6 + x2*(a8 + x2*a10))))

(or least that's what I like to write). Now, if I were going to higher
precision, I'd have more digits of course.

Dan Bishop · Nov 20, 2005

Roy said:
There are plenty of ways to make numeric literals easier to read without
resorting to built-in language support. One way is:

sixTrillion = 6 * 1000 * 1000 * 1000 * 1000

Or, a more general solution might be to write a little factory function
which took a string, stripped out the underscores (or spaces, or commas, or
whatever bit of punctuation turned you on), and then converted the
remaining digit string to an integer. You could then write:

creditCardNumber = myInt ("1234 5678 9012 3456 789")

Or alternatively, you could write:

creditCardNumber = int('1234''5678''9012''3456''789')

Peter Hansen · Nov 20, 2005

Dan said:
Or alternatively, you could write:

creditCardNumber = int('1234''5678''9012''3456''789')

Or creditCardNumber = int("1234 5678 9012 3456 789".replace(' ',''))

Or make a little function that does the same job and looks cleaner, if
you need this more than once.

But why would anyone want to create numeric literals for credit card
numbers?

-Peter

bonono · Nov 20, 2005

Peter said:
But why would anyone want to create numeric literals for credit card
numbers?

May be for space saving ? But storage space being so cheap, this is not
a very good reason, but still a reason.

Eric Jacoboni · Nov 20, 2005

Mike Meyer said:
I've seen at least one language (forget which one) that allowed such
separators, but only for groups of three. So 123_456 would be valid,
but 9_1 would be a syntax error.

Ada allows underscores in numeric literals since 1983, without
enforcing any grouping. The Ruby language allows also this
notation. You may write 1_000_001 or 1000_001 or 10_00_001, etc. (the
same for real numbers...).

When you have the habit to represent literals like that, all other
big numeric literals or workarounds to create grouping seem cryptic.

Roy Smith · Nov 20, 2005

Dan Bishop said:
creditCardNumber = int('1234''5678''9012''3456''789')

Wow, I didn't know you could do that. That's better than my idea.

Steve Holden · Nov 20, 2005

David said:
One example I can think of is a large number of float constants used
for some math routine. In that case they usually be a full 16 or 17
digits. It'd be handy in that case to split into smaller groups to
make it easier to match with tables where these constants may come
from. Ex:

def sinxx(x):
"computes sin x/x for 0 <= x <= pi/2 to 2e-9"
a2 = -0.16666 66664
a4 = 0.00833 33315
a6 = -0.00019 84090
a8 = 0.00000 27526
a10= -0.00000 00239
x2 = x**2
return 1. + x2*(a2 + x2*(a4 + x2*(a6 + x2*(a8 + x2*a10))))

(or least that's what I like to write). Now, if I were going to higher
precision, I'd have more digits of course.

Right, this is clearly such a frequent use case it's worth changing the
compiler for.

regards
Steve

D H · Nov 20, 2005

Steve said:
Right, this is clearly such a frequent use case it's worth changing the
compiler for.

Yes it is.
In that one example he used digit grouping 5 more times than I've
used lambda in my life. Remember people use python as a data format as
well (see for example JSON).
It's a simple harmless change to the parser: ignore underscores or
spaces in numeric literals. As others have mentioned, Ruby supports
this already, as do Ada, Perl, ML variants, VHDL, boo, nemerle, and others.

Python usage numbers	0	Feb 11, 2012
Method Underscores?	32	Oct 21, 2004
Learning Python	1	May 25, 2013
XML python to database	3	Nov 2, 2013
Parsing files in python	0	Dec 23, 2012
Interactrive Python under Cygwin in Win7	2	Apr 8, 2013
Unicode and Python - how often do you index strings?	33	Jun 4, 2014
(Rebooting) Python Usage Statistics	0	Feb 13, 2012

Underscores in Python numbers

Steve Holden

Steven D'Aprano

Steve Holden

bonono

bonono

Steven D'Aprano

Peter Hansen

Roy Smith

Mike Meyer

Roy Smith

Raymond Hettinger

Mike Meyer

David M. Cooke

Dan Bishop

Peter Hansen

bonono

Eric Jacoboni

Roy Smith

Steve Holden

D H

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads