Rough draft: Proposed format specifier for a thousands separator

R

Raymond Hettinger

If anyone here is interested, here is a proposal I posted on the
python-ideas list.

The idea is to make numbering formatting a little easier with the new
format() builtin
in Py2.6 and Py3.0: http://docs.python.org/library/string.html#formatspec


-------------------------------------------------------------


Motivation:

Provide a simple, non-locale aware way to format a number
with a thousands separator.

Adding thousands separators is one of the simplest ways to
improve the professional appearance and readability of
output exposed to end users.

In the finance world, output with commas is the norm. Finance
users
and non-professional programmers find the locale approach to be
frustrating, arcane and non-obvious.

It is not the goal to replace locale or to accommodate every
possible convention. The goal is to make a common task easier
for many users.


Research so far:

Scanning the web, I've found that thousands separators are
usually one of COMMA, PERIOD, SPACE, or UNDERSCORE. The
COMMA is used when a PERIOD is the decimal separator.

James Knight observed that Indian/Pakistani numbering systems
group by hundreds. Ben Finney noted that Chinese group by
ten-thousands.

Visual Basic and its brethren (like MS Excel) use a completely
different style and have ultra-flexible custom format specifiers
like: "_($* #,##0_)".



Proposal I (from Nick Coghlan]:

A comma will be added to the format() specifier mini-language:

[[fill]align][sign][#][0][minimumwidth][,][.precision][type]

The ',' option indicates that commas should be included in the
output as a
thousands separator. As with locales which do not use a period as
the
decimal point, locales which use a different convention for digit
separation will need to use the locale module to obtain
appropriate
formatting.

The proposal works well with floats, ints, and decimals. It also
allows easy substitution for other separators. For example:

format(n, "6,f").replace(",", "_")

This technique is completely general but it is awkward in the one
case where the commas and periods need to be swapped.

format(n, "6,f").replace(",", "X").replace(".", ",").replace
("X", ".")


Proposal II (to meet Antoine Pitrou's request):

Make both the thousands separator and decimal separator user
specifiable
but not locale aware. For simplicity, limit the choices to a
comma, period,
space, or underscore..

[[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision]
[type]

Examples:

format(1234, "8.1f") --> ' 1234.0'
format(1234, "8,1f") --> ' 1234,0'
format(1234, "8T.,1f") --> ' 1.234,0'
format(1234, "8T .f") --> ' 1 234,0'
format(1234, "8d") --> ' 1234'
format(1234, "8T,d") --> ' 1,234'

This proposal meets mosts needs (except for people wanting
grouping
for hundreds or ten-thousands), but it comes at the expense of
being a little more complicated to learn and remember. Also, it
makes it
more challenging to write custom __format__ methods that follow
the
format specification mini-language.

For the locale module, just the "T" is necessary in a formatting
string
since the tool already has procedures for figuring out the actual
separators from the local context.



Comments and suggestions are welcome but I draw the line at supporting
Mayan numbering conventions ;-)


Raymond
 
R

Raymond Hettinger

If anyone here is interested, here is a proposal I posted on the
python-ideas list.

The idea is to make numbering formatting a little easier with
the new format() builtin:
http://docs.python.org/library/string.html#formatspec

Here's a re-post (hopefully without the line wrapping problems
in the previous post).

Raymond

-------------------------------------------------------------



Motivation:
-----------

Provide a simple, non-locale aware way to format a number
with a thousands separator.

Adding thousands separators is one of the simplest ways to
improve the professional appearance and readability of output
exposed to end users.

In the finance world, output with commas is the norm. Finance
users and non-professional programmers find the locale
approach to be frustrating, arcane and non-obvious.

It is not the goal to replace locale or to accommodate every
possible convention. The goal is to make a common task easier
for many users.


Research so far:
----------------

Scanning the web, I've found that thousands separators are
usually one of COMMA, PERIOD, SPACE, or UNDERSCORE. The
COMMA is used when a PERIOD is the decimal separator.

James Knight observed that Indian/Pakistani numbering systems
group by hundreds. Ben Finney noted that Chinese group by
ten-thousands.

Visual Basic and its brethren (like MS Excel) use a completely
different style and have ultra-flexible custom format
specifiers like: "_($* #,##0_)".



Proposal I (from Nick Coghlan):
-------------------------------

A comma will be added to the format() specifier mini-language:

[[fill]align][sign][#][0][minimumwidth][,][.precision][type]

The ',' option indicates that commas should be included in the
output as a thousands separator. As with locales which do not
use a period as the decimal point, locales which use a
different convention for digit separation will need to use the
locale module to obtain appropriate formatting.

The proposal works well with floats, ints, and decimals.
It also allows easy substitution for other separators.
For example:

format(n, "6,f").replace(",", "_")

This technique is completely general but it is awkward in the
one case where the commas and periods need to be swapped:

format(n, "6,f").replace(",", "X").replace(".", ",").replace("X",
".")


Proposal II (to meet Antoine Pitrou's request):
-----------------------------------------------

Make both the thousands separator and decimal separator user
specifiable but not locale aware. For simplicity, limit the
choices to a comma, period, space, or underscore.

[[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision][type]

Examples:

format(1234, "8.1f") --> ' 1234.0'
format(1234, "8,1f") --> ' 1234,0'
format(1234, "8T.,1f") --> ' 1.234,0'
format(1234, "8T .f") --> ' 1 234,0'
format(1234, "8d") --> ' 1234'
format(1234, "8T,d") --> ' 1,234'

This proposal meets mosts needs (except for people wanting
grouping for hundreds or ten-thousands), but iIt comes at the
expense of being a little more complicated to learn and
remember. Also, it makes it more challenging to write custom
__format__ methods that follow the format specification
mini-language.

For the locale module, just the "T" is necessary in a
formatting string since the tool already has procedures for
figuring out the actual separators from the local context.
 
U

Ulrich Eckhardt

Raymond said:
The idea is to make numbering formatting a little easier with
the new format() builtin:
http://docs.python.org/library/string.html#formatspec
[...]
Scanning the web, I've found that thousands separators are
usually one of COMMA, PERIOD, SPACE, or UNDERSCORE. The
COMMA is used when a PERIOD is the decimal separator.

James Knight observed that Indian/Pakistani numbering systems
group by hundreds. Ben Finney noted that Chinese group by
ten-thousands.

IIRC, some cultures use a non-uniform grouping, like e.g. "123 456 78.9".
For that, there is also a grouping reserved in the locale (at least in
those of C++ IOStreams, that is). Further, an that seems to also be one of
your concerns, there are different ways to represent negative numbers like
e.g. "(123)" or "-456".

Make both the thousands separator and decimal separator user
specifiable but not locale aware. For simplicity, limit the
choices to a comma, period, space, or underscore.

[[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision][type]

Examples:

format(1234, "8.1f") --> ' 1234.0'
format(1234, "8,1f") --> ' 1234,0'
format(1234, "8T.,1f") --> ' 1.234,0'
format(1234, "8T .f") --> ' 1 234,0'
format(1234, "8d") --> ' 1234'
format(1234, "8T,d") --> ' 1,234'


How about this?
format(1234, "8.1", tsep=",")
--> ' 1,234.0'
format(1234, "8.1", tsep=".", dsep=",")
--> ' 1.234,0'
format(123456, tsep=" ", grouping=(3, 2,))
--> '1 234 56'

IOW, why not explicitly say what you want using keyword arguments with
defaults instead of inventing an IMHO cryptic, read-only mini-language?
Seriously, the problem I see with this proposal is that its aim to be as
short as possible actually makes the resulting format specifications
unreadable. Could you even guess what "8T.,1f" should mean if you had not
written this?
This proposal meets mosts needs (except for people wanting
grouping for hundreds or ten-thousands), but iIt comes at the
expense of being a little more complicated to learn and
remember.

Too expensive for my taste.

Uli
 
R

Raymond Hettinger

[Ulrich Eckhardt]
IOW, why not explicitly say what you want using keyword arguments with
defaults instead of inventing an IMHO cryptic, read-only mini-language?

That makes sense to me but I don't think that's the way the format()
builtin was implemented (see PEP 3101 which was implemented Py2.6 and
3.0).
It is a simple pass-through to a __format__ method for each
formattable
object. I don't see how keywords would fit in that framework. What
is
proposed is similar to locale module's existing "n" specifier except
that
this lets you say exactly what you want instead of deferring to the
locale
settings.

The mini-language seems to already be the way of things (just as it is
many other languages including PHP, C, Fortran, and whatnot). I'm
just
proposing an addition "T," so you add commas as a thousands separator.


Raymond
 
J

John Machin

[Ulrich Eckhardt]
IOW, why not explicitly say what you want using keyword arguments with
defaults instead of inventing an IMHO cryptic, read-only mini-language?

That makes sense to me but I don't think that's the way the format()
builtin was implemented (see PEP 3101 which was implemented Py2.6 and
3.0).
It is a simple pass-through to a __format__ method for each
formattable
object.  I don't see how keywords would fit in that framework.  What
is
proposed is similar to locale module's existing "n" specifier except
that
this lets you say exactly what you want instead of deferring to the
locale
settings.

The mini-language seems to already be the way of things (just as it is
many other languages including PHP, C, Fortran, and whatnot).  I'm
just
proposing an addition "T," so you add commas as a thousands separator.

.... and why not C (centum) for hundreds (can't have H(ollerith)) and W
for wan (the Chinese word for 10 thousand)?
 
H

Hendrik van Rooyen

Ulrich Eckhardt said:
IOW, why not explicitly say what you want using keyword arguments with
defaults instead of inventing an IMHO cryptic, read-only mini-language?
Seriously, the problem I see with this proposal is that its aim to be as
short as possible actually makes the resulting format specifications
unreadable. Could you even guess what "8T.,1f" should mean if you had not
written this?

+1

Look back in history, and see how COBOL did it with the
PICTURE - dead easy and easily understandable.
Compared to that, even the C printf stuff and python's %
are incomprehensible.

- Hendrik
 
M

MRAB

Raymond Hettinger wrote:
[snip]
Proposal I (from Nick Coghlan):
-------------------------------

A comma will be added to the format() specifier mini-language:

[[fill]align][sign][#][0][minimumwidth][,][.precision][type]

The ',' option indicates that commas should be included in the
output as a thousands separator. As with locales which do not
use a period as the decimal point, locales which use a
different convention for digit separation will need to use the
locale module to obtain appropriate formatting.

The proposal works well with floats, ints, and decimals.
It also allows easy substitution for other separators.
For example:

format(n, "6,f").replace(",", "_")

This technique is completely general but it is awkward in the
one case where the commas and periods need to be swapped:

format(n, "6,f").replace(",", "X").replace(".", ",").replace("X",
".")


Proposal II (to meet Antoine Pitrou's request):
-----------------------------------------------

Make both the thousands separator and decimal separator user
specifiable but not locale aware. For simplicity, limit the
choices to a comma, period, space, or underscore.

[[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision][type]

Examples:

format(1234, "8.1f") --> ' 1234.0'
format(1234, "8,1f") --> ' 1234,0'
format(1234, "8T.,1f") --> ' 1.234,0'
format(1234, "8T .f") --> ' 1 234,0'
format(1234, "8d") --> ' 1234'
format(1234, "8T,d") --> ' 1,234'

This proposal meets mosts needs (except for people wanting
grouping for hundreds or ten-thousands), but iIt comes at the
expense of being a little more complicated to learn and
remember. Also, it makes it more challenging to write custom
__format__ methods that follow the format specification
mini-language.

For the locale module, just the "T" is necessary in a
formatting string since the tool already has procedures for
figuring out the actual separators from the local context.
[snip]
I'd probably prefer Proposal I with "." representing the decimal point
and "," representing the grouping (thousands) separator, although I'd
add an "L" flag to indicate that it should use the locale to provide the
actual characters to be used and even the number of digits for the
grouping:

[[fill]align][sign][#][0][minimumwidth][,][.precision][L][type]

Examples:

Assuming the locale has:

decimal point: ","
grouping separator: "."
grouping spacing: 3

format(123456, "10.1f") --> ' 123456.0'
format(123456, "10.1Lf") --> ' 123.456,0'
format(123456, "10,.1f") --> ' 123,456.0'
format(123456, "10,.1Lf") --> ' 123.456,0'
 
P

pruebauno

If anyone here is interested, here is a proposal I posted on the
python-ideas list.

The idea is to make numbering formatting a little easier with the new
format() builtin
in Py2.6 and Py3.0:  http://docs.python.org/library/string.html#formatspec

-------------------------------------------------------------

Motivation:

    Provide a simple, non-locale aware way to format a number
    with a thousands separator.

    Adding thousands separators is one of the simplest ways to
    improve the professional appearance and readability of
    output exposed to end users.

    In the finance world, output with commas is the norm.  Finance
users
    and non-professional programmers find the locale approach to be
    frustrating, arcane and non-obvious.

    It is not the goal to replace locale or to accommodate every
    possible convention.  The goal is to make a common task easier
    for many users.

Research so far:

    Scanning the web, I've found that thousands separators are
    usually one of COMMA, PERIOD, SPACE, or UNDERSCORE.  The
    COMMA is used when a PERIOD is the decimal separator.

    James Knight observed that Indian/Pakistani numbering systems
    group by hundreds.   Ben Finney noted that Chinese group by
    ten-thousands.

    Visual Basic and its brethren (like MS Excel) use a completely
    different style and have ultra-flexible custom format specifiers
    like: "_($* #,##0_)".

Proposal I (from Nick Coghlan]:

    A comma will be added to the format() specifier mini-language:

    [[fill]align][sign][#][0][minimumwidth][,][.precision][type]

    The ',' option indicates that commas should be included in the
output as a
    thousands separator. As with locales which do not use a period as
the
    decimal point, locales which use a different convention for digit
    separation will need to use the locale module to obtain
appropriate
    formatting.

    The proposal works well with floats, ints, and decimals.  It also
    allows easy substitution for other separators.  For example:

        format(n, "6,f").replace(",", "_")

    This technique is completely general but it is awkward in the one
    case where the commas and periods need to be swapped.

        format(n, "6,f").replace(",", "X").replace(".", ",").replace
("X", ".")

Proposal II (to meet Antoine Pitrou's request):

    Make both the thousands separator and decimal separator user
specifiable
    but not locale aware.  For simplicity, limit the choices to a
comma, period,
    space, or underscore..

    [[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision]
[type]

    Examples:

        format(1234, "8.1f")    -->     '  1234.0'
        format(1234, "8,1f")    -->     '  1234,0'
        format(1234, "8T.,1f")  -->     ' 1.234,0'
        format(1234, "8T .f")   -->     ' 1 234,0'
        format(1234, "8d")      -->     '    1234'
        format(1234, "8T,d")      -->   '   1,234'

    This proposal meets mosts needs (except for people wanting
grouping
    for hundreds or ten-thousands), but it comes at the expense of
    being a little more complicated to learn and remember.  Also, it
makes it
    more challenging to write custom __format__ methods that follow
the
    format specification mini-language.

    For the locale module, just the "T" is necessary in a formatting
string
    since the tool already has procedures for figuring out the actual
    separators from the local context.

Comments and suggestions are welcome but I draw the line at supporting
Mayan numbering conventions ;-)

Raymond

As far as I am concerned the most simple version plus a way to swap
around commas and period is all that is needed. The rest can be done
using one replace (because the decimal separator is always one of two
options). This should cover everywhere but the far east. 80% of cases
for 20% of implementation complexity.

For example:

[[fill]align][sign][#][0][,|.][minimumwidth][.precision][type]
 
R

Raymond Hettinger

If anyone here is interested, here is a proposal I posted on the
python-ideas list.
The idea is to make numbering formatting a little easier with the new
format() builtin
in Py2.6 and Py3.0:  http://docs.python.org/library/string.html#formatspec


    Provide a simple, non-locale aware way to format a number
    with a thousands separator.
    Adding thousands separators is one of the simplest ways to
    improve the professional appearance and readability of
    output exposed to end users.
    In the finance world, output with commas is the norm.  Finance
users
    and non-professional programmers find the locale approach to be
    frustrating, arcane and non-obvious.
    It is not the goal to replace locale or to accommodate every
    possible convention.  The goal is to make a common task easier
    for many users.
Research so far:
    Scanning the web, I've found that thousands separators are
    usually one of COMMA, PERIOD, SPACE, or UNDERSCORE.  The
    COMMA is used when a PERIOD is the decimal separator.
    James Knight observed that Indian/Pakistani numbering systems
    group by hundreds.   Ben Finney noted that Chinese group by
    ten-thousands.
    Visual Basic and its brethren (like MS Excel) use a completely
    different style and have ultra-flexible custom format specifiers
    like: "_($* #,##0_)".
Proposal I (from Nick Coghlan]:
    A comma will be added to the format() specifier mini-language:
    [[fill]align][sign][#][0][minimumwidth][,][.precision][type]
    The ',' option indicates that commas should be included in the
output as a
    thousands separator. As with locales which do not use a period as
the
    decimal point, locales which use a different convention for digit
    separation will need to use the locale module to obtain
appropriate
    formatting.
    The proposal works well with floats, ints, and decimals.  It also
    allows easy substitution for other separators.  For example:
        format(n, "6,f").replace(",", "_")
    This technique is completely general but it is awkward in the one
    case where the commas and periods need to be swapped.
        format(n, "6,f").replace(",", "X").replace(".", ",").replace
("X", ".")
Proposal II (to meet Antoine Pitrou's request):
    Make both the thousands separator and decimal separator user
specifiable
    but not locale aware.  For simplicity, limit the choices to a
comma, period,
    space, or underscore..
    [[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision]
[type]
    Examples:
        format(1234, "8.1f")    -->     '  1234.0'
        format(1234, "8,1f")    -->     '  1234,0'
        format(1234, "8T.,1f")  -->     ' 1.234,0'
        format(1234, "8T .f")   -->     ' 1 234,0'
        format(1234, "8d")      -->     '    1234'
        format(1234, "8T,d")      -->   '   1,234'
    This proposal meets mosts needs (except for people wanting
grouping
    for hundreds or ten-thousands), but it comes at the expense of
    being a little more complicated to learn and remember.  Also, it
makes it
    more challenging to write custom __format__ methods that follow
the
    format specification mini-language.
    For the locale module, just the "T" is necessary in a formatting
string
    since the tool already has procedures for figuring out the actual
    separators from the local context.
Comments and suggestions are welcome but I draw the line at supporting
Mayan numbering conventions ;-)

As far as I am concerned the most simple version plus a way to swap
around commas and period is all that is needed.

Thanks for the feedback.

FWIW, posted a cleaned-up version of the proposal at
http://www.python.org/dev/peps/pep-0378/


Raymond
 
P

Paul Rubin

Raymond Hettinger said:
FWIW, posted a cleaned-up version of the proposal at
http://www.python.org/dev/peps/pep-0378/

It would be nice if the PEP included a comparison between the proposed
scheme and how it is done in other programs and languages. For
example, I think Common Lisp has a feature for formatting thousands.
Spreadsheets like Excel probably have something similar. Those
programs are pretty well evolved and probably address the important
real use cases by now. It might be best to follow an existing example
(with adjustments for Pythonification as necessary) to the extent
possible.
 
R

Raymond Hettinger

[Paul Rubin]
It would be nice if the PEP included a comparison between the proposed
scheme and how it is done in other programs and languages.

Good idea. I'm hoping that people will post those here.
In my quick research, it looks like many languages offer
nothing more than the usual C style % formatting and defer
the rest for a local aware module.

 For
example, I think Common Lisp has a feature for formatting thousands.

Do you have more detail?

Spreadsheets like Excel probably have something similar.

I addressed that in the PEP in the section on VB and relatives. Their
approach doesn't graft-on to our existing approach. They use format
specifiers like: "_($* #,##0_)".


Raymond
 
P

Paul Rubin

Raymond Hettinger said:
In my quick research, it looks like many languages offer
nothing more than the usual C style % formatting and defer
the rest for a local aware module.

Hendrik van Rooyen's mention of Cobol's "picture" (aka PIC)
specifications might be added to the list. Cautionary tale: I once
had a similar idea and suggested including a bastardized version of
PIC in an extension language for something I worked on once. Another
programmer then coded a reasonable PIC subset and we shipped it.
Turned out that a number of our users were Cobol experts and once we
had anything like PIC, they expected the weirdest and most obscure
features (of which there were quite a few) of real Cobol PIC to work.
We ended up having to assign someone a fairly lengthy task of figuring
out the Cobol spec and implementing every last damn PIC feature. But
I digress.

Do you have more detail?

http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node200.html

gives as an example:

(format nil "The answer is ~:D." (expt 47 x))
=> "The answer is 229,345,007."
 
L

Lie Ryan

Hendrik said:
+1

Look back in history, and see how COBOL did it with the
PICTURE - dead easy and easily understandable.
Compared to that, even the C printf stuff and python's %
are incomprehensible.

- Hendrik

Seeing how many people complained for the proposal being unreadable
(although it tries to be simple by not including too much features), why
not go all the way to unreadability and teach people to always use some
sort of convenience function and never use the microlanguage except of
very simple cases (or extremely complex cases, in which case you might
actually be better served with writing your own formatting function).

A hyphotetical code using conv function and the microlanguage could look
like this:
'213-210@3242'
 
R

Raymond Hettinger

[Lie Ryan]
A hyphotetical code using conv function and the microlanguage could look
like this:

 >>> num = 213210.3242
 >>> fmt = create_format(sep='-', decsep='@')
 >>> print fmt
50|\/|3_v3ry_R34D4|3L3_C0D3
 >>> '{0!{1}}'.format(num, fmt)
'213-210@3242'

LOL, it's like APL all over again ;-)

FWIW, the latest version of the proposal is dirt simple:
'1,234.50'


The proposal is roughly:
If you want commas in the output,
put a comma in the format string.
It's not rocket science.

What is rocket science is what you have to do now
to achieve the same effect. If someone finds the
above to be baffling, how the heck are they going
to do the same thing using the locale module?


Raymond
 
T

Tim Rowe

2009/3/12 Raymond Hettinger said:
If anyone here is interested, here is a proposal I posted on the
python-ideas list.

The idea is to make numbering formatting a little easier with the new
format() builtin
in Py2.6 and Py3.0:  http://docs.python.org/library/string.html#formatspec

As far as I can see you're proposing an amendment to *encourage*
writing code that is not locale aware, with the amendment itself being
locale specific, which surely has to be a regressive move in the 21st
century. Frankly, I'd sooner see it made /harder/ to write code that
is not locale aware (warnings, like FxCop gives on .net code?) tnan
/easier/. Perhaps that's because I'm British, not American and I'm
sick of having date fields get the date wrong because the programmer
thinks the USA is the world. It makes me sympathetic to the problems
caused to others by programmers who think the English-speaking world
is the world.

By the way, to others who think that 123,456.7 and 123.456,7 are the
only conventions in common use in the West, no they're not. 123 456.7
is in common use in engineering, at least in Europe, precisely to
reduce (though not eliminate) problems caused by dot and comma
confusion..
 
P

Paul Rubin

Raymond Hettinger said:
The proposal is roughly:
If you want commas in the output,
put a comma in the format string.
It's not rocket science.

What if you want to change the separator? Europeans usually
use periods instead of commas: one thousand = 1.000.
 
R

Raymond Hettinger

[andrew cooke]
would it break anything to also allow

Yes, that's allowed too! The separators can be any one of COMMA,
SPACE, DOT, UNDERSCORE, or NON-BREAKING-SPACE.
 
R

Raymond Hettinger

[Paul Rubin]
What if you want to change the separator?  Europeans usually
use periods instead of commas: one thousand = 1.000.

That is supported also.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top