Format specification mini-language for list joining

T

Tobia Conforto

Hello

Lately I have been writing a lot of list join() operations variously including (and included in) string format() operations.

For example:

temps = [24.369, 24.550, 26.807, 27.531, 28.752]

out = 'Temperatures: {0} Celsius'.format(
', '.join('{0:.1f}'.format(t) for t in temps)
)

# => 'Temperatures: 24.4, 24.6, 26.8, 27.5, 28.8 Celsius'

This is just a simple example, my actual code has many more join and formatoperations, split into local variables as needed for clarity.

Then I remembered that Ye Old Common Lisp's format operator had built-in list traversing capabilities[1]:

(format t "Temperatures: ~{~1$~^, ~} Celsius" temps)

That format string (the part in the middle that looks like line noise) is admittedly arcane, but it's parsed like this:

~{ take next argument (temp) and start iterating over its contents
~1$ output a floating point number with 1 digit precision
~^ break the loop if there are no more items available
", " (otherwise) output a comma and space
~} end of the loop body

Now, as much as I appreciate the heritage of Lisp, I won't deny than its format string mini-language is EVIL. As a rule, format string placeholders should not include *imperative statements* such as for, break, continue, and if. We don't need a Turing-complete language in our format strings. Still, this is the grand^n-father of Python's format strings, so it's interesting to look at how it used to approach the list joining issue.

Then I asked myself: can I take the list joining capability and port it over to Python's format(), doing away with the overall ugliness?

Here is what I came up with:

out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)

# => 'Temperatures: 24.4, 24.6, 26.8, 27.5, 28.8 Celsius'

Here ", " is the joiner between the items and <.1f> is the format string for each item.

The way this would work is by defining a specific Format Specification Mini-Language for sequences (such as lists, tuples, and iterables).

A Format Specification Mini-Language (format_spec) is whatever follows the first colon in a curly brace placeholder, and is defined by the argument's class, so that it can vary wildly among different types.[2]

The root class (object) defines the generic format_spec we are accustomed to[3]:

[[fill]align][sign][#][0][width][,][.precision][type]

But that doesn't mean that more complex types should not define extensions or replacements. I propose this extended format_spec for sequences:

seq_format_spec ::= join_string [":" item_format_spec] | format_spec
join_string ::= '"' join_string_char* '"' | "'" join_string_char* "'"
join_string_char ::= <any character except "{", "}", newline, or the quote>
item_format_spec ::= format_spec

That is, if the format_spec for a sequence starts with ' or " it would be interpreted as a join operation (eg. {0:", "} or {0:', '}) optionally followed by a format_spec for the single items: {0:", ":.1f}

If the format_spec does not start with ' or ", of if the quote is not balanced (does not appear again in the format_spec), then it's assumed to be a generic format string and the implementation would call super(). This is meant for backwards compatibility with existing code that may be using the generic format_spec over various sequences.

I do think that would be quite readable and useful. Look again at the example:

out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)

As a bonus, it allows nested joins, albeit only for simple cases. For example we could format a dictionary's items:

temps = {'Rome': 26, 'Paris': 21, 'New York': 18}

out = 'Temperatures: {0:", ":" ":s}'.format(temps.items())

# => 'Temperatures: Rome 26, Paris 21, New York 18'

Here the format_spec for temps.items() is <", ":" ":s>. Then ", " would be used as a joiner between the item tuples and <" ":s> would be passed over as the format_spec for each tuple. This in turn would join the tuple's itemsusing a single space and output each item with its simple string format. This could go on and on as needed, adding a colon and joiner string for eachnested join operation.

A more complicated mini-language would be needed to output dicts using different format strings for keys and values, but I think that would be veeringover to unreadable territory.

What do you think?

I plan to write this as a module and propose it to Python's devs for inclusion in the main tree, but any criticism is welcome before I do that.

-Tobia

[1] http://www.gigamonkeys.com/book/a-few-format-recipes.html
[2] http://docs.python.org/3/library/string.html#formatstrings
[3] http://docs.python.org/3/library/string.html#formatspec
 
P

Paul Rubin

Tobia Conforto said:
Now, as much as I appreciate the heritage of Lisp, I won't deny than
its format string mini-language is EVIL. ... Still, this is the
grand^n-father of Python's format strings...

Without having yet read the rest of your post carefully, I wonder the
particular historical point above is correct. Python's format strings
are pretty much the same as C's format strings, which go back to the
beginnings of C in the 1970's, maybe even to some forerunner of C, like
maybe FOCAL or something like that. It's possible that Common Lisp's
format strings came from some earlier Lisp, but Common Lisp itself was a
1980's thing. Maybe some Lisp historian would know.
 
S

Steven D'Aprano

Hello

Lately I have been writing a lot of list join() operations variously
including (and included in) string format() operations.

For example:

temps = [24.369, 24.550, 26.807, 27.531, 28.752]

out = 'Temperatures: {0} Celsius'.format(
', '.join('{0:.1f}'.format(t) for t in temps)
)

# => 'Temperatures: 24.4, 24.6, 26.8, 27.5, 28.8 Celsius'

This is just a simple example, my actual code has many more join and
format operations, split into local variables as needed for clarity.

Good plan! But then you suggest:

Here is what I came up with:
out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)
# => 'Temperatures: 24.4, 24.6, 26.8, 27.5, 28.8 Celsius'

Here ", " is the joiner between the items and <.1f> is the format string
for each item.

And there goes all the clarity.

Is saving a few words of Python code so important that you would prefer
to read and write an overly-terse, cryptic mini-language?

If you're worried about code re-use, write a simple helper function:

def format_items(format, items):
template = '{0:%s}' % format
return ', '.join(template.format(item) for item in items)

out = 'Temperatures: {0} Celsius'.format( format_items('.1f, temps) )
 
K

Kwpolska

[…] Python's format strings are pretty much the same as C's format strings […]

You’re thinking about the old % syntax, 'Hello %s!' % 'world'. TheOP
meant the new str.format syntax ('Hello {}!'.format('world')).
---

IMO, the idea is useless. First of, format() exists since 2.6, which
was released in 2008. So, it would be hard to use it anyways. Second
of, which is more readable:

out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)

or

out = 'Temperatures: {} Celsius'.format(', '.join(temps))

101% of the Python community would opt for the second format. Because
your format is cryptic. The current thing is already
not-quite-easy-to-understand when you use magic (aligning, type
converting etc.), but your proposition is much worse. And I hate to
consult the docs while working on something. As I said, it’s hard to
even get this one changed because str.format is 4 years old.
 
T

Tobia Conforto

Kwpolska said:
out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)

[...] your format is cryptic.

Thank you for your criticism, I'll think it over. The reason I find it readable (-enough) is because even without knowing what format language is supported by the temps object, you can tell that "it" (the 0th argument in thiscase) is what's going to be serialized in that place.

Everything after the first colon is game anyways, meaning you'll have to look it up in the docs, because it's defined somewhere in the class hierarchyof the object being serialized. The fact that 99% of classes don't define a __format__ method and thus fall back on object's implementation, with it's alignment and padding operators, is IMHO irrelevant. It's still somethingyou can't pretend to know out of the box, because it's supposed to be customizable by classes.

Knowing this, if you know that the temps object is a list of floats, then Ithink it'd be pretty obvious what the ", " and the :.1f should do.
As I said, it’s hard to even get this one changed
because str.format is 4 years old.

Again, I beg to differ. I'm not proposing any change to format (that would be madness). What I'm proposing is the addition of a customized __format__ method to a few types, namely lists and sequences, that currently lack it (as do 99% of classes) and fall back to object's implementation. Which is kind of pointless with lists, as joining is by far the thing most often done to them when formatting.

Tobia
 
T

Tobia Conforto

Kwpolska said:
out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)

[...] your format is cryptic.

Thank you for your criticism, I'll think it over. The reason I find it readable (-enough) is because even without knowing what format language is supported by the temps object, you can tell that "it" (the 0th argument in thiscase) is what's going to be serialized in that place.

Everything after the first colon is game anyways, meaning you'll have to look it up in the docs, because it's defined somewhere in the class hierarchyof the object being serialized. The fact that 99% of classes don't define a __format__ method and thus fall back on object's implementation, with it's alignment and padding operators, is IMHO irrelevant. It's still somethingyou can't pretend to know out of the box, because it's supposed to be customizable by classes.

Knowing this, if you know that the temps object is a list of floats, then Ithink it'd be pretty obvious what the ", " and the :.1f should do.
As I said, it’s hard to even get this one changed
because str.format is 4 years old.

Again, I beg to differ. I'm not proposing any change to format (that would be madness). What I'm proposing is the addition of a customized __format__ method to a few types, namely lists and sequences, that currently lack it (as do 99% of classes) and fall back to object's implementation. Which is kind of pointless with lists, as joining is by far the thing most often done to them when formatting.

Tobia
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,539
Members
45,024
Latest member
ARDU_PROgrammER

Latest Threads

Top