String concatenation vs. string formatting

A

Andrew Berg

Is it bad practice to use this
logger.error(self.preset_file + ' could not be stored - ' +
sys.exc_info()[1]) Instead of this?
logger.error('{file} could not be stored -
{error}'.format(file=self.preset_file, error=sys.exc_info()[1]))


Other than the case where a variable isn't a string (format() converts
variables to strings, automatically, right?) and when a variable is used
a bunch of times, concatenation is fine, but somehow, it seems wrong.
Sorry if this seems a bit silly, but I'm a novice when it comes to
design. Plus, there's not really supposed to be "more than one way to do
it" in Python.
 
J

John Gordon

In said:
Is it bad practice to use this
logger.error(self.preset_file + ' could not be stored - ' +
sys.exc_info()[1]) Instead of this?
logger.error('{file} could not be stored -
{error}'.format(file=self.preset_file, error=sys.exc_info()[1]))
Other than the case where a variable isn't a string (format() converts
variables to strings, automatically, right?) and when a variable is used
a bunch of times, concatenation is fine, but somehow, it seems wrong.
Sorry if this seems a bit silly, but I'm a novice when it comes to
design. Plus, there's not really supposed to be "more than one way to do
it" in Python.

Concatenation feels ugly/clunky to me.

I prefer this usage:

logger.error('%s could not be stored - %s' % \
(self.preset_file, sys.exc_info()[1]))
 
B

Billy Mays

Is it bad practice to use this
logger.error(self.preset_file + ' could not be stored - ' +
sys.exc_info()[1]) Instead of this?
logger.error('{file} could not be stored -
{error}'.format(file=self.preset_file, error=sys.exc_info()[1]))


Other than the case where a variable isn't a string (format() converts
variables to strings, automatically, right?) and when a variable is used
a bunch of times, concatenation is fine, but somehow, it seems wrong.
Sorry if this seems a bit silly, but I'm a novice when it comes to
design. Plus, there's not really supposed to be "more than one way to do
it" in Python.
If it means anything, I think concatenation is faster.

__TIMES__
a() - 0.09s
b() - 0.09s
c() - 54.80s
d() - 5.50s

Code is below:

def a(v):
out = ""
for i in xrange(1000000):
out += v
return len(out)

def b(v):
out = ""
for i in xrange(100000):
out += v+v+v+v+v+v+v+v+v+v
return len(out)

def c(v):
out = ""
for i in xrange(1000000):
out = "%s%s" % (out, v)
return len(out)

def d(v):
out = ""
for i in xrange(100000):
out = "%s%s%s%s%s%s%s%s%s%s%s" % (out,v,v,v,v,v,v,v,v,v,v)
return len(out)

print "a", a('xxxxxxxxxx')
print "b", b('xxxxxxxxxx')
print "c", c('xxxxxxxxxx')
print "d", d('xxxxxxxxxx')

import profile

profile.run("a('xxxxxxxxxx')")
profile.run("b('xxxxxxxxxx')")
profile.run("c('xxxxxxxxxx')")
profile.run("d('xxxxxxxxxx')")
 
A

Andrew Berg

With the caveat that the formatting of that line should be using PEP 8
indentation for clarity:
PEP 8 isn't bad, but I don't agree with everything in it. Certain lines
look good in chunks, some don't, at least to me. It's quite likely I'm
going to be writing 98%, if not more, of this project's code, so what
looks good to me matters more than a standard (as long as the code
works). Obviously, if I need to work in a team, then things change.I prefaced that sentence with "Other than the case", as in "except for
the following case(s)".
There is often more than one way to do it. The Zen of Python is explicit
that there should be one obvious way to do it (and preferably only one).
I meant in contrast to the idea of intentionally having multiple ways to
do something, all with roughly equal merit.



Also, string formatting (especially using the new syntax like you are)
is much clearer because there's less noise (the quotes all over the
place and the plusses)
I don't find it that much clearer unless there are a lot of chunks.
and it's better for dealing with internationalization if you need to
do that.
I hadn't thought of that. That's probably the best reason to use string
formatting.


Thanks, everyone.
 
T

Thorsten Kampe

* John Gordon (Fri, 8 Jul 2011 20:23:52 +0000 (UTC))
I prefer this usage:

logger.error('%s could not be stored - %s' % \
(self.preset_file, sys.exc_info()[1]))

The syntax for formatting logging messages according to the
documentation is:

Logger.error(msg, *args)

NOT

Logger.error(msg % (*args))

Thorsten
 
S

Steven D'Aprano

Andrew said:
Is it bad practice to use this
logger.error(self.preset_file + ' could not be stored - ' +
sys.exc_info()[1]) Instead of this?
logger.error('{file} could not be stored -
{error}'.format(file=self.preset_file, error=sys.exc_info()[1]))


Other than the case where a variable isn't a string (format() converts
variables to strings, automatically, right?)

Not exactly, but more or less. format() has type codes, just like % string
interpolation:

Traceback (most recent call last):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: %d format: a number is required, not NoneType

If you don't give a type code, format converts any object to string (if
possible).

and when a variable is used
a bunch of times, concatenation is fine, but somehow, it seems wrong.

I don't like long chains of string concatenation, but short chains seem okay
to me. One or two plus signs seems fine to my eyes, three at the most. Any
more than that, I'd look at replacing it with % interpolation, the
str.join() idiom, the string.Template class, or str.format.

That's five ways of building strings.

Of course, *repeated* string concatenation risks being slow -- not just a
little slow, but potentially MASSIVELY slow, hundreds or thousands of times
slower that alternatives. Fortunately recent versions of CPython tend to
avoid this (which makes it all the more mysterious when the slow-down does
strike), but other Pythons like Jython and IronPython may not. So it's best
to limit string concatenation to one or two strings.

And finally, if you're concatenating string literals, you can use implicit
concatenation (*six* ways):
.... "world"
.... "!")'hello world!'

Sorry if this seems a bit silly, but I'm a novice when it comes to
design. Plus, there's not really supposed to be "more than one way to do
it" in Python.

On the contrary -- there are many different examples of "more than one way
to do it". The claim that Python has "only one way" to do things comes from
the Perl community, and is wrong.

It is true that Python doesn't deliberately add multiple ways of doing
things just for the sake of being different, or because they're cool,
although of course that's a subjective judgement. (Some people think that
functional programming idioms such as map and filter fall into that
category, wrongly in my opinion.) In any case, it's clear that Python
supports many ways of doing "the same thing", not all of which are exactly
equivalent:

# e.g. copy a list
blist = list(alist)
blist = alist[:]
blist[:] = alist # assumes blist already exists
blist = copy.copy(alist)
blist = copy.deepcopy(alist)
blist = []; blist.extend(alist)
blist = [x for x in alist] # don't do this


Hardly "only one way" :)
 
S

Steven D'Aprano

Billy said:
If it means anything, I think concatenation is faster.

You are measuring the speed of an implementation-specific optimization.
You'll likely get *very* different results with Jython or IronPython, or
old versions of CPython, or even if you use instance attributes instead of
local variables.

It also doesn't generalise: only appends are optimized, not prepends.

Worse, the optimization can be defeated by accidents of your operating
system's memory management, so code that runs snappily and fast on one
machine will run SLLLOOOOOOWWWWWWWWWWWWWWWLY on another.

This is not a hypothetical risk. People have been burned by this in real
life:

http://www.gossamer-threads.com/lists/python/dev/771948

If you're interested in learning about the optimization:

http://utcc.utoronto.ca/~cks/space/blog/python/ExaminingStringConcatOpt
 
C

Chris Angelico

It also doesn't generalise: only appends are optimized, not prepends.

If you're interested in learning about the optimization:

http://utcc.utoronto.ca/~cks/space/blog/python/ExaminingStringConcatOpt
From that page:
"Also, this is only for plain (byte) strings, not for Unicode strings;
as of Python 2.4.2, Unicode string concatenation remains
un-optimized."

Has the same optimization been implemented for Unicode? The page
doesn't mention Python 3 at all, and I would guess that the realloc
optimization would work fine for both types of string.

ChrisA
 
I

Ian Kelly

Has the same optimization been implemented for Unicode? The page
doesn't mention Python 3 at all, and I would guess that the realloc
optimization would work fine for both types of string.

Seems to be implemented for strs in 3.2, but not unicode in 2.7.
 
A

Andrew Berg

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

How should I go about switching from concatenation to string formatting
for this?

avs.write(demux_filter + field_filter + fpsin_filter + i2pfilter +
dn_filter + fpsout_filter + trim_filter + info_filter)

I can think of a few ways, but none of them are pretty.

- --
CPython 3.2 | Windows NT 6.1.7601.17592 | Thunderbird 5.0
PGP/GPG Public Key ID: 0xF88E034060A78FCB
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAwAGBQJOGUp7AAoJEPiOA0Bgp4/L3koIAMntYStREGjww6yKGKE/xI0W
ecAg2BHdqBxTFsPT6NMrSRyrNbdfnWRQcRi/0Z+Hhbwqp4qsz5hDFgsoVPkT5gyj
6q0TeJqaSE+Uoj5g2BofqVlWydyQ7fW34KaANbj7V71/UqXXgb+fl8TYvVRJbg0A
KlfytOO0HBrDW8f6dzGZuxLxCb3EONt7buIUV3Pa7b9jQZNTTiOKktLtWAteMMiC
CHivQhqzB8/cNVddpyk5LaMEDzJ9yz8a83fjuK8F5E/wrYk22t6Fad6PKgDEivaj
hAiE5HMeUw+gQ7xFhJGkK31/KyHRqAaFR4mUh16u9GHMTaGPobk8NEj81LwCbvg=
=g3kL
-----END PGP SIGNATURE-----
 
S

Steven D'Aprano

Andrew said:
How should I go about switching from concatenation to string formatting
for this?

avs.write(demux_filter + field_filter + fpsin_filter + i2pfilter +
dn_filter + fpsout_filter + trim_filter + info_filter)

I can think of a few ways, but none of them are pretty.

fields = (demux_filter, field_filter, fpsin_filter, i2pfilter,
dn_filter, fpsout_filter, trim_filter, info_filter)
avs.write("%s"*len(fields) % fields)

works for me.
 
R

Roy Smith

Andrew Berg said:
How should I go about switching from concatenation to string formatting
for this?

avs.write(demux_filter + field_filter + fpsin_filter + i2pfilter +
dn_filter + fpsout_filter + trim_filter + info_filter)

I can think of a few ways, but none of them are pretty.

The canonical way to do that would be something like

fields = [demux_filter,
field_filter,
fpsin_filter,
i2pfilter,
dn_filter,
fpsout_filter,
trim_filter,
info_filter]
avs.write(''.join(fields))
 
S

Steven D'Aprano

Roy said:
The canonical way to do that would be something like

fields = [demux_filter,
field_filter,
fpsin_filter,
i2pfilter,
dn_filter,
fpsout_filter,
trim_filter,
info_filter]
avs.write(''.join(fields))

I can't believe I didn't think of that. I must be getting sick. (The sore
throat, stuffy nose and puffy eyes may also be a sign.)

Yes, ''.join() is far to be preferred over my solution using "%s".
 
A

Andrew Berg

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

The canonical way to do that would be something like

fields = [demux_filter, field_filter, fpsin_filter, i2pfilter,
dn_filter, fpsout_filter, trim_filter, info_filter]
avs.write(''.join(fields))
That would look really awful (IMO) if the strings weren't intended to be
on separate lines (I use embedded newlines instead of joining them with
newlines in order to prevent blank lines whenever a certain filter isn't
used). In this particular case, they are, so even though it uses a lot
of whitespace, it does match the layout of its output.

- --
CPython 3.2 | Windows NT 6.1.7601.17592 | Thunderbird 5.0
PGP/GPG Public Key ID: 0xF88E034060A78FCB
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAwAGBQJOGjSxAAoJEPiOA0Bgp4/LHzcH+gKeSCkbdEh8jg2UV0vICJdS
Fea95/vqCbZkjQxSuW8L73CpoACiv4XQ6hoxyIUq7maf+W89rGMVmLsPWYXtmif9
FV6WM3kSpg4hoC1cbqGW5g1bnpMnSPlznm74mKtdGhF+3zEtlm9+j8m53362YQHc
0Y9D+4KAeee5QUT/NII5QBRvSG2rAuv5+sayMNayix0pCJLEGrRLp/7LJOyhvJLN
eDdywE+svfcQAi4iGAylrmvDfgf6pBgysyY/pv2YD9IpdpYL5mkVqLi+ADZdZBOb
M4uxBReowgC/RaWxB+qEvfg5AxWmfg4uCtAl48Z/Jv/uYR9d9jeHAlbuV2xPfnk=
=wRB5
-----END PGP SIGNATURE-----
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,440
Members
44,832
Latest member
GlennSmall

Latest Threads

Top