Why doesn't join() call str() on its arguments?

L

Leo Breebaart

I've tried Googling for this, but practically all discussions on
str.join() focus on the yuck-ugly-shouldn't-it-be-a-list-method?
issue, which is not my problem/question at all.

What I can't find an explanation for is why str.join() doesn't
automatically call str() on its arguments, so that e.g.
str.join([1,2,4,5]) would yield "1245", and ditto for e.g.
user-defined classes that have a __str__() defined.

All I've been able to find is a 1999 python-dev post by Tim
Peters which would seem to indicate he doesn't understand it
either:

"string.join(seq) doesn't currently convert seq elements to
string type, and in my vision it would. At least three of us
admit to mapping str across seq anyway before calling
string.join, and I think it would be a nice convenience
[...]"

But now it's 2005, and both string.join() and str.join() still
explicitly expect a sequence of strings rather than a sequence of
stringifiable objects.

I'm not complaining as such -- sep.join(str(i) for i in seq) is
not *that* ugly, but what annoys me is that I don't understand
*why* this was never changed. Presumably there is some
counter-argument involved, some reason why people preferred the
existing semantics after all. But for the life of me I can't
think what that counter-argument might be...
 
J

Jeff Shannon

Leo said:
What I can't find an explanation for is why str.join() doesn't
automatically call str() on its arguments [...]
[...] Presumably there is some
counter-argument involved, some reason why people preferred the
existing semantics after all. But for the life of me I can't
think what that counter-argument might be...

One possibility I can think of would be Unicode. I don't think that
implicitly calling str() on Unicode strings is desirable. (But then
again, I know embarrassingly little about unicode, so this may or may
not be a valid concern.)

Of course, one could ensure that unicode.join() used unicode() and
str.join() used str(), but I can conceive of the possibility of
wanting to use a plain-string separator to join a list that might
include unicode strings. Whether this is a realistic use-case is, of
course, a completely different question...

Jeff Shannon
Technician/Programmer
Credit International
 
F

Fredrik Lundh

Jeff said:
One possibility I can think of would be Unicode. I don't think that implicitly calling str() on
Unicode strings is desirable.

it's not. but you could make an exception for basestring types.
Of course, one could ensure that unicode.join() used unicode() and str.join() used str(), but I
can conceive of the possibility of wanting to use a plain-string separator to join a list that
might include unicode strings.
yes.

Whether this is a realistic use-case

it is. mixing 8-bit ascii strings with unicode works perfectly fine, and is a good
way to keep memory use down in programs that uses ascii in most cases (or
for most strings), but still needs to support non-ascii text.

I've proposed adding a "join" built-in that knows about the available string types,
and does the right thing for non-string objects. unfortunately, the current crop of
py-dev:ers don't seem to use strings much, so they prioritized really important stuff
like sum() and reversed() instead...

</F>
 
A

Aahz

I've proposed adding a "join" built-in that knows about the available
string types, and does the right thing for non-string objects.
unfortunately, the current crop of py-dev:ers don't seem to use strings
much, so they prioritized really important stuff like sum() and
reversed() instead...

You know where the patch tracker is.... ;-)
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
 
S

Steven Bethard

Leo said:
I'm not complaining as such -- sep.join(str(i) for i in seq) is
not *that* ugly, but what annoys me is that I don't understand
*why* this was never changed.

py> chars = [u'ä', u'å']
py> ', '.join(chars)
u'\xe4, \xe5'
py> ', '.join(str(c) for c in chars)
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "<interactive input>", line 1, in <generator expression>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in
position 0: ordinal not in range(128)
py> u', '.join(chars)
u'\xe4, \xe5'
py> u', '.join(unicode(c) for c in chars)
u'\xe4, \xe5'

Currently, str.join will return a unicode object if any of the items to
be joined are unicode. That means that str.join accepts unicode objects
as well as str objects. So you couldn't just call str on all the objects...

Maybe you could call str or unicode on each object as appropriate
though... If str.join already determines that it must return a unicode
object, it could call unicode on all the items instead of str... I
don't know the code well enough though to know if this is feasible...

STeVe
 
J

John Machin

What I can't find an explanation for is why str.join() doesn't
automatically call str() on its arguments, so that e.g.
str.join([1,2,4,5]) would yield "1245", and ditto for e.g.
user-defined classes that have a __str__() defined.

For a start, I think you meant ''.join([1,2,4,5]) to yield "1245".

Secondly, concatenating arbitrary types with a null separator doesn't
appear to be a good use case.
E.g.
''.join([str(x) for x in [1,2,1./3,4]])
'120.3333333333334'

Some possible explanations:

1. Explicit is better than implicit.
2. It would only be a good "trick" IMHO with a non-null separator and
types with a 'clean' str() result (unlike float) -- like int; I can't
think of more at the moment.
3. It would be one step on the slippery downwards path to perlishness.
4. For consistency, would you like "1" + 2 to produce "12"?
 
S

Skip Montanaro

John> 4. For consistency, would you like "1" + 2 to produce "12"?

No, the correct answer is obviously 3. ;-)

S
 
L

Leo Breebaart

What I can't find an explanation for is why str.join() doesn't
automatically call str() on its arguments, so that e.g.
str.join([1,2,4,5]) would yield "1245", and ditto for e.g.
user-defined classes that have a __str__() defined.

For a start, I think you meant ''.join([1,2,4,5]) to yield
"1245".

Yep. Sorry. Bad example.

Secondly, concatenating arbitrary types with a null separator doesn't
appear to be a good use case.
E.g.
''.join([str(x) for x in [1,2,1./3,4]])
'120.3333333333334'

Okay:
', '.join(str(x) for x in [1,2,1./3,4])
'1, 2, 0.333333333333, 4'

Isn't that better?

I am not claiming that *all* uses of calling join() on arbitrary
types are useful. But then neither are *all* uses of calling
join() on actual strings, or all uses of adding arbitrarily typed
elements to a list, or...

It's just that in my experience so far, whenever I have felt a
need for the 'join()' function, it has *always* been in a
situation where I also have to do the str(x) thing. That suggests
to me an "obvious default" of the kind that exists elsewhere in
Python as well.

It is entirely possible that my experience is not shared (or not
to the same extent) by others, but part of my reason for asking
this question here is investigating precisely that.

Some possible explanations:

1. Explicit is better than implicit.

Sure. But it's always a tradeoff -- list comprehensions are more
implicit than for loops...

2. It would only be a good "trick" IMHO with a non-null
separator and types with a 'clean' str() result (unlike float)
-- like int; I can't think of more at the moment.

I don't agree that the str() result for a float isn't clean.
Sometimes it can be exactly what you need, or even just
sufficient. If you want more control, then pure join() isn't what
you need, anyway. Right?

3. It would be one step on the slippery downwards path to
perlishness.

I think you're exaggerating, and I really would prefer to have
this discussion without gratuitous swipes against other
languages, please?

4. For consistency, would you like "1" + 2 to produce "12"?

No. A foolish consistency etc. etc. I sincerely do like the fact
that Python does not try to second-guess the programmer, I do
value explicit over implicit, and I have no desire to open the
can of worms that would be the changing the semantics of +.


I think my main dissatisfaction with your four possible
explanations stems from the fact that a join() function that
would be a bit more type-agnostic strikes me as *more* Pythonic,
not *less*. Isn't that what duck typing is about? Why does join()
care that its arguments should be actual strings only? If the
function of join() is to produce a string, why isn't it
sufficient for its arguments to have a string representation --
why do they have to *be* strings? Isn't that sort of thing
exactly how we are taught *not* to write our own argument
handling when we learn Python?
 
T

Thomas Heller

Skip Montanaro said:
John> 4. For consistency, would you like "1" + 2 to produce "12"?

No, the correct answer is obviously 3. ;-)

S

No, '"1"2' is correct. Or '"1"+2'.
 
J

John Roth

Leo Breebaart said:
I've tried Googling for this, but practically all discussions on
str.join() focus on the yuck-ugly-shouldn't-it-be-a-list-method?
issue, which is not my problem/question at all.

What I can't find an explanation for is why str.join() doesn't
automatically call str() on its arguments, so that e.g.
str.join([1,2,4,5]) would yield "1245", and ditto for e.g.
user-defined classes that have a __str__() defined.

All I've been able to find is a 1999 python-dev post by Tim
Peters which would seem to indicate he doesn't understand it
either:

"string.join(seq) doesn't currently convert seq elements to
string type, and in my vision it would. At least three of us
admit to mapping str across seq anyway before calling
string.join, and I think it would be a nice convenience
[...]"

But now it's 2005, and both string.join() and str.join() still
explicitly expect a sequence of strings rather than a sequence of
stringifiable objects.

I'm not complaining as such -- sep.join(str(i) for i in seq) is
not *that* ugly, but what annoys me is that I don't understand
*why* this was never changed. Presumably there is some
counter-argument involved, some reason why people preferred the
existing semantics after all. But for the life of me I can't
think what that counter-argument might be...
 
J

John Roth

Leo Breebaart said:
I've tried Googling for this, but practically all discussions on
str.join() focus on the yuck-ugly-shouldn't-it-be-a-list-method?
issue, which is not my problem/question at all.

What I can't find an explanation for is why str.join() doesn't
automatically call str() on its arguments, so that e.g.
str.join([1,2,4,5]) would yield "1245", and ditto for e.g.
user-defined classes that have a __str__() defined.

All I've been able to find is a 1999 python-dev post by Tim
Peters which would seem to indicate he doesn't understand it
either:

"string.join(seq) doesn't currently convert seq elements to
string type, and in my vision it would. At least three of us
admit to mapping str across seq anyway before calling
string.join, and I think it would be a nice convenience
[...]"

But now it's 2005, and both string.join() and str.join() still
explicitly expect a sequence of strings rather than a sequence of
stringifiable objects.

I'm not complaining as such -- sep.join(str(i) for i in seq) is
not *that* ugly, but what annoys me is that I don't understand
*why* this was never changed. Presumably there is some
counter-argument involved, some reason why people preferred the
existing semantics after all. But for the life of me I can't
think what that counter-argument might be...

I was originally going to say performance, but I don't think
that's all that much of an issue.

For me, at least, I've already got a way of taking just about
anything and turning it into a string: the % operator. It's
powerful enough that there have been attempts to make
a less powerful and simpler to understand version.

The limitation here is that you have to know how many
elements you want to join, although even that isn't the
world's hardest issue. Consider:

(untested)
result = ("%s" * len(list)) % list

Not the most obvious code in the world, but it might
work.

And as someone else (you?) pointed out, this will
also work:

(untested)
result = "".join([str(x) for x in list])

and it's got the advantage that it will handle separators
properly.

John Roth
 
N

Nick Vargish

Leo Breebaart said:
That suggests
to me an "obvious default" of the kind that exists elsewhere in
Python as well.

I feel pretty much the opposite... If a non-string-type has managed to
get into my list-of-strings, then something has gone wrong and I would
like to know about this potential problem.

If you want to do force a conversion before the join, you can use a
list comp:

', '.join([str(x) for x in l])


Nick "Explicit is better than Implicit"
 
D

David Eppstein

Leo Breebaart said:
What I can't find an explanation for is why str.join() doesn't
automatically call str() on its arguments, so that e.g.
str.join([1,2,4,5]) would yield "1245", and ditto for e.g.
user-defined classes that have a __str__() defined.

That would be the wrong thing to do when the arguments are unicodes.
 
J

John Machin

John> 4. For consistency, would you like "1" + 2 to produce "12"?

No, the correct answer is obviously 3. ;-)

Obviously, in awk. Bletch! I once had to help out some users of a
system where software development had been outsourced and upstuffed
and they needed some data file fixups done but their system was so
locked down even the manufacturer-supplied free pre-ANSI C compiler
had been deleted and one couldn't download stuff off the net but the
thought police had overlooked awk ... I even had to implement proper
CSV-reading routines in awk. No thanks for reminding me :-(
 
M

Michael Hoffman

Fredrik said:
I've proposed adding a "join" built-in that knows about the available string types,
and does the right thing for non-string objects.

That would be *so* useful. I frequently have to use the

"".join(map(str, mylist))

idiom, which is a wart.
 
R

Roy Smith

David Eppstein said:
Leo Breebaart said:
What I can't find an explanation for is why str.join() doesn't
automatically call str() on its arguments, so that e.g.
str.join([1,2,4,5]) would yield "1245", and ditto for e.g.
user-defined classes that have a __str__() defined.

That would be the wrong thing to do when the arguments are unicodes.

Why would it be wrong? I ask this with honest naivete, being quite
ignorant of unicode issues.
 
J

Jeff Shannon

Roy said:
What I can't find an explanation for is why str.join() doesn't
automatically call str() on its arguments, so that e.g.
str.join([1,2,4,5]) would yield "1245", and ditto for e.g.
user-defined classes that have a __str__() defined.

That would be the wrong thing to do when the arguments are unicodes.

Why would it be wrong? I ask this with honest naivete, being quite
ignorant of unicode issues.

As someone else demonstrated earlier...
Traceback (most recent call last):

Using str() on a unicode object works... IF all of the unicode
characters are also in the ASCII charset. But if you're using
non-ASCII unicode characters (and there's no point to using Unicode
unless you are, or might be), then str() will throw an exception.

The Effbot mentioned a join() implementation that would be smart
enough to do the right thing in this case, but it's not as simple as
just implicitly calling str().

Jeff Shannon
Technician/Programmer
Credit International
 
N

news.sydney.pipenetworks.com

Nick said:
That suggests
to me an "obvious default" of the kind that exists elsewhere in
Python as well.


I feel pretty much the opposite... If a non-string-type has managed to
get into my list-of-strings, then something has gone wrong and I would
like to know about this potential problem.

If you want to do force a conversion before the join, you can use a
list comp:

', '.join([str(x) for x in l])


Nick "Explicit is better than Implicit"

Really ? Then why are you using python. Python or most dynamic languages
are are so great because of their common sense towards the "implicit".
You must have heard of "never say never" but "never say always" (as in
"always better") is more appropriate here. There are many cases of
python's implicitness.

What about

a = "string"
b = 2
c = "%s%s" % (a, b)

There is an implicit str(b) here.

''.join(["string", 2]) to me is no different then the example above.


Huy
 
F

Fredrik Lundh

news.sydney.pipenetworks.com said:
Really ? Then why are you using python. Python or most dynamic languages are are so great because
of their common sense towards the "implicit". You must have heard of "never say never" but "never
say always" (as in "always better") is more appropriate here. There are many cases of python's
implicitness.

a certain "princess bride" quote would fit here, I think.
What about

a = "string"
b = 2
c = "%s%s" % (a, b)

There is an implicit str(b) here.

nope. it's explicit: %s means "convert using str()".

from the documentation:

%s String (converts any python object using str()).

''.join(["string", 2]) to me is no different then the example above.

so where's the "%s" in your second example?

</F>
 
N

Nick Craig-Wood

Nick Vargish said:
I feel pretty much the opposite... If a non-string-type has managed to
get into my list-of-strings, then something has gone wrong and I would
like to know about this potential problem.

This is a good argument.

Why not have another method to do this? I propose joinany which will
join any type of object together, not just strings

That way it becomes less of a poke in the eye to backwards
compatibility too.
Nick "Explicit is better than Implicit"

Aye!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top