Why " ".some_string is often used ?

  • Thread starter Stéphane Ninin
  • Start date
S

Stéphane Ninin

Hi all,

This is not the first time I see this way of coding in Python and
I wonder why this is coded this way:

Howto on PyXML
(http://pyxml.sourceforge.net/topics/howto/node14.html)
shows it on this function, but I saw that in many other pieces of code:

def normalize_whitespace(text):
"Remove redundant whitespace from a string"
return ' '.join(text.split())

Is there a reason to do instead of just returning join(text.split()) ?
why concatenate " " to the string and not just returning the string instead ?

Thanks in advance for your explanations.

Regards,
 
E

Erik Max Francis

Stéphane Ninin said:
Is there a reason to do instead of just returning join(text.split()) ?
why concatenate " " to the string and not just returning the string
instead ?

Because they're not the same thing unless you've already done

from string import join

first. join is not a builtin function.
 
S

Stéphane Ninin

Also sprach Erik Max Francis :
Because they're not the same thing unless you've already done

from string import join

first. join is not a builtin function.

Ok. Thanks.
I just realized that "." had also nothing to do with concatenation here.
 
S

Steve Lamb

Because they're not the same thing unless you've already done

from string import join

first. join is not a builtin function.

You know, given the volumes of text Pythonistas write about Python not
falling to the Perlish trap of magic linenoise this certainly smacks of it,
don'tcha think? Wonder how this idiom slipped in. To think all this time I
have been doing:

import string
string.join()
 
J

John Roth

Stéphane Ninin said:
Hi all,

This is not the first time I see this way of coding in Python and
I wonder why this is coded this way:

Howto on PyXML
(http://pyxml.sourceforge.net/topics/howto/node14.html)
shows it on this function, but I saw that in many other pieces of code:

def normalize_whitespace(text):
"Remove redundant whitespace from a string"
return ' '.join(text.split())

Is there a reason to do instead of just returning join(text.split()) ?
why concatenate " " to the string and not just returning the string
instead ?

This particular idiom replaces sequences of multiple whitespace
charaters with a single blank.

And I agree, it's not entirely obvious why it's a string
method rather than a list method, since it operates on
a list, not on a string. The only explanation that makes
sense is that, as a list method, it would fail if the list
contained something other than a string. That's still
not very friendly, though.

John Roth
 
E

Erik Max Francis

John said:
And I agree, it's not entirely obvious why it's a string
method rather than a list method, since it operates on
a list, not on a string. The only explanation that makes
sense is that, as a list method, it would fail if the list
contained something other than a string. That's still
not very friendly, though.

On the contrary, I think that's the best reason. Lists have nothing to
do with strings, and so very string-specific methods (discounting
system-wide things such as str or repr) being included in lists is not
the right approach. Furthermore, the methods associated with a list
tend to become the "pattern" that sequence types must fulfill, and it
sets a terribly bad precedent to attach whatever domain-specific
application that's needed into a sequence type just because it's easiest
on the eyes at the moment.

The .join method is inherently string specific, and belongs on strings,
not lists. There's no doubting that seeing S.join(...) for the first
time is a bit of a surprise, but once you understand the reasoning
behind it, it makes perfect sense and makes it clear just how much it
deserves to stay that way.

And above all, of course, if you think it personally looks ugly, you can

from string import join

or write your own join function that operates over sequences and does
whatever else you might wish. That's what the flexibility is there for.
 
P

Peter Hansen

John said:
And I agree, it's not entirely obvious why it's a string
method rather than a list method, since it operates on
a list, not on a string. The only explanation that makes
sense is that, as a list method, it would fail if the list
contained something other than a string. That's still
not very friendly, though.

One could about as easily argue (and I believe several have done
this quite well in the past, better than I anyway) that you are
actually operating on the *string*, not the list. You are in
effect asking the string to act as a joiner for the elements in the
list, not asking the list to join itself using the specified
string.

At least, if you look at it that way, it might be easier to swallow.

-Peter
 
S

Syver Enstad

Peter Hansen said:
One could about as easily argue (and I believe several have done
this quite well in the past, better than I anyway) that you are
actually operating on the *string*, not the list. You are in
effect asking the string to act as a joiner for the elements in the
list, not asking the list to join itself using the specified
string.

At least, if you look at it that way, it might be easier to swallow.

Can't we have both. This is called a reversing method (Beck, Smalltalk
Best Practice Patterns) because it allows you to send several messages
to the same object instead of switching between different instances,
allowing the code to be more regular.

class MyList(list):
def join(self, aString):
return aString.join(self)


Like this:

lst = ['one', 'two', 'three']
print lst
print lst.join('\n')

I'd also like a reversing method for len

class MyList(list):
def len(self):
return len(self)

Often when I program against an instance I intuitively start each line
of code by writing the variable name and then a dot and then the
operation. The lack of a reversing method for len and join means that
my concentration is broken a tiny fraction of a second when I have to
remember to use another object or the global scope to find the
operation that I am after. Not a showstopper by any definition, but
irritating nonetheless.
 
C

Christos TZOTZIOY Georgiou

[' '.join discussion]
And above all, of course, if you think it personally looks ugly, you can

from string import join

or write your own join function that operates over sequences and does
whatever else you might wish. That's what the flexibility is there for.

I believe str.join(string, sequence) works best for the functional types
(no need to rely on the string module).
 
C

Christos TZOTZIOY Georgiou

I'd also like a reversing method for len

class MyList(list):
def len(self):
return len(self)

You can always use the __len__ attribute in this specific case.

And now for the hack value:

class MyList(list):
import new as _new, __builtin__
def __getattr__(self, attr):
try:
return self._new.instancemethod( \
getattr(self.__builtin__, attr), \
self, \
None)
except AttributeError:
raise AttributeError, \
"there is no '%s' builtin" % attr

allowing:
24

It works for all builtins that can take a list as a first argument.
Of course it should not be taken seriously :)
 
T

Terry Reedy

John Roth said:
And I agree, it's not entirely obvious why it's a string
method rather than a list method,

Because, as I and others have posted several times in previous threads, and
explicated with several examples, <str,unicode>.join is NOT, NOT, NOT a
list method, anymore than it is a tuple, dict, array, generator-iterator,
or any other iteratable method. Taking 'string' genericly (as either type
str or unicode), .join joins a sequence (iterable) of strings with a
string.
since it operates on a list, not on a string.

Huh? It operates on a sequence of strings. It has nothing to do with
lists in particular. The builtinness and mutability of lists is irrelevant
to this generic read-only operation.
join(...)
S.join(sequence) -> string
Return a string which is the concatenation of the strings in the
sequence. The separator between elements is S.

Notice the absence of 'list'. Please do not confuse newbies with
misinformation.
The only explanation that makes
sense is that, as a list method, it would fail if the list
contained something other than a string.

This is true for any iterable that contains or yields something other than
a string.
Again, this function/method has nothing in particular to do with lists
other than the fact that lists are one of several types of iterables. That
is why join cannot be a list method and certain not just a list method.

If 'iterable' were a concrete type/class/interface that all iterables had
to inherit from in order to be recognized as an iterable, rather that an
abstract protocol to be implemented, then one might suggest that join be an
iterable method. But the statement above, with 'list' replaced by
'iterable', would still be true. Given that only a small finite subset of
the unbounded set of iterable functions could be designated as basic by
being made a method, one could easily argue that such designation should be
restricted to functions potentially applicable to any iterable. Count,
filter, map, reduce, iterate (apply for-loop body), and others in itertools
would be such candidates.

If there were an iterable-of-basestrings object subbing the hypothetical
iterable object, then join might an appropriate method for that. But that
is not the Python universe we have. Not do I necessarily wish it. The
beauty of the abstract iterable/iterator interfaces, to me, is that they
are so simple, clean, and genericly useful, without having to privilege
anyone's idea of which sequence functions are 'basic'.

Terry J. Reedy
 
D

Dave Benjamin

Because, as I and others have posted several times in previous threads, and
explicated with several examples, <str,unicode>.join is NOT, NOT, NOT a
list method, anymore than it is a tuple, dict, array, generator-iterator,
or any other iteratable method. Taking 'string' genericly (as either type
str or unicode), .join joins a sequence (iterable) of strings with a
string.

It's not a list method because it's not a list method or any other kind of
iterable method? That seems like circular reasoning.

Consider the following two pieces of data:

1. 'the,quick,brown,fox'
2. ['the', 'quick', 'brown', 'fox']

They are both lists of words. Perhaps the first is not a Python-list of
words, but it's a list of words nonetheless. #1 can be converted into #2 by
calling ".split(',')" on it. Doesn't it seem natural that #2 be converted to
#1 by calling ".join(',')"? It works this way in JavaScript and Ruby, at
least.

The argument is more of a technical issue. There are only two kinds of
strings. There are many kinds of "iterables". So, it's easier to define
"join" on the string, and force implementers of custom string types to
implement "join" as well (since this is more rare) than to define "join" on
an iterable and force implementers of the many kinds of iterables to define
"join" as well. Conceptually, I'm not sure that the case is so strong that
"join" is a string method.

In reality, "join" isn't really a string method any more than it's an
iterable method. It's a string-iterable<string> method; it operates on the
relationship between a string and an iterable of strings. If we had a
class that could represent that relationship, "join" would be a method of
that class, ie.:

seq = ['the', 'quick', 'brown', 'fox']
sep = ','

ssi = StringStringIterable(sep, seq)
result = ssi.join()

But this would be somewhat pointless because:

1. This would be a pain to type.
2. The class probably wouldn't pull its weight.
3. The elegance Python has with text processing is lost.

Another solution might be to use a mixin class that provides StringIterable
methods, and have the built-in list include this mixin. Then, you could
always mix it into your own iterable classes if you wanted "join" to be
available. But then, you've still got issues trying to integrate it with
tuples and generators.

Sometimes, object-orientedness gets in the way, and I think this is one of
those cases. "str.join" is probably the winner here, but since it's really
just a string method being used "out of context", the delimeter is the first
argument, and this doesn't read well to me. I think that "string.join" makes
more sense; it says "join this sequence using this delimeter" instead of
str.join's "join using this delimeter this sequence".
Huh? It operates on a sequence of strings. It has nothing to do with
lists in particular. The builtinness and mutability of lists is irrelevant
to this generic read-only operation.

Only because it is defined as such. Ruby and JavaScript define the "join"
method on built-in arrays. Newcomers to Python who have programmed in those
languages will naturally associate "join" with lists, even though
technically, in the Python world, it's really something associated with
the relationship between a string and an iterable of strings. Which is an
awful lot of semantics to digest when you just want to stick some commas
between words in a list.
 
G

Gary D. Duzan

The argument is more of a technical issue. There are only two kinds of
strings. There are many kinds of "iterables". So, it's easier to define
"join" on the string, and force implementers of custom string types to
implement "join" as well (since this is more rare) than to define "join" on
an iterable and force implementers of the many kinds of iterables to define
"join" as well. Conceptually, I'm not sure that the case is so strong that
"join" is a string method.

[ ... ]

Sometimes, object-orientedness gets in the way, and I think this is one of
those cases. "str.join" is probably the winner here, but since it's really
just a string method being used "out of context", the delimeter is the first
argument, and this doesn't read well to me. I think that "string.join" makes
more sense; it says "join this sequence using this delimeter" instead of
str.join's "join using this delimeter this sequence".

Why not something really simple which does something like this?

def myjoin(seq,sep):
def _addsep(l, r, s=sep): return l+s+r
return reduce(_addsep, seq)
myjoin(['a','b','c'], ",") 'a,b,c'
myjoin(['a','b','c'], "") 'abc'
myjoin([1,2,3,4], 0) 10
myjoin("abcd", ',')
'a,b,c,d'

It might not be the fastest, but it is straightforward and generic,
and could be optimized in C, if desired.

Gary Duzan
BBN Technologies
A Verizon Company
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top