Small inconsistency between string.split and "".split

C

Carlos Ribeiro

Hi all,

While writing a small program to help other poster at c.l.py, I found
a small inconsistency between the handling of keyword parameters of
string.split() and the split() method of strings. I wonder if someone
else had ever stumbled on it before, and if it has a good reason to
work like it is.

Both implementations take two parameters: the separator character and
the max number of splits (maxsplit). However, string.split() accept
maxsplit as a keyword parameter, while mystring.split() doesn't. In my
case, it meant that I had to resort to string.split() in my example,
in order to avoid having to deal with the separator.

** BTW, I had to avoid dealing with the separator for another annoying
reason: I thought that I could do something like this:

mystring.split(string.whitespace, 2)

to preserve the default whitespace detecting behavior. But it won't
work this way with neither implementation of split().

----
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)
 
P

Peter Hansen

Carlos said:
While writing a small program to help other poster at c.l.py, I found
a small inconsistency between the handling of keyword parameters of
string.split() and the split() method of strings. I wonder if someone
else had ever stumbled on it before, and if it has a good reason to
work like it is.

Both implementations take two parameters: the separator character and
the max number of splits (maxsplit). However, string.split() accept
maxsplit as a keyword parameter, while mystring.split() doesn't. In my
case, it meant that I had to resort to string.split() in my example,
in order to avoid having to deal with the separator.

Works here:

c:\>python
Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s = 'this is my string'
>>> s.split() ['this', 'is', 'my', 'string']
>>> s.split('s') ['thi', ' i', ' my ', 'tring']
>>> s.split('s', 1) ['thi', ' is my string']
>>> s.split('s', 2)
['thi', ' i', ' my string']
** BTW, I had to avoid dealing with the separator for another annoying
reason: I thought that I could do something like this:

mystring.split(string.whitespace, 2)

to preserve the default whitespace detecting behavior. But it won't
work this way with neither implementation of split().

I think this works though:
>>> s.split(None, 2) ['this', 'is', 'my string']
>>> s.split(None, 1)
['this', 'is my string']

-Peter
 
C

Carlos Ribeiro

Works here:
s.split('s', 1) ['thi', ' is my string']
s.split('s', 2)

Unfortunately, this is *not* what I had meant to ask for. What I am
saying is that:

import strings
strings.split(maxsplit=1)

works, while

mystring.split(maxsplit=1)

doesn't. In short, the builtin string method doesn't accept keyword
parameters while the strings.split() function does. Alas, the "None"
trick is not documented -- and without knowing about it, I had no
other way around.


--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)
 
I

Inyeol Lee

.
... Alas, the "None"
trick is not documented -- and without knowing about it, I had no
other way around.

In 2.3.4 Python Library Reference section 2.3.6.1 String Methods,

"""
split([sep [,maxsplit]])

Return a list of the words in the string, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or None, any
whitespace string is a separator.
"""

I think "None" trick was documented here since string method was
introduced.

-Inyeol
 
C

Carlos Ribeiro

I think "None" trick was documented here since string method was
introduced.

I got it now. The problem is that I had just read the docstring --
yes, not the manual, and admit it, it was lazyness of my part ;-) But
anyway... the keyword parameter handling is inconsistent, *and* the
docstring could mention something about sep="None". Here it is:

split(s [,sep [,maxsplit]]) -> list of strings

Return a list of the words in the string s, using sep as the
delimiter string. If maxsplit is given, splits at no more than
maxsplit places (resulting in at most maxsplit+1 words). If sep
is not specified, any whitespace string is a separator.

(split and splitfields are synonymous)

It seems that sep=None can be safely understood as "sep is not
specified". The other way round is not so clear.

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)
 
?

=?ISO-8859-1?Q?Walter_D=F6rwald?=

Carlos said:
I got it now. The problem is that I had just read the docstring --
yes, not the manual, and admit it, it was lazyness of my part ;-) But
anyway... the keyword parameter handling is inconsistent, *and* the
docstring could mention something about sep="None".

I've fixed the docstring for both unicode.split() and
string.split() to give a hint about the None default. Note
that the docstring for str.split() already *did* mention
the None option.

Bye,
Walter Dörwald
 
C

Carlos Ribeiro

Walter,

Carlos Ribeiro wrote:
I've fixed the docstring for both unicode.split() and
string.split() to give a hint about the None default. Note
that the docstring for str.split() already *did* mention
the None option.

I don't know if you can do it, but isn't easy to modify the split
method to accept maxsplit as a keyword parameter? It would make it
consistent with string.split(), and as far as I'm aware, it should not
cause any sizeable performance penalty. But the most important reason
is that keyword parameters for often-unused options make code more
readable; for example,

mystring.split(maxsplit=2)

reads better than:

mystring.,split(None, 2)

That's my opinion, anyway...

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)
 
A

Alex Martelli

Carlos Ribeiro said:
Walter,



I don't know if you can do it, but isn't easy to modify the split
method to accept maxsplit as a keyword parameter? It would make it

Feasible, not hard, not trivial. The problem is different...:

kallisti:~/downloads/Python-2.4a3 alex$ find . -name '*.c' | xargs cat |
grep -c 'METH_KEYWORDS'
92
kallisti:~/downloads/Python-2.4a3 alex$ find . -name '*.c' | xargs cat |
grep -c 'METH_VARARGS'
1272
kallisti:~/downloads/Python-2.4a3 alex$ find . -name '*.c' | xargs cat |
grep -c 'METH_'
2429

In other words: throughout the current C sources for Python (across all
platforms etc) there are about 2429 specifications of how various
functions (methods, of course, include) take their parameters. Of
these, about half are METH_VARARGS (400 are METH_NOARGS, i.e.e functions
and methods accepting no explicit arguments, and 739 are METH_O,
accepting just one), and less than 4% accept keyword-style arguments.
Many of those are pretty recent additions, too, and some play special
roles which you just couldn't fulfil otherwise (e.g. consider the
optional key= vs cmp= arguments that 2.4 accepts for the list.sort
method -- they are mutually exclusive...).

Having ALL C-coded functions and methods that accept any argument accept
keyword-style arguments in particular would surely lead to a more
consistent language, once the impact of thousands of modifications to
the source stabilizes again -- a slightly bigger and slower interpreter,
no doubt, but probably only slightly. But these thousands of changes
will require very substantial and disruptive editing -- substantial
manpower to perform them all, AND ensure they're all well tested (I
suspect the set of unit tests would have to more than double to do a
halfway decent job). It would have to be among the major targets of a
given Python release, I suspect, and raising enthusiasm for such a job
might not be easy, even though Python would be a better language in
consequence. Maybe it will be feasible as part of the 3.0 release,
which is slated to be incompatible anyway... remove the METH_VARARGS
altogether, breaking compatibility with all existing extensions, so
EVERY C-coded function in the future, if it takes any argument at all,
will HAVE to take them in keyword form, too.

Until it's feasible to perform such a sweeping change, justifying
changes to ONE specific method of an object which has dozens is going to
be pretty hard. Perhaps, if someone volunteered a patch to make ALL
methods of string and unicode objects specifically accepts arguments in
keyword form as well as positionally, with all the needed tests & docs,
in time for Python 2.4's first beta in a couple of weeks, it might be
accepted (if separate but similar patches also existed for methods of
other built-in types, that would help all of their acceptance chances,
IMHO). But a patch to change ONE method out of dozens, I suspect, would
be shot down -- the slight, useful extra functionality might be judged
to not be worth the increase in inconsistency in this area (which IMHO
must, sadly, count as a wart in today's Python, sigh).


Alex
 
M

Michael Hudson

Having ALL C-coded functions and methods that accept any argument
accept keyword-style arguments in particular would surely lead to a
more consistent language,

[...]

This whole area isn't particularly pretty. In general it would be
better to expose more of an extension functions signature *outside*
the function, for efficiency, introspection and even things like
psyco. METH_O, METH_NOARGS are a step in this direction -- but you
can't pass a keyword argument to a METH_O function (not that one would
want to, very often, but it's still a potential inconsistency).

I wonder what Pyrex does...

My thoughts on this area, like many others, can probably be summarized
as "I hate C".

Cheers,
mwh
 
A

Alex Martelli

Michael Hudson said:
Having ALL C-coded functions and methods that accept any argument
accept keyword-style arguments in particular would surely lead to a
more consistent language,

[...]

This whole area isn't particularly pretty. In general it would be

Indeed, it isn't.
better to expose more of an extension functions signature *outside*
the function, for efficiency, introspection and even things like

....and consistency with the way Python-coded functions work.
psyco. METH_O, METH_NOARGS are a step in this direction -- but you
can't pass a keyword argument to a METH_O function (not that one would
want to, very often, but it's still a potential inconsistency).

Right; it could be remedied by letting a macro otherwise equivalent to
METH_O know about that one argument's name.

I wonder what Pyrex does...

for:
def example(aa, bb):
pass

it generates (name mangling apart, I'm demangling for legibility):

static PyObject* example(PyObject *self, PyObject *args, PyObject *kwds)
{
PyObject *aa = 0;
PyObject *bb = 0;
static char *argnames[] = {"aa", "bb", 0};

if(!PyArg_ParseTupleAndKeywords(args,kwds,"OO",argnames,&aa,&bb))
return 0;

etc, etc, and METH_VARARGS|METH_KEYWORDS in the PyMethodDef array. IOW,
nothing strange, and all correct, it seems to me.


Alex
 
M

Michael Hudson

Michael Hudson said:
Having ALL C-coded functions and methods that accept any argument
accept keyword-style arguments in particular would surely lead to a
more consistent language,

[...]

This whole area isn't particularly pretty. In general it would be

Indeed, it isn't.
better to expose more of an extension functions signature *outside*
the function, for efficiency, introspection and even things like

...and consistency with the way Python-coded functions work.

Heh, yes, that too :)
Right; it could be remedied by letting a macro otherwise equivalent to
METH_O know about that one argument's name.

But... how? I guess the PyMethodDef struct could grow an ml_signature
field... wouldn't it be nice if you could do:

static PyObject*
foo(PyObject* ob, int index)
{
...;
}

PyMethodDef methods[] = {
{"foo", foo, "O[ob]i[index]", "docstring"},
{NULL, NULL}
}

? Even nicer if you didn't have to write the signature by hand.

Unfortunately, I don't think you can do this in standard C.
I wonder what Pyrex does...

for:
def example(aa, bb):
pass

it generates (name mangling apart, I'm demangling for legibility):

static PyObject* example(PyObject *self, PyObject *args, PyObject *kwds)
{
PyObject *aa = 0;
PyObject *bb = 0;
static char *argnames[] = {"aa", "bb", 0};

if(!PyArg_ParseTupleAndKeywords(args,kwds,"OO",argnames,&aa,&bb))
return 0;

etc, etc, and METH_VARARGS|METH_KEYWORDS in the PyMethodDef array. IOW,
nothing strange, and all correct, it seems to me.

Cool. I should use pyrex more, I suspect.

Cheers,
mwh
 
A

Alex Martelli

Michael Hudson said:
But... how? I guess the PyMethodDef struct could grow an ml_signature
field... wouldn't it be nice if you could do:

Right, something like that. As long as we need backwards compatibility
(==all the way to 3.0) that needs to be handled with care, of course...
static PyObject*
foo(PyObject* ob, int index)
{
...;
}

PyMethodDef methods[] = {
{"foo", foo, "O[ob]i[index]", "docstring"},
{NULL, NULL}
}

? Even nicer if you didn't have to write the signature by hand.

Unfortunately, I don't think you can do this in standard C.

I don't think so, either -- unless you put macros in TWO places,
perhaps:

DEF_PYFUN(foo, (PyObject* ob, int index))
{
...
}

PyMethodDef methods[] = {
REF_PYFUN(foo, "docstring"),
{0}
};

This, I suspect, might be possible, with DEF_PYFUN stashing the sig
string someplace (e.g. in a __def_pyfun__foo global) and REF_PYFUN
pulling out a reference to it...
Cool. I should use pyrex more, I suspect.

Me too, I suspect -- it's really a cool way to write extensions for
Python.


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,734
Messages
2,569,441
Members
44,832
Latest member
GlennSmall

Latest Threads

Top