Small inconsistency between string.split and "".split

Discussion in 'Python' started by Carlos Ribeiro, Sep 13, 2004.

  1. Hi all,

    While writing a small program to help other poster at c.l.py, I found
    a small inconsistency between the handling of keyword parameters of
    string.split() and the split() method of strings. I wonder if someone
    else had ever stumbled on it before, and if it has a good reason to
    work like it is.

    Both implementations take two parameters: the separator character and
    the max number of splits (maxsplit). However, string.split() accept
    maxsplit as a keyword parameter, while mystring.split() doesn't. In my
    case, it meant that I had to resort to string.split() in my example,
    in order to avoid having to deal with the separator.

    ** BTW, I had to avoid dealing with the separator for another annoying
    reason: I thought that I could do something like this:

    mystring.split(string.whitespace, 2)

    to preserve the default whitespace detecting behavior. But it won't
    work this way with neither implementation of split().

    ----
    Carlos Ribeiro
    Consultoria em Projetos
    blog: http://rascunhosrotos.blogspot.com
    blog: http://pythonnotes.blogspot.com
    mail:
    mail:
    Carlos Ribeiro, Sep 13, 2004
    #1
    1. Advertising

  2. Carlos Ribeiro

    Peter Hansen Guest

    Carlos Ribeiro wrote:

    > While writing a small program to help other poster at c.l.py, I found
    > a small inconsistency between the handling of keyword parameters of
    > string.split() and the split() method of strings. I wonder if someone
    > else had ever stumbled on it before, and if it has a good reason to
    > work like it is.
    >
    > Both implementations take two parameters: the separator character and
    > the max number of splits (maxsplit). However, string.split() accept
    > maxsplit as a keyword parameter, while mystring.split() doesn't. In my
    > case, it meant that I had to resort to string.split() in my example,
    > in order to avoid having to deal with the separator.


    Works here:

    c:\>python
    Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] on
    win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> s = 'this is my string'
    >>> s.split()

    ['this', 'is', 'my', 'string']
    >>> s.split('s')

    ['thi', ' i', ' my ', 'tring']
    >>> s.split('s', 1)

    ['thi', ' is my string']
    >>> s.split('s', 2)

    ['thi', ' i', ' my string']

    > ** BTW, I had to avoid dealing with the separator for another annoying
    > reason: I thought that I could do something like this:
    >
    > mystring.split(string.whitespace, 2)
    >
    > to preserve the default whitespace detecting behavior. But it won't
    > work this way with neither implementation of split().


    I think this works though:

    >>> s.split(None, 2)

    ['this', 'is', 'my string']
    >>> s.split(None, 1)

    ['this', 'is my string']

    -Peter
    Peter Hansen, Sep 13, 2004
    #2
    1. Advertising

  3. On Mon, 13 Sep 2004 13:09:26 -0400, Peter Hansen <> wrote:
    > Works here:
    > <snip>
    > >>> s.split('s', 1)

    > ['thi', ' is my string']
    > >>> s.split('s', 2)


    Unfortunately, this is *not* what I had meant to ask for. What I am
    saying is that:

    import strings
    strings.split(maxsplit=1)

    works, while

    mystring.split(maxsplit=1)

    doesn't. In short, the builtin string method doesn't accept keyword
    parameters while the strings.split() function does. Alas, the "None"
    trick is not documented -- and without knowing about it, I had no
    other way around.


    --
    Carlos Ribeiro
    Consultoria em Projetos
    blog: http://rascunhosrotos.blogspot.com
    blog: http://pythonnotes.blogspot.com
    mail:
    mail:
    Carlos Ribeiro, Sep 13, 2004
    #3
  4. Carlos Ribeiro

    Inyeol Lee Guest

    On Mon, Sep 13, 2004 at 02:41:33PM -0300, Carlos Ribeiro wrote:
    ....
    > ... Alas, the "None"
    > trick is not documented -- and without knowing about it, I had no
    > other way around.


    In 2.3.4 Python Library Reference section 2.3.6.1 String Methods,

    """
    split([sep [,maxsplit]])

    Return a list of the words in the string, using sep as the
    delimiter string. If maxsplit is given, at most maxsplit
    splits are done. If sep is not specified or None, any
    whitespace string is a separator.
    """

    I think "None" trick was documented here since string method was
    introduced.

    -Inyeol
    Inyeol Lee, Sep 13, 2004
    #4
  5. On Mon, 13 Sep 2004 10:59:27 -0700, Inyeol Lee <> wrote:
    > I think "None" trick was documented here since string method was
    > introduced.


    I got it now. The problem is that I had just read the docstring --
    yes, not the manual, and admit it, it was lazyness of my part ;-) But
    anyway... the keyword parameter handling is inconsistent, *and* the
    docstring could mention something about sep="None". Here it is:

    split(s [,sep [,maxsplit]]) -> list of strings

    Return a list of the words in the string s, using sep as the
    delimiter string. If maxsplit is given, splits at no more than
    maxsplit places (resulting in at most maxsplit+1 words). If sep
    is not specified, any whitespace string is a separator.

    (split and splitfields are synonymous)

    It seems that sep=None can be safely understood as "sep is not
    specified". The other way round is not so clear.

    --
    Carlos Ribeiro
    Consultoria em Projetos
    blog: http://rascunhosrotos.blogspot.com
    blog: http://pythonnotes.blogspot.com
    mail:
    mail:
    Carlos Ribeiro, Sep 13, 2004
    #5
  6. Carlos Ribeiro wrote:
    > On Mon, 13 Sep 2004 10:59:27 -0700, Inyeol Lee <> wrote:
    >
    >>I think "None" trick was documented here since string method was
    >>introduced.

    >
    > I got it now. The problem is that I had just read the docstring --
    > yes, not the manual, and admit it, it was lazyness of my part ;-) But
    > anyway... the keyword parameter handling is inconsistent, *and* the
    > docstring could mention something about sep="None".


    I've fixed the docstring for both unicode.split() and
    string.split() to give a hint about the None default. Note
    that the docstring for str.split() already *did* mention
    the None option.

    Bye,
    Walter Dörwald
    =?ISO-8859-1?Q?Walter_D=F6rwald?=, Sep 14, 2004
    #6
  7. Walter,

    On Tue, 14 Sep 2004 12:01:29 +0200, Walter Dörwald
    <> wrote:
    > Carlos Ribeiro wrote:
    > I've fixed the docstring for both unicode.split() and
    > string.split() to give a hint about the None default. Note
    > that the docstring for str.split() already *did* mention
    > the None option.


    I don't know if you can do it, but isn't easy to modify the split
    method to accept maxsplit as a keyword parameter? It would make it
    consistent with string.split(), and as far as I'm aware, it should not
    cause any sizeable performance penalty. But the most important reason
    is that keyword parameters for often-unused options make code more
    readable; for example,

    mystring.split(maxsplit=2)

    reads better than:

    mystring.,split(None, 2)

    That's my opinion, anyway...

    --
    Carlos Ribeiro
    Consultoria em Projetos
    blog: http://rascunhosrotos.blogspot.com
    blog: http://pythonnotes.blogspot.com
    mail:
    mail:
    Carlos Ribeiro, Sep 14, 2004
    #7
  8. Carlos Ribeiro <> wrote:

    > Walter,
    >
    > On Tue, 14 Sep 2004 12:01:29 +0200, Walter Dörwald
    > <> wrote:
    > > Carlos Ribeiro wrote:
    > > I've fixed the docstring for both unicode.split() and
    > > string.split() to give a hint about the None default. Note
    > > that the docstring for str.split() already *did* mention
    > > the None option.

    >
    > I don't know if you can do it, but isn't easy to modify the split
    > method to accept maxsplit as a keyword parameter? It would make it


    Feasible, not hard, not trivial. The problem is different...:

    kallisti:~/downloads/Python-2.4a3 alex$ find . -name '*.c' | xargs cat |
    grep -c 'METH_KEYWORDS'
    92
    kallisti:~/downloads/Python-2.4a3 alex$ find . -name '*.c' | xargs cat |
    grep -c 'METH_VARARGS'
    1272
    kallisti:~/downloads/Python-2.4a3 alex$ find . -name '*.c' | xargs cat |
    grep -c 'METH_'
    2429

    In other words: throughout the current C sources for Python (across all
    platforms etc) there are about 2429 specifications of how various
    functions (methods, of course, include) take their parameters. Of
    these, about half are METH_VARARGS (400 are METH_NOARGS, i.e.e functions
    and methods accepting no explicit arguments, and 739 are METH_O,
    accepting just one), and less than 4% accept keyword-style arguments.
    Many of those are pretty recent additions, too, and some play special
    roles which you just couldn't fulfil otherwise (e.g. consider the
    optional key= vs cmp= arguments that 2.4 accepts for the list.sort
    method -- they are mutually exclusive...).

    Having ALL C-coded functions and methods that accept any argument accept
    keyword-style arguments in particular would surely lead to a more
    consistent language, once the impact of thousands of modifications to
    the source stabilizes again -- a slightly bigger and slower interpreter,
    no doubt, but probably only slightly. But these thousands of changes
    will require very substantial and disruptive editing -- substantial
    manpower to perform them all, AND ensure they're all well tested (I
    suspect the set of unit tests would have to more than double to do a
    halfway decent job). It would have to be among the major targets of a
    given Python release, I suspect, and raising enthusiasm for such a job
    might not be easy, even though Python would be a better language in
    consequence. Maybe it will be feasible as part of the 3.0 release,
    which is slated to be incompatible anyway... remove the METH_VARARGS
    altogether, breaking compatibility with all existing extensions, so
    EVERY C-coded function in the future, if it takes any argument at all,
    will HAVE to take them in keyword form, too.

    Until it's feasible to perform such a sweeping change, justifying
    changes to ONE specific method of an object which has dozens is going to
    be pretty hard. Perhaps, if someone volunteered a patch to make ALL
    methods of string and unicode objects specifically accepts arguments in
    keyword form as well as positionally, with all the needed tests & docs,
    in time for Python 2.4's first beta in a couple of weeks, it might be
    accepted (if separate but similar patches also existed for methods of
    other built-in types, that would help all of their acceptance chances,
    IMHO). But a patch to change ONE method out of dozens, I suspect, would
    be shot down -- the slight, useful extra functionality might be judged
    to not be worth the increase in inconsistency in this area (which IMHO
    must, sadly, count as a wart in today's Python, sigh).


    Alex


    > consistent with string.split(), and as far as I'm aware, it should not
    > cause any sizeable performance penalty. But the most important reason
    > is that keyword parameters for often-unused options make code more
    > readable; for example,
    >
    > mystring.split(maxsplit=2)
    >
    > reads better than:
    >
    > mystring.,split(None, 2)
    >
    > That's my opinion, anyway...
    Alex Martelli, Sep 17, 2004
    #8
  9. (Alex Martelli) writes:

    > Having ALL C-coded functions and methods that accept any argument
    > accept keyword-style arguments in particular would surely lead to a
    > more consistent language,


    [...]

    This whole area isn't particularly pretty. In general it would be
    better to expose more of an extension functions signature *outside*
    the function, for efficiency, introspection and even things like
    psyco. METH_O, METH_NOARGS are a step in this direction -- but you
    can't pass a keyword argument to a METH_O function (not that one would
    want to, very often, but it's still a potential inconsistency).

    I wonder what Pyrex does...

    My thoughts on this area, like many others, can probably be summarized
    as "I hate C".

    Cheers,
    mwh

    --
    Enlightenment is probably antithetical to impatience.
    -- Erik Naggum, comp.lang.lisp
    Michael Hudson, Sep 17, 2004
    #9
  10. Michael Hudson <> wrote:

    > (Alex Martelli) writes:
    >
    > > Having ALL C-coded functions and methods that accept any argument
    > > accept keyword-style arguments in particular would surely lead to a
    > > more consistent language,

    >
    > [...]
    >
    > This whole area isn't particularly pretty. In general it would be


    Indeed, it isn't.

    > better to expose more of an extension functions signature *outside*
    > the function, for efficiency, introspection and even things like


    ....and consistency with the way Python-coded functions work.

    > psyco. METH_O, METH_NOARGS are a step in this direction -- but you
    > can't pass a keyword argument to a METH_O function (not that one would
    > want to, very often, but it's still a potential inconsistency).


    Right; it could be remedied by letting a macro otherwise equivalent to
    METH_O know about that one argument's name.


    > I wonder what Pyrex does...


    for:
    def example(aa, bb):
    pass

    it generates (name mangling apart, I'm demangling for legibility):

    static PyObject* example(PyObject *self, PyObject *args, PyObject *kwds)
    {
    PyObject *aa = 0;
    PyObject *bb = 0;
    static char *argnames[] = {"aa", "bb", 0};

    if(!PyArg_ParseTupleAndKeywords(args,kwds,"OO",argnames,&aa,&bb))
    return 0;

    etc, etc, and METH_VARARGS|METH_KEYWORDS in the PyMethodDef array. IOW,
    nothing strange, and all correct, it seems to me.


    Alex



    >
    > My thoughts on this area, like many others, can probably be summarized
    > as "I hate C".
    >
    > Cheers,
    > mwh
    Alex Martelli, Sep 17, 2004
    #10
  11. (Alex Martelli) writes:

    > Michael Hudson <> wrote:
    >
    > > (Alex Martelli) writes:
    > >
    > > > Having ALL C-coded functions and methods that accept any argument
    > > > accept keyword-style arguments in particular would surely lead to a
    > > > more consistent language,

    > >
    > > [...]
    > >
    > > This whole area isn't particularly pretty. In general it would be

    >
    > Indeed, it isn't.
    >
    > > better to expose more of an extension functions signature *outside*
    > > the function, for efficiency, introspection and even things like

    >
    > ...and consistency with the way Python-coded functions work.


    Heh, yes, that too :)

    > > psyco. METH_O, METH_NOARGS are a step in this direction -- but you
    > > can't pass a keyword argument to a METH_O function (not that one would
    > > want to, very often, but it's still a potential inconsistency).

    >
    > Right; it could be remedied by letting a macro otherwise equivalent to
    > METH_O know about that one argument's name.


    But... how? I guess the PyMethodDef struct could grow an ml_signature
    field... wouldn't it be nice if you could do:

    static PyObject*
    foo(PyObject* ob, int index)
    {
    ...;
    }

    PyMethodDef methods[] = {
    {"foo", foo, "O[ob]i[index]", "docstring"},
    {NULL, NULL}
    }

    ? Even nicer if you didn't have to write the signature by hand.

    Unfortunately, I don't think you can do this in standard C.
    > > I wonder what Pyrex does...

    >
    > for:
    > def example(aa, bb):
    > pass
    >
    > it generates (name mangling apart, I'm demangling for legibility):
    >
    > static PyObject* example(PyObject *self, PyObject *args, PyObject *kwds)
    > {
    > PyObject *aa = 0;
    > PyObject *bb = 0;
    > static char *argnames[] = {"aa", "bb", 0};
    >
    > if(!PyArg_ParseTupleAndKeywords(args,kwds,"OO",argnames,&aa,&bb))
    > return 0;
    >
    > etc, etc, and METH_VARARGS|METH_KEYWORDS in the PyMethodDef array. IOW,
    > nothing strange, and all correct, it seems to me.


    Cool. I should use pyrex more, I suspect.

    Cheers,
    mwh

    --
    As it seems to me, in Perl you have to be an expert to correctly make
    a nested data structure like, say, a list of hashes of instances. In
    Python, you have to be an idiot not to be able to do it, because you
    just write it down. -- Peter Norvig, comp.lang.functional
    Michael Hudson, Sep 17, 2004
    #11
  12. Michael Hudson <> wrote:
    ...
    > > Right; it could be remedied by letting a macro otherwise equivalent to
    > > METH_O know about that one argument's name.

    >
    > But... how? I guess the PyMethodDef struct could grow an ml_signature
    > field... wouldn't it be nice if you could do:


    Right, something like that. As long as we need backwards compatibility
    (==all the way to 3.0) that needs to be handled with care, of course...

    >
    > static PyObject*
    > foo(PyObject* ob, int index)
    > {
    > ...;
    > }
    >
    > PyMethodDef methods[] = {
    > {"foo", foo, "O[ob]i[index]", "docstring"},
    > {NULL, NULL}
    > }
    >
    > ? Even nicer if you didn't have to write the signature by hand.
    >
    > Unfortunately, I don't think you can do this in standard C.


    I don't think so, either -- unless you put macros in TWO places,
    perhaps:

    DEF_PYFUN(foo, (PyObject* ob, int index))
    {
    ...
    }

    PyMethodDef methods[] = {
    REF_PYFUN(foo, "docstring"),
    {0}
    };

    This, I suspect, might be possible, with DEF_PYFUN stashing the sig
    string someplace (e.g. in a __def_pyfun__foo global) and REF_PYFUN
    pulling out a reference to it...

    > > nothing strange, and all correct, it seems to me.

    >
    > Cool. I should use pyrex more, I suspect.


    Me too, I suspect -- it's really a cool way to write extensions for
    Python.


    Alex
    Alex Martelli, Sep 17, 2004
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Damjan
    Replies:
    4
    Views:
    296
    Damjan
    Dec 10, 2005
  2. James Wright
    Replies:
    3
    Views:
    257
    Steven D'Aprano
    May 10, 2011
  3. William Knapp

    Inconsistency between irb and Win7 cmd

    William Knapp, May 23, 2011, in forum: Ruby
    Replies:
    1
    Views:
    163
    Ryan Davis
    May 23, 2011
  4. Sara

    split inconsistency- why?

    Sara, Aug 9, 2004, in forum: Perl Misc
    Replies:
    24
    Views:
    204
    Ilya Zakharevich
    Aug 21, 2004
  5. jeff
    Replies:
    4
    Views:
    285
Loading...

Share This Page