str.split() with empty separator

Discussion in 'Python' started by Ulrich Eckhardt, Sep 15, 2009.

  1. Hi!

    "'abc'.split('')" gives me a "ValueError: empty separator".
    However, "''.join(['a', 'b', 'c'])" gives me "'abc'".

    Why this asymmetry? I was under the impression that the two would be
    complementary.

    Uli

    --
    Sator Laser GmbH
    Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
     
    Ulrich Eckhardt, Sep 15, 2009
    #1
    1. Advertising

  2. 2009/9/15 Ulrich Eckhardt <>:
    > Hi!
    >
    > "'abc'.split('')" gives me a "ValueError: empty separator".
    > However, "''.join(['a', 'b', 'c'])" gives me "'abc'".
    >
    > Why this asymmetry? I was under the impression that the two would be
    > complementary.
    >
    > Uli
    >


    maybe it isn't quite obvious, what the behaviour in this case should be;
    re.split also works with empty delimiter (and returns the original string)
    >>> re.split("", "abcde")

    ['abcde']

    If you need to split the string into the list of single characters
    like in your example, list() is the possible way:
    >>> list("abcde")

    ['a', 'b', 'c', 'd', 'e']
    >>>


    vbr
     
    Vlastimil Brom, Sep 15, 2009
    #2
    1. Advertising

  3. Ulrich Eckhardt

    Dave Angel Guest

    Ulrich Eckhardt wrote:
    > Hi!
    >
    > "'abc'.split('')" gives me a "ValueError: empty separator".
    > However, "''.join(['a', 'b', 'c'])" gives me "'abc'".
    >
    > Why this asymmetry? I was under the impression that the two would be
    > complementary.
    >
    > Uli
    >
    >

    I think the problem is that join() is lossy; if you try "".join(['a',
    'bcd', 'e']) then there's no way to reconstruct the original list with
    split(). Now that can be true even with actual separators, but perhaps
    this was the reasoning.

    Anyway, if you want to turn a string into a list of single-character
    strings, then use
    list("abcde")

    DaveA
     
    Dave Angel, Sep 15, 2009
    #3
  4. Ulrich Eckhardt

    jeffunit Guest

    python 3.1 unicode question

    I wrote a program that diffs files and prints out matching file names.
    I will be executing the output with sh, to delete select files.

    Most of the files names are plain ascii, but about 10% of them have unicode
    characters in them. When I try to print the string containing the name, I get
    an exception:

    'ascii' codec can't encode character '\udce9'
    in position 37: ordinal not in range(128)

    The string is:

    './Julio_Iglesias-Un_Hombre_Solo-05-Qu\udce9_no_se_rompa_la_noche.mp3'

    This is on a windows xp system, using python 3.1 which I compiled
    with the cygwin
    linux compatability layer tool.

    Can you tell me what encoding I need to print \udce9 and how to set python to
    that encoding mode?

    thanks,
    jeff
     
    jeffunit, Sep 15, 2009
    #4
  5. Ulrich Eckhardt

    MRAB Guest

    Vlastimil Brom wrote:
    > 2009/9/15 Ulrich Eckhardt <>:
    >> Hi!
    >>
    >> "'abc'.split('')" gives me a "ValueError: empty separator".
    >> However, "''.join(['a', 'b', 'c'])" gives me "'abc'".
    >>
    >> Why this asymmetry? I was under the impression that the two would be
    >> complementary.
    >>
    >> Uli
    >>

    >
    > maybe it isn't quite obvious, what the behaviour in this case should be;
    > re.split also works with empty delimiter (and returns the original string)
    >>>> re.split("", "abcde")

    > ['abcde']
    >
    > If you need to split the string into the list of single characters
    > like in your example, list() is the possible way:
    >>>> list("abcde")

    > ['a', 'b', 'c', 'd', 'e']
    >

    I'd prefer it to split into characters. As for re.split, there are times
    when it would be nice to be able to split on a zero-width match such as
    r"\b" (word boundary).
     
    MRAB, Sep 15, 2009
    #5
  6. On Tuesday 15 September 2009 14:50:11 Xavier Ho wrote:
    > On Tue, Sep 15, 2009 at 10:31 PM, Ulrich Eckhardt
    >
    > <>wrote:
    > > "'abc'.split('')" gives me a "ValueError: empty separator".
    > > However, "''.join(['a', 'b', 'c'])" gives me "'abc'".
    > >
    > > Why this asymmetry? I was under the impression that the two would be
    > > complementary.

    >
    > I'm not sure about asymmetry, but how would you implement a split method
    > with an empty delimiter to begin with? It doesn't make much sense anyway.


    I fell into this trap some time ago too.
    There is no such string method.

    The opposite of "".join(aListOfChars) is
    list(aString)

    - Hendrik
     
    Hendrik van Rooyen, Sep 16, 2009
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David
    Replies:
    2
    Views:
    503
    Thomas G. Marshall
    Aug 3, 2003
  2. Trevor

    sizeof(str) or sizeof(str) - 1 ?

    Trevor, Apr 3, 2004, in forum: C Programming
    Replies:
    9
    Views:
    660
    CBFalconer
    Apr 10, 2004
  3. Sullivan WxPyQtKinter

    It is fun.the result of str.lower(str())

    Sullivan WxPyQtKinter, Mar 7, 2006, in forum: Python
    Replies:
    5
    Views:
    354
    Tim Roberts
    Mar 9, 2006
  4. Stefan Ram

    str.equals(null) or str==null ?

    Stefan Ram, Jul 31, 2006, in forum: Java
    Replies:
    21
    Views:
    14,834
    Oliver Wong
    Aug 3, 2006
  5. maestro
    Replies:
    1
    Views:
    324
    Chris
    Aug 11, 2008
Loading...

Share This Page