str.split() with empty separator

U

Ulrich Eckhardt

Hi!

"'abc'.split('')" gives me a "ValueError: empty separator".
However, "''.join(['a', 'b', 'c'])" gives me "'abc'".

Why this asymmetry? I was under the impression that the two would be
complementary.

Uli
 
V

Vlastimil Brom

2009/9/15 Ulrich Eckhardt said:
Hi!

"'abc'.split('')" gives me a "ValueError: empty separator".
However, "''.join(['a', 'b', 'c'])" gives me "'abc'".

Why this asymmetry? I was under the impression that the two would be
complementary.

Uli

maybe it isn't quite obvious, what the behaviour in this case should be;
re.split also works with empty delimiter (and returns the original string)['abcde']

If you need to split the string into the list of single characters
like in your example, list() is the possible way:
list("abcde") ['a', 'b', 'c', 'd', 'e']

vbr
 
D

Dave Angel

Ulrich said:
Hi!

"'abc'.split('')" gives me a "ValueError: empty separator".
However, "''.join(['a', 'b', 'c'])" gives me "'abc'".

Why this asymmetry? I was under the impression that the two would be
complementary.

Uli
I think the problem is that join() is lossy; if you try "".join(['a',
'bcd', 'e']) then there's no way to reconstruct the original list with
split(). Now that can be true even with actual separators, but perhaps
this was the reasoning.

Anyway, if you want to turn a string into a list of single-character
strings, then use
list("abcde")

DaveA
 
J

jeffunit

I wrote a program that diffs files and prints out matching file names.
I will be executing the output with sh, to delete select files.

Most of the files names are plain ascii, but about 10% of them have unicode
characters in them. When I try to print the string containing the name, I get
an exception:

'ascii' codec can't encode character '\udce9'
in position 37: ordinal not in range(128)

The string is:

'./Julio_Iglesias-Un_Hombre_Solo-05-Qu\udce9_no_se_rompa_la_noche.mp3'

This is on a windows xp system, using python 3.1 which I compiled
with the cygwin
linux compatability layer tool.

Can you tell me what encoding I need to print \udce9 and how to set python to
that encoding mode?

thanks,
jeff
 
M

MRAB

Vlastimil said:
2009/9/15 Ulrich Eckhardt said:
Hi!

"'abc'.split('')" gives me a "ValueError: empty separator".
However, "''.join(['a', 'b', 'c'])" gives me "'abc'".

Why this asymmetry? I was under the impression that the two would be
complementary.

Uli

maybe it isn't quite obvious, what the behaviour in this case should be;
re.split also works with empty delimiter (and returns the original string)['abcde']

If you need to split the string into the list of single characters
like in your example, list() is the possible way:['a', 'b', 'c', 'd', 'e']
I'd prefer it to split into characters. As for re.split, there are times
when it would be nice to be able to split on a zero-width match such as
r"\b" (word boundary).
 
H

Hendrik van Rooyen

"'abc'.split('')" gives me a "ValueError: empty separator".
However, "''.join(['a', 'b', 'c'])" gives me "'abc'".

Why this asymmetry? I was under the impression that the two would be
complementary.

I'm not sure about asymmetry, but how would you implement a split method
with an empty delimiter to begin with? It doesn't make much sense anyway.

I fell into this trap some time ago too.
There is no such string method.

The opposite of "".join(aListOfChars) is
list(aString)

- Hendrik
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,021
Latest member
AkilahJaim

Latest Threads

Top