newby question: Splitting a string - separator

Thomas Liesner · Dec 8, 2005

Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:

#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()

The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom

Michael Spencer · Dec 8, 2005

Thomas said:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:

The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom

\s+ gives one or more, you need \s{2,} for two or more:

>>> import re
>>> re.split("\s{2,}","Guido van Rossum Tim Peters Thomas Liesner") ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']
>>>

Click to expand...

Click to expand...

Michael

Noah · Dec 8, 2005

Thomas said:
...
The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

For your split regex you could say
"\s\s+"
or
"\s{2,}"

This should work for you:
YOUR_SPLIT_LIST = re.split("\s{2,}", YOUR_STRING)

Yours,
Noah

Jim · Dec 8, 2005

Hi Tom,

a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

For more than one, I'd use

\s\s+

-Jim

James Stroud · Dec 10, 2005

Thomas said:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:

The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom

The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.

James

Kent Johnson · Dec 10, 2005

James said:
The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.

Unfortunately it gives the wrong result.

Kent

Tim Peters · Dec 10, 2005

[James Stroud]

The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.

Click to expand...

[Kent Johnson]

Unfortunately it gives the wrong result.

Still, it gets extra points for being such a pleasing example ;-)

bonono · Dec 10, 2005

Thomas said:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:

The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

Can I just use "two space" as the seperator ?

[ x.strip() for x in data.split(" ") ]

James Stroud · Dec 10, 2005

Kent said:
James said:

The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using
regexes.

Click to expand...

Unfortunately it gives the wrong result.

Kent

Just an example. Here is the "correct version":

names = [n for n in data.split(" ") if n]

James

Michael Spencer · Dec 10, 2005

Thomas said:
Thomas said:

Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:

The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

Click to expand...

Can I just use "two space" as the seperator ?

[ x.strip() for x in data.split(" ") ]

If you like, but it will create dummy entries if there are more than two spaces:

>>> data = "Guido van Rossum Tim Peters Thomas Liesner"
>>> [ x.strip() for x in data.split(" ") ]

Click to expand...

Click to expand...

['Guido van Rossum', 'Tim Peters', '', 'Thomas Liesner']

You could add a condition to the listcomp:

>>> [name.strip() for name in data.split(" ") if name]

Click to expand...

Click to expand...

['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

but what if there is some other whitespace character?

>>> data = "Guido van Rossum Tim Peters \t Thomas Liesner"
>>> [name.strip() for name in data.split(" ") if name] ['Guido van Rossum', 'Tim Peters', '', 'Thomas Liesner']
>>>

Click to expand...

Click to expand...

perhaps a smarter condition?

>>> [name.strip() for name in data.split(" ") if name.strip(" \t")]

Click to expand...

Click to expand...

['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

but this is beginning to feel like hard work.

I think this is a case where it's not worth the effort to try to avoid the regexp

>>> import re
>>> re.split("\s{2,}",data) ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']
>>>

Click to expand...

Click to expand...

Michael

Steven D'Aprano · Dec 10, 2005

Thomas said:
Thomas said:

Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:

The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom

Click to expand...

The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.

Yes, but the correct result would be:

['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

Your code is short, elegant but wrong.

It could also be shorter and more elegant:

# your version
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> [n for n in data.split() if n]
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

# my version
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> data.split()
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

The "if n" in the list comp is superfluous, and without that, the whole
list comp is unnecessary.

James Stroud · Dec 10, 2005

Steven said:
Thomas said:

Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:

#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()

The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom

Click to expand...

The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.

Click to expand...

Yes, but the correct result would be:

['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

Your code is short, elegant but wrong.

It could also be shorter and more elegant:

# your version
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> [n for n in data.split() if n]
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

# my version
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> data.split()
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

The "if n" in the list comp is superfluous, and without that, the whole
list comp is unnecessary.

see my post from 1 hr before this one.

Tim Roberts · Dec 10, 2005

James Stroud said:
The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.

But it is slower than this, which produces EXACTLY the same (incorrect)
result:

data = "Guido van Rossum Tim Peters Thomas Liesner"
names = data.split()

Fredrik Lundh · Dec 10, 2005

James said:
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using
regexes.

Click to expand...

Unfortunately it gives the wrong result.

Click to expand...

Just an example. Here is the "correct version":

names = [n for n in data.split(" ") if n]

where "correct" is "still wrong", and "theoretically faster" means "slightly
slower" (at least if fix your version, and precompile the pattern).

</F>

Problem Splitting Text String	2	Dec 29, 2022
String splitting by spaces question	6	Nov 23, 2011
a splitting headache	29	Oct 16, 2009
Splitting a string into substrings of equal size	18	Aug 15, 2009
Converting an Array to a String in JavaScript	7	Sep 22, 2023
Splitting a string with escapable separator?	8	Sep 28, 2005
Splitting a string	6	May 15, 2007
problem splitting a string	4	Feb 9, 2007

newby question: Splitting a string - separator

Thomas Liesner

Michael Spencer

Noah

Jim

James Stroud

Kent Johnson

Tim Peters

bonono

James Stroud

Michael Spencer

Steven D'Aprano

James Stroud

Tim Roberts

Fredrik Lundh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads