Newbie question regarding string.split()

K

kevinliu23

Hey guys,

So I have a question regarding the split() function in the string
module. Let's say I have an string...

input = "2b 3 4bx 5b 2c 4a 5a 6"
projectOptions = (input.replace(" ", "")).split('2')
print projectOptions

['', 'b34bx5b', 'c4a5a6']

My question is, why is the first element of projectOptions an empty
string? What can I do so that the first element is not an empty
string? but the 'b34bx5b' string as I expected?

Thanks so much guys. :)
 
G

Grant Edwards

Hey guys,

So I have a question regarding the split() function in the string
module. Let's say I have an string...

input = "2b 3 4bx 5b 2c 4a 5a 6"
projectOptions = (input.replace(" ", "")).split('2')
print projectOptions

['', 'b34bx5b', 'c4a5a6']

My question is, why is the first element of projectOptions an
empty string?

The presense of a delimiter indicates that there is a field
both before and after the delimiter. If it didn't work that
way, then you'd get the same results for

input = "2b 3 4bx 5b 2c 4a 5a 6"

as you would for

input = "b 3 4bx 5b 2c 4a 5a 6"

you would get the same results for

input = "2222b22222"

as you would for

intput = "b"
What can I do so that the first element is not an empty
string? but the 'b34bx5b' string as I expected?

projectOptions = (input.replace(" ", "")).split('2')
if projectOptions[0] == '':
del projectOptions[0]
print projectOptions
 
S

Steve Holden

kevinliu23 said:
Hey guys,

So I have a question regarding the split() function in the string
module. Let's say I have an string...
First of all, the string module is pretty much deprecated nowadays. What
you are actually using, the .split() method of a string, is the
preferred way to do it. If you are importing string, don't bother!

input = "2b 3 4bx 5b 2c 4a 5a 6"
projectOptions = (input.replace(" ", "")).split('2')
print projectOptions

['', 'b34bx5b', 'c4a5a6']

My question is, why is the first element of projectOptions an empty
string? What can I do so that the first element is not an empty
string? but the 'b34bx5b' string as I expected?
Because .split() returns a list of the strings surrounding each
occurrence of the split argument. Because the string begins with the
split argument it returns an empty string as the first element (since
the assumption is you are interested in both sides of the separator).

You can easily throw the first element away:

del projectOptions [0]

for example, or

projectOptions = projectOptions[1:]

But what do you want to do if the string *doesn't* begin with a 2?

regards
Steve
 
K

kyosohma

Hey guys,

So I have a question regarding the split() function in the string
module. Let's say I have an string...

input = "2b 3 4bx 5b 2c 4a 5a 6"
projectOptions = (input.replace(" ", "")).split('2')
print projectOptions

['', 'b34bx5b', 'c4a5a6']

My question is, why is the first element of projectOptions an empty
string? What can I do so that the first element is not an empty
string? but the 'b34bx5b' string as I expected?

Thanks so much guys. :)

The reason you have an empty string at the beginning is because you
are "splitting" on a character that happens to include the first
character in your string. So what you are telling Python to do is to
split the beginning from itself, or to insert a blank so that it is
split.

Also, you shouldn't use "input" as a variable name since it is a built-
in variable.

One hack to make it work is to add the following line right before you
print "projectOptions":

projectOptions.pop(0) # pop the first element out of the list



Mike
 
S

Stephen Lewitowski

kevinliu23 said:
Hey guys,

So I have a question regarding the split() function in the string
module. Let's say I have an string...

input = "2b 3 4bx 5b 2c 4a 5a 6"
projectOptions = (input.replace(" ", "")).split('2')
print projectOptions

['', 'b34bx5b', 'c4a5a6']

My question is, why is the first element of projectOptions an empty
string? What can I do so that the first element is not an empty
string? but the 'b34bx5b' string as I expected?

Thanks so much guys. :)
split on c instead
 
T

Tommy Grav

['', 'b34bx5b', 'c4a5a6']

My question is, why is the first element of projectOptions an empty
string? What can I do so that the first element is not an empty
string? but the 'b34bx5b' string as I expected?

Thanks so much guys. :)

The reason you have an empty string at the beginning is because you
are "splitting" on a character that happens to include the first
character in your string. So what you are telling Python to do is to
split the beginning from itself, or to insert a blank so that it is
split.

So why does this not happen when you use the empty split() function?

[tgrav@Thrym] /Users/tgrav --> python
Python 2.4.4 (#1, Oct 18 2006, 10:34:39)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = " 456 556 556"
>>> a.split() ['456', '556', '556']
>>> a.split(" ") ['', '456', '556', '556']
>>>

What exactly does .split() use to do the splitting?

Cheers
Tommy
 
S

Steve Holden

Tommy said:
['', 'b34bx5b', 'c4a5a6']

My question is, why is the first element of projectOptions an empty
string? What can I do so that the first element is not an empty
string? but the 'b34bx5b' string as I expected?

Thanks so much guys. :)
The reason you have an empty string at the beginning is because you
are "splitting" on a character that happens to include the first
character in your string. So what you are telling Python to do is to
split the beginning from itself, or to insert a blank so that it is
split.

So why does this not happen when you use the empty split() function?

[tgrav@Thrym] /Users/tgrav --> python
Python 2.4.4 (#1, Oct 18 2006, 10:34:39)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
a = " 456 556 556"
a.split() ['456', '556', '556']
a.split(" ") ['', '456', '556', '556']

What exactly does .split() use to do the splitting?
Any sequence of one or more whitespace characters. This is a rather
special case, quite different from .split(" ").

regards
Steve
 
S

Steven D'Aprano

One hack to make it work is to add the following line right before you
print "projectOptions":

projectOptions.pop(0) # pop the first element out of the list

Which will introduce a nice bug into the Original Poster's code when the
input string doesn't start with a "2".
 
B

Bruno Desthuilliers

kevinliu23 a écrit :
Hey guys,

So I have a question regarding the split() function in the string
module. Let's say I have an string...

input = "2b 3 4bx 5b 2c 4a 5a 6"
projectOptions = (input.replace(" ", "")).split('2')
The parens around the call to input.replace are useless:
projectOptions = input.replace(" ", "").split('2')
print projectOptions

['', 'b34bx5b', 'c4a5a6']
(snip)

What can I do so that the first element is not an empty
string? but the 'b34bx5b' string as I expected?


projectOptions = filter(None, input.replace(" ", "").split('2'))
 
K

kevinliu23

kevinliu23 a écrit :> Hey guys,
Thanks for all your help everyone. :)
The parens around the call to input.replace are useless:
projectOptions = input.replace(" ", "").split('2')
print projectOptions
['', 'b34bx5b', 'c4a5a6']
(snip)

What can I do so that the first element is not an empty
string? but the 'b34bx5b' string as I expected?

projectOptions = filter(None, input.replace(" ", "").split('2'))
 
D

Dennis Lee Bieber

kevinliu23 a écrit :
said:
What can I do so that the first element is not an empty
string? but the 'b34bx5b' string as I expected?


projectOptions = filter(None, input.replace(" ", "").split('2'))
inp = "2b 3 4bx 5b 2c 4a 5a 6"
marker = "2"
po = inp.replace(" ", "").strip(marker).split(marker)
po ['b34bx5b', 'c4a5a6']

..split() [no arguments] splits on (blocks of) white-space, and does an
implicit .strip() [no arguments] to remove leading and trailing white
space before splitting.

..split(achar) splits on /each/ occurrence of "achar"; no treating
adjacent copies as one split point.

The behavior can be seen if one uses find/replace in a text editor.
Start with (including the quotes)

"2b 3 4bx 5b 2c 4a 5a 6"

find <space> replace <none>

"2b34bx5b2c4a5a6"

find <2> replace <",">

"","b34bx5b","c4a5a6"

Look familiar? wrap some [ ] around it...

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
K

Karthik Gurusamy

Hey guys,

So I have a question regarding the split() function in the string
module. Let's say I have an string...

input = "2b 3 4bx 5b 2c 4a 5a 6"
projectOptions = (input.replace(" ", "")).split('2')
print projectOptions

['', 'b34bx5b', 'c4a5a6']

The confusion, as you can see from other posts, is because the
behavior is different from default split().
Default split works on whitespace and we don't get leading/trailing
empty list items.

So just add input = input.strip('2') after the input assignment (BTW
someone had
pointed input is a reserved identifier). Note this solution will work
for splitting on any sequence of chars..just strip them first. Note we
still get empty elements in the middle of the string -- this probably
we want to get in most cases.

Karthik
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top