Newbie question regarding string.split()

Discussion in 'Python' started by kevinliu23, Apr 20, 2007.

  1. kevinliu23

    kevinliu23 Guest

    Hey guys,

    So I have a question regarding the split() function in the string
    module. Let's say I have an string...

    input = "2b 3 4bx 5b 2c 4a 5a 6"
    projectOptions = (input.replace(" ", "")).split('2')
    print projectOptions

    ['', 'b34bx5b', 'c4a5a6']

    My question is, why is the first element of projectOptions an empty
    string? What can I do so that the first element is not an empty
    string? but the 'b34bx5b' string as I expected?

    Thanks so much guys. :)
    kevinliu23, Apr 20, 2007
    #1
    1. Advertising

  2. On 2007-04-20, kevinliu23 <> wrote:
    > Hey guys,
    >
    > So I have a question regarding the split() function in the string
    > module. Let's say I have an string...
    >
    > input = "2b 3 4bx 5b 2c 4a 5a 6"
    > projectOptions = (input.replace(" ", "")).split('2')
    > print projectOptions
    >
    > ['', 'b34bx5b', 'c4a5a6']
    >
    > My question is, why is the first element of projectOptions an
    > empty string?


    The presense of a delimiter indicates that there is a field
    both before and after the delimiter. If it didn't work that
    way, then you'd get the same results for

    input = "2b 3 4bx 5b 2c 4a 5a 6"

    as you would for

    input = "b 3 4bx 5b 2c 4a 5a 6"

    you would get the same results for

    input = "2222b22222"

    as you would for

    intput = "b"

    > What can I do so that the first element is not an empty
    > string? but the 'b34bx5b' string as I expected?


    projectOptions = (input.replace(" ", "")).split('2')
    if projectOptions[0] == '':
    del projectOptions[0]
    print projectOptions

    --
    Grant Edwards grante Yow! I feel like a wet
    at parking meter on Darvon!
    visi.com
    Grant Edwards, Apr 20, 2007
    #2
    1. Advertising

  3. kevinliu23

    Steve Holden Guest

    kevinliu23 wrote:
    > Hey guys,
    >
    > So I have a question regarding the split() function in the string
    > module. Let's say I have an string...
    >

    First of all, the string module is pretty much deprecated nowadays. What
    you are actually using, the .split() method of a string, is the
    preferred way to do it. If you are importing string, don't bother!


    > input = "2b 3 4bx 5b 2c 4a 5a 6"
    > projectOptions = (input.replace(" ", "")).split('2')
    > print projectOptions
    >
    > ['', 'b34bx5b', 'c4a5a6']
    >
    > My question is, why is the first element of projectOptions an empty
    > string? What can I do so that the first element is not an empty
    > string? but the 'b34bx5b' string as I expected?
    >

    Because .split() returns a list of the strings surrounding each
    occurrence of the split argument. Because the string begins with the
    split argument it returns an empty string as the first element (since
    the assumption is you are interested in both sides of the separator).

    You can easily throw the first element away:

    del projectOptions [0]

    for example, or

    projectOptions = projectOptions[1:]

    But what do you want to do if the string *doesn't* begin with a 2?

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://del.icio.us/steve.holden
    Recent Ramblings http://holdenweb.blogspot.com
    Steve Holden, Apr 20, 2007
    #3
  4. kevinliu23

    Guest

    On Apr 20, 1:51 pm, kevinliu23 <> wrote:
    > Hey guys,
    >
    > So I have a question regarding the split() function in the string
    > module. Let's say I have an string...
    >
    > input = "2b 3 4bx 5b 2c 4a 5a 6"
    > projectOptions = (input.replace(" ", "")).split('2')
    > print projectOptions
    >
    > ['', 'b34bx5b', 'c4a5a6']
    >
    > My question is, why is the first element of projectOptions an empty
    > string? What can I do so that the first element is not an empty
    > string? but the 'b34bx5b' string as I expected?
    >
    > Thanks so much guys. :)


    The reason you have an empty string at the beginning is because you
    are "splitting" on a character that happens to include the first
    character in your string. So what you are telling Python to do is to
    split the beginning from itself, or to insert a blank so that it is
    split.

    Also, you shouldn't use "input" as a variable name since it is a built-
    in variable.

    One hack to make it work is to add the following line right before you
    print "projectOptions":

    projectOptions.pop(0) # pop the first element out of the list



    Mike
    , Apr 20, 2007
    #4
  5. kevinliu23 wrote:
    > Hey guys,
    >
    > So I have a question regarding the split() function in the string
    > module. Let's say I have an string...
    >
    > input = "2b 3 4bx 5b 2c 4a 5a 6"
    > projectOptions = (input.replace(" ", "")).split('2')
    > print projectOptions
    >
    > ['', 'b34bx5b', 'c4a5a6']
    >
    > My question is, why is the first element of projectOptions an empty
    > string? What can I do so that the first element is not an empty
    > string? but the 'b34bx5b' string as I expected?
    >
    > Thanks so much guys. :)
    >

    split on c instead
    Stephen Lewitowski, Apr 20, 2007
    #5
  6. kevinliu23

    Tommy Grav Guest

    On Apr 20, 2007, at 3:15 PM, wrote:
    > On Apr 20, 1:51 pm, kevinliu23 <> wrote:
    >> ['', 'b34bx5b', 'c4a5a6']
    >>
    >> My question is, why is the first element of projectOptions an empty
    >> string? What can I do so that the first element is not an empty
    >> string? but the 'b34bx5b' string as I expected?
    >>
    >> Thanks so much guys. :)

    >
    > The reason you have an empty string at the beginning is because you
    > are "splitting" on a character that happens to include the first
    > character in your string. So what you are telling Python to do is to
    > split the beginning from itself, or to insert a blank so that it is
    > split.


    So why does this not happen when you use the empty split() function?

    [tgrav@Thrym] /Users/tgrav --> python
    Python 2.4.4 (#1, Oct 18 2006, 10:34:39)
    [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> a = " 456 556 556"
    >>> a.split()

    ['456', '556', '556']
    >>> a.split(" ")

    ['', '456', '556', '556']
    >>>


    What exactly does .split() use to do the splitting?

    Cheers
    Tommy
    Tommy Grav, Apr 20, 2007
    #6
  7. kevinliu23

    Steve Holden Guest

    Tommy Grav wrote:
    > On Apr 20, 2007, at 3:15 PM, wrote:
    >> On Apr 20, 1:51 pm, kevinliu23 <> wrote:
    >>> ['', 'b34bx5b', 'c4a5a6']
    >>>
    >>> My question is, why is the first element of projectOptions an empty
    >>> string? What can I do so that the first element is not an empty
    >>> string? but the 'b34bx5b' string as I expected?
    >>>
    >>> Thanks so much guys. :)

    >> The reason you have an empty string at the beginning is because you
    >> are "splitting" on a character that happens to include the first
    >> character in your string. So what you are telling Python to do is to
    >> split the beginning from itself, or to insert a blank so that it is
    >> split.

    >
    > So why does this not happen when you use the empty split() function?
    >
    > [tgrav@Thrym] /Users/tgrav --> python
    > Python 2.4.4 (#1, Oct 18 2006, 10:34:39)
    > [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
    > Type "help", "copyright", "credits" or "license" for more information.
    > >>> a = " 456 556 556"
    > >>> a.split()

    > ['456', '556', '556']
    > >>> a.split(" ")

    > ['', '456', '556', '556']
    > >>>

    >
    > What exactly does .split() use to do the splitting?
    >

    Any sequence of one or more whitespace characters. This is a rather
    special case, quite different from .split(" ").

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://del.icio.us/steve.holden
    Recent Ramblings http://holdenweb.blogspot.com
    Steve Holden, Apr 20, 2007
    #7
  8. On Fri, 20 Apr 2007 12:15:33 -0700, kyosohma wrote:

    > One hack to make it work is to add the following line right before you
    > print "projectOptions":
    >
    > projectOptions.pop(0) # pop the first element out of the list


    Which will introduce a nice bug into the Original Poster's code when the
    input string doesn't start with a "2".



    --
    Steven.
    Steven D'Aprano, Apr 21, 2007
    #8
  9. kevinliu23 a écrit :
    > Hey guys,
    >
    > So I have a question regarding the split() function in the string
    > module. Let's say I have an string...
    >
    > input = "2b 3 4bx 5b 2c 4a 5a 6"
    > projectOptions = (input.replace(" ", "")).split('2')

    The parens around the call to input.replace are useless:
    projectOptions = input.replace(" ", "").split('2')

    > print projectOptions
    >
    > ['', 'b34bx5b', 'c4a5a6']


    (snip)

    > What can I do so that the first element is not an empty
    > string? but the 'b34bx5b' string as I expected?



    projectOptions = filter(None, input.replace(" ", "").split('2'))
    Bruno Desthuilliers, Apr 21, 2007
    #9
  10. kevinliu23

    kevinliu23 Guest

    On Apr 21, 3:30 pm, Bruno Desthuilliers
    <> wrote:
    > kevinliu23 a écrit :> Hey guys,
    >
    > > So I have a question regarding the split() function in the string
    > > module. Let's say I have an string...

    >
    > > input = "2b 3 4bx 5b 2c 4a 5a 6"
    > > projectOptions = (input.replace(" ", "")).split('2')

    Thanks for all your help everyone. :)

    > The parens around the call to input.replace are useless:
    > projectOptions = input.replace(" ", "").split('2')
    >
    > > print projectOptions

    >
    > > ['', 'b34bx5b', 'c4a5a6']

    >
    > (snip)
    >
    > > What can I do so that the first element is not an empty
    > > string? but the 'b34bx5b' string as I expected?

    >
    > projectOptions = filter(None, input.replace(" ", "").split('2'))
    kevinliu23, Apr 21, 2007
    #10
  11. On Sat, 21 Apr 2007 21:30:10 +0200, Bruno Desthuilliers
    <> declaimed the following in
    comp.lang.python:

    > kevinliu23 a écrit :

    <snip>
    > > What can I do so that the first element is not an empty
    > > string? but the 'b34bx5b' string as I expected?

    >
    >
    > projectOptions = filter(None, input.replace(" ", "").split('2'))
    >


    >>> inp = "2b 3 4bx 5b 2c 4a 5a 6"
    >>> marker = "2"
    >>> po = inp.replace(" ", "").strip(marker).split(marker)
    >>> po

    ['b34bx5b', 'c4a5a6']
    >>>


    ..split() [no arguments] splits on (blocks of) white-space, and does an
    implicit .strip() [no arguments] to remove leading and trailing white
    space before splitting.

    ..split(achar) splits on /each/ occurrence of "achar"; no treating
    adjacent copies as one split point.

    The behavior can be seen if one uses find/replace in a text editor.
    Start with (including the quotes)

    "2b 3 4bx 5b 2c 4a 5a 6"

    find <space> replace <none>

    "2b34bx5b2c4a5a6"

    find <2> replace <",">

    "","b34bx5b","c4a5a6"

    Look familiar? wrap some [ ] around it...

    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
    Dennis Lee Bieber, Apr 21, 2007
    #11
  12. On Apr 20, 11:51 am, kevinliu23 <> wrote:
    > Hey guys,
    >
    > So I have a question regarding the split() function in the string
    > module. Let's say I have an string...
    >
    > input = "2b 3 4bx 5b 2c 4a 5a 6"
    > projectOptions = (input.replace(" ", "")).split('2')
    > print projectOptions
    >
    > ['', 'b34bx5b', 'c4a5a6']
    >


    The confusion, as you can see from other posts, is because the
    behavior is different from default split().
    Default split works on whitespace and we don't get leading/trailing
    empty list items.

    So just add input = input.strip('2') after the input assignment (BTW
    someone had
    pointed input is a reserved identifier). Note this solution will work
    for splitting on any sequence of chars..just strip them first. Note we
    still get empty elements in the middle of the string -- this probably
    we want to get in most cases.

    Karthik

    > My question is, why is the first element of projectOptions an empty
    > string? What can I do so that the first element is not an empty
    > string? but the 'b34bx5b' string as I expected?
    >
    > Thanks so much guys. :)
    Karthik Gurusamy, Apr 22, 2007
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    451
  2. Carlos Ribeiro
    Replies:
    11
    Views:
    684
    Alex Martelli
    Sep 17, 2004
  3. trans.  (T. Onoma)

    split on '' (and another for split -1)

    trans. (T. Onoma), Dec 27, 2004, in forum: Ruby
    Replies:
    10
    Views:
    201
    Florian Gross
    Dec 28, 2004
  4. Sam Kong
    Replies:
    5
    Views:
    224
    Rick DeNatale
    Aug 12, 2006
  5. Stanley Xu
    Replies:
    2
    Views:
    582
    Stanley Xu
    Mar 23, 2011
Loading...

Share This Page