Fixed length lists from .split()?

Discussion in 'Python' started by Bob Greschke, Jan 26, 2007.

  1. Bob Greschke

    Bob Greschke Guest

    I'm reading a file that has lines like

    bcsn; 1000000; 1223
    bcsn; 1000001; 1456
    bcsn; 1000003
    bcsn; 1000010; 4567

    The problem is the line with only the one semi-colon.
    Is there a fancy way to get Parts=Line.split(";") to make Parts always
    have three items in it, or do I just have to check the length of Parts
    and loop to add the required missing items (this one would just take
    Parts+=[""], but there are other types of lines in the file that have
    about 10 "fields" that also have this problem)?

    Thanks!

    Bob
    Bob Greschke, Jan 26, 2007
    #1
    1. Advertising

  2. Bob Greschke

    Duncan Booth Guest

    Bob Greschke <> wrote:

    > Is there a fancy way to get Parts=Line.split(";") to make Parts always
    > have three items in it, or do I just have to check the length of Parts
    > and loop to add the required missing items (this one would just take
    > Parts+=[""], but there are other types of lines in the file that have
    > about 10 "fields" that also have this problem)?


    >>> def nsplit(s, sep, n):

    return (s.split(sep) + [""]*n)[:n]

    >>> nsplit("bcsn; 1000001; 1456", ";", 3)

    ['bcsn', ' 1000001', ' 1456']
    >>> nsplit("bcsn; 1000001", ";", 3)

    ['bcsn', ' 1000001', '']
    >>>
    Duncan Booth, Jan 26, 2007
    #2
    1. Advertising

  3. Bob Greschke

    Bob Greschke Guest

    On 2007-01-26 11:13:56 -0700, Duncan Booth <> said:

    > Bob Greschke <> wrote:
    >
    >> Is there a fancy way to get Parts=Line.split(";") to make Parts always
    >> have three items in it, or do I just have to check the length of Parts
    >> and loop to add the required missing items (this one would just take
    >> Parts+=[""], but there are other types of lines in the file that have
    >> about 10 "fields" that also have this problem)?

    >
    >>>> def nsplit(s, sep, n):

    > return (s.split(sep) + [""]*n)[:n]
    >
    >>>> nsplit("bcsn; 1000001; 1456", ";", 3)

    > ['bcsn', ' 1000001', ' 1456']
    >>>> nsplit("bcsn; 1000001", ";", 3)

    > ['bcsn', ' 1000001', '']


    That's fancy enough. :) I didn't know you could do [""]*n. I never
    thought about it before.

    Thanks!

    Bob
    Bob Greschke, Jan 26, 2007
    #3
  4. On Fri, 26 Jan 2007 11:26:46 -0700, Bob Greschke <>
    declaimed the following in comp.lang.python:

    >
    > That's fancy enough. :) I didn't know you could do [""]*n. I never
    > thought about it before.
    >

    My first thought was getting it from the other side...

    >>> def nsplit(st, sp, n):

    .... return (st + (sp*n)).split(sp)[:n]
    ....
    >>> nsplit("this;is;a;sample", ";", 10)

    ['this', 'is', 'a', 'sample', '', '', '', '', '', '']

    To the string to be split, append enough separators to ensure the
    desired number of fields, perform the split, and return the desired
    number of resultant parts.

    Of course, if the string is longer than "n", it will only return the
    leftmost "n" parts.

    >>> nsplit("this;is;a;sample", ";", 4)

    ['this', 'is', 'a', 'sample']
    >>> nsplit("this;is;a;sample", ";", 3)

    ['this', 'is', 'a']
    >>>

    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
    Dennis Lee Bieber, Jan 27, 2007
    #4
  5. Bob Greschke

    Guest

    Duncan Booth:
    > def nsplit(s, sep, n):
    > return (s.split(sep) + [""]*n)[:n]


    Another version, longer:

    from itertools import repeat

    def nsplit(text, sep, n):
    """
    >>> nsplit("bcsn; 1000001; 1456", ";", 3)

    ['bcsn', ' 1000001', ' 1456']
    >>> nsplit("bcsn; 1000001", ";", 3)

    ['bcsn', ' 1000001', '']
    >>> nsplit("bcsn", ";", 3)

    ['bcsn', '', '']
    >>> nsplit("", ".", 4)

    ['', '', '', '']
    >>> nsplit("ab.ac.ad.ae", ".", 2)

    ['ab', 'ac', 'ad', 'ae']
    """
    result = text.split(sep)
    nparts = len(result)
    result.extend(repeat("", n-nparts))
    return result

    if __name__ == "__main__":
    import doctest
    doctest.testmod()

    Bye,
    bearophile
    , Jan 27, 2007
    #5
  6. On Jan 26, 11:07 am, Bob Greschke <> wrote:
    > I'm reading a file that has lines like
    >
    > bcsn; 1000000; 1223
    > bcsn; 1000001; 1456
    > bcsn; 1000003
    > bcsn; 1000010; 4567
    >
    > The problem is the line with only the one semi-colon.
    > Is there a fancy way to get Parts=Line.split(";") to make Parts always
    > have three items in it


    In Python 2.5 you can use the .partition() method which always returns
    a three item tuple:

    >>> text = '''\

    .... bcsn; 1000000; 1223
    .... bcsn; 1000001; 1456
    .... bcsn; 1000003
    .... bcsn; 1000010; 4567
    .... '''
    >>> for line in text.splitlines():

    .... bcsn, _, rest = line.partition(';')
    .... num1, _, num2 = rest.partition(';')
    .... print (bcsn, num1, num2)
    ....
    (' bcsn', ' 1000000', ' 1223')
    (' bcsn', ' 1000001', ' 1456')
    (' bcsn', ' 1000003', '')
    (' bcsn', ' 1000010', ' 4567')
    >>> help(str.partition)

    Help on method_descriptor:

    partition(...)
    S.partition(sep) -> (head, sep, tail)

    Searches for the separator sep in S, and returns the part before
    it,
    the separator itself, and the part after it. If the separator is
    not
    found, returns S and two empty strings.


    STeVe
    Steven Bethard, Jan 30, 2007
    #6
  7. Bob Greschke

    Bob Greschke Guest

    This idiom is what I ended up using (a lot it turns out!):

    Parts = Line.split(";")
    Parts += (x-len(Parts))*[""]

    where x knows how long the line should be. If the line already has
    more parts than x (i.e. [""] gets multiplied by a negative number)
    nothing seems to happen which is just fine in this program's case.

    Bob
    Bob Greschke, Feb 1, 2007
    #7
  8. On Feb 1, 2:40 pm, Bob Greschke <> wrote:

    > This idiom is what I ended up using (a lot it turns out!):
    >
    > Parts = Line.split(";")
    > Parts += (x-len(Parts))*[""]
    >
    > where x knows how long the line should be. If the line already has
    > more parts than x (i.e. [""] gets multiplied by a negative number)
    > nothing seems to happen which is just fine in this program's case.
    >
    > Bob


    Here's a more generic padding one liner:

    from itertools import chain,repeat

    def ipad(seq, minlen, fill=None):
    return chain(seq, repeat(fill, minlen-len(seq)))

    >>> list(ipad('one;two;three;four'.split(";"), 7, ''))

    ['one', 'two', 'three', 'four', '', '', '']

    >>> tuple(ipad(xrange(1,5), 7))

    (1, 2, 3, 4, None, None, None)


    George
    George Sakkis, Feb 2, 2007
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Harald Kirsch
    Replies:
    2
    Views:
    366
    Roedy Green
    Sep 4, 2003
  2. H Brown  New To It
    Replies:
    6
    Views:
    781
    Dave Monroe
    Nov 5, 2003
  3. johnp
    Replies:
    4
    Views:
    3,668
    Toby Inkster
    May 23, 2005
  4. =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==

    List of lists of lists of lists...

    =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==, May 8, 2006, in forum: Python
    Replies:
    5
    Views:
    404
    =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==
    May 15, 2006
  5. Wayne Molina
    Replies:
    7
    Views:
    564
    Adam Penny
    Oct 21, 2008
Loading...

Share This Page