Remove empty strings from list

Discussion in 'Python' started by Helvin, Sep 15, 2009.

  1. Helvin

    Helvin Guest

    Hi,

    Sorry I did not want to bother the group, but I really do not
    understand this seeming trivial problem.
    I am reading from a textfile, where each line has 2 values, with
    spaces before and between the values.
    I would like to read in these values, but of course, I don't want the
    whitespaces between them.
    I have looked at documentation, and how strings and lists work, but I
    cannot understand the behaviour of the following:
    line = f.readline()
    line = line.lstrip() # take away whitespace at the beginning of the
    readline.
    list = line.split(' ') # split the str line into a list

    # the list has empty strings in it, so now,
    remove these empty strings
    for item in list:
    if item is ' ':
    print 'discard these: ',item
    index = list.index(item)
    del list[index] # remove this item from the list
    else:
    print 'keep this: ',item
    The problem is, when my list is : ['44', '', '', '', '', '',
    '0.000000000\n']
    The output is:
    len of list: 7
    keep this: 44
    discard these:
    discard these:
    discard these:
    So finally the list is: ['44', '', '', '0.000000000\n']
    The code above removes all the empty strings in the middle, all except
    two. My code seems to miss two of the empty strings.

    Would you know why this is occuring?

    Regards,
    Helvin
    Helvin, Sep 15, 2009
    #1
    1. Advertising

  2. Helvin

    Chris Rebert Guest

    On Mon, Sep 14, 2009 at 6:49 PM, Helvin <> wrote:
    > Hi,
    >
    > Sorry I did not want to bother the group, but I really do not
    > understand this seeming trivial problem.
    > I am reading from a textfile, where each line has 2 values, with
    > spaces before and between the values.
    > I would like to read in these values, but of course, I don't want the
    > whitespaces between them.
    > I have looked at documentation, and how strings and lists work, but I
    > cannot understand the behaviour of the following:
    >                        line = f.readline()
    >                        line = line.lstrip() # take away whitespace at the beginning of the
    > readline.
    >                        list = line.split(' ') # split the str line into a list
    >
    >                        # the list has empty strings in it, so now,
    > remove these empty strings
    >                        for item in list:
    >                                if item is ' ':
    >                                        print 'discard these: ',item
    >                                        index = list.index(item)
    >                                        del list[index]         # remove this item from the list
    >                                else:
    >                                        print 'keep this: ',item
    > The problem is, when my list is :  ['44', '', '', '', '', '',
    > '0.000000000\n']
    > The output is:
    >    len of list:  7
    >    keep this:  44
    >    discard these:
    >    discard these:
    >    discard these:
    > So finally the list is:   ['44', '', '', '0.000000000\n']
    > The code above removes all the empty strings in the middle, all except
    > two. My code seems to miss two of the empty strings.
    >
    > Would you know why this is occuring?


    Block quoting from http://effbot.org/zone/python-list.htm
    """
    Note that the for-in statement maintains an internal index, which is
    incremented for each loop iteration. This means that if you modify the
    list you’re looping over, the indexes will get out of sync, and you
    may end up skipping over items, or process the same item multiple
    times.
    """

    Thus why your code is skipping over some elements and not removing them.
    Moral: Don't modify a list while iterating over it. Use the loop to
    create a separate, new list from the old one instead.

    Cheers,
    Chris
    --
    http://blog.rebertia.com
    Chris Rebert, Sep 15, 2009
    #2
    1. Advertising

  3. Helvin

    Dave Angel Guest

    Helvin wrote:
    > Hi,
    >
    > Sorry I did not want to bother the group, but I really do not
    > understand this seeming trivial problem.
    > I am reading from a textfile, where each line has 2 values, with
    > spaces before and between the values.
    > I would like to read in these values, but of course, I don't want the
    > whitespaces between them.
    > I have looked at documentation, and how strings and lists work, but I
    > cannot understand the behaviour of the following:
    > line = f.readline()
    > line = line.lstrip() # take away whitespace at the beginning of the
    > readline.
    > list = line.split(' ') # split the str line into a list
    >
    > # the list has empty strings in it, so now,
    > remove these empty strings
    > for item in list:
    > if item is ' ':
    > print 'discard these: ',item
    > index = list.index(item)
    > del list[index] # remove this item from the list
    > else:
    > print 'keep this: ',item
    > The problem is, when my list is : ['44', '', '', '', '', '',
    > '0.000000000\n']
    > The output is:
    > len of list: 7
    > keep this: 44
    > discard these:
    > discard these:
    > discard these:
    > So finally the list is: ['44', '', '', '0.000000000\n']
    > The code above removes all the empty strings in the middle, all except
    > two. My code seems to miss two of the empty strings.
    >
    > Would you know why this is occuring?
    >
    > Regards,
    > Helvin
    >
    >

    (list already is a defined name, so you really should call it something
    else.


    As Chris says, you're modifying the list while you're iterating through
    it, and that's undefined behavior. Why not do the following?

    mylist = line.strip().split(' ')
    mylist = [item for item in mylist if item]

    DaveA
    Dave Angel, Sep 15, 2009
    #3
  4. On Mon, 14 Sep 2009 18:49:58 -0700 (PDT), Helvin <>
    declaimed the following in gmane.comp.python.general:

    > Hi,
    >
    > Sorry I did not want to bother the group, but I really do not
    > understand this seeming trivial problem.
    > I am reading from a textfile, where each line has 2 values, with
    > spaces before and between the values.
    > I would like to read in these values, but of course, I don't want the
    > whitespaces between them.
    > I have looked at documentation, and how strings and lists work, but I
    > cannot understand the behaviour of the following:


    <snip>

    All of which can be condensed into a simple

    for ln in f:
    wrds = ln.strip()
    # do something with the words -- no whitespace to be seen
    --
    Wulfraed Dennis Lee Bieber KD6MOG
    HTTP://wlfraed.home.netcom.com/
    Dennis Lee Bieber, Sep 15, 2009
    #4
  5. On Mon, 14 Sep 2009 18:55:13 -0700, Chris Rebert wrote:

    > On Mon, Sep 14, 2009 at 6:49 PM, Helvin <> wrote:

    ....
    > > I have looked at documentation, and how strings and lists work, but I
    > > cannot understand the behaviour of the following:

    ....
    > >                        for item in list:
    > >                                if item is ' ':
    > >                                        print 'discard these: ',item
    > >                                        index = list.index(item)
    > >                                        del list[index]


    ....

    > Moral: Don't modify a list while iterating over it. Use the loop to
    > create a separate, new list from the old one instead.



    This doesn't just apply to Python, it is good advice in every language
    I'm familiar with. At the very least, if you have to modify over a list
    in place and you are deleting or inserting items, work *backwards*:

    for i in xrange(len(alist), -1, -1):
    item = alist
    if item == 'delete me':
    del alist


    This is almost never the right solution in Python, but as a general
    technique, it works in all sorts of situations. (E.g. when varnishing a
    floor, don't start at the doorway and varnish towards the end of the
    room, because you'll be walking all over the fresh varnish. Do it the
    other way, starting at the end of the room, and work backwards towards
    the door.)

    In Python, the right solution is almost always to make a new copy of the
    list. Here are three ways to do that:


    newlist = []
    for item in alist:
    if item != 'delete me':
    newlist.append(item)


    newlist = [item for item in alist if item != 'delete me']

    newlist = filter(lambda item: item != 'delete me', alist)



    Once you have newlist, you can then rebind it to alist:

    alist = newlist

    or you can replace the contents of alist with the contents of newlist:

    alist[:] = newlist


    The two have a subtle difference in behavior that may not be apparent
    unless you have multiple names bound to alist.



    --
    Steven
    Steven D'Aprano, Sep 15, 2009
    #5
  6. Helvin a écrit :
    > Hi,
    >
    > Sorry I did not want to bother the group, but I really do not
    > understand this seeming trivial problem.
    > I am reading from a textfile, where each line has 2 values, with
    > spaces before and between the values.
    > I would like to read in these values, but of course, I don't want the
    > whitespaces between them.
    > I have looked at documentation, and how strings and lists work, but I
    > cannot understand the behaviour of the following:

    line = f.readline()
    > line = line.lstrip() # take away whitespace at the beginning of the
    > readline.


    file.readline returns the line with the ending newline character (which
    is considered whitespace by the str.strip method), so you may want to
    use line.strip instead of line.lstrip

    > list = line.split(' ')


    Slightly OT but : don't use builtin types or functions names as
    identifiers - this shadows the builtin object.

    Also, the default behaviour of str.split is to split on whitespaces and
    remove the delimiter. You would have better results not specifying the
    delimiters here:

    >>> " a a a a ".split(' ')

    ['', 'a', '', 'a', '', 'a', '', 'a', '']
    >>> " a a a a ".split()

    ['a', 'a', 'a', 'a']
    >>>


    > # the list has empty strings in it, so now,
    > remove these empty strings


    A problem you could have avoided right from the start !-)

    > for item in list:
    > if item is ' ':


    Don't use identity comparison when you want to test for equality. It
    happens to kind of work in your above example but only because CPython
    implements a cache for _some_ small strings, but you should _never_ rely
    on such implementation details. A string containing accented characters
    would not have been cached:
    >>> s = 'ééé'
    >>> s is 'ééé'

    False
    >>>



    Also, this is surely not your actual code : ' ' is not an empty string,
    it's a string with a single space character. The empty string is ''. And
    FWIW, empty strings (like most empty sequences and collections, all
    numerical zeros, and the None object) have a false value in a boolean
    context, so you can just test the string directly:

    for s in ['', 0, 0.0, [], {}, (), None]:
    if not s:
    print "'%s' is empty, so it's false" % str(s)


    > print 'discard these: ',item
    > index = list.index(item)
    > del list[index] # remove this item from the list


    And then you do have a big problem : the internal pointer used by the
    iterator is not in sync with the list anymore, so the next iteration
    will skip one item.

    As general rule : *don't* add / remove elements to/from a sequence while
    iterating over it. If you really need to modify the sequence while
    iterating over it, do a reverse iteration - but there are usually better
    solutions.

    > else:
    > print 'keep this: ',item
    > The problem is,


    Make it a plural - there's more than 1 problem here !-)

    > when my list is : ['44', '', '', '', '', '',
    > '0.000000000\n']
    > The output is:
    > len of list: 7
    > keep this: 44
    > discard these:
    > discard these:
    > discard these:
    > So finally the list is: ['44', '', '', '0.000000000\n']
    > The code above removes all the empty strings in the middle, all except
    > two. My code seems to miss two of the empty strings.
    >
    > Would you know why this is occuring?



    cf above... and below:

    >>> alist = ['44', '', '', '', '', '', '0.000000000']
    >>> for i, it in enumerate(alist):

    .... print 'i : %s - it : "%s"' % (i, it)
    .... if not it:
    .... del alist[idx]
    .... print "alist is now %s" % alist
    ....
    i : 0 - it : "44"
    alist is now ['44', '', '', '', '', '', '0.000000000']
    i : 1 - it : ""
    alist is now ['44', '', '', '', '', '0.000000000']
    i : 2 - it : ""
    alist is now ['44', '', '', '', '0.000000000']
    i : 3 - it : ""
    alist is now ['44', '', '', '0.000000000']
    >>>



    Ok, now for practical answers:

    1/ in the above case, use line.strip().split(), you'll have no more
    problem !-)

    2/ as a general rule, if you need to filter a sequence, don't try to do
    it in place (unless it's a *very* big sequence and you run into memory
    problems but then there are probably better solutions).

    The common idioms for filtering a sequence are:

    * filter(predicate, sequence):

    the 'predicate' param is callback function which takes an item from the
    sequence and returns a boolean value (True to keep the item, False to
    discard it). The following example will filter out even integers:

    def is_odd(n):
    return n % 2

    alist = range(10)
    odds = filter(is_odd, alist)
    print alist
    print odds

    Alternatively, filter() can take None as it's first param, in which case
    it will filter out items that have a false value in a boolean context, ie:

    alist = ['', 'a', 0, 1, [], [1], None, object, False, True]
    result = filter(None, alist)
    print result


    * list comprehensions

    Here you directly build the result list:

    alist = range(10)
    odds = [n for n in alist if n % 2]

    alist = ['', 'a', 0, 1, [], [1], None, object, False, True]
    result = [item for item in alist if item]
    print result



    HTH
    Bruno Desthuilliers, Sep 15, 2009
    #6
  7. Dave Angel a écrit :
    (snip)
    >
    > As Chris says, you're modifying the list while you're iterating through
    > it, and that's undefined behavior. Why not do the following?
    >
    > mylist = line.strip().split(' ')
    > mylist = [item for item in mylist if item]


    Mmmm... because the second line is plain useless when calling
    str.split() without a delimiter ?-)

    >> mylist = line.strip().split()


    will already do the RightThing(tm).
    Bruno Desthuilliers, Sep 15, 2009
    #7
  8. Dennis Lee Bieber a écrit :
    (snip)
    > All of which can be condensed into a simple
    >
    > for ln in f:
    > wrds = ln.strip()
    > # do something with the words -- no whitespace to be seen



    I assume you meant:
    wrds = ln.strip().split()

    ?-)
    Bruno Desthuilliers, Sep 15, 2009
    #8
  9. Bruno Desthuilliers <> wrote:
    > >> mylist = line.strip().split()

    >
    >will already do the RightThing(tm).


    So will

    mylist = line.split()

    --
    \S

    under construction
    Sion Arrowsmith, Sep 15, 2009
    #9
  10. On Tue, 15 Sep 2009 11:21:30 +0200, Bruno Desthuilliers
    <> declaimed the following in
    gmane.comp.python.general:

    >
    > I assume you meant:
    > wrds = ln.strip().split()
    >

    Whoops... yes... And actually, without the strip() in there.
    --
    Wulfraed Dennis Lee Bieber KD6MOG
    HTTP://wlfraed.home.netcom.com/
    Dennis Lee Bieber, Sep 15, 2009
    #10
  11. Helvin

    Rhodri James Guest

    On Tue, 15 Sep 2009 02:55:13 +0100, Chris Rebert <> wrote:

    > On Mon, Sep 14, 2009 at 6:49 PM, Helvin <> wrote:
    >> Hi,
    >>
    >> Sorry I did not want to bother the group, but I really do not
    >> understand this seeming trivial problem.
    >> I am reading from a textfile, where each line has 2 values, with
    >> spaces before and between the values.
    >> I would like to read in these values, but of course, I don't want the
    >> whitespaces between them.
    >> I have looked at documentation, and how strings and lists work, but I
    >> cannot understand the behaviour of the following:
    >> line = f.readline()
    >> line = line.lstrip() # take away whitespace at
    >> the beginning of the
    >> readline.
    >> list = line.split(' ') # split the str line into
    >> a list
    >>
    >> # the list has empty strings in it, so now,
    >> remove these empty strings

    [snip]
    >
    > Block quoting from http://effbot.org/zone/python-list.htm
    > """
    > Note that the for-in statement maintains an internal index, which is
    > incremented for each loop iteration. This means that if you modify the
    > list you’re looping over, the indexes will get out of sync, and you
    > may end up skipping over items, or process the same item multiple
    > times.
    > """
    >
    > Thus why your code is skipping over some elements and not removing them.
    > Moral: Don't modify a list while iterating over it. Use the loop to
    > create a separate, new list from the old one instead.


    In this case, your life would be improved by using

    l = line.split()

    instead of

    l = line.split(' ')

    and not getting the empty strings in the first place.

    --
    Rhodri James *-* Wildebeest Herder to the Masses
    Rhodri James, Sep 16, 2009
    #11
  12. Sion Arrowsmith a écrit :
    > Bruno Desthuilliers <> wrote:
    >>>> mylist = line.strip().split()

    >> will already do the RightThing(tm).

    >
    > So will
    >
    > mylist = line.split()
    >

    Yeps, it's at least the second time someone reminds me that the call to
    str.strip is just useless here... Pity my poor old neuron :(
    Bruno Desthuilliers, Sep 16, 2009
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Klaus Neuner
    Replies:
    7
    Views:
    475
    Klaus Neuner
    Jul 26, 2004
  2. Chris Brat
    Replies:
    7
    Views:
    370
    Marc 'BlackJack' Rintsch
    Sep 6, 2006
  3. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    740
    Malcolm
    Jun 24, 2006
  4. Tzury Bar Yochay
    Replies:
    1
    Views:
    394
    Gabriel Genellina
    Mar 24, 2008
  5. Wybo Dekker
    Replies:
    1
    Views:
    175
    Charles Steinman
    Jul 23, 2005
Loading...

Share This Page