Is there a better/simpler way to filter blank lines?

Discussion in 'Python' started by tmallen, Nov 4, 2008.

  1. tmallen

    tmallen Guest

    I'm parsing some text files, and I want to strip blank lines in the
    process. Is there a simpler way to do this than what I have here?

    lines = filter(lambda line: len(line.strip()) > 0, lines)

    Thomas
    tmallen, Nov 4, 2008
    #1
    1. Advertising

  2. tmallen

    Guest

    tmallen:
    > I'm parsing some text files, and I want to strip blank lines in the
    > process. Is there a simpler way to do this than what I have here?
    > lines = filter(lambda line: len(line.strip()) > 0, lines)


    xlines = (line for line in open(filename) if line.strip())

    Bye,
    bearophile
    , Nov 4, 2008
    #2
    1. Advertising

  3. tmallen

    Larry Bates Guest

    wrote:
    > tmallen:
    >> I'm parsing some text files, and I want to strip blank lines in the
    >> process. Is there a simpler way to do this than what I have here?
    >> lines = filter(lambda line: len(line.strip()) > 0, lines)

    >
    > xlines = (line for line in open(filename) if line.strip())
    >
    > Bye,
    > bearophile


    Of if you want to filter/loop at the same time, or if you don't want all the
    lines in memory at the same time:

    fp = open(filename, 'r')
    for line in fp:
    if not line.strip():
    continue

    #
    # Do something with the non-blank like:
    #


    fp.close()

    -Larry
    Larry Bates, Nov 4, 2008
    #3
  4. tmallen

    tmallen Guest

    On Nov 4, 4:30 pm, wrote:
    > tmallen:
    >
    > > I'm parsing some text files, and I want to strip blank lines in the
    > > process. Is there a simpler way to do this than what I have here?
    > > lines = filter(lambda line: len(line.strip()) > 0, lines)

    >
    > xlines = (line for line in open(filename) if line.strip())
    >
    > Bye,
    > bearophile


    I must be missing something:

    >>> xlines = (line for line in open("new.data") if line.strip())
    >>> xlines

    <generator object at 0x6b648>
    >>> xlines.sort()

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    AttributeError: 'generator' object has no attribute 'sort'

    What do you think?

    Thomas
    tmallen, Nov 4, 2008
    #4
  5. On Tue, 04 Nov 2008 13:27:00 -0800, tmallen wrote:

    > I'm parsing some text files, and I want to strip blank lines in the
    > process. Is there a simpler way to do this than what I have here?
    >
    > lines = filter(lambda line: len(line.strip()) > 0, lines)
    >
    > Thomas



    lines = filter(lambda line: line.strip(), lines)


    --
    Steven
    Steven D'Aprano, Nov 4, 2008
    #5
  6. tmallen

    Chris Rebert Guest

    On Tue, Nov 4, 2008 at 2:30 PM, tmallen <> wrote:
    > On Nov 4, 4:30 pm, wrote:
    >> tmallen:
    >>
    >> > I'm parsing some text files, and I want to strip blank lines in the
    >> > process. Is there a simpler way to do this than what I have here?
    >> > lines = filter(lambda line: len(line.strip()) > 0, lines)

    >>
    >> xlines = (line for line in open(filename) if line.strip())
    >>
    >> Bye,
    >> bearophile

    >
    > I must be missing something:
    >
    >>>> xlines = (line for line in open("new.data") if line.strip())
    >>>> xlines

    > <generator object at 0x6b648>
    >>>> xlines.sort()

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in <module>
    > AttributeError: 'generator' object has no attribute 'sort'
    >
    > What do you think?


    xlines is a generator, not a list. If you don't know what a generator
    is, see the relevant parts of the Python tutorial/manual (Google is
    your friend).
    To sort the generator, you can use 'sorted(xlines)'
    If you need it to actually be a list, you can do 'list(xlines)'

    Cheers,
    Chris
    --
    Follow the path of the Iguana...
    http://rebertia.com

    >
    > Thomas
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
    Chris Rebert, Nov 4, 2008
    #6
  7. tmallen

    Guest

    tmallen
    > I must be missing something:
    >
    > >>> xlines = (line for line in open("new.data") if line.strip())
    > >>> xlines

    > <generator object at 0x6b648>
    > >>> xlines.sort()

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in <module>
    > AttributeError: 'generator' object has no attribute 'sort'
    >
    > What do you think?


    Congratulations, you have just met your first lazy construct ^_^
    That's a generator, it yields nonblank lines one after the other. This
    can be really useful.
    If you want a real array of items, then you can do this:
    lines = list(xlines)
    Or use a list comp.:
    lines = [line for line in open("new.data") if line.strip()]

    Bye,
    bearophile
    , Nov 4, 2008
    #7
  8. tmallen

    Falcolas Guest

    On Nov 4, 3:30 pm, tmallen <> wrote:
    > On Nov 4, 4:30 pm, wrote:
    >
    > > tmallen:

    >
    > > > I'm parsing some text files, and I want to strip blank lines in the
    > > > process. Is there a simpler way to do this than what I have here?
    > > > lines = filter(lambda line: len(line.strip()) > 0, lines)

    >
    > > xlines = (line for line in open(filename) if line.strip())

    >
    > > Bye,
    > > bearophile

    >
    > I must be missing something:
    >
    > >>> xlines = (line for line in open("new.data") if line.strip())
    > >>> xlines

    >
    > <generator object at 0x6b648>>>> xlines.sort()
    >
    > Traceback (most recent call last):
    >   File "<stdin>", line 1, in <module>
    > AttributeError: 'generator' object has no attribute 'sort'
    >
    > What do you think?
    >
    > Thomas


    Using the surrounding parentheses creates a generator object, whereas
    using square brackets would create a list. So, if you want to run list
    operations on the resulting object, you'll want to use the list
    comprehension instead.

    i.e.

    list_o_lines = [line for line in open(filename) if line.strip()]

    Downside is the increased memory usage and processing time as you dump
    the entire file into memory, whereas if you plan to do a "for line in
    xlines:" operation, it would be faster to use the generator.
    Falcolas, Nov 4, 2008
    #8
  9. tmallen

    tmallen Guest

    Between this info and http://www.python.org/doc/2.5.2/tut/node11.html#SECTION00111000000000000000000
    , I'm starting to understand how I'll use generators (I've seen them
    mentioned before, but never used them knowingly).

    > list_o_lines = [line for line in open(filename) if line.strip()]


    +1 for "list_o_lines"

    Thanks for the help!
    Thomas

    On Nov 4, 6:36 pm, Falcolas <> wrote:
    > On Nov 4, 3:30 pm, tmallen <> wrote:
    >
    >
    >
    > > On Nov 4, 4:30 pm, wrote:

    >
    > > > tmallen:

    >
    > > > > I'm parsing some text files, and I want to strip blank lines in the
    > > > > process. Is there a simpler way to do this than what I have here?
    > > > > lines = filter(lambda line: len(line.strip()) > 0, lines)

    >
    > > > xlines = (line for line in open(filename) if line.strip())

    >
    > > > Bye,
    > > > bearophile

    >
    > > I must be missing something:

    >
    > > >>> xlines = (line for line in open("new.data") if line.strip())
    > > >>> xlines

    >
    > > <generator object at 0x6b648>>>> xlines.sort()

    >
    > > Traceback (most recent call last):
    > >   File "<stdin>", line 1, in <module>
    > > AttributeError: 'generator' object has no attribute 'sort'

    >
    > > What do you think?

    >
    > > Thomas

    >
    > Using the surrounding parentheses creates a generator object, whereas
    > using square brackets would create a list. So, if you want to run list
    > operations on the resulting object, you'll want to use the list
    > comprehension instead.
    >
    > i.e.
    >
    > list_o_lines = [line for line in open(filename) if line.strip()]
    >
    > Downside is the increased memory usage and processing time as you dump
    > the entire file into memory, whereas if you plan to do a "for line in
    > xlines:" operation, it would be faster to use the generator.
    tmallen, Nov 4, 2008
    #9
  10. tmallen

    Steve Holden Guest

    tmallen wrote:
    > On Nov 4, 4:30 pm, wrote:
    >> tmallen:
    >>
    >>> I'm parsing some text files, and I want to strip blank lines in the
    >>> process. Is there a simpler way to do this than what I have here?
    >>> lines = filter(lambda line: len(line.strip()) > 0, lines)

    >> xlines = (line for line in open(filename) if line.strip())
    >>
    >> Bye,
    >> bearophile

    >
    > I must be missing something:
    >
    >>>> xlines = (line for line in open("new.data") if line.strip())
    >>>> xlines

    > <generator object at 0x6b648>
    >>>> xlines.sort()

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in <module>
    > AttributeError: 'generator' object has no attribute 'sort'
    >
    > What do you think?
    >

    I think there'd be no advantage to a sort method on a generator, since
    theoretically the last item could be the first required in the sorted
    sequence, so it's necessary to hold all items in memory to ensure the
    sort is correct. So there's no point using a generator in the first place.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
    Steve Holden, Nov 5, 2008
    #10
  11. On Wed, 05 Nov 2008 12:06:42 +1100, Ben Finney wrote:

    > Falcolas <> writes:
    >
    >> Using the surrounding parentheses creates a generator object

    >
    > No. Using the generator expression syntax creates a generator object.
    >
    > Parentheses are irrelevant to whether the expression is a generator
    > expression. The parentheses merely group the expression from surrounding
    > syntax.


    No they are important:

    In [270]: a = x for x in xrange(10)
    ------------------------------------------------------------
    File "<ipython console>", line 1
    a = x for x in xrange(10)
    ^
    <type 'exceptions.SyntaxError'>: invalid syntax


    In [271]: a = (x for x in xrange(10))

    Ciao,
    Marc 'BlackJack' Rintsch
    Marc 'BlackJack' Rintsch, Nov 5, 2008
    #11
  12. On Tue, 04 Nov 2008 20:25:09 -0500, Steve Holden wrote:

    > I think there'd be no advantage to a sort method on a generator, since
    > theoretically the last item could be the first required in the sorted
    > sequence, so it's necessary to hold all items in memory to ensure the
    > sort is correct. So there's no point using a generator in the first
    > place.



    You can't sort something lazily.

    Actually, that's not *quite* true: it only holds for comparison sorts.
    You can sort lazily using non-comparison sorts, such as Counting Sort:

    http://en.wikipedia.org/wiki/Counting_sort

    Arguably, the benefit of giving generators a sort() method would be to
    avoid an explicit call to list. But I think many people would argue that
    was actually a disadvantage, not a benefit, and that the call to list is
    a good thing. I'd agree with them.

    However, sorted() should take a generator argument, and in fact I see it
    does:

    >>> sorted( x+1 for x in (4, 2, 0, 3, 1) )

    [1, 2, 3, 4, 5]



    --
    Steven
    Steven D'Aprano, Nov 5, 2008
    #12
  13. On Wed, 05 Nov 2008 13:18:27 +1100, Ben Finney wrote:

    > Marc 'BlackJack' Rintsch <> writes:
    >
    > Your example shows only that they're important for grouping the
    > expression from surrounding syntax. As I said.
    >
    > They are *not* important for making the expresison be a generator
    > expression in the first place. Parentheses are irrelevant for the
    > generator expression syntax.


    Okay, technically correct but parenthesis belong to generator expressions
    because they have to be there to separate them from surrounding syntax
    with the exception when there are already enclosing parentheses. So
    parenthesis are tied to generator expression syntax.

    Ciao,
    Marc 'BlackJack' Rintsch
    Marc 'BlackJack' Rintsch, Nov 5, 2008
    #13
  14. On Wed, 05 Nov 2008 14:39:36 +1100, Ben Finney wrote:

    > Marc 'BlackJack' Rintsch <> writes:
    >
    >> On Wed, 05 Nov 2008 13:18:27 +1100, Ben Finney wrote:
    >>
    >> > Marc 'BlackJack' Rintsch <> writes:
    >> >
    >> > Your example shows only that they're important for grouping the
    >> > expression from surrounding syntax. As I said.
    >> >
    >> > They are *not* important for making the expresison be a generator
    >> > expression in the first place. Parentheses are irrelevant for the
    >> > generator expression syntax.

    >>
    >> Okay, technically correct but parenthesis belong to generator
    >> expressions because they have to be there to separate them from
    >> surrounding syntax with the exception when there are already enclosing
    >> parentheses. So parenthesis are tied to generator expression syntax.

    >
    > No, I think that's factually wrong *and* confusing.
    >
    > >>> list(i + 7 for i in range(10))

    > [7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
    >
    > Does this demonstrate that parentheses are “tied to†integer literal
    > syntax? No.


    You can use integer literals without parenthesis, like the 7 above, but
    you can't use generator expressions without them. They are always
    there. In that way parenthesis are tied to generator expressions.

    If I see the pattern ``f(x) for x in obj if c(x)`` I look if it is
    enclosed in parenthesis or brackets to decide if it is a list
    comprehension or a generator expression. That may not reflect the formal
    grammar, but it is IMHO the easiest and pragmatic way to look at this as
    a human programmer.

    Ciao,
    Marc 'BlackJack' Rintsch
    Marc 'BlackJack' Rintsch, Nov 5, 2008
    #14
  15. tmallen

    Jorgen Grahn Guest

    On Tue, 04 Nov 2008 15:36:23 -0600, Larry Bates <> wrote:
    > wrote:
    >> tmallen:
    >>> I'm parsing some text files, and I want to strip blank lines in the
    >>> process. Is there a simpler way to do this than what I have here?
    >>> lines = filter(lambda line: len(line.strip()) > 0, lines)

    ....

    > Of if you want to filter/loop at the same time, or if you don't want all the
    > lines in memory at the same time:


    Or if you want to support potentially infinite input streams, such as
    a pipe or socket. There are many reasons this is my preferred way of
    going through a text file.

    > fp = open(filename, 'r')
    > for line in fp:
    > if not line.strip():
    > continue
    >
    > #
    > # Do something with the non-blank like:
    > #
    >
    >
    > fp.close()


    Often, you want to at least rstrip() all lines anyway,
    for other reasons, and then the extra cost is even less:

    line = line.rstrip()
    if not line: continue
    # do something with the rstripped, nonblank lines

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
    \X/ snipabacken.se> R'lyeh wgah'nagl fhtagn!
    Jorgen Grahn, Nov 5, 2008
    #15
  16. tmallen

    tmallen Guest

    Why do I feel like the coding style in Lutz' "Programming Python" is
    very far from idiomatic Python? The content feels dated, and I find
    that most answers that I get for Python questions use a different
    style from the sort of code I see in this book.

    Thomas

    On Nov 5, 7:15 am, Jorgen Grahn <> wrote:
    > On Tue, 04 Nov 2008 15:36:23 -0600, Larry Bates <> wrote:
    > > wrote:
    > >> tmallen:
    > >>> I'm parsing some text files, and I want to strip blank lines in the
    > >>> process. Is there a simpler way to do this than what I have here?
    > >>> lines = filter(lambda line: len(line.strip()) > 0, lines)

    >
    > ...
    >
    > > Of if you want to filter/loop at the same time, or if you don't want all the
    > > lines in memory at the same time:

    >
    > Or if you want to support potentially infinite input streams, such as
    > a pipe or socket.  There are many reasons this is my preferred way of
    > going through a text file.
    >
    > > fp = open(filename, 'r')
    > > for line in fp:
    > >      if not line.strip():
    > >          continue

    >
    > >      #
    > >      # Do something with the non-blank like:
    > >      #

    >
    > > fp.close()

    >
    > Often, you want to at least rstrip() all lines anyway,
    > for other reasons, and then the extra cost is even less:
    >
    >        line = line.rstrip()
    >        if not line: continue
    >        # do something with the rstripped, nonblank lines
    >
    > /Jorgen
    >
    > --
    >   // Jorgen Grahn <grahn@        Ph'nglui mglw'nafh Cthulhu
    > \X/     snipabacken.se>          R'lyeh wgah'nagl fhtagn!
    tmallen, Nov 5, 2008
    #16
  17. tmallen

    Lie Guest

    On Nov 5, 4:56 pm, Marc 'BlackJack' Rintsch <> wrote:
    > On Wed, 05 Nov 2008 14:39:36 +1100, Ben Finney wrote:
    > > Marc 'BlackJack' Rintsch <> writes:

    >
    > >> On Wed, 05 Nov 2008 13:18:27 +1100, Ben Finney wrote:

    >
    > >> > Marc 'BlackJack' Rintsch <> writes:

    >
    > >> > Your example shows only that they're important for grouping the
    > >> > expression from surrounding syntax. As I said.

    >
    > >> > They are *not* important for making the expresison be a generator
    > >> > expression in the first place. Parentheses are irrelevant for the
    > >> > generator expression syntax.

    >
    > >> Okay, technically correct but parenthesis belong to generator
    > >> expressions because they have to be there to separate them from
    > >> surrounding syntax with the exception when there are already enclosing
    > >> parentheses.  So parenthesis are tied to generator expression syntax..

    >
    > > No, I think that's factually wrong *and* confusing.

    >
    > >     >>> list(i + 7 for i in range(10))
    > >     [7, 8, 9, 10, 11, 12, 13, 14, 15, 16]

    >
    > > Does this demonstrate that parentheses are “tied to” integer literal
    > > syntax? No.

    >
    > You can use integer literals without parenthesis, like the 7 above, but
    > you can't use generator expressions without them.  They are always
    > there.  In that way parenthesis are tied to generator expressions.
    >
    > If I see the pattern ``f(x) for x in obj if c(x)`` I look if it is
    > enclosed in parenthesis or brackets to decide if it is a list
    > comprehension or a generator expression.  That may not reflect the formal
    > grammar, but it is IMHO the easiest and pragmatic way to look at this as
    > a human programmer.
    >
    > Ciao,
    >         Marc 'BlackJack' Rintsch


    The situation is similar to tuples. What makes a tuple is the commas,
    not the parens.
    What makes a generator expression is "<exp> for <var-or-tuple> in
    <exp>".

    Parenthesis is generally required because without it, it's almost
    impossible to differentiate it with the surrounding. But it is not
    part of the formally required syntax.
    Lie, Nov 5, 2008
    #17
  18. Lie <> writes:
    > What makes a generator expression is "<exp> for <var-or-tuple> in
    > <exp>".
    >
    > Parenthesis is generally required because without it, it's almost
    > impossible to differentiate it with the surrounding. But it is not
    > part of the formally required syntax.


    .... But *every* generator expression is surrounded by parentheses, isn't
    it?

    --
    Arnaud
    Arnaud Delobelle, Nov 5, 2008
    #18
  19. On Wed, 05 Nov 2008 21:23:57 +0000, Arnaud Delobelle wrote:

    > Lie <> writes:
    >> What makes a generator expression is "<exp> for <var-or-tuple> in
    >> <exp>".
    >>
    >> Parenthesis is generally required because without it, it's almost
    >> impossible to differentiate it with the surrounding. But it is not part
    >> of the formally required syntax.

    >
    > ... But *every* generator expression is surrounded by parentheses, isn't
    > it?


    Yes, but sometimes they are there in order to call a function, not to
    form the generator expression.

    I'm surprised that nobody yet has RTFM:

    http://docs.python.org/reference/expressions.html

    Steven D'Aprano, Nov 5, 2008
    #19
  20. tmallen

    Miles Guest

    Ben Finney wrote:
    > Falcolas writes:
    >
    >> Using the surrounding parentheses creates a generator object

    >
    > No. Using the generator expression syntax creates a generator object.
    >
    > Parentheses are irrelevant to whether the expression is a generator
    > expression. The parentheses merely group the expression from
    > surrounding syntax.


    As others have pointed out, the parentheses are part of the generator
    syntax. If not for the parentheses, a list comprehension would be
    indistinguishable from a list literal with a single element, a
    generator object. It's also worth remembering that list
    comprehensions are distinct from generator expressions and don't
    require the creation of a generator object.

    -Miles
    Miles, Nov 5, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Alex Nitulescu
    Replies:
    5
    Views:
    1,339
    Alan Silver
    Mar 3, 2005
  2. Alessandro Bottoni

    Is there a better/simpler logging module?

    Alessandro Bottoni, Aug 8, 2005, in forum: Python
    Replies:
    2
    Views:
    247
    Michael Hoffman
    Aug 8, 2005
  3. laredotornado

    Simpler way to filter

    laredotornado, Sep 5, 2008, in forum: Java
    Replies:
    6
    Views:
    330
    Arne Vajhøj
    Sep 6, 2008
  4. trans.  (T. Onoma)

    More better way to split off blank lines

    trans. (T. Onoma), Oct 17, 2004, in forum: Ruby
    Replies:
    0
    Views:
    107
    trans. (T. Onoma)
    Oct 17, 2004
  5. Julian Leviston

    Is there a simpler way to do this?

    Julian Leviston, Aug 17, 2005, in forum: Ruby
    Replies:
    12
    Views:
    226
    Serpent
    Aug 28, 2005
Loading...

Share This Page