Is there a better/simpler way to filter blank lines?

T

tmallen

I'm parsing some text files, and I want to strip blank lines in the
process. Is there a simpler way to do this than what I have here?

lines = filter(lambda line: len(line.strip()) > 0, lines)

Thomas
 
B

bearophileHUGS

tmallen:
I'm parsing some text files, and I want to strip blank lines in the
process. Is there a simpler way to do this than what I have here?
lines = filter(lambda line: len(line.strip()) > 0, lines)

xlines = (line for line in open(filename) if line.strip())

Bye,
bearophile
 
L

Larry Bates

tmallen:

xlines = (line for line in open(filename) if line.strip())

Bye,
bearophile

Of if you want to filter/loop at the same time, or if you don't want all the
lines in memory at the same time:

fp = open(filename, 'r')
for line in fp:
if not line.strip():
continue

#
# Do something with the non-blank like:
#


fp.close()

-Larry
 
T

tmallen

tmallen:


xlines = (line for line in open(filename) if line.strip())

Bye,
bearophile

I must be missing something:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'generator' object has no attribute 'sort'

What do you think?

Thomas
 
S

Steven D'Aprano

I'm parsing some text files, and I want to strip blank lines in the
process. Is there a simpler way to do this than what I have here?

lines = filter(lambda line: len(line.strip()) > 0, lines)

Thomas


lines = filter(lambda line: line.strip(), lines)
 
C

Chris Rebert

I must be missing something:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'generator' object has no attribute 'sort'

What do you think?

xlines is a generator, not a list. If you don't know what a generator
is, see the relevant parts of the Python tutorial/manual (Google is
your friend).
To sort the generator, you can use 'sorted(xlines)'
If you need it to actually be a list, you can do 'list(xlines)'

Cheers,
Chris
 
B

bearophileHUGS

tmallen
I must be missing something:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'generator' object has no attribute 'sort'

What do you think?

Congratulations, you have just met your first lazy construct ^_^
That's a generator, it yields nonblank lines one after the other. This
can be really useful.
If you want a real array of items, then you can do this:
lines = list(xlines)
Or use a list comp.:
lines = [line for line in open("new.data") if line.strip()]

Bye,
bearophile
 
F

Falcolas

I must be missing something:


<generator object at 0x6b648>>>> xlines.sort()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'generator' object has no attribute 'sort'

What do you think?

Thomas

Using the surrounding parentheses creates a generator object, whereas
using square brackets would create a list. So, if you want to run list
operations on the resulting object, you'll want to use the list
comprehension instead.

i.e.

list_o_lines = [line for line in open(filename) if line.strip()]

Downside is the increased memory usage and processing time as you dump
the entire file into memory, whereas if you plan to do a "for line in
xlines:" operation, it would be faster to use the generator.
 
T

tmallen

Between this info and http://www.python.org/doc/2.5.2/tut/node11.html#SECTION00111000000000000000000
, I'm starting to understand how I'll use generators (I've seen them
mentioned before, but never used them knowingly).
list_o_lines = [line for line in open(filename) if line.strip()]

+1 for "list_o_lines"

Thanks for the help!
Thomas

On Nov 4, 4:30 pm, (e-mail address removed) wrote:
I must be missing something:
<generator object at 0x6b648>>>> xlines.sort()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'generator' object has no attribute 'sort'
What do you think?

Using the surrounding parentheses creates a generator object, whereas
using square brackets would create a list. So, if you want to run list
operations on the resulting object, you'll want to use the list
comprehension instead.

i.e.

list_o_lines = [line for line in open(filename) if line.strip()]

Downside is the increased memory usage and processing time as you dump
the entire file into memory, whereas if you plan to do a "for line in
xlines:" operation, it would be faster to use the generator.
 
S

Steve Holden

tmallen said:
I must be missing something:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'generator' object has no attribute 'sort'

What do you think?
I think there'd be no advantage to a sort method on a generator, since
theoretically the last item could be the first required in the sorted
sequence, so it's necessary to hold all items in memory to ensure the
sort is correct. So there's no point using a generator in the first place.

regards
Steve
 
M

Marc 'BlackJack' Rintsch

No. Using the generator expression syntax creates a generator object.

Parentheses are irrelevant to whether the expression is a generator
expression. The parentheses merely group the expression from surrounding
syntax.

No they are important:

In [270]: a = x for x in xrange(10)
------------------------------------------------------------
File "<ipython console>", line 1
a = x for x in xrange(10)
^
<type 'exceptions.SyntaxError'>: invalid syntax


In [271]: a = (x for x in xrange(10))

Ciao,
Marc 'BlackJack' Rintsch
 
S

Steven D'Aprano

I think there'd be no advantage to a sort method on a generator, since
theoretically the last item could be the first required in the sorted
sequence, so it's necessary to hold all items in memory to ensure the
sort is correct. So there's no point using a generator in the first
place.


You can't sort something lazily.

Actually, that's not *quite* true: it only holds for comparison sorts.
You can sort lazily using non-comparison sorts, such as Counting Sort:

http://en.wikipedia.org/wiki/Counting_sort

Arguably, the benefit of giving generators a sort() method would be to
avoid an explicit call to list. But I think many people would argue that
was actually a disadvantage, not a benefit, and that the call to list is
a good thing. I'd agree with them.

However, sorted() should take a generator argument, and in fact I see it
does:
[1, 2, 3, 4, 5]
 
M

Marc 'BlackJack' Rintsch

Your example shows only that they're important for grouping the
expression from surrounding syntax. As I said.

They are *not* important for making the expresison be a generator
expression in the first place. Parentheses are irrelevant for the
generator expression syntax.

Okay, technically correct but parenthesis belong to generator expressions
because they have to be there to separate them from surrounding syntax
with the exception when there are already enclosing parentheses. So
parenthesis are tied to generator expression syntax.

Ciao,
Marc 'BlackJack' Rintsch
 
M

Marc 'BlackJack' Rintsch

Marc 'BlackJack' Rintsch said:
Okay, technically correct but parenthesis belong to generator
expressions because they have to be there to separate them from
surrounding syntax with the exception when there are already enclosing
parentheses. So parenthesis are tied to generator expression syntax.

No, I think that's factually wrong *and* confusing.
[7, 8, 9, 10, 11, 12, 13, 14, 15, 16]

Does this demonstrate that parentheses are “tied to†integer literal
syntax? No.

You can use integer literals without parenthesis, like the 7 above, but
you can't use generator expressions without them. They are always
there. In that way parenthesis are tied to generator expressions.

If I see the pattern ``f(x) for x in obj if c(x)`` I look if it is
enclosed in parenthesis or brackets to decide if it is a list
comprehension or a generator expression. That may not reflect the formal
grammar, but it is IMHO the easiest and pragmatic way to look at this as
a human programmer.

Ciao,
Marc 'BlackJack' Rintsch
 
J

Jorgen Grahn

....

Of if you want to filter/loop at the same time, or if you don't want all the
lines in memory at the same time:

Or if you want to support potentially infinite input streams, such as
a pipe or socket. There are many reasons this is my preferred way of
going through a text file.
fp = open(filename, 'r')
for line in fp:
if not line.strip():
continue

#
# Do something with the non-blank like:
#


fp.close()

Often, you want to at least rstrip() all lines anyway,
for other reasons, and then the extra cost is even less:

line = line.rstrip()
if not line: continue
# do something with the rstripped, nonblank lines

/Jorgen
 
T

tmallen

Why do I feel like the coding style in Lutz' "Programming Python" is
very far from idiomatic Python? The content feels dated, and I find
that most answers that I get for Python questions use a different
style from the sort of code I see in this book.

Thomas
 
L

Lie

No, I think that's factually wrong *and* confusing.
    >>> list(i + 7 for i in range(10))
    [7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
Does this demonstrate that parentheses are “tied to” integer literal
syntax? No.

You can use integer literals without parenthesis, like the 7 above, but
you can't use generator expressions without them.  They are always
there.  In that way parenthesis are tied to generator expressions.

If I see the pattern ``f(x) for x in obj if c(x)`` I look if it is
enclosed in parenthesis or brackets to decide if it is a list
comprehension or a generator expression.  That may not reflect the formal
grammar, but it is IMHO the easiest and pragmatic way to look at this as
a human programmer.

Ciao,
        Marc 'BlackJack' Rintsch

The situation is similar to tuples. What makes a tuple is the commas,
not the parens.
What makes a generator expression is "<exp> for <var-or-tuple> in
<exp>".

Parenthesis is generally required because without it, it's almost
impossible to differentiate it with the surrounding. But it is not
part of the formally required syntax.
 
A

Arnaud Delobelle

Lie said:
What makes a generator expression is "<exp> for <var-or-tuple> in
<exp>".

Parenthesis is generally required because without it, it's almost
impossible to differentiate it with the surrounding. But it is not
part of the formally required syntax.

.... But *every* generator expression is surrounded by parentheses, isn't
it?
 
S

Steven D'Aprano

... But *every* generator expression is surrounded by parentheses, isn't
it?

Yes, but sometimes they are there in order to call a function, not to
form the generator expression.

I'm surprised that nobody yet has RTFM:

http://docs.python.org/reference/expressions.html

A generator expression is a compact generator notation in parentheses:

generator_expression ::= "(" expression genexpr_for ")"
genexpr_for ::= "for" target_list "in" or_test [genexpr_iter]
genexpr_iter ::= genexpr_for | genexpr_if
genexpr_if ::= "if" old_expression [genexpr_iter]

....
The parentheses can be omitted on calls with only one argument.
[end quote]

It seems to me that the FM says that the parentheses *are* part of the
syntax for a generator expression, but if some other syntactic construct
(e.g. a function call) provides the parentheses, then you don't need to
supply a second, redundant, pair.

I believe that this is the definitive answer, short of somebody reading
the source code and claiming the documentation is wrong.
 
M

Miles

Ben said:
No. Using the generator expression syntax creates a generator object.

Parentheses are irrelevant to whether the expression is a generator
expression. The parentheses merely group the expression from
surrounding syntax.

As others have pointed out, the parentheses are part of the generator
syntax. If not for the parentheses, a list comprehension would be
indistinguishable from a list literal with a single element, a
generator object. It's also worth remembering that list
comprehensions are distinct from generator expressions and don't
require the creation of a generator object.

-Miles
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top