Looking under Python's hood: Will we find a high performance orclunky engine?

Rick Johnson · Jan 22, 2012

What does Python do when presented with this code?

py> [line.strip('\n') for line in f.readlines()]

If Python reads all the file lines first and THEN iterates AGAIN to do
the strip; we are driving a Fred flintstone mobile. If however Python
strips each line of the lines passed into readlines in one fell swoop,
we made the correct choice.

Which is it Pythonistas? Which is it?

Heiko Wundram · Jan 22, 2012

Am 22.01.2012 16:50, schrieb Rick Johnson:

What does Python do when presented with this code?

py> [line.strip('\n') for line in f.readlines()]

If Python reads all the file lines first and THEN iterates AGAIN to do
the strip; we are driving a Fred flintstone mobile. If however Python
strips each line of the lines passed into readlines in one fell swoop,
we made the correct choice.

Which is it Pythonistas? Which is it?

You aren't one (considering how vocal you are in arguing for changes to
the language)?

So: shouldn't you be able to answer your own question?

Robert Kern · Jan 22, 2012

What does Python do when presented with this code?

py> [line.strip('\n') for line in f.readlines()]

If Python reads all the file lines first and THEN iterates AGAIN to do
the strip; we are driving a Fred flintstone mobile. If however Python
strips each line of the lines passed into readlines in one fell swoop,
we made the correct choice.

Which is it Pythonistas? Which is it?

The .readlines() method is an old API that predates the introduction of
iterators to Python. The modern way to do this in one iteration is to use the
file object as an iterator:

[line.strip('\n') for line in f]

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Michael Torrie · Jan 23, 2012

What does Python do when presented with this code?

py> [line.strip('\n') for line in f.readlines()]

If Python reads all the file lines first and THEN iterates AGAIN to do
the strip; we are driving a Fred flintstone mobile. If however Python
strips each line of the lines passed into readlines in one fell swoop,
we made the correct choice.

Which is it Pythonistas? Which is it?

You're doing it wrong, obviously. I'm actually surprised that an expert
such as yourself would read a file in this way. In any language.
Surely you would iterate over the file object which is the obvious way
to do it.

I guess we'll chalk this up as another python pitfall. Looking forward
to your programming language which will prevent such things while
maintaining the purity and beauty of Python's ideals.

Rick Johnson · Jan 23, 2012

What does Python do when presented with this code?

Click to expand...

py> [line.strip('\n') for line in f.readlines()]

Click to expand...

If Python reads all the file lines first and THEN iterates AGAIN to do
the strip; we are driving a Fred flintstone mobile. If however Python
strips each line of the lines passed into readlines in one fell swoop,
we made the correct choice.

Click to expand...

Which is it Pythonistas? Which is it?

Click to expand...

You're doing it wrong, obviously. I'm actually surprised that an expert
such as yourself would read a file in this way. In any language.
Surely you would iterate over the file object which is the obvious way
to do it.

That's just the point. If an expert such as myself can make a simple
mistake as this, then one can only expect that the neophytes are going
to suffer greatly. I wonder how many tutorials are out there in WWW
still teaching old ways of writing Python code? Old ways that have
been superseded by newer versions.

I guess we'll chalk this up as another python pitfall. Looking forward
to your programming language which will prevent such things while
maintaining the purity and beauty of Python's ideals.

Purity is a myth perpetrated by those who are blinded by their own
sanctimonious ideals of self worth. Any language that vows to be
"pure" (in a backwards compatible sense) is doomed to bitrot.

Programming languages and creators have a father and daughter
relationship.

The creator (father) wants his language (daughter) to be pure forever
and ever. You can lock her away in a tower with a titanium chastity
belt and stick your head in the sand but sooner or later a prince
charming (evolution) is going to come along and defile her. Will you
have the strength to realize that your daughters life is more
important than your feeble attachments to nostalgic purity?

The fact is, Python's purity has been defiled already. And although we
must give GvR the credit he deserves for making these long overdue
changes, he did not go far enough. Essentially he has accepted his
daughter will not longer be pure but he refuses to acknowledge the new
communion. His internal drive to evolve hath waxed cold. He has the
license plate and the job title and that seems to be enough. Where is
the passion?

Michael Torrie · Jan 23, 2012

That's just the point. If an expert such as myself can make a simple
mistake as this, then one can only expect that the neophytes are going
to suffer greatly. I wonder how many tutorials are out there in WWW
still teaching old ways of writing Python code? Old ways that have
been superseded by newer versions.

Well that's that then. Everything is good.

Programming languages and creators have a father and daughter
relationship.

The creator (father) wants his language (daughter) to be pure forever
and ever.

Good thing you are so funny, or back into the plonk file you would go.
You're welcome to fork and deflower our pure Python if you want (you
seem to be quite stuck on sex metaphors). Please feel free. Looking
forward to your less-virginal take on the language. Seems like you know
what's best, and only you can save and ravish Python at the same time.

Steven D'Aprano · Jan 23, 2012

What does Python do when presented with this code?

py> [line.strip('\n') for line in f.readlines()]

If Python reads all the file lines first and THEN iterates AGAIN to do
the strip; we are driving a Fred flintstone mobile.

Nonsense. File-like objects offer two APIs: there is a lazy iterator
approach, using the file-like object itself as an iterator, and an eager
read-it-all-at-once approach, offered by the venerable readlines()
method. readlines *deliberately* reads the entire file, and if you as a
developer do so by accident, you have no-one to blame but yourself. Only
a poor tradesman blames his tools instead of taking responsibility for
learning how to use them himself.

You should use whichever approach is more appropriate for your situation.
You might want to consider reading from the file as quickly as possible,
in one big chunk if you can, so you can close it again and let other
applications have access to it. Or you might not care. The choice is
yours.

For small files, readlines() will probably be faster, although for small
files it won't make much practical difference. Who cares whether it takes
0.01ms or 0.02ms? For medium sized files, say, a few thousand lines, it
could go either way, depending on memory use, the size of the internal
file buffer, and implementation details. Only for files large enough that
allocating memory for all the lines at once becomes significant will lazy
iteration be a clear winner.

But if the file is that big, are you sure that a list comprehension is
the right tool in the first place?

In general, you should not care greatly which of the two you use, unless
profiling your application shows that this is the bottleneck.

But it is extremely unlikely that copying even a few thousands lines
around memory will be slower than reading them from disk in the first
place. Unless you expect to be handling truly large files, you've got
more important things to optimize before wasting time caring about this.

Grant Edwards · Jan 23, 2012

What does Python do when presented with this code?

It does what you tell it to. What else would you expect?

88888 Dihedral · Jan 23, 2012

åœ¨ 2012å¹´1æœˆ23æ—¥æ˜ŸæœŸä¸€UTC+8ä¸Šåˆ2æ—¶01åˆ†11ç§’ï¼ŒRobert Kernå†™é“ï¼š

What does Python do when presented with this code?

py> [line.strip('\n') for line in f.readlines()]

If Python reads all the file lines first and THEN iterates AGAIN to do
the strip; we are driving a Fred flintstone mobile. If however Python
strips each line of the lines passed into readlines in one fell swoop,
we made the correct choice.

Which is it Pythonistas? Which is it?

Click to expand...

The .readlines() method is an old API that predates the introduction of
iterators to Python. The modern way to do this in one iteration is to usethe
file object as an iterator:

[line.strip('\n') for line in f]

This is more powerful by turning an object to be iterable.

But the list comprehension violates the basic operating
principle of the iteratee chaining rule in programming.

I know manny python programmers just abandon the list comprehension
in non-trivial processes.

88888 Dihedral · Jan 23, 2012

åœ¨ 2012å¹´1æœˆ23æ—¥æ˜ŸæœŸä¸€UTC+8ä¸Šåˆ2æ—¶01åˆ†11ç§’ï¼ŒRobert Kernå†™é“ï¼š

What does Python do when presented with this code?

py> [line.strip('\n') for line in f.readlines()]

If Python reads all the file lines first and THEN iterates AGAIN to do
the strip; we are driving a Fred flintstone mobile. If however Python
strips each line of the lines passed into readlines in one fell swoop,
we made the correct choice.

Which is it Pythonistas? Which is it?

Click to expand...

The .readlines() method is an old API that predates the introduction of
iterators to Python. The modern way to do this in one iteration is to usethe
file object as an iterator:

[line.strip('\n') for line in f]

This is more powerful by turning an object to be iterable.

But the list comprehension violates the basic operating
principle of the iteratee chaining rule in programming.

I know manny python programmers just abandon the list comprehension
in non-trivial processes.

alex23 · Jan 24, 2012

åœ¨ 2012å¹´1æœˆ23æ—¥æ˜ŸæœŸä¸€UTC+8ä¸Šåˆ2æ—¶01åˆ†11ç§’ï¼ŒRobert Kernå†™é“ï¼š

Â Â [line.strip('\n') for line in f]

Click to expand...

This is more powerful by turning an object to be iterable.
But the list comprehension violates the basic operating
principle of the iteratee chaining rule in programming.

Thankfully, the syntax is almost identical for generators, which are
chain-able:

noEOLs = (line.strip('\n') for line in f)
txtSuffix = (line for line in noEOLs if line.endswith('txt'))
...etc

I know manny python programmers just abandon the list comprehension
in non-trivial processes.

Really? Observation of the python mailing list indicates the opposite:
people seem inclined to use them no matter what.

Also: PLEASE STOP DOUBLE POSTING.

Chris Angelico · Jan 24, 2012

Really? Observation of the python mailing list indicates the opposite:
people seem inclined to use them no matter what.

You're responding to a bot, although I think a human must be
post-processing and selecting for humour.

ChrisA

Looking under Python's hood: Will we find a high performance orclunky engine?

Rick Johnson

Heiko Wundram

Robert Kern

Michael Torrie

Rick Johnson

Michael Torrie

Steven D'Aprano

Grant Edwards

88888 Dihedral

88888 Dihedral

alex23

Chris Angelico

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads