altering an object as you iterate over it?

J

John Salerno

What is the best way of altering something (in my case, a file) while
you are iterating over it? I've tried this before by accident and got an
error, naturally.

I'm trying to read the lines of a file and remove all the blank ones.
One solution I tried is to open the file and use readlines(), then copy
that list into another variable, but this doesn't seem very efficient to
have two variables representing the file.

Perhaps there's also some better to do it than this, including using
readlines(), but I'm most interested in just how you edit something as
you are iterating with it.

Thanks.
 
J

John Salerno

John said:
What is the best way of altering something (in my case, a file) while
you are iterating over it? I've tried this before by accident and got an
error, naturally.

I'm trying to read the lines of a file and remove all the blank ones.
One solution I tried is to open the file and use readlines(), then copy
that list into another variable, but this doesn't seem very efficient to
have two variables representing the file.

Perhaps there's also some better to do it than this, including using
readlines(), but I'm most interested in just how you edit something as
you are iterating with it.

Thanks.

Slightly new question as well. here's my code:

phonelist = open('file').readlines()
new_phonelist = phonelist

for line in phonelist:
if line == '\n':
new_phonelist.remove(line)

import pprint
pprint.pprint(new_phonelist)

But I notice that there are still several lines that print out as '\n',
so why doesn't it work for all lines?
 
B

bruno at modulix

John said:
What is the best way of altering something (in my case, a file) while
you are iterating over it? I've tried this before by accident and got an
error, naturally.

I'm trying to read the lines of a file and remove all the blank ones.
One solution I tried is to open the file and use readlines(), then copy
that list into another variable, but this doesn't seem very efficient to
have two variables representing the file.

If the file is huge, this can be a problem. But you cannot modify a text
file in place anyway.

For the general case, the best way to go would probably be an iterator:

def iterfilter(fileObj):
for line in fileObj:
if line.strip():
yield line


f = open(path, 'r')
for line in iterfilter(f):
doSomethingWith(line)

Now if what you want to do is just to rewrite the file without the blank
files, you need to use a second file:

fin = open(path, 'r')
fout = open(temp, 'w')
for line in fin:
if line.strip():
fout.write(line)
fin.close()
fout.close()

then delete path and rename temp, and you're done. And yes, this is
actually the canonical way to do this !-)
 
J

John Salerno

bruno said:
Now if what you want to do is just to rewrite the file without the blank
files, you need to use a second file:

fin = open(path, 'r')
fout = open(temp, 'w')
for line in fin:
if line.strip():
fout.write(line)
fin.close()
fout.close()

then delete path and rename temp, and you're done. And yes, this is
actually the canonical way to do this !-)

Thanks, that's what I want. Seems a little strange, but at least you
showed me that line.strip() is far better than line == '\n'
 
P

Paul McGuire

John Salerno said:
Slightly new question as well. here's my code:

phonelist = open('file').readlines()
new_phonelist = phonelist

for line in phonelist:
if line == '\n':
new_phonelist.remove(line)

import pprint
pprint.pprint(new_phonelist)

But I notice that there are still several lines that print out as '\n',
so why doesn't it work for all lines?

Okay, so it looks like you are moving away from modifying a list while
iterating over it. In general this is good practice, that is, it is good
practice to *not* modify a list while iterating over it (although if you
*must* do this, it is possible, just iterate from back-to-front instead of
front to back, so that deletions don't mess up your "next" pointer).

Your coding style is a little dated - are you using an old version of
Python? This style is the old-fashioned way:

noblanklines = []
lines = open("filename.dat").readlines()
for line in lines:
if line != '\n':
noblanklines.append(lin)

1. open("xxx") still works - not sure if it's even deprecated or not - but
the new style is to use the file class
2. the file class is itself an iterator, so no need to invoke readlines
3. no need for such a simple for loop, a list comprehension will do the
trick - or even a generator expression passed to a list constructor.

So this construct collapses down to:

noblanklines = [ line for line in file("filename.dat") if line != '\n' ]


Now to your question about why '\n' lines persist into your new list. The
answer is - you are STILL UPDATING THE LIST YOUR ARE ITERATING OVER!!!
Here's your code:

new_phonelist = phonelist

for line in phonelist:
if line == '\n':
new_phonelist.remove(line)

phonelist and new_phonelist are just two names bound to the same list! If
you have two consecutive '\n's in the file (say lines 3 and 4), then
removing the first (line 3) shortens the list by one, so that line 4 becomes
the new line 3. Then you advance to the next line, being line 4, and the
second '\n' has been skipped over.

Also, don't confuse remove with del. new_phonelist.remove(line) does a
search of new_phonelist for the first matching entry of line. We know line
= '\n' - all this is doing is scanning through new_phonelist and removing
the first occurrence of '\n'. You'd do just as well with:

numEmptyLines = lines.count('\n')
for i in range( numEmptyLines ):
lines.remove('\n')

Why didn't I just write this:

for i in range( lines.count('\n') ):
lines.remove('\n')

Because lines.count('\n') would be evaluated every time in the loop,
reducing by one each time because of the line we'd removed. Talk about
sucky performance!

You might also want to strip whitespace from your lines - I expect while you
are removing blank lines, a line composed of all spaces and/or tabs would be
equally removable. Try this:

lines = map(str.rstrip, file("XYZZY.DAT") )

-- Paul
 
J

John Salerno

Paul said:
Your coding style is a little dated - are you using an old version of
Python? This style is the old-fashioned way:

I'm sure it has more to do with the fact that I'm new to Python, but
what is old-fashioned about open()? Does file() do anything different? I
know they are synonymous, but I like open because it seems like it's
more self-describing than 'file'.
Now to your question about why '\n' lines persist into your new list. The
answer is - you are STILL UPDATING THE LIST YOUR ARE ITERATING OVER!!!

Doh! I see that now! :)

You might also want to strip whitespace from your lines

Another good hint. Thanks for the reply!
 
J

James Stroud

Paul said:
Your coding style is a little dated - are you using an old version of
Python? This style is the old-fashioned way: [clip]
1. open("xxx") still works - not sure if it's even deprecated or not - but
the new style is to use the file class


Python 2.3.4 (#4, Oct 25 2004, 21:40:10)
[GCC 3.3.2 (Mandrake Linux 10.0 3.3.2-6mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
py> open is file
True

James
--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
 
P

Paul McGuire

John Salerno said:
I'm sure it has more to do with the fact that I'm new to Python, but
what is old-fashioned about open()? Does file() do anything different? I
know they are synonymous, but I like open because it seems like it's
more self-describing than 'file'.

I think it is just part of the objectification trend - "f =
open('xyzzy.dat')" is sort of a functional/verb concept, so it has to return
something, and its something non-objecty like a file handle - urk! Instead,
using "f = file('xyzzy.dat')" is more of an object construction concept - "I
am creating a file object around 'xyzzy.dat' that I will interact with." In
practice, yes, they both do the same thing. Note though, the asymmetry of
"f = open('blah')" and "f.close()" - there is no "close(f)". I see now in
the help for "file" this statement:

Note: open() is an alias for file().

Sounds like some global namespace pollution that may be up for review come
the new Python millennium (Py3K, that is).
Doh! I see that now! :)

Sorry about the ALL CAPS... I think I got a little rant-ish in that last
post, didn't mean to shout. :)

Thanks for being a good sport,
-- Paul
 
J

John Salerno

Paul said:
Sorry about the ALL CAPS... I think I got a little rant-ish in that last
post, didn't mean to shout. :)

Thanks for being a good sport,

Heh heh, actually it was the all caps that kept making me read it over
and over until I really knew what you were saying! :)
 
J

John Salerno

Paul said:
I think it is just part of the objectification trend - "f =
open('xyzzy.dat')" is sort of a functional/verb concept, so it has to return
something, and its something non-objecty like a file handle - urk! Instead,
using "f = file('xyzzy.dat')" is more of an object construction concept

I see what you mean, but I think that's why I like using open, because I
like having my functions be verbs instead of nouns.
Note though, the asymmetry of
"f = open('blah')" and "f.close()" - there is no "close(f)".

I'm not sure that's a perfect comparison though, because the counterpart
of close(f) would be open(f), and whether you use file() or open(),
neither is taking f as the parameter like close() does, and you aren't
calling close() on 'blah' above.
 
M

Marc 'BlackJack' Rintsch

1. open("xxx") still works - not sure if it's even deprecated or not - but
the new style is to use the file class

It's not deprecated and may be still used for opening files. I guess the
main reason for introducing `file` as a synonym was the possibility to
inherit from builtins. Inheriting from `open` looks quite strange.

Ciao,
Marc 'BlackJack' Rintsch
 
R

Richard Townsend

Paul said:
Your coding style is a little dated - are you using an old version of
Python? This style is the old-fashioned way: [clip]
1. open("xxx") still works - not sure if it's even deprecated or not - but
the new style is to use the file class


Python 2.3.4 (#4, Oct 25 2004, 21:40:10)
[GCC 3.3.2 (Mandrake Linux 10.0 3.3.2-6mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
py> open is file
True

James

As part of a discussion on Python-Dev in 2004 about using open() or file()
Guido replied:
Then should the following line in the reference be changed?

"The file() constructor is new in Python 2.2. The previous spelling,
open(), is retained for compatibility, and is an alias for file()."

That *strongly* suggests that the preferred spelling is file(), and
that open() shouldn't be used for new code.

Oops, yes. I didn't write that, and it doesn't convey my feelings
about file() vs. open(). Here's a suggestion for better words:

"The file class is new in Python 2.2. It represents the type (class)
of objects returned by the built-in open() function. Its constructor
is an alias for open(), but for future and backwards compatibility,
open() remains preferred."


See: http://mail.python.org/pipermail/python-dev/2004-July/045931.html
 
B

Ben Finney

John Salerno said:
I see what you mean, but I think that's why I like using open,
because I like having my functions be verbs instead of nouns.

Note though that you're calling a class (in this case, type)
constructor, to return a new object. Do you find int(), dict(), set()
et al to be strange names for what they do?
 
B

Bruno Desthuilliers

John Salerno a écrit :
Slightly new question as well. here's my code:

phonelist = open('file').readlines()

readlines() reads the whole file in memory. Take care, you may have
problem with huge files.
new_phonelist = phonelist

Woops ! Gotcha ! Try adding this:
assert(new_phonelist is phonelist)

Got it ? Python 'variables' are really name/object ref pairs, so here
you just made new_phonelist an alias to phonelist.
for line in phonelist:
if line == '\n':

replace this with:
if not line.strip()
new_phonelist.remove(line)

And end up modifying the list in place while iterating over it - which
is usually a very bad idea.

Also, FWIW, you'd have the same result with:

phonelist = filter(None, open('file'))
import pprint
pprint.pprint(new_phonelist)

But I notice that there are still several lines that print out as '\n',
so why doesn't it work for all lines?

Apart from the fact that it's usually safer to use line.strip(), the
main problem is that you modify the list in place while iterating over it.
 
B

Bruno Desthuilliers

John Salerno a écrit :
I'm sure it has more to do with the fact that I'm new to Python, but
what is old-fashioned about open()?

It has been, at a time, recommended to use file() instead of open().
Don't worry, open() is ok - and I guess almost anyone uses it.
 
A

Aahz

Paul said:
1. open("xxx") still works - not sure if it's even deprecated or not - but
the new style is to use the file class

Python 2.3.4 (#4, Oct 25 2004, 21:40:10)
[GCC 3.3.2 (Mandrake Linux 10.0 3.3.2-6mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
py> open is file
True

Python 2.5a2 (trunk:46052, May 19 2006, 19:54:46)
[GCC 4.0.2 20050808 (prerelease) (Ubuntu 4.0.1-4ubuntu9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.False

Per the other comments in this thread, Guido agreed that making open() a
synonym of file() was a mistake, and my patch to split them was accepted.
Still need to do more doc update (per Uncle Timmy complaint), but that
shouldn't be too hard.
 
B

Bruno Desthuilliers

bruno at modulix a écrit :
(snip)

(responding to myself)
(but under another identity - now that's a bit schizophrenic, isn't it ?-)
For the general case, the best way to go would probably be an iterator:

def iterfilter(fileObj):
for line in fileObj:
if line.strip():
yield line

>
f = open(path, 'r')
for line in iterfilter(f):
doSomethingWith(line)

Which is good as an example of simple iterator, but pretty useless since
we have itertools :

import itertools
f = open(path, 'r')
for line in itertools.ifilter(lambda l: l.strip(), f):
doSomethingWith(line)
f.close()
 
J

John Salerno

Aahz said:
Python 2.5a2 (trunk:46052, May 19 2006, 19:54:46)
[GCC 4.0.2 20050808 (prerelease) (Ubuntu 4.0.1-4ubuntu9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.False

Per the other comments in this thread, Guido agreed that making open() a
synonym of file() was a mistake, and my patch to split them was accepted.
Still need to do more doc update (per Uncle Timmy complaint), but that
shouldn't be too hard.

Interesting. What is the difference between them now?
 
P

Paul McGuire

Bruno Desthuilliers said:
bruno at modulix a écrit :
(snip)

(responding to myself)
(but under another identity - now that's a bit schizophrenic, isn't it ?-)

Do you ever flame yourself?

-- Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top