Strange behavior with iterables - is this a bug?

A

akameswaran

Ok, I am confused about this one. I'm not sure if it's a bug or a
feature.. but
================================ RESTART
f1 = open('word1.txt')
f2 = open('word2.txt')
f3 = open('word3.txt')
print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in f1 for i2 in f2 for i3 in f3] [('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c')]
l1 = ['a\n','b\n','c\n']
l2 = ['a\n','b\n','c\n']

l3 = ['a\n','b\n','c\n']
print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in l1 for i2 in l2 for i3 in l3]
[('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'a'),
('a', 'b', 'b'), ('a', 'b', 'c'), ('a', 'c', 'a'), ('a', 'c', 'b'),
('a', 'c', 'c'), ('b', 'a', 'a'), ('b', 'a', 'b'), ('b', 'a', 'c'),
('b', 'b', 'a'), ('b', 'b', 'b'), ('b', 'b', 'c'), ('b', 'c', 'a'),
('b', 'c', 'b'), ('b', 'c', 'c'), ('c', 'a', 'a'), ('c', 'a', 'b'),
('c', 'a', 'c'), ('c', 'b', 'a'), ('c', 'b', 'b'), ('c', 'b', 'c'),
('c', 'c', 'a'), ('c', 'c', 'b'), ('c', 'c', 'c')]

explanation of code: the files word1.txt, word2.txt and word3.txt are
all identical conataining the letters a,b and c one letter per line.
The lists I've added the "\n" so that the lists are identical to what
is returned by the file objects. Just eliminating any possible
differences.


If you notice, when using the file objects I don't get the proper set
of permutations. I was playing around with doing this via recursion,
etc. But nothing was working so I made a simplest case nesting. Still
no go.
Why does this not work with the file objects? Or any other class I''ve
made which implements __iter__ and next?

Seems like a bug to me, but maybe I am missing something. Seems to
happen in 2.3 and 2.4.
 
T

Terry Reedy

Ok, I am confused about this one. I'm not sure if it's a bug or a
feature.. but
================================ RESTART
f1 = open('word1.txt')
f2 = open('word2.txt')
f3 = open('word3.txt')
print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in f1 for i2 in f2
for i3 in f3]
[('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c')]

A file is something like an iterator and something like an iterable. At
this point, the internal cursur for f3 points at EOF. To reiterate thru
the file, you must rewind in the inner loops. So try (untest by me)

def initf(fil):
f.seek(0)
return f

and ...for i2 in initf(f2) for i3 in initf(f3)

l1 = ['a\n','b\n','c\n']
l2 = ['a\n','b\n','c\n']

l3 = ['a\n','b\n','c\n']
print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in l1 for i2 in l2
for i3 in l3]
[('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'a'),
('a', 'b', 'b'), ('a', 'b', 'c'), ('a', 'c', 'a'), ('a', 'c', 'b'),
('a', 'c', 'c'), ('b', 'a', 'a'), ('b', 'a', 'b'), ('b', 'a', 'c'),
('b', 'b', 'a'), ('b', 'b', 'b'), ('b', 'b', 'c'), ('b', 'c', 'a'),
('b', 'c', 'b'), ('b', 'c', 'c'), ('c', 'a', 'a'), ('c', 'a', 'b'),
('c', 'a', 'c'), ('c', 'b', 'a'), ('c', 'b', 'b'), ('c', 'b', 'c'),
('c', 'c', 'a'), ('c', 'c', 'b'), ('c', 'c', 'c')]

explanation of code: the files word1.txt, word2.txt and word3.txt are
all identical conataining the letters a,b and c one letter per line.
The lists I've added the "\n" so that the lists are identical to what
is returned by the file objects. Just eliminating any possible
differences.

But lists are not file objects and you did not eliminate the crucial
difference in reiterability. Try your experiment with StringIO objects,
which are more nearly identical to file objects.

Terry Jan Reedy
 
I

Inyeol Lee

]
================================ RESTART
f1 = open('word1.txt')
f2 = open('word2.txt')
f3 = open('word3.txt')
print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in f1 for i2 in f2 for i3 in f3] [('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c')]
l1 = ['a\n','b\n','c\n']
l2 = ['a\n','b\n','c\n']

l3 = ['a\n','b\n','c\n']
print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in l1 for i2 in l2 for i3 in l3]
[('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'a'),
('a', 'b', 'b'), ('a', 'b', 'c'), ('a', 'c', 'a'), ('a', 'c', 'b'),
('a', 'c', 'c'), ('b', 'a', 'a'), ('b', 'a', 'b'), ('b', 'a', 'c'),
('b', 'b', 'a'), ('b', 'b', 'b'), ('b', 'b', 'c'), ('b', 'c', 'a'),
('b', 'c', 'b'), ('b', 'c', 'c'), ('c', 'a', 'a'), ('c', 'a', 'b'),
('c', 'a', 'c'), ('c', 'b', 'a'), ('c', 'b', 'b'), ('c', 'b', 'c'),
('c', 'c', 'a'), ('c', 'c', 'b'), ('c', 'c', 'c')]

explanation of code: the files word1.txt, word2.txt and word3.txt are
all identical conataining the letters a,b and c one letter per line.
The lists I've added the "\n" so that the lists are identical to what
is returned by the file objects. Just eliminating any possible
differences.

You're comparing file, which is ITERATOR, and list, which is ITERABLE,
not ITERATOR. To get the result you want, use this instead;
print [(i1.strip(),i2.strip(),i3.strip(),)
for i1 in open('word1.txt')
for i2 in open('word2.txt')
for i3 in open('word3.txt')]

FIY, to get the same buggy(?) result using list, try this instead;
l1 = iter(['a\n','b\n','c\n'])
l2 = iter(['a\n','b\n','c\n'])
l3 = iter(['a\n','b\n','c\n'])
print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in l1 for i2 in l2 for i3 in l3] [('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c')]


-Inyeol Lee
 
G

Gary Herron

Ok, I am confused about this one. I'm not sure if it's a bug or a
feature.. but
List comprehension is a great shortcut, but when the shortcut starts
causing trouble, better to go with the old ways. You need to reopen each
file each time you want to iterate through it. You should be able to
understand the difference between these two bits of code.

The first bit opens each file but uses (two of them) multiple times.
Reading from a file at EOF returns an empty sequence.

The second bit opened the file each time you want to reuse it. That
works correctly.

And that suggest the third bit of correctly working code which uses list
comprehension.

# Fails because files are opened once but reused
f1 = open('word1.txt')
f2 = open('word2.txt')
f3 = open('word3.txt')
for i1 in f1:
for i2 in f2:
for i3 in f3:
print (i1.strip(),i2.strip(),i3.strip())

and

# Works because files are reopened for each reuse:
f1 = open('word1.txt')
for i1 in f1:
f2 = open('word2.txt')
for i2 in f2:
f3 = open('word3.txt')
for i3 in f3:
print (i1.strip(),i2.strip(),i3.strip())

and

# Also works because files are reopened for each use:
print [(i1.strip(),i2.strip(),i3.strip())
for i1 in open('word1.txt')
for i2 in open('word2.txt')
for i3 in open('word3.txt')]

Hope that's clear!

Gary Herron




================================ RESTART
f1 = open('word1.txt')
f2 = open('word2.txt')
f3 = open('word3.txt')
print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in f1 for i2 in f2 for i3 in f3]
[('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c')]

l1 = ['a\n','b\n','c\n']
l2 = ['a\n','b\n','c\n']

l3 = ['a\n','b\n','c\n']
print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in l1 for i2 in l2 for i3 in l3]
[('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'a'),
('a', 'b', 'b'), ('a', 'b', 'c'), ('a', 'c', 'a'), ('a', 'c', 'b'),
('a', 'c', 'c'), ('b', 'a', 'a'), ('b', 'a', 'b'), ('b', 'a', 'c'),
('b', 'b', 'a'), ('b', 'b', 'b'), ('b', 'b', 'c'), ('b', 'c', 'a'),
('b', 'c', 'b'), ('b', 'c', 'c'), ('c', 'a', 'a'), ('c', 'a', 'b'),
('c', 'a', 'c'), ('c', 'b', 'a'), ('c', 'b', 'b'), ('c', 'b', 'c'),
('c', 'c', 'a'), ('c', 'c', 'b'), ('c', 'c', 'c')]

explanation of code: the files word1.txt, word2.txt and word3.txt are
all identical conataining the letters a,b and c one letter per line.
The lists I've added the "\n" so that the lists are identical to what
is returned by the file objects. Just eliminating any possible
differences.


If you notice, when using the file objects I don't get the proper set
of permutations. I was playing around with doing this via recursion,
etc. But nothing was working so I made a simplest case nesting. Still
no go.
Why does this not work with the file objects? Or any other class I''ve
made which implements __iter__ and next?

Seems like a bug to me, but maybe I am missing something. Seems to
happen in 2.3 and 2.4.
 
A

akameswaran

Gary said:
List comprehension is a great shortcut, but when the shortcut starts
causing trouble, better to go with the old ways. You need to reopen each
file each time you want to iterate through it. You should be able to
understand the difference between these two bits of code.

The first bit opens each file but uses (two of them) multiple times.
Reading from a file at EOF returns an empty sequence.

The second bit opened the file each time you want to reuse it. That
works correctly.

And that suggest the third bit of correctly working code which uses list
comprehension.

# Fails because files are opened once but reused
f1 = open('word1.txt')
f2 = open('word2.txt')
f3 = open('word3.txt')
for i1 in f1:
for i2 in f2:
for i3 in f3:
print (i1.strip(),i2.strip(),i3.strip())

and

# Works because files are reopened for each reuse:
f1 = open('word1.txt')
for i1 in f1:
f2 = open('word2.txt')
for i2 in f2:
f3 = open('word3.txt')
for i3 in f3:
print (i1.strip(),i2.strip(),i3.strip())

and

# Also works because files are reopened for each use:
print [(i1.strip(),i2.strip(),i3.strip())
for i1 in open('word1.txt')
for i2 in open('word2.txt')
for i3 in open('word3.txt')]

Hope that's clear!

Gary Herron


My original problem was with recursion. I explicitly nested it out to
try and understand the behavior - and foolishly looked in the wrong
spot for the problem, namely that file is not reitreable. In truth I
was never concerned about file objects, the problem was failing with my
own custom iterators (wich also were not reiterable) and I switched to
file, to eliminate possible code deficiencies on my own part. I was
simply chasing down the wrong problem. As was pointed out to me in a
nother thread - the cleanest implementation which would allow me to use
one copy of the file (in my example the files are identical) would be
to use a trivial iterator class that opens the file, uses tell to track
position and seek to set position, and returns the appropriate line for
that instance - thus eliminating unnecessary file opens and closes.
 
G

Gary Herron

Gary Herron wrote:

List comprehension is a great shortcut, but when the shortcut starts
causing trouble, better to go with the old ways. You need to reopen each
file each time you want to iterate through it. You should be able to
understand the difference between these two bits of code.

The first bit opens each file but uses (two of them) multiple times.
Reading from a file at EOF returns an empty sequence.

The second bit opened the file each time you want to reuse it. That
works correctly.

And that suggest the third bit of correctly working code which uses list
comprehension.

# Fails because files are opened once but reused
f1 = open('word1.txt')
f2 = open('word2.txt')
f3 = open('word3.txt')
for i1 in f1:
for i2 in f2:
for i3 in f3:
print (i1.strip(),i2.strip(),i3.strip())

and

# Works because files are reopened for each reuse:
f1 = open('word1.txt')
for i1 in f1:
f2 = open('word2.txt')
for i2 in f2:
f3 = open('word3.txt')
for i3 in f3:
print (i1.strip(),i2.strip(),i3.strip())

and

# Also works because files are reopened for each use:
print [(i1.strip(),i2.strip(),i3.strip())
for i1 in open('word1.txt')
for i2 in open('word2.txt')
for i3 in open('word3.txt')]

Hope that's clear!

Gary Herron


My original problem was with recursion. I explicitly nested it out to
try and understand the behavior - and foolishly looked in the wrong
spot for the problem, namely that file is not reitreable. In truth I
was never concerned about file objects, the problem was failing with my
own custom iterators (wich also were not reiterable) and I switched to
file, to eliminate possible code deficiencies on my own part. I was
simply chasing down the wrong problem. As was pointed out to me in a
nother thread - the cleanest implementation which would allow me to use
one copy of the file (in my example the files are identical) would be
to use a trivial iterator class that opens the file, uses tell to track
position and seek to set position, and returns the appropriate line for
that instance - thus eliminating unnecessary file opens and closes.
I see.

I wouldn't call "tell" and "seek" clean. Here's another suggestion. Use
l1 = open(...).readlines()
to read the whole file into a (nicely reiterable) list residing in
memory, and then iterate through the list as you wish. Only if your
files are MANY megabytes long would this be a problem with memory
consumption. (But if they were that big, you wouldn't be trying to find
all permutations would you!)

Gary Herron
 
A

akameswaran

My original concern and reason for goint the iterator/generator route
was exactly for large large lists :) Unnecessary in this example, but
exactly what I was exploring. I wouldn't be using list comprehension
for generating the permutiations. Where all this came from was
creating a generator/iterator to handle very large permutations.



Gary said:
Gary Herron wrote:

List comprehension is a great shortcut, but when the shortcut starts
causing trouble, better to go with the old ways. You need to reopen each
file each time you want to iterate through it. You should be able to
understand the difference between these two bits of code.

The first bit opens each file but uses (two of them) multiple times.
Reading from a file at EOF returns an empty sequence.

The second bit opened the file each time you want to reuse it. That
works correctly.

And that suggest the third bit of correctly working code which uses list
comprehension.

# Fails because files are opened once but reused
f1 = open('word1.txt')
f2 = open('word2.txt')
f3 = open('word3.txt')
for i1 in f1:
for i2 in f2:
for i3 in f3:
print (i1.strip(),i2.strip(),i3.strip())

and

# Works because files are reopened for each reuse:
f1 = open('word1.txt')
for i1 in f1:
f2 = open('word2.txt')
for i2 in f2:
f3 = open('word3.txt')
for i3 in f3:
print (i1.strip(),i2.strip(),i3.strip())

and

# Also works because files are reopened for each use:
print [(i1.strip(),i2.strip(),i3.strip())
for i1 in open('word1.txt')
for i2 in open('word2.txt')
for i3 in open('word3.txt')]

Hope that's clear!

Gary Herron


My original problem was with recursion. I explicitly nested it out to
try and understand the behavior - and foolishly looked in the wrong
spot for the problem, namely that file is not reitreable. In truth I
was never concerned about file objects, the problem was failing with my
own custom iterators (wich also were not reiterable) and I switched to
file, to eliminate possible code deficiencies on my own part. I was
simply chasing down the wrong problem. As was pointed out to me in a
nother thread - the cleanest implementation which would allow me to use
one copy of the file (in my example the files are identical) would be
to use a trivial iterator class that opens the file, uses tell to track
position and seek to set position, and returns the appropriate line for
that instance - thus eliminating unnecessary file opens and closes.
I see.

I wouldn't call "tell" and "seek" clean. Here's another suggestion. Use
l1 = open(...).readlines()
to read the whole file into a (nicely reiterable) list residing in
memory, and then iterate through the list as you wish. Only if your
files are MANY megabytes long would this be a problem with memory
consumption. (But if they were that big, you wouldn't be trying to find
all permutations would you!)

Gary Herron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top