Strange behavior with iterables - is this a bug?

Discussion in 'Python' started by akameswaran@gmail.com, May 30, 2006.

  1. Guest

    Ok, I am confused about this one. I'm not sure if it's a bug or a
    feature.. but

    >>> ================================ RESTART
    >>> f1 = open('word1.txt')
    >>> f2 = open('word2.txt')
    >>> f3 = open('word3.txt')
    >>> print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in f1 for i2 in f2 for i3 in f3]

    [('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c')]
    >>> l1 = ['a\n','b\n','c\n']
    >>> l2 = ['a\n','b\n','c\n']
    >>>
    >>> l3 = ['a\n','b\n','c\n']
    >>> print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in l1 for i2 in l2 for i3 in l3]

    [('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'a'),
    ('a', 'b', 'b'), ('a', 'b', 'c'), ('a', 'c', 'a'), ('a', 'c', 'b'),
    ('a', 'c', 'c'), ('b', 'a', 'a'), ('b', 'a', 'b'), ('b', 'a', 'c'),
    ('b', 'b', 'a'), ('b', 'b', 'b'), ('b', 'b', 'c'), ('b', 'c', 'a'),
    ('b', 'c', 'b'), ('b', 'c', 'c'), ('c', 'a', 'a'), ('c', 'a', 'b'),
    ('c', 'a', 'c'), ('c', 'b', 'a'), ('c', 'b', 'b'), ('c', 'b', 'c'),
    ('c', 'c', 'a'), ('c', 'c', 'b'), ('c', 'c', 'c')]

    explanation of code: the files word1.txt, word2.txt and word3.txt are
    all identical conataining the letters a,b and c one letter per line.
    The lists I've added the "\n" so that the lists are identical to what
    is returned by the file objects. Just eliminating any possible
    differences.


    If you notice, when using the file objects I don't get the proper set
    of permutations. I was playing around with doing this via recursion,
    etc. But nothing was working so I made a simplest case nesting. Still
    no go.
    Why does this not work with the file objects? Or any other class I''ve
    made which implements __iter__ and next?

    Seems like a bug to me, but maybe I am missing something. Seems to
    happen in 2.3 and 2.4.
     
    , May 30, 2006
    #1
    1. Advertising

  2. Terry Reedy Guest

    <> wrote in message
    news:...
    > Ok, I am confused about this one. I'm not sure if it's a bug or a
    > feature.. but
    >
    >>>> ================================ RESTART
    >>>> f1 = open('word1.txt')
    >>>> f2 = open('word2.txt')
    >>>> f3 = open('word3.txt')
    >>>> print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in f1 for i2 in f2
    >>>> for i3 in f3]

    > [('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c')]


    A file is something like an iterator and something like an iterable. At
    this point, the internal cursur for f3 points at EOF. To reiterate thru
    the file, you must rewind in the inner loops. So try (untest by me)

    def initf(fil):
    f.seek(0)
    return f

    and ...for i2 in initf(f2) for i3 in initf(f3)


    >>>> l1 = ['a\n','b\n','c\n']
    >>>> l2 = ['a\n','b\n','c\n']
    >>>>
    >>>> l3 = ['a\n','b\n','c\n']
    >>>> print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in l1 for i2 in l2
    >>>> for i3 in l3]

    > [('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'a'),
    > ('a', 'b', 'b'), ('a', 'b', 'c'), ('a', 'c', 'a'), ('a', 'c', 'b'),
    > ('a', 'c', 'c'), ('b', 'a', 'a'), ('b', 'a', 'b'), ('b', 'a', 'c'),
    > ('b', 'b', 'a'), ('b', 'b', 'b'), ('b', 'b', 'c'), ('b', 'c', 'a'),
    > ('b', 'c', 'b'), ('b', 'c', 'c'), ('c', 'a', 'a'), ('c', 'a', 'b'),
    > ('c', 'a', 'c'), ('c', 'b', 'a'), ('c', 'b', 'b'), ('c', 'b', 'c'),
    > ('c', 'c', 'a'), ('c', 'c', 'b'), ('c', 'c', 'c')]
    >
    > explanation of code: the files word1.txt, word2.txt and word3.txt are
    > all identical conataining the letters a,b and c one letter per line.
    > The lists I've added the "\n" so that the lists are identical to what
    > is returned by the file objects. Just eliminating any possible
    > differences.


    But lists are not file objects and you did not eliminate the crucial
    difference in reiterability. Try your experiment with StringIO objects,
    which are more nearly identical to file objects.

    Terry Jan Reedy
     
    Terry Reedy, May 30, 2006
    #2
    1. Advertising

  3. Inyeol Lee Guest

    On Tue, May 30, 2006 at 01:11:26PM -0700, wrote:
    [...]
    > >>> ================================ RESTART
    > >>> f1 = open('word1.txt')
    > >>> f2 = open('word2.txt')
    > >>> f3 = open('word3.txt')
    > >>> print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in f1 for i2 in f2 for i3 in f3]

    > [('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c')]
    > >>> l1 = ['a\n','b\n','c\n']
    > >>> l2 = ['a\n','b\n','c\n']
    > >>>
    > >>> l3 = ['a\n','b\n','c\n']
    > >>> print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in l1 for i2 in l2 for i3 in l3]

    > [('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'a'),
    > ('a', 'b', 'b'), ('a', 'b', 'c'), ('a', 'c', 'a'), ('a', 'c', 'b'),
    > ('a', 'c', 'c'), ('b', 'a', 'a'), ('b', 'a', 'b'), ('b', 'a', 'c'),
    > ('b', 'b', 'a'), ('b', 'b', 'b'), ('b', 'b', 'c'), ('b', 'c', 'a'),
    > ('b', 'c', 'b'), ('b', 'c', 'c'), ('c', 'a', 'a'), ('c', 'a', 'b'),
    > ('c', 'a', 'c'), ('c', 'b', 'a'), ('c', 'b', 'b'), ('c', 'b', 'c'),
    > ('c', 'c', 'a'), ('c', 'c', 'b'), ('c', 'c', 'c')]
    >
    > explanation of code: the files word1.txt, word2.txt and word3.txt are
    > all identical conataining the letters a,b and c one letter per line.
    > The lists I've added the "\n" so that the lists are identical to what
    > is returned by the file objects. Just eliminating any possible
    > differences.


    You're comparing file, which is ITERATOR, and list, which is ITERABLE,
    not ITERATOR. To get the result you want, use this instead;

    >>> print [(i1.strip(),i2.strip(),i3.strip(),)

    for i1 in open('word1.txt')
    for i2 in open('word2.txt')
    for i3 in open('word3.txt')]

    FIY, to get the same buggy(?) result using list, try this instead;

    >>> l1 = iter(['a\n','b\n','c\n'])
    >>> l2 = iter(['a\n','b\n','c\n'])
    >>> l3 = iter(['a\n','b\n','c\n'])
    >>> print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in l1 for i2 in l2 for i3 in l3]

    [('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c')]
    >>>



    -Inyeol Lee
     
    Inyeol Lee, May 30, 2006
    #3
  4. Gary Herron Guest

    wrote:

    >Ok, I am confused about this one. I'm not sure if it's a bug or a
    >feature.. but
    >
    >

    List comprehension is a great shortcut, but when the shortcut starts
    causing trouble, better to go with the old ways. You need to reopen each
    file each time you want to iterate through it. You should be able to
    understand the difference between these two bits of code.

    The first bit opens each file but uses (two of them) multiple times.
    Reading from a file at EOF returns an empty sequence.

    The second bit opened the file each time you want to reuse it. That
    works correctly.

    And that suggest the third bit of correctly working code which uses list
    comprehension.

    # Fails because files are opened once but reused
    f1 = open('word1.txt')
    f2 = open('word2.txt')
    f3 = open('word3.txt')
    for i1 in f1:
    for i2 in f2:
    for i3 in f3:
    print (i1.strip(),i2.strip(),i3.strip())

    and

    # Works because files are reopened for each reuse:
    f1 = open('word1.txt')
    for i1 in f1:
    f2 = open('word2.txt')
    for i2 in f2:
    f3 = open('word3.txt')
    for i3 in f3:
    print (i1.strip(),i2.strip(),i3.strip())

    and

    # Also works because files are reopened for each use:
    print [(i1.strip(),i2.strip(),i3.strip())
    for i1 in open('word1.txt')
    for i2 in open('word2.txt')
    for i3 in open('word3.txt')]

    Hope that's clear!

    Gary Herron





    >
    >
    >>>>================================ RESTART
    >>>>f1 = open('word1.txt')
    >>>>f2 = open('word2.txt')
    >>>>f3 = open('word3.txt')
    >>>>print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in f1 for i2 in f2 for i3 in f3]
    >>>>
    >>>>

    >[('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c')]
    >
    >
    >>>>l1 = ['a\n','b\n','c\n']
    >>>>l2 = ['a\n','b\n','c\n']
    >>>>
    >>>>l3 = ['a\n','b\n','c\n']
    >>>>print [(i1.strip(),i2.strip(),i3.strip(),) for i1 in l1 for i2 in l2 for i3 in l3]
    >>>>
    >>>>

    >[('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'a'),
    >('a', 'b', 'b'), ('a', 'b', 'c'), ('a', 'c', 'a'), ('a', 'c', 'b'),
    >('a', 'c', 'c'), ('b', 'a', 'a'), ('b', 'a', 'b'), ('b', 'a', 'c'),
    >('b', 'b', 'a'), ('b', 'b', 'b'), ('b', 'b', 'c'), ('b', 'c', 'a'),
    >('b', 'c', 'b'), ('b', 'c', 'c'), ('c', 'a', 'a'), ('c', 'a', 'b'),
    >('c', 'a', 'c'), ('c', 'b', 'a'), ('c', 'b', 'b'), ('c', 'b', 'c'),
    >('c', 'c', 'a'), ('c', 'c', 'b'), ('c', 'c', 'c')]
    >
    >explanation of code: the files word1.txt, word2.txt and word3.txt are
    >all identical conataining the letters a,b and c one letter per line.
    >The lists I've added the "\n" so that the lists are identical to what
    >is returned by the file objects. Just eliminating any possible
    >differences.
    >
    >
    >If you notice, when using the file objects I don't get the proper set
    >of permutations. I was playing around with doing this via recursion,
    >etc. But nothing was working so I made a simplest case nesting. Still
    >no go.
    >Why does this not work with the file objects? Or any other class I''ve
    >made which implements __iter__ and next?
    >
    >Seems like a bug to me, but maybe I am missing something. Seems to
    >happen in 2.3 and 2.4.
    >
    >
    >
     
    Gary Herron, May 30, 2006
    #4
  5. Guest

    DOH!!

    thanks a lot. had to be something stupid on my part.

    Now I get it :)
     
    , May 30, 2006
    #5
  6. Guest

    Gary Herron wrote:
    > List comprehension is a great shortcut, but when the shortcut starts
    > causing trouble, better to go with the old ways. You need to reopen each
    > file each time you want to iterate through it. You should be able to
    > understand the difference between these two bits of code.
    >
    > The first bit opens each file but uses (two of them) multiple times.
    > Reading from a file at EOF returns an empty sequence.
    >
    > The second bit opened the file each time you want to reuse it. That
    > works correctly.
    >
    > And that suggest the third bit of correctly working code which uses list
    > comprehension.
    >
    > # Fails because files are opened once but reused
    > f1 = open('word1.txt')
    > f2 = open('word2.txt')
    > f3 = open('word3.txt')
    > for i1 in f1:
    > for i2 in f2:
    > for i3 in f3:
    > print (i1.strip(),i2.strip(),i3.strip())
    >
    > and
    >
    > # Works because files are reopened for each reuse:
    > f1 = open('word1.txt')
    > for i1 in f1:
    > f2 = open('word2.txt')
    > for i2 in f2:
    > f3 = open('word3.txt')
    > for i3 in f3:
    > print (i1.strip(),i2.strip(),i3.strip())
    >
    > and
    >
    > # Also works because files are reopened for each use:
    > print [(i1.strip(),i2.strip(),i3.strip())
    > for i1 in open('word1.txt')
    > for i2 in open('word2.txt')
    > for i3 in open('word3.txt')]
    >
    > Hope that's clear!
    >
    > Gary Herron



    My original problem was with recursion. I explicitly nested it out to
    try and understand the behavior - and foolishly looked in the wrong
    spot for the problem, namely that file is not reitreable. In truth I
    was never concerned about file objects, the problem was failing with my
    own custom iterators (wich also were not reiterable) and I switched to
    file, to eliminate possible code deficiencies on my own part. I was
    simply chasing down the wrong problem. As was pointed out to me in a
    nother thread - the cleanest implementation which would allow me to use
    one copy of the file (in my example the files are identical) would be
    to use a trivial iterator class that opens the file, uses tell to track
    position and seek to set position, and returns the appropriate line for
    that instance - thus eliminating unnecessary file opens and closes.
     
    , May 31, 2006
    #6
  7. Gary Herron Guest

    wrote:

    >Gary Herron wrote:
    >
    >
    >>List comprehension is a great shortcut, but when the shortcut starts
    >>causing trouble, better to go with the old ways. You need to reopen each
    >>file each time you want to iterate through it. You should be able to
    >>understand the difference between these two bits of code.
    >>
    >>The first bit opens each file but uses (two of them) multiple times.
    >>Reading from a file at EOF returns an empty sequence.
    >>
    >>The second bit opened the file each time you want to reuse it. That
    >>works correctly.
    >>
    >>And that suggest the third bit of correctly working code which uses list
    >>comprehension.
    >>
    >># Fails because files are opened once but reused
    >>f1 = open('word1.txt')
    >>f2 = open('word2.txt')
    >>f3 = open('word3.txt')
    >>for i1 in f1:
    >> for i2 in f2:
    >> for i3 in f3:
    >> print (i1.strip(),i2.strip(),i3.strip())
    >>
    >>and
    >>
    >># Works because files are reopened for each reuse:
    >>f1 = open('word1.txt')
    >>for i1 in f1:
    >>f2 = open('word2.txt')
    >>for i2 in f2:
    >>f3 = open('word3.txt')
    >>for i3 in f3:
    >>print (i1.strip(),i2.strip(),i3.strip())
    >>
    >>and
    >>
    >># Also works because files are reopened for each use:
    >>print [(i1.strip(),i2.strip(),i3.strip())
    >> for i1 in open('word1.txt')
    >> for i2 in open('word2.txt')
    >> for i3 in open('word3.txt')]
    >>
    >>Hope that's clear!
    >>
    >>Gary Herron
    >>
    >>

    >
    >
    >My original problem was with recursion. I explicitly nested it out to
    >try and understand the behavior - and foolishly looked in the wrong
    >spot for the problem, namely that file is not reitreable. In truth I
    >was never concerned about file objects, the problem was failing with my
    >own custom iterators (wich also were not reiterable) and I switched to
    >file, to eliminate possible code deficiencies on my own part. I was
    >simply chasing down the wrong problem. As was pointed out to me in a
    >nother thread - the cleanest implementation which would allow me to use
    >one copy of the file (in my example the files are identical) would be
    >to use a trivial iterator class that opens the file, uses tell to track
    >position and seek to set position, and returns the appropriate line for
    >that instance - thus eliminating unnecessary file opens and closes.
    >
    >
    >

    I see.

    I wouldn't call "tell" and "seek" clean. Here's another suggestion. Use
    l1 = open(...).readlines()
    to read the whole file into a (nicely reiterable) list residing in
    memory, and then iterate through the list as you wish. Only if your
    files are MANY megabytes long would this be a problem with memory
    consumption. (But if they were that big, you wouldn't be trying to find
    all permutations would you!)

    Gary Herron
     
    Gary Herron, May 31, 2006
    #7
  8. Guest

    My original concern and reason for goint the iterator/generator route
    was exactly for large large lists :) Unnecessary in this example, but
    exactly what I was exploring. I wouldn't be using list comprehension
    for generating the permutiations. Where all this came from was
    creating a generator/iterator to handle very large permutations.



    Gary Herron wrote:
    > wrote:
    >
    > >Gary Herron wrote:
    > >
    > >
    > >>List comprehension is a great shortcut, but when the shortcut starts
    > >>causing trouble, better to go with the old ways. You need to reopen each
    > >>file each time you want to iterate through it. You should be able to
    > >>understand the difference between these two bits of code.
    > >>
    > >>The first bit opens each file but uses (two of them) multiple times.
    > >>Reading from a file at EOF returns an empty sequence.
    > >>
    > >>The second bit opened the file each time you want to reuse it. That
    > >>works correctly.
    > >>
    > >>And that suggest the third bit of correctly working code which uses list
    > >>comprehension.
    > >>
    > >># Fails because files are opened once but reused
    > >>f1 = open('word1.txt')
    > >>f2 = open('word2.txt')
    > >>f3 = open('word3.txt')
    > >>for i1 in f1:
    > >> for i2 in f2:
    > >> for i3 in f3:
    > >> print (i1.strip(),i2.strip(),i3.strip())
    > >>
    > >>and
    > >>
    > >># Works because files are reopened for each reuse:
    > >>f1 = open('word1.txt')
    > >>for i1 in f1:
    > >>f2 = open('word2.txt')
    > >>for i2 in f2:
    > >>f3 = open('word3.txt')
    > >>for i3 in f3:
    > >>print (i1.strip(),i2.strip(),i3.strip())
    > >>
    > >>and
    > >>
    > >># Also works because files are reopened for each use:
    > >>print [(i1.strip(),i2.strip(),i3.strip())
    > >> for i1 in open('word1.txt')
    > >> for i2 in open('word2.txt')
    > >> for i3 in open('word3.txt')]
    > >>
    > >>Hope that's clear!
    > >>
    > >>Gary Herron
    > >>
    > >>

    > >
    > >
    > >My original problem was with recursion. I explicitly nested it out to
    > >try and understand the behavior - and foolishly looked in the wrong
    > >spot for the problem, namely that file is not reitreable. In truth I
    > >was never concerned about file objects, the problem was failing with my
    > >own custom iterators (wich also were not reiterable) and I switched to
    > >file, to eliminate possible code deficiencies on my own part. I was
    > >simply chasing down the wrong problem. As was pointed out to me in a
    > >nother thread - the cleanest implementation which would allow me to use
    > >one copy of the file (in my example the files are identical) would be
    > >to use a trivial iterator class that opens the file, uses tell to track
    > >position and seek to set position, and returns the appropriate line for
    > >that instance - thus eliminating unnecessary file opens and closes.
    > >
    > >
    > >

    > I see.
    >
    > I wouldn't call "tell" and "seek" clean. Here's another suggestion. Use
    > l1 = open(...).readlines()
    > to read the whole file into a (nicely reiterable) list residing in
    > memory, and then iterate through the list as you wish. Only if your
    > files are MANY megabytes long would this be a problem with memory
    > consumption. (But if they were that big, you wouldn't be trying to find
    > all permutations would you!)
    >
    > Gary Herron
     
    , May 31, 2006
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mantorok Redgormor
    Replies:
    70
    Views:
    1,825
    Dan Pop
    Feb 17, 2004
  2. Steven Bethard

    *expression and iterables

    Steven Bethard, Aug 20, 2004, in forum: Python
    Replies:
    0
    Views:
    406
    Steven Bethard
    Aug 20, 2004
  3. John Reese
    Replies:
    10
    Views:
    591
    Carl Banks
    Nov 14, 2006
  4. Thomas Bach
    Replies:
    19
    Views:
    292
    Ramchandra Apte
    Oct 3, 2012
  5. Ian Kelly
    Replies:
    0
    Views:
    202
    Ian Kelly
    Sep 29, 2012
Loading...

Share This Page