Help me understand this iterator

Discussion in 'Python' started by LaundroMat, Oct 31, 2006.

  1. LaundroMat

    LaundroMat Guest

    Hi,

    I've found this script over at effbot
    (http://effbot.org/librarybook/os-path.htm), and I can't get my head
    around its inner workings. Here's the script:

    import os

    class DirectoryWalker:
    # a forward iterator that traverses a directory tree

    def __init__(self, directory):
    self.stack = [directory]
    self.files = []
    self.index = 0

    def __getitem__(self, index):
    while 1:
    try:
    file = self.files[self.index]
    self.index = self.index + 1
    except IndexError:
    # pop next directory from stack
    self.directory = self.stack.pop()
    self.files = os.listdir(self.directory)
    self.index = 0
    else:
    # got a filename
    fullname = os.path.join(self.directory, file)
    if os.path.isdir(fullname) and not
    os.path.islink(fullname):
    self.stack.append(fullname)
    return fullname

    for file in DirectoryWalker("."):
    print file

    Now, if I look at this script step by step, I don't understand:
    - what is being iterated over (what is being called by "file in
    DirectoryWalker()"?);
    - where it gets the "index" value from;
    - where the "while 1:"-loop is quitted.

    Thanks in advance,

    Mathieu
     
    LaundroMat, Oct 31, 2006
    #1
    1. Advertising

  2. LaundroMat

    Peter Otten Guest

    LaundroMat wrote:

    > Hi,
    >
    > I've found this script over at effbot
    > (http://effbot.org/librarybook/os-path.htm), and I can't get my head
    > around its inner workings. Here's the script:
    >
    > import os
    >
    > class DirectoryWalker:
    > # a forward iterator that traverses a directory tree
    >
    > def __init__(self, directory):
    > self.stack = [directory]
    > self.files = []
    > self.index = 0
    >
    > def __getitem__(self, index):
    > while 1:
    > try:
    > file = self.files[self.index]
    > self.index = self.index + 1
    > except IndexError:
    > # pop next directory from stack
    > self.directory = self.stack.pop()
    > self.files = os.listdir(self.directory)
    > self.index = 0
    > else:
    > # got a filename
    > fullname = os.path.join(self.directory, file)
    > if os.path.isdir(fullname) and not
    > os.path.islink(fullname):
    > self.stack.append(fullname)
    > return fullname
    >
    > for file in DirectoryWalker("."):
    > print file
    >
    > Now, if I look at this script step by step, I don't understand:
    > - what is being iterated over (what is being called by "file in
    > DirectoryWalker()"?);
    > - where it gets the "index" value from;
    > - where the "while 1:"-loop is quitted.


    With

    dw = DirectoryWalker(".")

    the for loop is equivalent to

    index = 0 # internal variable, not visible from Python
    while True:
    try:
    file = dw[index] # invokes dw.__getitem__(index)
    except IndexError:
    break
    print file

    This is an old way of iterating over a sequence which is only used when the
    iterator-based approach

    dwi = iter(dw) # invokes dw.__iter__()
    while True:
    try:
    file = dwi.next()
    except StopIteration:
    break
    print file

    fails.

    Peter
     
    Peter Otten, Oct 31, 2006
    #2
    1. Advertising

  3. LaundroMat wrote:

    > Now, if I look at this script step by step, I don't understand:
    > - what is being iterated over (what is being called by "file in
    > DirectoryWalker()"?);


    as explained in the text above the script, this class emulates a
    sequence. it does this by implementing the __getindex__ method:

    http://effbot.org/pyref/__getitem__

    > - where it gets the "index" value from;


    from the call to __getitem__ done by the for-in loop.

    > - where the "while 1:"-loop is quitted.


    the loop stops when the stack is empty, and pop raises an IndexError
    exception.

    note that this is an old example; code written for newer versions of
    Python would probably use a recursing generator instead (see the source
    code for os.walk in the standard library for an example).

    </F>
     
    Fredrik Lundh, Oct 31, 2006
    #3
  4. On Tue, 31 Oct 2006 03:36:08 -0800, LaundroMat wrote:

    > Hi,
    >
    > I've found this script over at effbot
    > (http://effbot.org/librarybook/os-path.htm), and I can't get my head
    > around its inner workings.


    [snip code]

    > Now, if I look at this script step by step, I don't understand:
    > - what is being iterated over (what is being called by "file in
    > DirectoryWalker()"?);


    What is being iterated over is the list of files in the current directory.
    In Unix land (and probably DOS/Windows as well) the directory "." means
    "this directory, right here".


    > - where it gets the "index" value from;


    When Python see's a line like "for x in obj:" it does some special
    magic. First it looks to see if obj has a "next" method, that is, it
    tries to call obj.next() repeatedly. That's not the case here --
    DirectoryWalker is an old-style iterator, not one of the fancy new ones.

    Instead, Python tries calling obj[index] starting at 0 and keeps going
    until an IndexError exception is raised, then it halts the for loop.

    So, think of it like this: pretend that Python expands the following code:

    for x in obj:
    block

    into something like this:

    index = 0
    while True: # loop forever
    try:
    x = obj[index]
    block # can use x in block
    except IndexError:
    # catch the exception and escape the while loop
    break
    index = index + 1
    # and now we're done, continue the rest of the program

    That's not exactly what Python does, of course, it is much more efficient,
    but that's a good picture of what happens.


    > - where the "while 1:"-loop is quitted.



    The while 1 loop is escaped when the function hits the return statement.



    --
    Steven.
     
    Steven D'Aprano, Oct 31, 2006
    #4
  5. LaundroMat

    Peter Otten Guest

    LaundroMat wrote:

    [me hitting send too soon]

    > Now, if I look at this script step by step, I don't understand:


    > - where the "while 1:"-loop is quitted.


    > class DirectoryWalker:
    > # a forward iterator that traverses a directory tree
    >
    > def __init__(self, directory):
    > self.stack = [directory]
    > self.files = []
    > self.index = 0
    >
    > def __getitem__(self, index):
    > while 1:
    > try:
    > file = self.files[self.index]
    > self.index = self.index + 1
    > except IndexError:
    > # pop next directory from stack
    > self.directory = self.stack.pop()


    If self.stack is empty, pop() will raise an IndexError which terminates both
    the 'while 1' loop in __getitem__() and the enclosing 'for file in ...'
    loop

    > self.files = os.listdir(self.directory)
    > self.index = 0
    > else:
    > # got a filename
    > fullname = os.path.join(self.directory, file)
    > if os.path.isdir(fullname) and not
    > os.path.islink(fullname):
    > self.stack.append(fullname)
    > return fullname


    The return statement feeds the next file to the for loop.

    Peter
     
    Peter Otten, Oct 31, 2006
    #5
  6. LaundroMat

    LaundroMat Guest

    Thanks all, those were some great explanations. It seems I have still
    still a long way for me to go before I grasp the intricacies of this
    language.

    That 'magic index' variable bugs me a little however. It gives me the
    same feeling as when I see hard-coded variables. I suppose the
    generator class has taken care of this with its next() method (although
    - I should have a look - __next__() probable takes self and index as
    its arguments). Although I'm very fond of the language (as a
    non-formally trained hobbyist developer), that "magic" bit is a tad
    disturbing.

    Still, thanks for the quick and complete replies!
     
    LaundroMat, Oct 31, 2006
    #6
  7. LaundroMat

    LaundroMat Guest

    Ack, I get it now. It's not the variable's name ("index") that is
    hard-coded, it's just that the for...in... loop sends an argument by
    default. That's a lot more comforting.
     
    LaundroMat, Oct 31, 2006
    #7
  8. LaundroMat wrote:

    > That 'magic index' variable bugs me a little however. It gives me the
    > same feeling as when I see hard-coded variables.


    what magic index? the variable named "index" is an argument to the
    method it's used in.

    </F>
     
    Fredrik Lundh, Oct 31, 2006
    #8
  9. LaundroMat

    LaundroMat Guest

    On Oct 31, 3:53 pm, Fredrik Lundh <> wrote:
    > LaundroMat wrote:
    > > That 'magic index' variable bugs me a little however. It gives me the
    > > same feeling as when I see hard-coded variables.what magic index? the variable named "index" is an argument to the

    > method it's used in.


    Yes, I reacted too quickly. Sorry.
     
    LaundroMat, Oct 31, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?ISO-8859-1?Q?Thomas_Gagn=E9?=

    [1.4] argv[] doesn't understand size() and iterator()??

    =?ISO-8859-1?Q?Thomas_Gagn=E9?=, Jul 2, 2003, in forum: Java
    Replies:
    13
    Views:
    4,582
    =?ISO-8859-1?Q?Thomas_Gagn=E9?=
    Jul 3, 2003
  2. Hendrik Maryns
    Replies:
    18
    Views:
    1,448
  3. greg
    Replies:
    6
    Views:
    469
    Dietmar Kuehl
    Jul 17, 2003
  4. Replies:
    6
    Views:
    670
    Jim Langston
    Oct 30, 2005
  5. Steven D'Aprano

    What makes an iterator an iterator?

    Steven D'Aprano, Apr 18, 2007, in forum: Python
    Replies:
    28
    Views:
    1,220
    Steven D'Aprano
    Apr 20, 2007
Loading...

Share This Page