Help me understand this iterator

L

LaundroMat

Hi,

I've found this script over at effbot
(http://effbot.org/librarybook/os-path.htm), and I can't get my head
around its inner workings. Here's the script:

import os

class DirectoryWalker:
# a forward iterator that traverses a directory tree

def __init__(self, directory):
self.stack = [directory]
self.files = []
self.index = 0

def __getitem__(self, index):
while 1:
try:
file = self.files[self.index]
self.index = self.index + 1
except IndexError:
# pop next directory from stack
self.directory = self.stack.pop()
self.files = os.listdir(self.directory)
self.index = 0
else:
# got a filename
fullname = os.path.join(self.directory, file)
if os.path.isdir(fullname) and not
os.path.islink(fullname):
self.stack.append(fullname)
return fullname

for file in DirectoryWalker("."):
print file

Now, if I look at this script step by step, I don't understand:
- what is being iterated over (what is being called by "file in
DirectoryWalker()"?);
- where it gets the "index" value from;
- where the "while 1:"-loop is quitted.

Thanks in advance,

Mathieu
 
P

Peter Otten

LaundroMat said:
Hi,

I've found this script over at effbot
(http://effbot.org/librarybook/os-path.htm), and I can't get my head
around its inner workings. Here's the script:

import os

class DirectoryWalker:
# a forward iterator that traverses a directory tree

def __init__(self, directory):
self.stack = [directory]
self.files = []
self.index = 0

def __getitem__(self, index):
while 1:
try:
file = self.files[self.index]
self.index = self.index + 1
except IndexError:
# pop next directory from stack
self.directory = self.stack.pop()
self.files = os.listdir(self.directory)
self.index = 0
else:
# got a filename
fullname = os.path.join(self.directory, file)
if os.path.isdir(fullname) and not
os.path.islink(fullname):
self.stack.append(fullname)
return fullname

for file in DirectoryWalker("."):
print file

Now, if I look at this script step by step, I don't understand:
- what is being iterated over (what is being called by "file in
DirectoryWalker()"?);
- where it gets the "index" value from;
- where the "while 1:"-loop is quitted.

With

dw = DirectoryWalker(".")

the for loop is equivalent to

index = 0 # internal variable, not visible from Python
while True:
try:
file = dw[index] # invokes dw.__getitem__(index)
except IndexError:
break
print file

This is an old way of iterating over a sequence which is only used when the
iterator-based approach

dwi = iter(dw) # invokes dw.__iter__()
while True:
try:
file = dwi.next()
except StopIteration:
break
print file

fails.

Peter
 
F

Fredrik Lundh

LaundroMat said:
Now, if I look at this script step by step, I don't understand:
- what is being iterated over (what is being called by "file in
DirectoryWalker()"?);

as explained in the text above the script, this class emulates a
sequence. it does this by implementing the __getindex__ method:

http://effbot.org/pyref/__getitem__
- where it gets the "index" value from;

from the call to __getitem__ done by the for-in loop.
- where the "while 1:"-loop is quitted.

the loop stops when the stack is empty, and pop raises an IndexError
exception.

note that this is an old example; code written for newer versions of
Python would probably use a recursing generator instead (see the source
code for os.walk in the standard library for an example).

</F>
 
S

Steven D'Aprano

Hi,

I've found this script over at effbot
(http://effbot.org/librarybook/os-path.htm), and I can't get my head
around its inner workings.

[snip code]
Now, if I look at this script step by step, I don't understand:
- what is being iterated over (what is being called by "file in
DirectoryWalker()"?);

What is being iterated over is the list of files in the current directory.
In Unix land (and probably DOS/Windows as well) the directory "." means
"this directory, right here".

- where it gets the "index" value from;

When Python see's a line like "for x in obj:" it does some special
magic. First it looks to see if obj has a "next" method, that is, it
tries to call obj.next() repeatedly. That's not the case here --
DirectoryWalker is an old-style iterator, not one of the fancy new ones.

Instead, Python tries calling obj[index] starting at 0 and keeps going
until an IndexError exception is raised, then it halts the for loop.

So, think of it like this: pretend that Python expands the following code:

for x in obj:
block

into something like this:

index = 0
while True: # loop forever
try:
x = obj[index]
block # can use x in block
except IndexError:
# catch the exception and escape the while loop
break
index = index + 1
# and now we're done, continue the rest of the program

That's not exactly what Python does, of course, it is much more efficient,
but that's a good picture of what happens.

- where the "while 1:"-loop is quitted.


The while 1 loop is escaped when the function hits the return statement.
 
P

Peter Otten

LaundroMat wrote:

[me hitting send too soon]
Now, if I look at this script step by step, I don't understand:
- where the "while 1:"-loop is quitted.
class DirectoryWalker:
# a forward iterator that traverses a directory tree

def __init__(self, directory):
self.stack = [directory]
self.files = []
self.index = 0

def __getitem__(self, index):
while 1:
try:
file = self.files[self.index]
self.index = self.index + 1
except IndexError:
# pop next directory from stack
self.directory = self.stack.pop()

If self.stack is empty, pop() will raise an IndexError which terminates both
the 'while 1' loop in __getitem__() and the enclosing 'for file in ...'
loop
self.files = os.listdir(self.directory)
self.index = 0
else:
# got a filename
fullname = os.path.join(self.directory, file)
if os.path.isdir(fullname) and not
os.path.islink(fullname):
self.stack.append(fullname)
return fullname

The return statement feeds the next file to the for loop.

Peter
 
L

LaundroMat

Thanks all, those were some great explanations. It seems I have still
still a long way for me to go before I grasp the intricacies of this
language.

That 'magic index' variable bugs me a little however. It gives me the
same feeling as when I see hard-coded variables. I suppose the
generator class has taken care of this with its next() method (although
- I should have a look - __next__() probable takes self and index as
its arguments). Although I'm very fond of the language (as a
non-formally trained hobbyist developer), that "magic" bit is a tad
disturbing.

Still, thanks for the quick and complete replies!
 
L

LaundroMat

Ack, I get it now. It's not the variable's name ("index") that is
hard-coded, it's just that the for...in... loop sends an argument by
default. That's a lot more comforting.
 
F

Fredrik Lundh

LaundroMat said:
That 'magic index' variable bugs me a little however. It gives me the
same feeling as when I see hard-coded variables.

what magic index? the variable named "index" is an argument to the
method it's used in.

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top