"Maximum recursion depth exceeded"...why?

T

Thomas Allen

I must not be understanding something. This is a simple recursive
function that prints all HTML files in argv[1] as its scans the
directory's contents. Why do I get a RuntimeError for recursion depth
exceeded?

#!/usr/bin/env python

import os, sys

def main():
absToRel(sys.argv[1], sys.argv[2])

def absToRel(dir, root):
for filename in os.listdir(dir):
if os.path.isdir(filename):
absToRel(filename, root)
else:
if(filename.endswith("html") or filename.endswith("htm")):
print filename

if __name__ == "__main__":
main()
 
M

Martin v. Löwis

Thomas said:
I must not be understanding something. This is a simple recursive
function that prints all HTML files in argv[1] as its scans the
directory's contents. Why do I get a RuntimeError for recursion depth
exceeded?

def main():
absToRel(sys.argv[1], sys.argv[2])

def absToRel(dir, root):
for filename in os.listdir(dir):
if os.path.isdir(filename):

Perhaps you have a symlink somewhere that makes the tree appear to
have infinite depth?

Regards,
Martin
 
T

Thomas Allen

I must not be understanding something. This is a simple recursive
function that prints all HTML files in argv[1] as its scans the
directory's contents. Why do I get a RuntimeError for recursion depth
exceeded?

#!/usr/bin/env python

import os, sys

def main():
    absToRel(sys.argv[1], sys.argv[2])

def absToRel(dir, root):
    for filename in os.listdir(dir):
        if os.path.isdir(filename):
            absToRel(filename, root)
        else:
            if(filename.endswith("html") or filename.endswith("htm")):
                print filename

if __name__ == "__main__":
    main()

Please note that I'm not using os.walk(sys.argv[1]) because the
current depth of recursion is relevant to the transformation I'm
attempting. Basically, I'm transforming a live site to a local one and
the live site uses all absolute paths (not my decision...). I planned
on performing the replace like so for each line:

line.replace(root, "../" * depth)

So that a file in the top-level would simple remove all instances of
root, one level down would sub "../", etc.
 
P

Peter Otten

Thomas said:
I must not be understanding something. This is a simple recursive
function that prints all HTML files in argv[1] as its scans the
directory's contents. Why do I get a RuntimeError for recursion depth
exceeded?

#!/usr/bin/env python

import os, sys

def main():
absToRel(sys.argv[1], sys.argv[2])

def absToRel(dir, root):
for filename in os.listdir(dir):

filename = os.path.join(dir, filename)
if os.path.isdir(filename):
absToRel(filename, root)
else:
if(filename.endswith("html") or filename.endswith("htm")):
print filename

if __name__ == "__main__":
main()

Without the addition for a directory and a subdirectory of the same
name, "dir/dir", os.listdir("dir") has "dir" (the child) in the result list
which triggers an absToRel() call on "dir" (the parent) ad infinitum.

Peter
 
T

Thomas Allen

Thomas said:
I must not be understanding something. This is a simple recursive
function that prints all HTML files in argv[1] as its scans the
directory's contents. Why do I get a RuntimeError for recursion depth
exceeded?
#!/usr/bin/env python
import os, sys
def main():
    absToRel(sys.argv[1], sys.argv[2])
def absToRel(dir, root):
    for filename in os.listdir(dir):

          filename = os.path.join(dir, filename)
        if os.path.isdir(filename):
            absToRel(filename, root)
        else:
            if(filename.endswith("html") or filename.endswith("htm")):
                print filename
if __name__ == "__main__":
    main()

Without the addition for a directory and a subdirectory of the same
name, "dir/dir", os.listdir("dir") has "dir" (the child) in the result list
which triggers an absToRel() call on "dir" (the parent) ad infinitum.

Peter

I have two problems in this case:

1. I don't know how to reliably map the current filename to an
absolute path beyond the top-most directory because my method of doing
so would be to os.path.join(os.getcwd(), filename)

2. For some reason, only one folder in the directory gets marked as a
directory itself when there are about nine others in the top-most
directory. I don't even know where to begin to solve this one.

I'm sure the first is an easy answer, but what do I need to do to
solve the second?
 
P

Peter Otten

Thomas said:
Thomas said:
I must not be understanding something. This is a simple recursive
function that prints all HTML files in argv[1] as its scans the
directory's contents. Why do I get a RuntimeError for recursion depth
exceeded?
#!/usr/bin/env python
import os, sys
def main():
absToRel(sys.argv[1], sys.argv[2])
def absToRel(dir, root):
for filename in os.listdir(dir):

filename = os.path.join(dir, filename)
if os.path.isdir(filename):
absToRel(filename, root)
else:
if(filename.endswith("html") or filename.endswith("htm")):
print filename
if __name__ == "__main__":
main()

Without the addition for a directory and a subdirectory of the same
name, "dir/dir", os.listdir("dir") has "dir" (the child) in the result
list which triggers an absToRel() call on "dir" (the parent) ad
infinitum.

Peter

I have two problems in this case:

1. I don't know how to reliably map the current filename to an
absolute path beyond the top-most directory because my method of doing
so would be to os.path.join(os.getcwd(), filename)

Don't make things more complicated than necessary. If you can do
os.listdir(somedir) you can also do [os.path.join(somedir, fn) for fn in
os.listdir(somedir)].
2. For some reason, only one folder in the directory gets marked as a
directory itself when there are about nine others in the top-most
directory. I don't even know where to begin to solve this one.

I'm sure the first is an easy answer, but what do I need to do to
solve the second?

If you solve the first properly the second might magically disappear. This
is what my crystal ball tells me because there is no code in sight...

Peter
 
T

Thomas Allen

Thomas said:
Thomas Allen wrote:
I must not be understanding something. This is a simple recursive
function that prints all HTML files in argv[1] as its scans the
directory's contents. Why do I get a RuntimeError for recursion depth
exceeded?
#!/usr/bin/env python
import os, sys
def main():
absToRel(sys.argv[1], sys.argv[2])
def absToRel(dir, root):
for filename in os.listdir(dir):
filename = os.path.join(dir, filename)
if os.path.isdir(filename):
absToRel(filename, root)
else:
if(filename.endswith("html") or filename.endswith("htm")):
print filename
if __name__ == "__main__":
main()
Without the addition for a directory and a subdirectory of the same
name, "dir/dir", os.listdir("dir") has "dir" (the child) in the result
list which triggers an absToRel() call on "dir" (the parent) ad
infinitum.
Peter
I have two problems in this case:
1. I don't know how to reliably map the current filename to an
absolute path beyond the top-most directory because my method of doing
so would be to os.path.join(os.getcwd(), filename)

Don't make things more complicated than necessary. If you can do
os.listdir(somedir) you can also do [os.path.join(somedir, fn) for fn in
os.listdir(somedir)].
2. For some reason, only one folder in the directory gets marked as a
directory itself when there are about nine others in the top-most
directory. I don't even know where to begin to solve this one.
I'm sure the first is an easy answer, but what do I need to do to
solve the second?

If you solve the first properly the second might magically disappear. This
is what my crystal ball tells me because there is no code in sight...

Peter

I'm referring to the same code, but with a print:

for file in os.listdir(dir):
if os.path.isdir(file):
print "D", file

in place of the internal call to absToRel...and only one line prints
such a message. I mean, if I can't trust my OS or its Python
implementation (on a Windows box) to recognize a directory, I'm
wasting everyone's time here.

In any case, is this the best way to go about the problem in general?
Or is there already a way to recursively walk a directory, aware of
the current depth?

Thanks,
Thomas
 
M

MRAB

Thomas said:
I must not be understanding something. This is a simple recursive
function that prints all HTML files in argv[1] as its scans the
directory's contents. Why do I get a RuntimeError for recursion depth
exceeded?

#!/usr/bin/env python

import os, sys

def main():
absToRel(sys.argv[1], sys.argv[2])

def absToRel(dir, root):
for filename in os.listdir(dir):

os.listdir() returns a list of filenames, not filepaths. Create the
filepath with os.path.join(dir, filename).
 
T

Thomas Allen

You are under a wrong assumption. You think os.listdir() returns a list
of absolute path elements. In fact it returns just a list of names. You
have to os.path.join(dir, file) to get an absolute path.

Anyway stop reinventing the wheel and use os.walk() as I already
explained. You can easily spot the depth with "directory.count(os.sep)".
 os.path.normpath() helps you to sanitize the path before counting the
number of os.sep.

Christian

If you'd read the messages in this thread you'd understand why I'm not
using os.walk(): I'm not using it because I need my code to be aware
of the current recursion depth so that the correct number of "../" are
substituted in.

Also, somebody mentioned wget -R...did you mean wget -r? In any case,
I have all of these files locally already and am trying to replace
absolute paths with relative ones so that a colleague can present some
website content at a location with no internet.

Thomas
 
T

Thomas Allen

I'm well aware of your messages and your requirements. However you
didn't either read or understand what I was trying to tell you. You
don't need to know the recursion depths in order to find the correct
number of "../".

base = os.path.normpath(base)
baselevel = root.count(os.sep)

for root, dirs, files in os.walk(base):
    level = root.count(os.sep) - baselevel
    offset = level * "../"
    ...

See?

Christian

Very clever (and now seemingly obvious)! That certainly is one way to
measure directory depth; I hadn't thought of counting the separator.
Sorry that I misunderstood what you meant there.

Thanks,
Thomas
 
A

alex23

Something wrong with wget -R ?

Did you mean wget -r ?

That will just grab the entire site, though. I'm guessing that Thomas'
function absToRel will eventually replace the print with something
that changes links accordingly so the local version is traversable.
 
R

rdmurray

alex23 said:
Did you mean wget -r ?

That will just grab the entire site, though. I'm guessing that Thomas'
function absToRel will eventually replace the print with something
that changes links accordingly so the local version is traversable.

Yeah, but wget -r -k will do that bit of it, too.

--RDM
 
T

Thomas Allen

Wow, nice, I don't know why I never noticed that. Cheers!

Hm...doesn't do that over here. I thought it may have been because of
absolute links (not to site root), but it even leaves things like <a
href="/">. Does it work for you guys?
 
R

rdmurray

Thomas Allen said:
Hm...doesn't do that over here. I thought it may have been because of
absolute links (not to site root), but it even leaves things like <a
href="/">. Does it work for you guys?

It works for me. The sample pages I just tested on it don't use
any href="/" links, but my 'href="/about.html"' got properly
converted to 'href="../about.html"'. (On the other hand my '/contact.html'
got converted to a full external URL...but that's apparently because the
contact.html file doesn't actually exist :)

--RDM
 
T

Thomas Allen

It works for me.  The sample pages I just tested on it don't use
any href="/" links, but my 'href="/about.html"' got properly
converted to 'href="../about.html"'.  (On the other hand my '/contact..html'
got converted to a full external URL...but that's apparently because the
contact.html file doesn't actually exist :)

--RDM

Thanks for the help everyone. The idea of counting the slashes was the
linchpin of this little script, and with a little trial and error, I
successfully generated a local copy of the site. I don't think my
colleague knows what went into this, but he seemed appreciative :^)

Thomas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top