Handling and recursing subdirectories

K

Kloudnyne

I'm a newcomer to Perl, and am currently attempting to teach myself
Perl through using it, but I have currently come across an issue I
can't seem to see any way around.

I am trying to write a Perl script that will go through a series of
directories and their subdirectories, removing javascript, images,
bots, etc from HTML files in order to provide a text-reader friendly
version of each page. The actual conversion of any given file has been
taken care of, thanks to code heavily borrowed from an existing
script, but I can't seem to work out how I can get it to recurse
through the various subdirs.

The snippet of code I've thrown together for it so far is:

***

sub reading {
do {
opendir (CURRENTFOLDER, $htmdir) || die 'Ay Señor! Los bandidos have
raided that directory!'
while defined($filename = readdir(FOLDER)) = True {
$nesting = directorycheck(); #nesting tells me how deep we are into
subdirectories. a zero value is at the root of the process. only
really intended as a flag for testing
#dircheck checks to see if our victim this cycle is a
subdirectory. If it is, (I hope) we'll launch in to a nested subcycle.
}

closedir(CURRENTFOLDER); # with a little luck, re-opening the
previous folder will have the pointer still at the last position
checked, or else we have uber-recursives
chop ($htmdir); #prepping the string to ensure that the trailing char
is NOT a / (not that it should be anyway)
do {

}
until (chop($htmdir) ne '/');
# now that we've gone back to (and removed) the / nearest to the end
of the handle, we've effectively gone back to the parent directory
$nesting -- ;
}
until $htmdir = $htmroot;
}


sub directorycheck {
if (-d $filename) {
dircheck = $nesting + 1 ;
$htmdir = $filename .=$htmdir;
chdir ($htmdir);
} else {
$txtdir = $htmdir; # sets $txtdir to mirror $htmdir, but in the
/txt/ directory, where we want our output to be.
$txtdir =~s/htdocs/txt/; #(hopefully) changes the file path for
output to the /txt/ equivalent of the current /htdocs/ folder
parsetxt(); # only parses if we've hit a file, rather than a subdir.
}
}

***

where "parsetxt" is the subroutine that handles the actual conversion.
However, I can't even get this to compile, let alone run it to see if
it just dies or recurses away to infinity, or whatever.

The script is intended to run on a linux box acting as a webserver,
but for purposes of writing/testing I'm using ActivePerl 5.8 on a
win2k machine.

My question, after all this explanation, is this: Am I barking up the
wrong tree here, or am I just missing one little thing that will make
all this work? If anyone else has a piece of code that will fulfil my
requirements and make my life easier, you will have my undying
gratitude, because at this point I'm seriously starting to reconsider
scripting and just perform the conversions manually.

Thanks for your time.


PS: I apologise for the hideous formatting. It's actually quite
legible on a full-width screen, and I didn't want to disturb the text
for fear of accidentally altering the code.
 
P

Paul Lalli

Kloudnyne said:
I am trying to write a Perl script that will go through a series of
directories and their subdirectories, removing javascript, images,
bots, etc from HTML files in order to provide a text-reader friendly
version of each page. The actual conversion of any given file has been
taken care of, thanks to code heavily borrowed from an existing
script, but I can't seem to work out how I can get it to recurse
through the various subdirs.

The snippet of code I've thrown together for it so far is:

My question, after all this explanation, is this: Am I barking up the
wrong tree here, or am I just missing one little thing that will make
all this work? If anyone else has a piece of code that will fulfil my
requirements and make my life easier, you will have my undying
gratitude, because at this point I'm seriously starting to reconsider
scripting and just perform the conversions manually.

The standard (that is, included with your Perl distibution) module
File::Find is what you want to use to recurse through directories. Read
about it by typing the command
perldoc File::Find
at your shell prompt. The CPAN modules File::Finder and
File::Find::Rule also exist if you prefer an alternate syntax.

In the more general case, whenever you find yourself trying to do
something in Perl that has most likely done before (surely you don't
think you're the only one who's ever needed to recurse through a
directory structure, do you?), you should always check to see if a
module exists which already does it. Modules are stored and shared on
the CPAN, which you can search at http://search.cpan.org

Give File::Find a shot, and if you have problems with it, feel free to
ask for help.

Paul Lalli
 
P

Paul Lalli

Kloudnyne said:
I am trying to write a Perl script that will go through a series of
directories and their subdirectories, removing javascript, images,
bots, etc from HTML files in order to provide a text-reader friendly
version of each page. The actual conversion of any given file has been
taken care of, thanks to code heavily borrowed from an existing
script, but I can't seem to work out how I can get it to recurse
through the various subdirs.

The snippet of code I've thrown together for it so far is:

My question, after all this explanation, is this: Am I barking up the
wrong tree here, or am I just missing one little thing that will make
all this work? If anyone else has a piece of code that will fulfil my
requirements and make my life easier, you will have my undying
gratitude, because at this point I'm seriously starting to reconsider
scripting and just perform the conversions manually.

The standard (that is, included with your Perl distibution) module
File::Find is what you want to use to recurse through directories. Read
about it by typing the command
perldoc File::Find
at your shell prompt. The CPAN modules File::Finder and
File::Find::Rule also exist if you prefer an alternate syntax.

In the more general case, whenever you find yourself trying to do
something in Perl that has most likely done before (surely you don't
think you're the only one who's ever needed to recurse through a
directory structure, do you?), you should always check to see if a
module exists which already does it. Modules are stored and shared on
the CPAN, which you can search at http://search.cpan.org

Give File::Find a shot, and if you have problems with it, feel free to
ask for help.

Paul Lalli
 
A

Anno Siegel

[...]
script, but I can't seem to work out how I can get it to recurse
through the various subdirs.

You want File::Find (a standard module).

[code snipped]
PS: I apologise for the hideous formatting. It's actually quite
legible on a full-width screen, and I didn't want to disturb the text
for fear of accidentally altering the code.

....so you left the formatting to Usenet, which really messed it up.

Anno
 
J

Joe Smith

Kloudnyne said:
If anyone else has a piece of code that will fulfil my requirements
and make my life easier, you will have my undying gratitude...

use File::Find;
sub process { print "Found file $_ in $File::Find::dir\n" if -f $_; }
find(\&process,'/tmp');

-Joe
 
K

Kloudnyne

<snip>

Thanks for your help. I apologise again for my blatantly obvious noobness.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top