How to avoid searching this folder?

geoff · Mar 25, 2011

Hello

I am using Tom Boutell's simple search engine on my website but would
like it to not index the files in a particular folder called archives.

How would I modify the code for this? I have tried and so far failed.

Thanks

Geoff

#!/usr/bin/perl

$path = "/path/public_html";
$webpath = "";
$indexname = "/path/formmail/searchindex.txt";

$nextFd = 0;

open(OUT, ">$indexname");

&update($path, $webpath);

sub update {
my($path, $webpath) = @_;
my($dd) = $nextFd++;
print "Updating in $path\n";
if (!opendir($dd, $path)) {
print STDERR "Warning: can't open $path\n";
return;
}
while ($entry = readdir($dd)) {

if ($entry =~ /^\.$/) {
next;
}

if ($entry =~ /^\.\.$/) {
next;
}
if (-d "$path/$entry") {
&update("$path/$entry", "$webpath/$entry");
next;
}
if (($entry !~ /.html$/i) && ($entry !~ /.htm$/i)) {
next;
}
my($fd) = $nextFd++;
if (!open($fd, "$path/$entry")) {
print STDERR "Warning: can't open
$path/$entry\n";
next;
}
my(%words) = ( );
my($line);
while ($line = <$fd>) {
# Support for turning off the search engine
# indexer for parts of a page. These markers
# must have a line to themselves. 3/13/00
if ($line =~ /<\!\-\- SEARCH-ENGINE-OFF -->/)
{
while ($line = <$fd>) {
if ($line =~ /<\!\-\-
SEARCH-ENGINE-ON -->/) {
last;
}
}
next;
}
# Simple HTML flusher
$line =~ s/\<.*?\>//g;
# Case insensitive
$line =~ tr/A-Z/a-z/;
# If it's not a letter, it's whitespace
$line =~ s/[^a-z]/ /g;
my(@words) = split(/\s+/, $line);
my($p);
for $p (@words) {
if (length($p)) {
$words{$p}++;
}
}
}
print OUT "$webpath/$entry ";
my($first) = 1;
while (($key, $val) = each(%words)) {
print OUT "$val:$key";
if ($first) {
$first = 0;
} else {
print OUT " ";
}
}
print OUT "\n";
close($fd);
}
closedir($dd);
}
close(OUT);

Josef Moellers · Mar 25, 2011

Am 25.3.2011 schrub geoff:

Hello

I am using Tom Boutell's simple search engine on my website but would
like it to not index the files in a particular folder called archives.

How would I modify the code for this? I have tried and so far failed.

You're usually expected to explain/show what you've tried so far ...

I'd put in a check immediately at the beginning of the update sub.
Unless ... you tried that and it didn't work!

Josef

geoff · Mar 25, 2011

Am 25.3.2011 schrub geoff:

You're usually expected to explain/show what you've tried so far ...

Joseph,

I tried

if (!-d "$path/archives") {
next;
}

thinking that the code continues so long as the archives folder is not
found..

Can !-d be used like this? It seems not!

Cheers

Geoff

geoff · Mar 25, 2011

Am 25.3.2011 schrub geoff:

You're usually expected to explain/show what you've tried so far ...

Joseph

I have tried this too

#if (-d "$path/$entry") {
#&update("$path/$entry", "$webpath/$entry");
#next;
#}

if (($entry != "archives") && (-d "$path/$entry")) {
&update("$path/$entry", "$webpath/$entry");
next;
}

thinking that so long as the $entry is not a directory called archives
the code will continue ... still no go.

Geoff

geoff · Mar 25, 2011

^^^^^^^^^^^^^^^^^^^^

You are comparing them as numbers, so each string is converted
into a number before the comparison is tested.

Thanks tad. I changed to ne but still produces an empty file.

I have tried

unless (($entry !~ /members/i) && (-d "$path/$entry")) {
&update("$path/$entry", "$webpath/$entry");
next;
}

but still no go! I guess my logic is wrong?

Cheers

Geoff

Josef Moellers · Mar 25, 2011

Am 25.3.2011 schrub geoff:

Joseph

I have tried this too

#if (-d "$path/$entry") {
#&update("$path/$entry", "$webpath/$entry");
#next;
#}

if (($entry != "archives") && (-d "$path/$entry")) {
&update("$path/$entry", "$webpath/$entry");
next;
}

thinking that so long as the $entry is not a directory called archives
the code will continue ... still no go.

Tad commented about the "!=" vs "ne".

What about case? Maybe the name isn't spelled "archives" but "ARCHIVES"?
Try "if ((lc($entry) ne "archives") && (-d "$path/$entry"))"

Also: have a look at File::Find. You may be re-inventing the wheel.

NB you can drop the "&" in front of the function call.
Josef

Jim Gibson · Mar 25, 2011

On Fri, 25 Mar 2011 11:17:13 +0100, Josef Moellers

I have tried this too

if (($entry != "archives") && (-d "$path/$entry")) {
&update("$path/$entry", "$webpath/$entry");
next;
}

thinking that so long as the $entry is not a directory called archives
the code will continue ... still no go.

Try this:

if ( -d "$path/$entry")) {
next if $entry =~ /archive/i;
update("$path/$entry", "$webpath/$entry");
next;
}

geoff · Mar 25, 2011

Tad commented about the "!=" vs "ne".

What about case? Maybe the name isn't spelled "archives" but "ARCHIVES"?
Try "if ((lc($entry) ne "archives") && (-d "$path/$entry"))"

Joseph,

I tried the above but the code does not work - it produces an empty
searchindex.txt file.

Maybe I have got the wrong approach - could you please look at this
part of the code again and suggest how it should be changed to avoid
indexing the archives folder?

sub update {
my($path, $webpath) = @_;
my($dd) = $nextFd++;
print "Updating in $path\n";
if (!opendir($dd, $path)) {
print STDERR "Warning: can't open $path\n";
return;
}
while ($entry = readdir($dd)) {
if ($entry =~ /^\.$/) {
next;
}
if ($entry =~ /^\.\.$/) {
next;
}
if (-d "$path/$entry") {
&update("$path/$entry", "$webpath/$entry");
next;
}
if (($entry !~ /.html$/i) && ($entry !~ /.htm$/i)) {
next;
}
my($fd) = $nextFd++;
if (!open($fd, "$path/$entry")) {
print STDERR "Warning: can't open
$path/$entry\n";
next;
}
my(%words) = ( );
my($line);

Also: have a look at File::Find. You may be re-inventing the wheel.

I don't think I would dare do this at the moment!

Cheers

Geoff

geoff · Mar 25, 2011

On Fri, 25 Mar 2011 15:11:58 +0100, Josef Moellers

Joseph,

I seem to have got this now!

if ($entry ne "archives" ) {
if (-d "$path/$entry") {
&update("$path/$entry", "$webpath/$entry");
next;
}
}

The above now avoids the archives folder.

Cheers

Geoff

geoff · Mar 25, 2011

Try this:

if ( -d "$path/$entry")) {
next if $entry =~ /archive/i;
update("$path/$entry", "$webpath/$entry");
next;
}

Thanks Jim - yours is much neater than my

if ($entry ne "archives" ) {
if (-d "$path/$entry") {
&update("$path/$entry", "$webpath/$entry");
next;
}
}

!

Cheers

Geoff

John W. Krahn · Mar 26, 2011

Hello
Hello,

I am using Tom Boutell's simple search engine on my website but would
like it to not index the files in a particular folder called archives.

How would I modify the code for this? I have tried and so far failed.

Thanks

Geoff

#!/usr/bin/perl

The next two lines should be:

use warnings;
use strict;

$path = "/path/public_html";
$webpath = "";
$indexname = "/path/formmail/searchindex.txt";

my $path = "/path/public_html";
my $webpath = "";
my $indexname = "/path/formmail/searchindex.txt";

$nextFd = 0;

It looks like you don't really need this variable, so what is it really
supposed to do for your program?

open(OUT, ">$indexname");

You should *always* verify that the file was opened correctly before
trying to use what may be an invalid filehandle:

open OUT, '>', $indexname or die "Cannot open '$indexname' because: $!";

&update($path, $webpath);

In modern versions of Perl you don't need to use ampersands on
subroutine calls:

update($path, $webpath);

sub update {
my($path, $webpath) = @_;
my($dd) = $nextFd++;

Why are you storing a number in a variable that you are going to use for
a directory handle? That makes no sense.

print "Updating in $path\n";
if (!opendir($dd, $path)) {
print STDERR "Warning: can't open $path\n";
return;
}

You should declare variables where you first use them and you should
include $! in the error message so you know why it failed:

opendir my $dd, $path or do {
warn "Warning: can't open '$path' because: $!";
return;
};

while ($entry = readdir($dd)) {

while ( my $entry = readdir $dd ) {

if ($entry =~ /^\.$/) {
next;
}

if ($entry =~ /^\.\.$/) {
next;
}

Or simply:

next if $entry =~ /\A\.\.?\z/;

if (-d "$path/$entry") {
&update("$path/$entry", "$webpath/$entry");
next;
}
if (($entry !~ /.html$/i)&& ($entry !~ /.htm$/i)) {
next;
}

You have to escape the period or it will match any character and you can
combine both regular expressions into one (same as example above):

next if $entry !~ /\.html?$/i;

my($fd) = $nextFd++;

Why are you storing a number in a variable that you are going to use for
a filehandle? That makes no sense.

if (!open($fd, "$path/$entry")) {
print STDERR "Warning: can't open
$path/$entry\n";
next;
}

You should declare variables where you first use them and you should
include $! in the error message so you know why it failed:

open my $fd, '<', "$path/$entry" or do {
warn "Warning: can't open '$path/$entry' because: $!";
next;
};

my(%words) = ( );

Or just:

my %words;

my($line);
while ($line =<$fd>) {

Or just:

# Support for turning off the search engine
# indexer for parts of a page. These markers
# must have a line to themselves. 3/13/00
if ($line =~ /<\!\-\- SEARCH-ENGINE-OFF -->/)
{
while ($line =<$fd>) {
if ($line =~ /<\!\-\-
SEARCH-ENGINE-ON -->/) {
last;
}
}
next;
}
# Simple HTML flusher
$line =~ s/\<.*?\>//g;
# Case insensitive
$line =~ tr/A-Z/a-z/;
# If it's not a letter, it's whitespace
$line =~ s/[^a-z]/ /g;

You could also use tr/// for that:

$line =~ tr/a-z/ /c;

my(@words) = split(/\s+/, $line);

That might be better as:

my @words = split ' ', $line;

my($p);
for $p (@words) {

Better as:

for my $p ( @words ) {

if (length($p)) {

Why would $p have zero length? Probably because you are using /\s+/
instead of ' ' as the first argument to split which will give you a zero
length string if there is leading whitespace in $line.

$words{$p}++;
}
}
}
print OUT "$webpath/$entry ";
my($first) = 1;

Why are you forcing list context on a scalar assignment?

while (($key, $val) = each(%words)) {

Better as:

while ( my ( $key, $val ) = each %words ) {

print OUT "$val:$key";
if ($first) {
$first = 0;
} else {
print OUT " ";
}

So you want no space between the first and second "$val:$key" but a
space after every other occurrence of "$val:$key" including at the end
of the line?

}
print OUT "\n";

It looks like you could probably do that while loop like this instead:

print OUT join( ' ', map "$words{$_}:$_", keys %words ), "\n";

close($fd);
}
closedir($dd);
}
close(OUT);

John

geoff · Mar 27, 2011

[email protected] said:
[email protected] said:

Hello
Hello,

I am using Tom Boutell's simple search engine on my website but would
like it to not index the files in a particular folder called archives.

How would I modify the code for this? I have tried and so far failed.

Thanks

Geoff

#!/usr/bin/perl

Click to expand...

The next two lines should be:

use warnings;
use strict;

$path = "/path/public_html";
$webpath = "";
$indexname = "/path/formmail/searchindex.txt";

Click to expand...

my $path = "/path/public_html";
my $webpath = "";
my $indexname = "/path/formmail/searchindex.txt";

$nextFd = 0;

Click to expand...

It looks like you don't really need this variable, so what is it really
supposed to do for your program?

open(OUT, ">$indexname");

Click to expand...

You should *always* verify that the file was opened correctly before
trying to use what may be an invalid filehandle:

open OUT, '>', $indexname or die "Cannot open '$indexname' because: $!";

&update($path, $webpath);

Click to expand...

In modern versions of Perl you don't need to use ampersands on
subroutine calls:

update($path, $webpath);

sub update {
my($path, $webpath) = @_;
my($dd) = $nextFd++;

Click to expand...

Why are you storing a number in a variable that you are going to use for
a directory handle? That makes no sense.

print "Updating in $path\n";
if (!opendir($dd, $path)) {
print STDERR "Warning: can't open $path\n";
return;
}

Click to expand...

You should declare variables where you first use them and you should
include $! in the error message so you know why it failed:

opendir my $dd, $path or do {
warn "Warning: can't open '$path' because: $!";
return;
};

while ($entry = readdir($dd)) {

Click to expand...

while ( my $entry = readdir $dd ) {

if ($entry =~ /^\.$/) {
next;
}

if ($entry =~ /^\.\.$/) {
next;
}

Click to expand...

Or simply:

next if $entry =~ /\A\.\.?\z/;

if (-d "$path/$entry") {
&update("$path/$entry", "$webpath/$entry");
next;
}
if (($entry !~ /.html$/i)&& ($entry !~ /.htm$/i)) {
next;
}

Click to expand...

You have to escape the period or it will match any character and you can
combine both regular expressions into one (same as example above):

next if $entry !~ /\.html?$/i;

my($fd) = $nextFd++;

Click to expand...

Why are you storing a number in a variable that you are going to use for
a filehandle? That makes no sense.

if (!open($fd, "$path/$entry")) {
print STDERR "Warning: can't open
$path/$entry\n";
next;
}

Click to expand...

You should declare variables where you first use them and you should
include $! in the error message so you know why it failed:

open my $fd, '<', "$path/$entry" or do {
warn "Warning: can't open '$path/$entry' because: $!";
next;
};

my(%words) = ( );

Click to expand...

Or just:

my %words;

my($line);
while ($line =<$fd>) {

Click to expand...

Or just:

# Support for turning off the search engine
# indexer for parts of a page. These markers
# must have a line to themselves. 3/13/00
if ($line =~ /<\!\-\- SEARCH-ENGINE-OFF -->/)
{
while ($line =<$fd>) {
if ($line =~ /<\!\-\-
SEARCH-ENGINE-ON -->/) {
last;
}
}
next;
}
# Simple HTML flusher
$line =~ s/\<.*?\>//g;
# Case insensitive
$line =~ tr/A-Z/a-z/;
# If it's not a letter, it's whitespace
$line =~ s/[^a-z]/ /g;

Click to expand...

You could also use tr/// for that:

$line =~ tr/a-z/ /c;

my(@words) = split(/\s+/, $line);

Click to expand...

That might be better as:

my @words = split ' ', $line;

my($p);
for $p (@words) {

Click to expand...

Better as:

for my $p ( @words ) {

if (length($p)) {

Click to expand...

Why would $p have zero length? Probably because you are using /\s+/
instead of ' ' as the first argument to split which will give you a zero
length string if there is leading whitespace in $line.

$words{$p}++;
}
}
}
print OUT "$webpath/$entry ";
my($first) = 1;

Click to expand...

Why are you forcing list context on a scalar assignment?

while (($key, $val) = each(%words)) {

Click to expand...

Better as:

while ( my ( $key, $val ) = each %words ) {

print OUT "$val:$key";
if ($first) {
$first = 0;
} else {
print OUT " ";
}

Click to expand...

So you want no space between the first and second "$val:$key" but a
space after every other occurrence of "$val:$key" including at the end
of the line?

}
print OUT "\n";

Click to expand...

It looks like you could probably do that while loop like this instead:

print OUT join( ' ', map "$words{$_}:$_", keys %words ), "\n";

close($fd);
}
closedir($dd);
}
close(OUT);

Click to expand...

John

John,

You have really made a lot of no dount useful comments but the code is
not mine - it came from Tom Boutell's site and my only concern was to
be able to avoid indexing some particular files/folders.

Cheers

Geoff

How to loop in folder through all excel files and all sheets using pandas?	0	Dec 1, 2022
Need help with this script	4	Mar 12, 2023
How to fix this code?	1	Sep 22, 2023
How to speed this code	3	Nov 16, 2022
How to connect to shared folder and access files from it using python	1	Jun 25, 2020
Efficiently searching multiple files	10	May 20, 2010
I have to finish this code for my assignment but I cant figure out how to solve it	1	Jun 27, 2023
Translater + module + tkinter	1	Feb 16, 2023

How to avoid searching this folder?

geoff

Josef Moellers

geoff

geoff

geoff

Josef Moellers

Jim Gibson

geoff

geoff

geoff

John W. Krahn

geoff

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads