free source search engine (simple) ## comments?

R

Robin

This is a very a simple search engine that prints the filenames of the files
and a link to them, it's not very advanced but I used it as an excersise
into writing search engines, and I'm planning on making it into a more
advanced one in the future I am aware of the race conditions in the header
and footer subs and their suckiness and am working now to fix them, I got
some advice from someone on the perl beginners yahoo list, but to work their
code I have to understand it and I don't so once I read up I'll fix the race
condition, anyway, any comments would be nice.
-Robin

btw, unless I change newsreaders which I haven't had much success with, I've
downloaded like 5 and none seem to work for me cause they don't work with my
mail server my indenting is gonna be screwed up, so please bear with me.

#!/usr/bin/perl

use strict;
use warnings;

use Fcntl qw :)flock);

use CGI qw:)all);

$CGI::pOST_MAX=1024 * 100; # max 100K posts
$CGI::DISABLE_UPLOADS = 1; # no uploads

$" = '';

$ENV{'PATH'} = '/bin:/usr/bin:/usr/local/bin';

my @directories = ("./", "../"); #change this to the directories you want to
have searched. Include the slash at the end of the directory.
my $action = url_param ('action');
my $rootfile = url (relative=>1);
my $headerfile = "searchheader.txt";
my $footerfile = "searchfooter.txt";
my $errorfile = "ERR.txt";
my @head = getheader ($headerfile);
my @foot = getfooter ($footerfile);
my $date = getdate ();
my @errors;
my @finaldirs;
checkerrors ();

if ($action eq "search")
{
search ();
}

else
{
newsearch ();
}

sub search
{
print header;
print (@head);
#code for parsing results
foreach my $dir (@directories)
{
opendir (DIR, $dir);
my @files_from_dir = readdir (DIR);
closedir (DIR);
foreach my $filefromdir (@files_from_dir)
{
if (! -d $filefromdir)
{
push (@finaldirs, "$dir$filefromdir")
}
}
}
my $query = param ('query');
my @finalresults;
foreach my $file (@finaldirs)
{
open (FILE, $file) or push (@errors, "A file open error occured on $file:
$!.");
flock (FILE, LOCK_SH) or push (@errors, "A file lock error occured on
$file: $!.");
checkerrors ();
my @contents;
my $contents;
@contents = <FILE> if (-f $file);
close (FILE);
chomp (@contents);
$contents = join ('', @contents);
$contents =~ s/<.*>//g;
my $result;
if ($contents =~ m/$query/ and $query)
{
$result = "<a href=\"$file\">$file</a><br>";
push (@finalresults, $result);
}
}
print <<END;
<p><strong><em>Infused Search</em></strong>
<br>
<br>
Search Results:
</p>
END
print @finalresults;
print <<END;
<hr size="1">
</body>
</html>
END
}


sub newsearch
{
print header;
print (@head);
print <<END;
<strong><em>Infused Search</em></strong>
<br>
<br>
<hr size="1">
<form name="form1" method="post" action="search.pl?action=search">
<input type="text" name="query">
<input type="submit" name="Submit" value="Submit">
</form>
<hr size="1">
END
print (@foot);
}

sub checkerrors
{
if (@errors)
{
print header;
print "<html><body><center>";
print "There were errors while trying to execute Infused Search. They are
listed as follows.<br><br>\n";
foreach my $error (@errors)
{
print ($error, "<br>\n");
}
my $errflag = 0;
if (! open (ERRORF, ">>$errorfile") and flock (ERRORF, LOCK_EX))
{
print "There was an error logging the errors: file cannot be locked or
opened.<br>";
$errflag = 1;
}
else
{
print ERRORF ("Current date: $date", "\n");
foreach my $error2 (@errors)
{
print ERRORF $error2, "\n";
}
}
close (ERRORF);
if (! $errflag)
{
print "<br>", "Errors have been logged in $errorfile.";
}
print "</body></html>";
exit (0);
}
else
{
return;
}
}

sub getheader
{
my $header_sub = shift;
my (@headertoret);
if (-e $header_sub)
{
open (HEADERF, $header_sub) or push (@errors, "A file open error occured
on $header_sub: $!.");
flock (HEADERF, LOCK_SH) or push (@errors, "A file lock error occured on
$header_sub: $!.");
@headertoret = <HEADERF>;
close (HEADERF);
}
else
{
open (HEADERF, ">$header_sub") or push (@errors, "A file open error
occured on $header_sub: $!.");
flock (HEADERF, LOCK_EX) or push (@errors, "A file lock error occured on
$header_sub: $!.");
@headertoret = <<END;
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Infused Search</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
END
print HEADERF @headertoret;
close (HEADERF);
}
return (@headertoret);
}

sub getfooter
{
my $footer_sub = shift;
my (@footertoret);
if (-e $footer_sub)
{
open (HEADERF, $footer_sub) or push (@errors, "A file open error occured
on $footer_sub: $!.");
flock (HEADERF, LOCK_SH) or push (@errors, "A file lock error occured on
$footer_sub $!.");
@footertoret = <HEADERF>;
close (HEADERF);
}
else
{
open (HEADERF, ">$footer_sub") or push (@errors, "A file open error
occured on $footer_sub $!.");
flock (HEADERF, LOCK_EX) or push (@errors, "A file lock error occured on
$footer_sub $!.");
@footertoret = <<END;
</body></html>
END
print HEADERF @footertoret;
close (HEADERF);
}
return (@footertoret);
}

sub getdate
{
my ($day, $mon, $year)=(localtime)[3,4,5];
$mon++; #month is returned in a 0-11 range
$year +=1900;
my $date = $mon . "/" . $day . "/" . $year;
return $date;
}
 
S

Sherm Pendley

Robin said:
btw, unless I change newsreaders which I haven't had much success with,
I've downloaded like 5 and none seem to work for me cause they don't work
with my mail server my indenting is gonna be screwed up, so please bear
with me.

You mentioned earlier about spaces and tabs getting screwed up. I've seen
that happen many times - the problem is that everyone has their own idea
where the "right" place to put tab stops is. Some folks put them at every 4
columns, some every 8, some every 2, etc. If you use tabs in your file, and
someone with different tab settings views the file, the formatting gets
totally hosed.

The solution is to avoid using tabs. I'm *not* saying don't indent your
code! :) I'm just saying to use spaces to do it, instead of tabs. Most
programmer's editors have an option to insert spaces when you hit the tab
key, so you'll hardly notice a difference when typing.

sherm--
 
R

Richard Morse

Robin said:
btw, unless I change newsreaders which I haven't had much success with, I've
downloaded like 5 and none seem to work for me cause they don't work with my
mail server my indenting is gonna be screwed up, so please bear with me.

One possibility: run the following on your script file before you paste
it into the message:

perl -i.orig -p -e "s/\t/ /g" my_perl_script

where my_perl_script is the file with the code you want to paste.
That's four space characters in the replacement string, although you can
change that if you wish.

HTH,
Ricky
 
A

Ala Qumsieh

btw, unless I change newsreaders which I haven't had much success with, I've
downloaded like 5 and none seem to work for me cause they don't work with my
mail server my indenting is gonna be screwed up, so please bear with me.

Two suggestions:

1. Download Mozilla from www.mozilla.org.

2. Please, please, please do not post your code for review on the newsgroup.
You can post asking for other people to review your code, but provide a link
of where they can download your code instead. I don't see any benefit of
posting your code as it is generally easier for other people to download it
by clicking a button than by copy/pasting it from their newsreader.

--Ala
 
G

gnari

robin, robin, robin.

here you go again.

your biggers problem is that you do not really listen
to the advice you are given.
if you do not understand what we are trying to tell you,
it is much better to ask for clarification than make the
exact sames mistakes in your next post.

for example, you were told by a friendly soul in a recent post
why your search() will fail for other directories than './',
but this version still has the same problems.

you were also told by a more mischievous soul not to distribute
cgi scripts publicly because they are full of security holes,
and easily abused. this script , for example, can be used to hack
your site (again).

you do have a certain degree of tenacity, and your code is improving
a bit, but the regulars here would be more positive towards you
if you just showed some sign that you are trying to listen and learn.

gnari
 
R

Robin

your biggers problem is that you do not really listen
to the advice you are given.
if you do not understand what we are trying to tell you,
it is much better to ask for clarification than make the
exact sames mistakes in your next post.

for example, you were told by a friendly soul in a recent post
why your search() will fail for other directories than './',
but this version still has the same problems.

ok, I finally figured out why it's not getting the other directories...
you were also told by a more mischievous soul not to distribute
cgi scripts publicly because they are full of security holes,
and easily abused. this script , for example, can be used to hack
your site (again).

how can this one be used to hack my site? I'm curious...
 
R

Robin

Ala Qumsieh said:
with

Two suggestions:

1. Download Mozilla from www.mozilla.org.

2. Please, please, please do not post your code for review on the newsgroup.
You can post asking for other people to review your code, but provide a link
of where they can download your code instead. I don't see any benefit of
posting your code as it is generally easier for other people to download it
by clicking a button than by copy/pasting it from their newsreader.

Ok, sounds good. Next time I will post the link.

-Robin
 
R

Robin

Sherm Pendley said:
You mentioned earlier about spaces and tabs getting screwed up. I've seen
that happen many times - the problem is that everyone has their own idea
where the "right" place to put tab stops is. Some folks put them at every 4
columns, some every 8, some every 2, etc. If you use tabs in your file, and
someone with different tab settings views the file, the formatting gets
totally hosed.

The solution is to avoid using tabs. I'm *not* saying don't indent your
code! :) I'm just saying to use spaces to do it, instead of tabs. Most
programmer's editors have an option to insert spaces when you hit the tab
key, so you'll hardly notice a difference when typing.

spaces eh? That's a good call:
-Robin
 
R

Robin

Richard Morse said:
One possibility: run the following on your script file before you paste
it into the message:

perl -i.orig -p -e "s/\t/ /g" my_perl_script

where my_perl_script is the file with the code you want to paste.
That's four space characters in the replacement string, although you can
change that if you wish.

Thanks, I that's a very nifty example of code there.
-Robin
 
G

gnari

Robin said:
how can this one be used to hack my site? I'm curious...

it is a consequence of your habit of keeping securty related files
in your web directory. in the same directory where your
'search engine' is reading. do you see the implications of that ?

gnari
 
R

Robin

gnari said:
it is a consequence of your habit of keeping securty related files
in your web directory. in the same directory where your
'search engine' is reading. do you see the implications of that ?

what's your definition of a security related file? My stuff is mainly just
my personal site and zip files for various scripts and doc files...
-Robin
 
T

Tassilo v. Parseval

Also sprach Sherm Pendley:
You mentioned earlier about spaces and tabs getting screwed up. I've seen
that happen many times - the problem is that everyone has their own idea
where the "right" place to put tab stops is. Some folks put them at every 4
columns, some every 8, some every 2, etc. If you use tabs in your file, and
someone with different tab settings views the file, the formatting gets
totally hosed.

The solution is to avoid using tabs. I'm *not* saying don't indent your
code! :) I'm just saying to use spaces to do it, instead of tabs. Most
programmer's editors have an option to insert spaces when you hit the tab
key, so you'll hardly notice a difference when typing.

Actually - and I was told about that on perl5-porters - using tabs is
the right thing to do. Or at least partly, as the perl source
conventions are a little more complicated. The first level of
indentation is four spaces (no tab there). The second level would be
eight spaces and is then transformed into one tab. The third level would
then be one tab plus four spaces, and so on.

Any proper editor can be told to follow this convention automatically.
They also have settings for setting the visual tab width so that your
code will looks as though there are only spaces.

In the end, your editor will display these files correctly. But when you
look at it through a pager or so, it will look strange, depending on how
many spaces it uses per tab (I have an alias which maps 'less' to
'less -x4 -r' which would be four spaces per tab).

Tassilo
 
A

Anno Siegel

Tassilo v. Parseval said:
Also sprach Sherm Pendley:


Actually - and I was told about that on perl5-porters - using tabs is

Just waitaminute. We're talking about formatting Perl, not the perl
source.
the right thing to do. Or at least partly, as the perl source
conventions are a little more complicated. The first level of
indentation is four spaces (no tab there). The second level would be
eight spaces and is then transformed into one tab. The third level would
then be one tab plus four spaces, and so on.

I have noticed this strange convention. If there is a reason for it, I
don't want to know...
Any proper editor can be told to follow this convention automatically.
They also have settings for setting the visual tab width so that your
code will looks as though there are only spaces.

I guess so.
In the end, your editor will display these files correctly. But when you
look at it through a pager or so, it will look strange, depending on how
many spaces it uses per tab (I have an alias which maps 'less' to
'less -x4 -r' which would be four spaces per tab).

Well, that's fine when the source requires that format, as the Perl source
seems to do. But you're not recommending it for new sources (C, Perl
or otherwise), are you?

Tabs where introduced with mechanical typewriters to save typing time.
In a programming environment they have the additional advantage of saving
space, so early programmers adopted them eagerly. But they never worked
really well, because every bit of hard- or software has its own ideas of
what a tab really means.

These days, editors do the time-saving, and byte-counting has ceased
to be profitable. Tabs might have been useful with more standardization,
but they aren't now.

Anno
 
G

gnari

Robin said:
what's your definition of a security related file? My stuff is mainly just
my personal site and zip files for various scripts and doc files...

well a password file for your blogger would qualify, as long as you do not
want other people to make unauthorized entries.

do you want a demonstration ?

gnari
 
T

Tassilo v. Parseval

Also sprach Anno Siegel:
Just waitaminute. We're talking about formatting Perl, not the perl
source.

I was under the impression that we were talking about indentation of
source code in general.
I have noticed this strange convention. If there is a reason for it, I
don't want to know...

Futile to seek a reason when Perl is the topic. :)
I guess so.


Well, that's fine when the source requires that format, as the Perl source
seems to do. But you're not recommending it for new sources (C, Perl
or otherwise), are you?

Not necessarily, right. My main point was just that the principle of tab
avoiding is maybe not as common among projects as it seems.
Incidentally, I had thought the perl source used spaces instead of tabs
(I had never looked that closely) until someone pointed out to me that
the whitespacing in one of my patches was slightly off.
Tabs where introduced with mechanical typewriters to save typing time.
In a programming environment they have the additional advantage of saving
space, so early programmers adopted them eagerly. But they never worked
really well, because every bit of hard- or software has its own ideas of
what a tab really means.

These days, editors do the time-saving, and byte-counting has ceased
to be profitable. Tabs might have been useful with more standardization,
but they aren't now.

So when tabs are still used in perl nowadays, this probably means it is
a left-over from those times when tabbing was still an issue. So this
convention probably remains only for the sake of uniformity.

Tassilo
 
M

Michele Dondi

This is a very a simple search engine that prints the filenames of the files
and a link to them, it's not very advanced but I used it as an excersise
into writing search engines, and I'm planning on making it into a more
advanced one in the future I am aware of the race conditions in the header
and footer subs and their suckiness and am working now to fix them, I got
some advice from someone on the perl beginners yahoo list, but to work their
code I have to understand it and I don't so once I read up I'll fix the race
condition, anyway, any comments would be nice.

Dear Robin,


I *do* have a comment: is there any good reason why you continue to
post to this group whole projects of yours, occasionally claiming to
make them available publicly, whereas (i) you've been repeatedly been
warned about several different issues with both the code proper and
this "activity", (ii) even if this may come as a surprise to you,
there are people here who could carry on equivalent projects in a much
more reliable/secure/etc. way (and indeed someone who does that on a
professional basis) *and* they do *not* post here tons of code for
review?

Personally I think that there's nothing wrong with being proud about
our own little successes and new learnings. But this group, for one
thing, is about *Perl*, not about web development. (Even if using
Perl!)

BTW: are you aware that perl is not a CGI-only thing?!?

This group is for discussions about Perl: OTs and asides are welcome
to some extent, but there are precise directions not too waste too
much bandwidth not to say people's own time that are just at the same
time quite *reasonable* behavioural rules, that comprise, for example
the request to post only minimal examples, to the extent that this is
possible.

Now what would happen if 5 other Robin's popped out each posting
his/her own new bbs (crappy) code, guestbook (crappy) code, blog
(crappy) code, search engine (crappy) code, (crappy) review of some
product/book, etc.?!?

As I said, there's nothing wrong *IMHO* with, say, sharing some piece
of code and asking for comments; just as a somewhat self-referential
example see this old post of mine:
btw, unless I change newsreaders which I haven't had much success with, I've ^^^^^^^^^^^
downloaded like 5 and none seem to work for me cause they don't work with my
mail server my indenting is gonna be screwed up, so please bear with me.
^^^^^^^^^^^

Unless you *do* want to use your newsreader also as a mail client,
which is not unreasonable, but may also not be fundamental for you,
being it unable to "work with your mail server" (*which* mail server,
BTW?) shouldn't make it impossible to use it for reading and writing
usenet articles...

Also, sorry for the intended sarcasm, but isn't it strange that a
self-declared (not Perl but) "win32 expert" can't properly setup at
least one out of *five* different newsreaders?!?


Michele
 
R

Robin

gnari said:
well a password file for your blogger would qualify, as long as you do not
want other people to make unauthorized entries.

do you want a demonstration ?

true, they'd still have to guess the password though :)
-Robin
 
R

Robin

Unless you *do* want to use your newsreader also as a mail client,
which is not unreasonable, but may also not be fundamental for you,
being it unable to "work with your mail server" (*which* mail server,
BTW?) shouldn't make it impossible to use it for reading and writing
usenet articles...

Also, sorry for the intended sarcasm, but isn't it strange that a
self-declared (not Perl but) "win32 expert" can't properly setup at
least one out of *five* different newsreaders?!?

None of them offerered smtp authentication, I'll keep looking. -Robin
 
G

gnari

[snip discussion about how his script can compromise his site]
true, they'd still have to guess the password though :)

Robin, you are not *listening*.
I was telling you: your search script gave me the password.

look at your blog if you need proof:
http://www.infusedlight.net/robin/blogger.pl

this would not be a big deal, it it was *just* your personal
blog software, but you have been offering your software
to other people without adequate warnings about not
using it for real.

gnari
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top