Masking by columns for grep

S

Shannon Jacobs

I want to restrict a grep to certain columns. After extensive contemplation
and a lot of study of the camel book (1991 printing), various manuals and
Web pages, and assorted newsgroup posts, I finally indirectly arrived at
this:

@foo2 = grep(/.{50}$form_values{'a_SEARCH_VALUE'}.{6}/,@foo1);

That's how it appears in the actual file, though for the sake of this
question, a reduced form would be:

@foo = grep(/.{50}$theString.{6},@foo);

The "records" are fixed length, and I want to ignore the first 50 characters
and the last 6. This approach mostly works (unless the string has a | in it,
which is probably a different problem), but it seems to have pegged my
nauseous code detector. I'm interested in constructive suggestions or
bemusing feedback. (Or maybe I should just appeal for a reference to a
source to replace the entire grotesque thing, appealing to the camel page
xviii citation of "laziness" as the first great virtue. Right now this thing
feeds off of another database system that goes back over 20 years, and it
would probably be a good thing to port the entire mess to some reasonable
environment...)

For the sake of context, the entire program appears below. It is actually
live on the Web at shanenj.tripod.com/search0.html. (Yes, I know Tripod has
a brain-damaged interface, but returning the encoded error messages in the
URL is kind of unique. I also know I should move to a decent server.
Consider Perl/CGI on Tripod as one of those semi-religious rituals, rather
like flogging oneself with a chain.)

Warning: If you proceed past this point you are at risk of laughing yourself
sick. By any standard, I'm sure this is some really peculiar old code,
containing about 10 years of occasional hacking and tweaking. I think it was
the Lisp influence that did it... Or maybe the parts that came from the
Chinese guy with the Berkeley Ph.D.? (I'm sure he doesn't want a more
explicit citation for what I've done to his code.) Or the Y2K windowing? Or
something. <Imagine a movie of the "Lost in Space" robot wildly waving its
arms and shouting "Danger, Will Robinson! Danger!">

#!/usr/local/bin/perl

print '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">';
print "Content-type: text/html\n\n";

print "<HTML><HEAD><TITLE>Search Result</TITLE>\n";
print '<SCRIPT language="JavaScript" src="../formfunc.js"></SCRIPT>';
print "\n";
print '<SCRIPT language="JavaScript">var inHTMLdate = "';
print scalar localtime;
print '";</SCRIPT>';
print "\n";
print "</HEAD>\n";
print "<BODY bgcolor=\"#00bfbf\">\n";

print "<hr size=8>\n";
print "<center><h3>BookList Search Form Result</H3></center>\n";
print "<hr size=8>";

%form_values = &html_parse;

if ( $form_values{'a_SEARCH_VALUE'} ) {
open(DATAFILE,'titles.txt');
$i = 0;
while (<DATAFILE>) { $foo0[$i++] = $_; }
close(DATAFILE);

open(AUTHFILE,'authors.txt');
$i = 1;
while (<AUTHFILE>) { $authlist[$i++] = $_; }
close(AUTHFILE);

# Time for fishing against authors (Main event of 8JUN2005)
if ($form_values{'AuthorFishing'} eq 'gofish' ) {
@foobar = grep(/$form_values{'a_SEARCH_VALUE'}/i,@authlist);
$form_values{'a_SEARCH_VALUE'} = '';
foreach $GREPPED (@foobar) {
$form_values{'a_SEARCH_VALUE'} =
$form_values{'a_SEARCH_VALUE'}.substr($GREPPED,34,4)."|";
}
$form_values{'a_SEARCH_VALUE'} = $form_values{'a_SEARCH_VALUE'}."90AZ";
}

# Main searches from here
if ($form_values{'sorttype'} eq 'title' ) {
@foo1 = sort ignore3 @foo0
} else { if ($form_values{'sorttype'} eq 'subject') {
@foo1 = sort {substr($a,62,6) cmp substr($b,62,6)} @foo0
} else {
@foo1 = @foo0
}}

if ($form_values{'searchfields'} eq 'authornums' ) {
@foo2 = grep(/.{50}$form_values{'a_SEARCH_VALUE'}.{6}/,@foo1);
# fut } else { if ($form_values{'searchfields'} eq 'dates' ) {
# @foo2 = grep(/$form_values{'a_SEARCH_VALUE'}/,@foo2);
# } else { if ($form_values{'searchfields'} eq 'subjects' ) {
# fut @foo2 = grep(/$form_values{'a_SEARCH_VALUE'}/,@foo2);
} else {
@foo2 = @foo1
}

if ($form_values{'sensecase'} eq 'usecase' ) {
@foo3 = grep(/$form_values{'a_SEARCH_VALUE'}/,@foo2);
} else {
@foo3 = grep(/$form_values{'a_SEARCH_VALUE'}/i,@foo2);
}

$hits = 1+$#foo3;
print ("<center>This search ( $form_values{'a_SEARCH_VALUE'} ) found $hits
books.</center><p>\n");

if ($form_values{'datetype'} eq 'human' ) {
#only load $months once if required
$months[1] = 'January';
$months[2] = 'February';
$months[3] = 'March';
$months[4] = 'April';
$months[5] = 'May';
$months[6] = 'June';
$months[7] = 'July';
$months[8] = 'August';
$months[9] = 'September';
$months[10] = 'October';
$months[11] = 'November';
$months[12] = 'December';
}

print("<center><table border = 1 cell padding = 1>\n");
print("<th>Title of Book</th><th>Pub. Year</th><th>Date Read</th>\n");
print("<th colspan=\"3\">Authors</th><th colspan=\"3\">Subjects</th>\n");
print("<tr></tr><tr></tr>\n");
foreach $GREPPED (@foo3) {
print("<tr>");
print("<td>",substr($GREPPED,0,40),"</td>\n");
print("<td>",substr($GREPPED,40,4),"</td>");

# print("<td>",substr($GREPPED,44,6),"</td>");
print("<td>".nicedate(substr($GREPPED,44,6))."</td>");

# From here store the authors and get names if required
$auth1 = substr($GREPPED,50,4);
$auth2 = substr($GREPPED,54,4);
$auth3 = substr($GREPPED,58,4);

if ( $form_values{'numorname'} eq 'authnames' ) {
print("<td>".substr($authlist[$auth1],0,34)."<br></td>");
print("<td>".substr($authlist[$auth2],0,34)."<br></td>");
print("<td>".substr($authlist[$auth3],0,34)."<br></td>\n");
} else { if ( $form_values{'numorname'} eq 'both' ) {
print("<td>".substr($authlist[$auth1],0,34)."( $auth1 )<br></td>");
print("<td>".substr($authlist[$auth2],0,34)."( $auth2 )<br></td>");
print("<td>".substr($authlist[$auth3],0,34)."( $auth3 )<br></td>\n");
} else {
print("<td>$auth1<br></td>");
print("<td>$auth2<br></td>");
print("<td>$auth3<br></td>\n");
}}

# print("<td>",substr($GREPPED,50,4),"<br></td>");
# print("<td>",substr($GREPPED,54,4),"<br></td>");
# print("<td>",substr($GREPPED,58,4),"<br></td>\n");

print("<td>",substr($GREPPED,62,2),"<br></td>");
print("<td>",substr($GREPPED,64,2),"<br></td>");
print("<td>",substr($GREPPED,66,2),"<br></td>");
print("</tr>\n\n");
}
print("</table></center>\n");
} else {
print ("Nothing to Search For<br>\n");
}
print '<SCRIPT language="JavaScript">';
print "\n";
print 'threefooter(\'Return
to<br>shanen\\\'s<br>Home\',\'/index.html\',\'procbook\');';
print "\n";
print '</SCRIPT>';
print "\n";

# print "<hr size=8>";
# print "<center><P>End of Search Results<P><br>\nClick to go back to my
Index page </center>\n";
# print "<P><center>";

# print "<A http=\"/index.html\"><img src=\"/images/home.gif\"
border=0></a>";
# print "</center><P>";
# print "<hr size=8><P>";

print "</BODY></HTML>";

exit(0);

sub ignore3 {
# local ($acut,$bcut);
$acut = 0;
$bcut = 0;
if (index($a,'A ') == 0) {$acut = 2}
if (index($a,'An ') == 0) {$acut = 3}
if (index($a,'The ') == 0) {$acut = 4}
if (index($b,'A ') == 0) {$bcut = 2}
if (index($b,'An ') == 0) {$bcut = 3}
if (index($b,'The ') == 0) {$bcut = 4}
#print ("$a1 cmp $b1");
substr($a,$acut) cmp substr($b,$bcut)
# $a cmp $b
}

sub nicedate {
if ($form_values{'datetype'} eq 'human' ) {
if (substr(@_[0],4,1) eq '0') {
$foo = $months[substr(@_[0],2,2)].'
'.substr(@_[0],5,1)
} else {
$foo = $months[substr(@_[0],2,2)].'
'.substr(@_[0],4,2)
};

# Y2K window added to 1960-2059, slj 3/20/2000 including above #s
if (substr(@_[0],0,1) gt '5') {
$foo.', 19'.substr(@_[0],0,2)
} else {
$foo.', 20'.substr(@_[0],0,2)
}
# Yeah, I know it isn't pretty but I haven't touched PERL in a LONG time...

} else {
if ($form_values{'datetype'} eq 'usa' ) {
substr(@_[0],2,2).'/'.substr(@_[0],4,2).'/'.substr(@_[0],0,2)
} else {
@_[0]
}
}
}

sub html_parse
{
local($line, $length, $offset, @pairs);

# if ($ENV{"REQUEST_METHOD"} eq "GET") {
# $line = $ENV{"QUERY_STRING"};
# }
# elsif ($ENV{"REQUEST_METHOD"} eq "POST") {
#
# ($ENV{"CONTENT_TYPE"} ne "application/x-www-form-#urlencoded") &&
# &html_fatal("Illegal Content-Type '" . $ENV
# $length = $ENV{"CONTENT_LENGTH"};
# (($length =~ m/^\d+$/) == 0) &&
# &html_fatal("Content-Length variable not found");
#
# (read(STDIN, $line, $length) == $length) ||
# &html_fatal("Could not read form info from stdin: " .
#$length);
# }
# else {
# &html_fatal("Illegal Request-Method '" . $ENV
#{"REQUEST_METHOD"} . "'");
# }
# Probably not allowed to check ENV so just take the query string?
# Just use a GET and the one line?
$line = $ENV{"QUERY_STRING"};

((!defined($line)) || (length($line) == 0)) &&
&html_fatal("No values passed in");

@pairs = split(/[=&]/, $line);
(substr($line, $length - 1, 1) eq "=") &&
($pairs[@pairs] = "");
$offset = 0;
foreach $value (@pairs) {
($offset++ & 1) &&
&html_unescape($value);
}
return @pairs;
}

sub html_unescape
{
local($offset);

$_[0] =~ s/\+/ /g;
$offset = -1;
while (($offset = index($_[0], "%", $offset + 1)) >= 0) {
substr($_[0], $offset, 3) =
pack("c", hex(substr($_[0], $offset + 1, 2)));
}
}


sub html_fatal
{
print "<H3>Error Detected</H3><P>";
print @_[0], "\n";
exit 0;
}
 
J

John Bokma

Shannon said:
For the sake of

our sanity, first many tips:

#!/usr/local/bin/perl

CGI, so add -T to the end of that line

and next:

use strict;
use warnings;

print '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN">'; print "Content-type: text/html\n\n";

Ouch! You should print the header first...

And since it's CGI:

use CGI::Carp qw(fatalsToBrowser);
use CGI;

my $cgi = new CGI;

print $cgi->header;

next: here docs, instead of:
print "<HTML><HEAD><TITLE>Search Result</TITLE>\n"; :
:

print "<center><h3>BookList Search Form Result</H3></center>\n";
print "<hr size=8>";

print <<HTML;
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>....
:
:
<hr size="8">
HTML

%form_values = &html_parse;

Don't use & in front of a sub, unless you need it's special effects.

throw away that CGI for Dummies (1996)
if ( $form_values{'a_SEARCH_VALUE'} ) {
open(DATAFILE,'titles.txt');

or die
$i = 0;
while (<DATAFILE>) { $foo0[$i++] = $_; }
close(DATAFILE);

or die
open(AUTHFILE,'authors.txt');

or die
$i = 1;
while (<AUTHFILE>) { $authlist[$i++] = $_; }
close(AUTHFILE);

or die
$hits = 1+$#foo3;

$hits = @foo3;

use clear names.
$months[1] = 'January';
:

$months[12] = 'December';


aargh:

my @months = qw( January February ...... December );

if you really need 1 based, put a dummy before January
foreach $GREPPED (@foo3) {

Don't use ALLCAPS for non-constant scalars

[ snip ]

A *lot* of stuff that could be made more readable with here docs.
# Yeah, I know it isn't pretty but I haven't touched PERL in a LONG
time...

10 years?
sub html_parse

Dump that one
sub html_fatal

With fatals to browser you can just use die
 
S

Shannon Jacobs

John said:
our sanity, first many tips:

Well, thanks, and you show some of the third virtue, hubris, but I'm not
sure how many of these tips apply to the environment of Tripod. To call
Tripod's CGI/Perl interface "twisted" is a charitable description, but
"lobotomized" is probably more accurate. I'm pretty sure I tried a number of
these things, especially in your early recommendations. I'm not even sure
the here docs worked properly in that environment, though it has also been a
long time since I tried those experiments. (That was probably during the
time when I was running the same code on two or three servers, and Tripod
was definitely the LCD. The Tripod version only survived because it's a
reasonably stable system, and because it's hard to argue with the price.) It
would be nice if you would have included a bit of explanation for some of
the less obvious parts. For example, I think there's something non-obvious
about the "or die", since I already knew about it, and I'm reasonably sure
it was used in the original code. There must have been some reason I took it
out here...

<snipping to conserve resources in the reply>
 
J

John Bokma

Shannon said:
Well, thanks, and you show some of the third virtue, hubris,

:-D. Yeah, while on the other news people say I have a low self esteem,
so yup.
but I'm
not sure how many of these tips apply to the environment of Tripod.

Me neither, but lets see:
To
call Tripod's CGI/Perl interface "twisted" is a charitable
description, but "lobotomized" is probably more accurate.

You can try a mini-script, and see what it does:

#!/usr/bin/perl -T

use strict;
use warnings;

use CGI::Carp qw(fatalsToBrowser);
use CGI;

my $cgi = new CGI;
print $cgi->header( 'text/plain' ),
<<"HELLO";
Hello, World!
HELLO

If they don't have CGI.pm installed, which I doubt, their interface is
not that twisted, just incomplete.
I'm pretty
sure I tried a number of these things, especially in your early
recommendations. I'm not even sure the here docs worked properly in
that environment,

I think here docs are in Perl4, no idea about 3, but have *nothing* to
do with CGI.
though it has also been a long time since I tried
those experiments. (That was probably during the time when I was
running the same code on two or three servers, and Tripod was
definitely the LCD. The Tripod version only survived because it's a
reasonably stable system, and because it's hard to argue with the
price.)

I have no idea about the price, but I have webspace which doesn't criple
me, Perl, or CGI at 1 USD a month (excluding domain registration, all in
it comes down to 19.95 USD a year!)
It would be nice if you would have included a bit of
explanation for some of the less obvious parts.

That would come down to a Perl course for beginners :)
For example, I think
there's something non-obvious about the "or die", since I already knew
about it, and I'm reasonably sure it was used in the original code.
There must have been some reason I took it out here...

Your script died? Which means that the open ( or close, etc ) failed.

If you output the value of $! you would know the reason.

Programming is not cooking, even though there are cookbooks. It's not:
ah, it tastes not nice, lets add some salt, and remove some of the
vegetables.

Programming is planning ahead what you want, then shop for the
ingredients, and the result *will* taste good, since you planned and
bought the right ingredients, and used the right amounts.

And not blaming the color of the egg plant, because the result tastes
too salty :)
 
B

Brian McCauley

Jim said:
If you really want to ignore the first 50 columns and ignore the last
6, you should anchor your pattern:

@foo = grep(/^.{50}$theString.{6}$/,@foo);

Actually sinde the OP stated fixed length records you can ignore the
last 6 by, er, simply ignoring them.

@foo = grep(/^.{50}\Q$theString/,@foo);

(Note: \Q inserted as per other branch of this thread as $theString is a
target string not a target regex).
 
A

Anno Siegel

Brian McCauley said:
Actually sinde the OP stated fixed length records you can ignore the
last 6 by, er, simply ignoring them.

@foo = grep(/^.{50}\Q$theString/,@foo);

(Note: \Q inserted as per other branch of this thread as $theString is a
target string not a target regex).

Another way to skip the first 50 (untested):

@foo = grep /\G\Q$theString/, map pos = 50, @foo;

It's longer, but it may be faster (or not) if efficiency matters.

Anno
 
J

John W. Krahn

Anno said:
Another way to skip the first 50 (untested):

@foo = grep /\G\Q$theString/, map pos = 50, @foo;

It's longer, but it may be faster (or not) if efficiency matters.

That won't work because $_ is localized by map() and grep() so the $_ that
map() modifies is not the same $_ that grep() uses.


John
 
A

Anno Siegel

John W. Krahn said:
That won't work because $_ is localized by map() and grep() so the $_ that
map() modifies is not the same $_ that grep() uses.

There's more than that wrong with it. For one, "map pos = 50, @foo"
returns as many 50s as @foo has elements, not the elements of @foo.

Rewriting it

@foo = grep /\G\Q$theString/, map { pos = 50; $_ } @foo;

still suffers from the fact that the elements that map() returns don't
have their pos() set. I admit that I don't quite understand that.
$_ in map should be an alias to the current element of @foo, and
grep should see an alias to that, with its position set.

This works as intended:

pos = 50 for @foo;
@foo = grep /\G\Q$theString/, @foo;

Anno
 
D

Damian James

...
Rewriting it

@foo = grep /\G\Q$theString/, map { pos = 50; $_ } @foo;

still suffers from the fact that the elements that map() returns don't
have their pos() set. I admit that I don't quite understand that.

Me either from my reading of the docs.
$_ in map should be an alias to the current element of @foo, and
grep should see an alias to that, with its position set.

This works as intended:

pos = 50 for @foo;
@foo = grep /\G\Q$theString/, @foo;

Curious. I would find the following more obvious about what it does:

@foo = grep { pos = 50; /\G\Q$theString/ } @foo;

I don't really understand where the position is being kept in your
version (or rather, how map() knows about it).

--damian
 
D

Damian James

...
Rewriting it

@foo = grep /\G\Q$theString/, map { pos = 50; $_ } @foo;

still suffers from the fact that the elements that map() returns don't
have their pos() set. I admit that I don't quite understand that.

Me either from my reading of the docs.
$_ in map should be an alias to the current element of @foo, and
grep should see an alias to that, with its position set.

This works as intended:

pos = 50 for @foo;
@foo = grep /\G\Q$theString/, @foo;

Curious. I would find the following more obvious about what it does:

@foo = grep { pos = 50; /\G\Q$theString/ } @foo;

I don't really understand where the position is being kept in your
version (or rather, how grep() knows about it).

--damian
 
A

Anno Siegel

Damian James said:
Me either from my reading of the docs.


Curious. I would find the following more obvious about what it does:

@foo = grep { pos = 50; /\G\Q$theString/ } @foo;

Of course! Much better.
I don't really understand where the position is being kept in your
version (or rather, how grep() knows about it).

The position is a property of every string and lives as long as
the string lives. Once the position is set for all strings in @foo,
the regex in grep() can use it.

The position is fragile in that it isn't copied with the string:

$x = '123';
pos $x = 2;
$y = $x;

leaves $x with a position of 2, but $y has undefined position.

Anno
 
S

Shannon Jacobs

Anno said:
Of course! Much better.


The position is a property of every string and lives as long as
the string lives. Once the position is set for all strings in @foo,
the regex in grep() can use it.

The position is fragile in that it isn't copied with the string:

$x = '123';
pos $x = 2;
$y = $x;

leaves $x with a position of 2, but $y has undefined position.

Anno

Thanks very much. This has been a typically enlightening discussion. I very
much like that aspect of Perl, though programming in it still makes me feel
a bit queasy sometimes.

(And I still think I should replace the entire kludgy program, but no one
offered any pointers in that direction. I suppose part of what is bothering
me about the approach is the fundamental distortion as you pass through the
CGI--on one side you are thinking in HTML trivialities, but on the other
side you have Perl, where you must think abstractly in a way that is
anything but trivial.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,882
Messages
2,569,948
Members
46,267
Latest member
TECHSCORE

Latest Threads

Top