you *must* be able to shorten this...

G

Ghee

Hello,

I've written a short script to read in a form with HTML-style tags
(form is postscript, can be 0.5MB to 20MB+, tags are alphanumerical
wiht - or _ in), and substitute values into these tags from a separate
data file (pipe delimited, ie data|value), writing out the populated
form as a new file.

The tags will appear in the form enclosed in angled brackets, within a
postscript "text object", which is delimited by rounde brackets - eg:

.........(....<TAG1>...<TAG2>....)......

The following script works, but I've got the feeling that it's not
very efficient. So, can any of you suggest improvements over my
perl-newbie script?

#!/usr/bin/perl

#open output file, tag data, and the form
open (OUTPUT,">OUTPUT.ps") || die "can't open output file";
open (DATA,"DATA.PDF") || die "can't open data file";
open (FORM,"FORM1.ps") || die "can't open form file";

#Read in pipe delimited file (PDF), put data in associative array
@data=<DATA>;
foreach $item (@data) {
@items=split /\|/,$item;
$variable=$items[0];
$value=$items[1];
$$variable=$value; chop $value;
$array{$variable}=$value;
};

#read through form line by line, identifying any tags and replacing
#values from associative array as neccesary

while(<FORM>) {
$stuff = $_;
$done_stuff = "";
while ($stuff =~ /<[0-9a-zA-Z-_]+>/) {
$done_stuff = $done_stuff . $`; #write out pre match
$stuff = $'; #move post match to $stuff to loop again...
$tag_name = $&;
$tag_name =~ s/<|>//g; #take off < and >
if (exists $array{$tag_name}) { #replace if exists in array
$done_stuff = $done_stuff . $array{$tag_name};
} else {
$done_stuff = $done_stuff . "<$tag_name>";
}
}
$done_stuff = $done_stuff . $stuff;
print OUTPUT $done_stuff;
};

close (FORM);
close (DATA);
close (OUTPUT);


Regards,

Ghee
 
J

Jeff 'japhy' Pinyan

[posted & mailed]

I've written a short script to read in a form with HTML-style tags
(form is postscript, can be 0.5MB to 20MB+, tags are alphanumerical
wiht - or _ in), and substitute values into these tags from a separate
data file (pipe delimited, ie data|value), writing out the populated
form as a new file.
#!/usr/bin/perl

I'd suggest

use strict;
use warnings;

The 'strict' will require you to declare your variables.
#open output file, tag data, and the form
open (OUTPUT,">OUTPUT.ps") || die "can't open output file";
open (DATA,"DATA.PDF") || die "can't open data file";

"PDF" is a common Adobe file suffix. It's awfully misleading here.
open (FORM,"FORM1.ps") || die "can't open form file";

It's helpful to include $! in your error messages; it explains why the
system couldn't do what you asked.
#Read in pipe delimited file (PDF), put data in associative array
@data=<DATA>;
foreach $item (@data) {

There is no reason to read a filehandle into an array, just to iterate
over the array. Use a while loop, like you did for FORM.
@items=split /\|/,$item;
$variable=$items[0];
$value=$items[1];

Why use @items? Why not store the values directly into the variables you
want to use?
$$variable=$value; chop $value;

EWW! NO! Not allowed. Do not create a variable out of thin air like
that. If $foo is a string like "name", do NOT create $name by saying

$$foo = "Jeff";

That is evil. It's a symbolic reference. You shouldn't use them.l
$array{$variable}=$value;

.... especially if you're going to store the relationship in a hash anyway!
And %array is a rather poor name for a hash.

my %vars;

while (<DATA>) {
chomp; # remove the newline; chomp() is safer than chop()
my ($varname, $value) = split /\|/;
$vars{$varname} = $value;
}
#read through form line by line, identifying any tags and replacing
#values from associative array as neccesary

while(<FORM>) {
$stuff = $_;

There's no reason not to use $_ in this loop. And if you don't want to
use $_, then why not say

while (my $stuff = said:
$done_stuff = "";
while ($stuff =~ /<[0-9a-zA-Z-_]+>/) {
$done_stuff = $done_stuff . $`; #write out pre match
$stuff = $'; #move post match to $stuff to loop again...

Don't use $`, $&, and $'. They're bad variables. They slow things down,
and there's almost always a way around them. I have a feeling you want to
use a substitution here.
$tag_name = $&;
$tag_name =~ s/<|>//g; #take off < and >
if (exists $array{$tag_name}) { #replace if exists in array
$done_stuff = $done_stuff . $array{$tag_name};
} else {
$done_stuff = $done_stuff . "<$tag_name>";
}
}
$done_stuff = $done_stuff . $stuff;
print OUTPUT $done_stuff;
};

Definitely. You definitely want a substitution.

while (<FORM>) {
s{
< ([-\w]+) > # the ()'s capturing what they match to $1
}{
if (exists $vars{$1}) { return $vars{$1} }
else { return "<$1>" }
}xge;
# s{}{} is just another form of s///, and we're operating on $_
# the /x modifier means the regex has extra whitespace and comments
# the /g modifier means "do this globally for each match"
# the /e modifier means the replacement is a block of code to evaluate

print OUTPUT; # $_ is the default thing to print
}
 
G

Ghee

Jeff said:
[posted & mailed]

I've written a short script to read in a form with HTML-style tags
(form is postscript, can be 0.5MB to 20MB+, tags are alphanumerical
wiht - or _ in), and substitute values into these tags from a separate
data file (pipe delimited, ie data|value), writing out the populated
form as a new file.

#!/usr/bin/perl


I'd suggest

use strict;
use warnings;

The 'strict' will require you to declare your variables.

#open output file, tag data, and the form
open (OUTPUT,">OUTPUT.ps") || die "can't open output file";
open (DATA,"DATA.PDF") || die "can't open data file";


"PDF" is a common Adobe file suffix. It's awfully misleading here.

open (FORM,"FORM1.ps") || die "can't open form file";


It's helpful to include $! in your error messages; it explains why the
system couldn't do what you asked.

#Read in pipe delimited file (PDF), put data in associative array
@data=<DATA>;
foreach $item (@data) {


There is no reason to read a filehandle into an array, just to iterate
over the array. Use a while loop, like you did for FORM.

@items=split /\|/,$item;
$variable=$items[0];
$value=$items[1];


Why use @items? Why not store the values directly into the variables you
want to use?

$$variable=$value; chop $value;


EWW! NO! Not allowed. Do not create a variable out of thin air like
that. If $foo is a string like "name", do NOT create $name by saying

$$foo = "Jeff";

That is evil. It's a symbolic reference. You shouldn't use them.l

$array{$variable}=$value;


... especially if you're going to store the relationship in a hash anyway!
And %array is a rather poor name for a hash.



my %vars;

while (<DATA>) {
chomp; # remove the newline; chomp() is safer than chop()
my ($varname, $value) = split /\|/;
$vars{$varname} = $value;
}

#read through form line by line, identifying any tags and replacing
#values from associative array as neccesary

while(<FORM>) {
$stuff = $_;


There's no reason not to use $_ in this loop. And if you don't want to
use $_, then why not say

$done_stuff = "";
while ($stuff =~ /<[0-9a-zA-Z-_]+>/) {
$done_stuff = $done_stuff . $`; #write out pre match
$stuff = $'; #move post match to $stuff to loop again...


Don't use $`, $&, and $'. They're bad variables. They slow things down,
and there's almost always a way around them. I have a feeling you want to
use a substitution here.

$tag_name = $&;
$tag_name =~ s/<|>//g; #take off < and >
if (exists $array{$tag_name}) { #replace if exists in array
$done_stuff = $done_stuff . $array{$tag_name};
} else {
$done_stuff = $done_stuff . "<$tag_name>";
}
}
$done_stuff = $done_stuff . $stuff;
print OUTPUT $done_stuff;
};


Definitely. You definitely want a substitution.

while (<FORM>) {
s{
< ([-\w]+) > # the ()'s capturing what they match to $1
}{
if (exists $vars{$1}) { return $vars{$1} }
else { return "<$1>" }
}xge;
# s{}{} is just another form of s///, and we're operating on $_
# the /x modifier means the regex has extra whitespace and comments
# the /g modifier means "do this globally for each match"
# the /e modifier means the replacement is a block of code to evaluate

print OUTPUT; # $_ is the default thing to print
}

close (FORM);
close (DATA);
close (OUTPUT);

Jeff,

Thanks for the suggestions - they sparked off two days of great
discussion here (no-one knew about putting code in a regexp with the /e
option). Sorry for the delay in replying...

Ghee
 
U

Uri Guttman

<snip of massive quote>

at least you didn't top post but you should have edited out most/all of
the quote as you didn't comment directly on anything there.

G> Thanks for the suggestions - they sparked off two days of great
G> discussion here (no-one knew about putting code in a regexp with the
G> /e option). Sorry for the delay in replying...

that is putting code in the replacement side of s///. s/// is NOT a
regex but the substitution operator. the left part of it is a regex and
the right part is a double quotish string or code when using the /e
modifier. it helps to understand that properly when discussing it.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top