you must be able to shorten this...

Ghee · Jul 28, 2003

Hello,

I've written a short script to read in a form with HTML-style tags
(form is postscript, can be 0.5MB to 20MB+, tags are alphanumerical
wiht - or _ in), and substitute values into these tags from a separate
data file (pipe delimited, ie data|value), writing out the populated
form as a new file.

The tags will appear in the form enclosed in angled brackets, within a
postscript "text object", which is delimited by rounde brackets - eg:

.........(....<TAG1>...<TAG2>....)......

The following script works, but I've got the feeling that it's not
very efficient. So, can any of you suggest improvements over my
perl-newbie script?

#!/usr/bin/perl

#open output file, tag data, and the form
open (OUTPUT,">OUTPUT.ps") || die "can't open output file";
open (DATA,"DATA.PDF") || die "can't open data file";
open (FORM,"FORM1.ps") || die "can't open form file";

#Read in pipe delimited file (PDF), put data in associative array
@data=<DATA>;
foreach $item (@data) {
@items=split /\|/,$item;
$variable=$items[0];
$value=$items[1];
$$variable=$value; chop $value;
$array{$variable}=$value;
};

#read through form line by line, identifying any tags and replacing
#values from associative array as neccesary

while(<FORM>) {
$stuff = $_;
$done_stuff = "";
while ($stuff =~ /<[0-9a-zA-Z-_]+>/) {
$done_stuff = $done_stuff . $`; #write out pre match
$stuff = $'; #move post match to $stuff to loop again...
$tag_name = $&;
$tag_name =~ s/<|>//g; #take off < and >
if (exists $array{$tag_name}) { #replace if exists in array
$done_stuff = $done_stuff . $array{$tag_name};
} else {
$done_stuff = $done_stuff . "<$tag_name>";
}
}
$done_stuff = $done_stuff . $stuff;
print OUTPUT $done_stuff;
};

close (FORM);
close (DATA);
close (OUTPUT);

Regards,

Ghee

Jeff 'japhy' Pinyan · Jul 28, 2003

[posted & mailed]

I've written a short script to read in a form with HTML-style tags
(form is postscript, can be 0.5MB to 20MB+, tags are alphanumerical
wiht - or _ in), and substitute values into these tags from a separate
data file (pipe delimited, ie data|value), writing out the populated
form as a new file.

#!/usr/bin/perl

I'd suggest

use strict;
use warnings;

The 'strict' will require you to declare your variables.

#open output file, tag data, and the form
open (OUTPUT,">OUTPUT.ps") || die "can't open output file";
open (DATA,"DATA.PDF") || die "can't open data file";

"PDF" is a common Adobe file suffix. It's awfully misleading here.

open (FORM,"FORM1.ps") || die "can't open form file";

It's helpful to include $! in your error messages; it explains why the
system couldn't do what you asked.

#Read in pipe delimited file (PDF), put data in associative array
@data=<DATA>;
foreach $item (@data) {

There is no reason to read a filehandle into an array, just to iterate
over the array. Use a while loop, like you did for FORM.

@items=split /\|/,$item;
$variable=$items[0];
$value=$items[1];

Why use @items? Why not store the values directly into the variables you
want to use?

$$variable=$value; chop $value;

EWW! NO! Not allowed. Do not create a variable out of thin air like
that. If $foo is a string like "name", do NOT create $name by saying

$$foo = "Jeff";

That is evil. It's a symbolic reference. You shouldn't use them.l

$array{$variable}=$value;

.... especially if you're going to store the relationship in a hash anyway!
And %array is a rather poor name for a hash.

};

my %vars;

while (<DATA>) {
chomp; # remove the newline; chomp() is safer than chop()
my ($varname, $value) = split /\|/;
$vars{$varname} = $value;
}

#read through form line by line, identifying any tags and replacing
#values from associative array as neccesary

while(<FORM>) {
$stuff = $_;

There's no reason not to use $_ in this loop. And if you don't want to
use $_, then why not say

while (my $stuff = said:
$done_stuff = "";
while ($stuff =~ /<[0-9a-zA-Z-_]+>/) {
$done_stuff = $done_stuff . $`; #write out pre match
$stuff = $'; #move post match to $stuff to loop again...

Don't use $`, $&, and $'. They're bad variables. They slow things down,
and there's almost always a way around them. I have a feeling you want to
use a substitution here.

$tag_name = $&;
$tag_name =~ s/<|>//g; #take off < and >
if (exists $array{$tag_name}) { #replace if exists in array
$done_stuff = $done_stuff . $array{$tag_name};
} else {
$done_stuff = $done_stuff . "<$tag_name>";
}
}
$done_stuff = $done_stuff . $stuff;
print OUTPUT $done_stuff;
};

Definitely. You definitely want a substitution.

while (<FORM>) {
s{
< ([-\w]+) > # the ()'s capturing what they match to $1
}{
if (exists $vars{$1}) { return $vars{$1} }
else { return "<$1>" }
}xge;
# s{}{} is just another form of s///, and we're operating on $_
# the /x modifier means the regex has extra whitespace and comments
# the /g modifier means "do this globally for each match"
# the /e modifier means the replacement is a block of code to evaluate

print OUTPUT; # $_ is the default thing to print
}

Ghee · Aug 19, 2003

Jeff said:
[posted & mailed]

I've written a short script to read in a form with HTML-style tags
(form is postscript, can be 0.5MB to 20MB+, tags are alphanumerical
wiht - or _ in), and substitute values into these tags from a separate
data file (pipe delimited, ie data|value), writing out the populated
form as a new file.

#!/usr/bin/perl

Click to expand...

I'd suggest

use strict;
use warnings;

The 'strict' will require you to declare your variables.

#open output file, tag data, and the form
open (OUTPUT,">OUTPUT.ps") || die "can't open output file";
open (DATA,"DATA.PDF") || die "can't open data file";

Click to expand...

"PDF" is a common Adobe file suffix. It's awfully misleading here.

open (FORM,"FORM1.ps") || die "can't open form file";

Click to expand...

It's helpful to include $! in your error messages; it explains why the
system couldn't do what you asked.

#Read in pipe delimited file (PDF), put data in associative array
@data=<DATA>;
foreach $item (@data) {

Click to expand...

There is no reason to read a filehandle into an array, just to iterate
over the array. Use a while loop, like you did for FORM.

@items=split /\|/,$item;
$variable=$items[0];
$value=$items[1];

Click to expand...

Why use @items? Why not store the values directly into the variables you
want to use?

$$variable=$value; chop $value;

Click to expand...

EWW! NO! Not allowed. Do not create a variable out of thin air like
that. If $foo is a string like "name", do NOT create $name by saying

$$foo = "Jeff";

That is evil. It's a symbolic reference. You shouldn't use them.l

$array{$variable}=$value;

Click to expand...

... especially if you're going to store the relationship in a hash anyway!
And %array is a rather poor name for a hash.

};

Click to expand...

my %vars;

while (<DATA>) {
chomp; # remove the newline; chomp() is safer than chop()
my ($varname, $value) = split /\|/;
$vars{$varname} = $value;
}

#read through form line by line, identifying any tags and replacing
#values from associative array as neccesary

while(<FORM>) {
$stuff = $_;

Click to expand...

There's no reason not to use $_ in this loop. And if you don't want to
use $_, then why not say

$done_stuff = "";
while ($stuff =~ /<[0-9a-zA-Z-_]+>/) {
$done_stuff = $done_stuff . $`; #write out pre match
$stuff = $'; #move post match to $stuff to loop again...

Click to expand...

Don't use $`, $&, and $'. They're bad variables. They slow things down,
and there's almost always a way around them. I have a feeling you want to
use a substitution here.

$tag_name = $&;
$tag_name =~ s/<|>//g; #take off < and >
if (exists $array{$tag_name}) { #replace if exists in array
$done_stuff = $done_stuff . $array{$tag_name};
} else {
$done_stuff = $done_stuff . "<$tag_name>";
}
}
$done_stuff = $done_stuff . $stuff;
print OUTPUT $done_stuff;
};

Click to expand...

Definitely. You definitely want a substitution.

while (<FORM>) {
s{
< ([-\w]+) > # the ()'s capturing what they match to $1
}{
if (exists $vars{$1}) { return $vars{$1} }
else { return "<$1>" }
}xge;
# s{}{} is just another form of s///, and we're operating on $_
# the /x modifier means the regex has extra whitespace and comments
# the /g modifier means "do this globally for each match"
# the /e modifier means the replacement is a block of code to evaluate

print OUTPUT; # $_ is the default thing to print
}

close (FORM);
close (DATA);
close (OUTPUT);

Click to expand...

Jeff,

Thanks for the suggestions - they sparked off two days of great
discussion here (no-one knew about putting code in a regexp with the /e
option). Sorry for the delay in replying...

Ghee

Uri Guttman · Aug 19, 2003

<snip of massive quote>

at least you didn't top post but you should have edited out most/all of
the quote as you didn't comment directly on anything there.

G> Thanks for the suggestions - they sparked off two days of great
G> discussion here (no-one knew about putting code in a regexp with the
G> /e option). Sorry for the delay in replying...

that is putting code in the replacement side of s///. s/// is NOT a
regex but the substitution operator. the left part of it is a regex and
the right part is a double quotish string or code when using the /e
modifier. it helps to understand that properly when discussing it.

uri

Why is this WordPress comments form not submitting?	1	Jan 12, 2020
PHP RSS Feed Aggregator changing to todays date everytime feed is aggregated	1	Jan 10, 2022
Looking for someone to take alook at this code and help	2	Mar 9, 2023
Survey details won't go through using php, ajax, Mysql	3	Oct 25, 2023
how to make a tree with randomly selected html tags from an array in python?	0	Mar 10, 2013
Flushing and multiple pipes	2	Jul 20, 2012
How to keep the script from stopping or hanging	5	Apr 7, 2012
Can this script be used to spam?	3	Nov 24, 2003

you must be able to shorten this...

Ghee

Jeff 'japhy' Pinyan

Ghee

Uri Guttman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads

you *must* be able to shorten this...

Ghee

Jeff 'japhy' Pinyan

Ghee

Uri Guttman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads

you must be able to shorten this...