C
ccc31807
During the discussion of the 9-11 mosque in NYC, several commentators
mentioned Milestones
by Sayed Qutb. I decided to read it to see that the fuss was about,
and ended up with an ASCII text copy generated from a PDF original.
I could have printed the text directly, but it was pretty mangled, and
after attempting and failing to reformat the document using vi, I
decided to write a simple Perl script to reformat it. I wanted to do
several things, join paragraphs together (every line in the file was
terminated by a "\n"), separate paragraphs by a blank line (block
style), remove repeated periods (dots), remove form feeds (which
marked pages in the original), etc.
I first attempted to munge the file in place, like this:
#FIRST ATTEMPT
open MS, '<', $file;
open OUT, '>', $out;
while (<MS>)
{
#do stuff
print OUT;
}
close MS;
close OUT;
It mostly worked, but I couldn't fine tune it. I then attempted to
munge two lines together, like this:
#SECOND ATTEMPT
open MS, '<', $file;
open OUT, '>', $out;
$line1 = <MS>;
while (<MS>)
{
$line2 = $_;
#do stuff
print OUT;
$line 2 = $line1;
}
close MS;
close OUT;
This worked a little better, but it wasn't perfect. I then tried this
and got perfect formatting:
#THIRD ATTEMPT
{
local $/ = undef;
open MS, '<', $file;
$document = <MS>;
close MS;
}
#series of transformations like this
$document =~ s/\r//;
open OUT, '>', $out;
print OUT $document;
close OUT;
All of the work I have done in the past has munged the lines one by
one, as in the first example. Occasionally, I have had to use the
second style (e.g., where the formatting of each line depends on the
content of the preceding line.) I've never used the third style at
all.
I liked the third way a lot. It seemed quick, easy, and worked
perfectly. I was actually able to open the resulting document in Word,
fancify it a little, and print a nice finished copy. However, I can't
think of any actual uses of the third style in my day to day work.
My question is this: Is the third attempt, slurping the entire
document into memory and transforming the text by regexs, very common,
or is it considered a last resort when nothing else would work?
CC.
mentioned Milestones
by Sayed Qutb. I decided to read it to see that the fuss was about,
and ended up with an ASCII text copy generated from a PDF original.
I could have printed the text directly, but it was pretty mangled, and
after attempting and failing to reformat the document using vi, I
decided to write a simple Perl script to reformat it. I wanted to do
several things, join paragraphs together (every line in the file was
terminated by a "\n"), separate paragraphs by a blank line (block
style), remove repeated periods (dots), remove form feeds (which
marked pages in the original), etc.
I first attempted to munge the file in place, like this:
#FIRST ATTEMPT
open MS, '<', $file;
open OUT, '>', $out;
while (<MS>)
{
#do stuff
print OUT;
}
close MS;
close OUT;
It mostly worked, but I couldn't fine tune it. I then attempted to
munge two lines together, like this:
#SECOND ATTEMPT
open MS, '<', $file;
open OUT, '>', $out;
$line1 = <MS>;
while (<MS>)
{
$line2 = $_;
#do stuff
print OUT;
$line 2 = $line1;
}
close MS;
close OUT;
This worked a little better, but it wasn't perfect. I then tried this
and got perfect formatting:
#THIRD ATTEMPT
{
local $/ = undef;
open MS, '<', $file;
$document = <MS>;
close MS;
}
#series of transformations like this
$document =~ s/\r//;
open OUT, '>', $out;
print OUT $document;
close OUT;
All of the work I have done in the past has munged the lines one by
one, as in the first example. Occasionally, I have had to use the
second style (e.g., where the formatting of each line depends on the
content of the preceding line.) I've never used the third style at
all.
I liked the third way a lot. It seemed quick, easy, and worked
perfectly. I was actually able to open the resulting document in Word,
fancify it a little, and print a nice finished copy. However, I can't
think of any actual uses of the third style in my day to day work.
My question is this: Is the third attempt, slurping the entire
document into memory and transforming the text by regexs, very common,
or is it considered a last resort when nothing else would work?
CC.