C
cyborg
When I was starting to learn regexes in Perl (2 days ago), I picked up
some books and some websites and read a bunch. When I though I was
ready to go, I realized none of those sources taught me how to actually
write a Perl program from start to end that would open the file I
wanted to parse and save the parsing results to a second file. That was
a bummer.
Bla bla bla etc etc etc all those boring stuff everyone hates to read
about other people's life bla bla bla.
Okay, finally I have created a template for my regexes to parse a file,
save results to another file, and have its matches work OVER MULTIPLE
LINES. I know this is far from exciting for you perl hackers, but do
realize that the books I've read don't teach this. (I've got ADHD so if
they do and I'm just a poor reader, nevermind that statement).
Also, please understand that when I say "i have created" I mean "I,
with the help of loads of other people's work and some people's help"
(because let's face it, it's not that big of a file to need help from
loads of people). Of course I don't want credit for this, what I do
want is help. Everything works but some parts I don't understand why.
Also, I know there are probably better ways to go about some stuff,
like I think there's that "or die" stuff that would do what the
"unless" is doing now.
There are also some comments to help beginners (actually they're to
help me, a beginner too, not forget what each of the lines do)
understand what each part is doing and how it contributes to the
program.
So consider this thread as if I were asking you "how do I match over
multiple lines? could you provide full perl code?" and then you replied
me with some code.
Here it is:
#############################################
#* *#
# TEMPLATE FOR PERL REGEX PROGRAMS #
# #
# THIS TEMPLATE DOES THE FOLLOWING: #
# #
#=> reads input file and writes output file #
#=> undefines line terminator so that you #
# can match over multiple lines autolly #
# #
# #
# > to choose files from the prompt: #
# my $source=$ARGV[0]; #
# my $dest=$ARGV[1]; #
# #
#* *#
#############################################
# all variables must be declared
#______________________perl warns us about anything wrong
use strict;
use warnings;
#______________________these are the filenames
my $source="r.txt";
my $dest="r2.txt";
#______________________to store the lines we'll be reading
my $line;
#______________________do away with line breaks
$/ = undef;
# comment the above line out and the parser won't
# match over multiple lines anymore.
#______________________check file existence and permission
unless($source and $dest){
print "Source or destination file missing\n";
}
#______________________open input and output files
open SOURCE, "<$source";
open DEST, ">$dest";
#______________________read file till eof
while($line = <SOURCE>){
# replace "if" for "while" and it will print the first
# match and nothing more. don't know why.
# take away g and it will print the first match infinite
# times. don't know why.
# take away s and it won't match over multiple lines
# anymore. that's because s makes . match \n
# the $/=undef above is just for the file reading
# part, i guess. it doesn't nullify \n
while($line =~ m/<(.*?)>/gs) {
print DEST "----$1----\n";
}
}
#______________________close input and output files
close SOURCE;
close DEST;
Just save r.txt with this to test it:
tag"><1b>word<2/div>
<3div class="okay"><4i>o.
<5/i> notgood,
akdjsf jkdmhf djaf =¨?#$
<6flunk>yes but<7
this is
.. a
.. multiline
.. string, the kind of which my
.. template matches
, yes, > maybe we can
but we should <8be> careful
Any improvements will be appreciated.
some books and some websites and read a bunch. When I though I was
ready to go, I realized none of those sources taught me how to actually
write a Perl program from start to end that would open the file I
wanted to parse and save the parsing results to a second file. That was
a bummer.
Bla bla bla etc etc etc all those boring stuff everyone hates to read
about other people's life bla bla bla.
Okay, finally I have created a template for my regexes to parse a file,
save results to another file, and have its matches work OVER MULTIPLE
LINES. I know this is far from exciting for you perl hackers, but do
realize that the books I've read don't teach this. (I've got ADHD so if
they do and I'm just a poor reader, nevermind that statement).
Also, please understand that when I say "i have created" I mean "I,
with the help of loads of other people's work and some people's help"
(because let's face it, it's not that big of a file to need help from
loads of people). Of course I don't want credit for this, what I do
want is help. Everything works but some parts I don't understand why.
Also, I know there are probably better ways to go about some stuff,
like I think there's that "or die" stuff that would do what the
"unless" is doing now.
There are also some comments to help beginners (actually they're to
help me, a beginner too, not forget what each of the lines do)
understand what each part is doing and how it contributes to the
program.
So consider this thread as if I were asking you "how do I match over
multiple lines? could you provide full perl code?" and then you replied
me with some code.
Here it is:
#############################################
#* *#
# TEMPLATE FOR PERL REGEX PROGRAMS #
# #
# THIS TEMPLATE DOES THE FOLLOWING: #
# #
#=> reads input file and writes output file #
#=> undefines line terminator so that you #
# can match over multiple lines autolly #
# #
# #
# > to choose files from the prompt: #
# my $source=$ARGV[0]; #
# my $dest=$ARGV[1]; #
# #
#* *#
#############################################
# all variables must be declared
#______________________perl warns us about anything wrong
use strict;
use warnings;
#______________________these are the filenames
my $source="r.txt";
my $dest="r2.txt";
#______________________to store the lines we'll be reading
my $line;
#______________________do away with line breaks
$/ = undef;
# comment the above line out and the parser won't
# match over multiple lines anymore.
#______________________check file existence and permission
unless($source and $dest){
print "Source or destination file missing\n";
}
#______________________open input and output files
open SOURCE, "<$source";
open DEST, ">$dest";
#______________________read file till eof
while($line = <SOURCE>){
# replace "if" for "while" and it will print the first
# match and nothing more. don't know why.
# take away g and it will print the first match infinite
# times. don't know why.
# take away s and it won't match over multiple lines
# anymore. that's because s makes . match \n
# the $/=undef above is just for the file reading
# part, i guess. it doesn't nullify \n
while($line =~ m/<(.*?)>/gs) {
print DEST "----$1----\n";
}
}
#______________________close input and output files
close SOURCE;
close DEST;
Just save r.txt with this to test it:
tag"><1b>word<2/div>
<3div class="okay"><4i>o.
<5/i> notgood,
akdjsf jkdmhf djaf =¨?#$
<6flunk>yes but<7
this is
.. a
.. multiline
.. string, the kind of which my
.. template matches
but we should <8be> careful
Any improvements will be appreciated.