Multiple Line Pattern Match

C

Chris L.

Can someone please provide some assitance with a multi-line matching
problem? I have a datafile that looks like this:

***************DATAFILE************************
START
foo
START
foo
START
foo
bar
foo
bar
foo
bar
START


I am trying to capture the contents between the START and START
delineators. However, only if there are more than 1 line in between
them.
Specifically, I want to capture the entries with 6 lines in between
START and START--
but I want to leave out the entries that are only 1 line between START
and START.


Below is what I have so far-- however, it captures everything in
between START and START. Again, Im trying to catch only the 6 line
stretches between START and START not the 1 line stretches...
---------------------------------------------------------------------------­-------------------------------

open(FH,"foobar.txt")|| die "Cannot open FHandle: $!";
local $/ = "START\n";
while ( <FH> )
{
s/.*START\n//;
print;
}
close FH;
---------------------------------------------------------------------------­----------------------------------

Is there a way to specify the amount of lines?
Thank you very much for your time.
Chris L.
 
X

Xicheng Jia

Chris said:
Can someone please provide some assitance with a multi-line matching
problem? I have a datafile that looks like this:

***************DATAFILE************************
START
foo
START
foo
START
foo
bar
foo
bar
foo
bar
START


I am trying to capture the contents between the START and START
delineators. However, only if there are more than 1 line in between
them.
Specifically, I want to capture the entries with 6 lines in between
START and START--
but I want to leave out the entries that are only 1 line between START
and START.


Below is what I have so far-- however, it captures everything in
between START and START. Again, Im trying to catch only the 6 line
stretches between START and START not the 1 line stretches...
---------------------------------------------------------------------------­-------------------------------

open(FH,"foobar.txt")|| die "Cannot open FHandle: $!";
local $/ = "START\n";
while ( <FH> )
{
= s/.*START\n//;
this line is useless, coz START has been in $/, so $_ doesnot contain
the string "START", if you want to remove the last START which is not
followed by a newline, then you may want to use:

s/.*START//;
print;
}
close FH;
---------------------------------------------------------------------------­----------------------------------
= Is there a way to specify the amount of lines?

Just count the numer of newlines in $/, like

my $number_of_lines = tr/\n//;
print "$_\n\n" if $number_of_lines == 6;

Xicheng
 
T

Tad McClellan

Xicheng Jia said:
Chris L. wrote:

= s/.*START\n//;
this line is useless, coz START has been in $/, so $_ doesnot contain
the string "START",


Yes it does (if the file has the $/ value anywhere in it)..

When $/="\n" do you get a newline in $_ ?

Sure you do. Same here.

if you want to remove the last START which is not
followed by a newline, then you may want to use:

s/.*START//;


What is it that keeps the character after the START from
being a newline again?

perl -le 'print "matched" if "START\n" =~ /.*START/'

Just count the numer of newlines in $/, like

my $number_of_lines = tr/\n//;


That counts the number of newlines in $_, not in $/
 
X

Xicheng Jia

Yes it does (if the file has the $/ value anywhere in it)..

When $/="\n" do you get a newline in $_ ?

Sure you do. Same here.

yeah, you are right. I always use -l option on my command line which
actually chomps off $/, so it's why I thought there is no such $/ in
$_... anyway, the s/// expression there is about the same as chomp..:)
What is it that keeps the character after the START from
being a newline again?

perl -le 'print "matched" if "START\n" =~ /.*START/'
= That counts the number of newlines in $_, not in $/

my typo, and thanks for the correction.. :)

Regards,
Xicheng
 
A

Anno Siegel

Chris L. said:
Can someone please provide some assitance with a multi-line matching
problem? I have a datafile that looks like this:

***************DATAFILE************************
START
foo
START
foo
START
foo
bar
foo
bar
foo
bar
START


I am trying to capture the contents between the START and START
delineators. However, only if there are more than 1 line in between
them.
Specifically, I want to capture the entries with 6 lines in between
START and START--
but I want to leave out the entries that are only 1 line between START
and START.

my @big_chunks = do {
local $/ = "START\n";
grep tr/\n// > 2, <DATA>;
};

Anno
 
T

Tad McClellan

Chris L. said:
Specifically, I want to capture the entries with 6 lines in between
START and START--
but I want to leave out the entries that are only 1 line between START
and START.


Why not capture all the chunks, and then filter them based
on how many lines they contain?


----------------------
#!/usr/bin/perl
use warnings;
use strict;

local $/ = "START\n";

while ( <DATA> ) {
chomp;
my @lines = split /\n/;
next unless @lines == 6;

print "found a 6-line chunk\n";
}

__DATA__
START
foo
START
foo
START
foo
bar
foo
bar
foo
bar
START
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,054
Latest member
LucyCarper

Latest Threads

Top