Multiple Line Pattern Match

Discussion in 'Perl Misc' started by Chris L., Apr 9, 2006.

  1. Chris L.

    Chris L. Guest

    Can someone please provide some assitance with a multi-line matching
    problem? I have a datafile that looks like this:

    ***************DATAFILE************************
    START
    foo
    START
    foo
    START
    foo
    bar
    foo
    bar
    foo
    bar
    START


    I am trying to capture the contents between the START and START
    delineators. However, only if there are more than 1 line in between
    them.
    Specifically, I want to capture the entries with 6 lines in between
    START and START--
    but I want to leave out the entries that are only 1 line between START
    and START.


    Below is what I have so far-- however, it captures everything in
    between START and START. Again, Im trying to catch only the 6 line
    stretches between START and START not the 1 line stretches...
    ---------------------------------------------------------------------------­-------------------------------

    open(FH,"foobar.txt")|| die "Cannot open FHandle: $!";
    local $/ = "START\n";
    while ( <FH> )
    {
    s/.*START\n//;
    print;
    }
    close FH;
    ---------------------------------------------------------------------------­----------------------------------

    Is there a way to specify the amount of lines?
    Thank you very much for your time.
    Chris L.
     
    Chris L., Apr 9, 2006
    #1
    1. Advertising

  2. Chris L.

    Xicheng Jia Guest

    Chris L. wrote:
    > Can someone please provide some assitance with a multi-line matching
    > problem? I have a datafile that looks like this:
    >
    > ***************DATAFILE************************
    > START
    > foo
    > START
    > foo
    > START
    > foo
    > bar
    > foo
    > bar
    > foo
    > bar
    > START
    >
    >
    > I am trying to capture the contents between the START and START
    > delineators. However, only if there are more than 1 line in between
    > them.
    > Specifically, I want to capture the entries with 6 lines in between
    > START and START--
    > but I want to leave out the entries that are only 1 line between START
    > and START.
    >
    >
    > Below is what I have so far-- however, it captures everything in
    > between START and START. Again, Im trying to catch only the 6 line
    > stretches between START and START not the 1 line stretches...
    > ---------------------------------------------------------------------------­-------------------------------
    >
    > open(FH,"foobar.txt")|| die "Cannot open FHandle: $!";
    > local $/ = "START\n";
    > while ( <FH> )
    > {

    = s/.*START\n//;
    this line is useless, coz START has been in $/, so $_ doesnot contain
    the string "START", if you want to remove the last START which is not
    followed by a newline, then you may want to use:

    s/.*START//;

    > print;
    > }
    > close FH;
    > ---------------------------------------------------------------------------­----------------------------------
    >

    = Is there a way to specify the amount of lines?

    Just count the numer of newlines in $/, like

    my $number_of_lines = tr/\n//;
    print "$_\n\n" if $number_of_lines == 6;

    Xicheng

    > Thank you very much for your time.
    > Chris L.
     
    Xicheng Jia, Apr 9, 2006
    #2
    1. Advertising

  3. Xicheng Jia <> wrote:
    > Chris L. wrote:


    >> ***************DATAFILE************************
    >> START
    >> foo
    >> START
    >> foo
    >> START
    >> foo
    >> bar
    >> foo
    >> bar
    >> foo
    >> bar
    >> START
    >>
    >>
    >> I am trying to capture the contents between the START and START
    >> delineators. However, only if there are more than 1 line in between
    >> them.



    >> open(FH,"foobar.txt")|| die "Cannot open FHandle: $!";
    >> local $/ = "START\n";
    >> while ( <FH> )
    >> {

    >= s/.*START\n//;
    > this line is useless, coz START has been in $/, so $_ doesnot contain
    > the string "START",



    Yes it does (if the file has the $/ value anywhere in it)..

    When $/="\n" do you get a newline in $_ ?

    Sure you do. Same here.


    > if you want to remove the last START which is not
    > followed by a newline, then you may want to use:
    >
    > s/.*START//;



    What is it that keeps the character after the START from
    being a newline again?

    perl -le 'print "matched" if "START\n" =~ /.*START/'


    > Just count the numer of newlines in $/, like
    >
    > my $number_of_lines = tr/\n//;



    That counts the number of newlines in $_, not in $/


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Apr 10, 2006
    #3
  4. Chris L.

    Xicheng Jia Guest

    Tad McClellan wrote:
    > Xicheng Jia <> wrote:
    > > Chris L. wrote:

    >
    > >> ***************DATAFILE************************
    > >> START
    > >> foo
    > >> START
    > >> foo
    > >> START
    > >> foo
    > >> bar
    > >> foo
    > >> bar
    > >> foo
    > >> bar
    > >> START
    > >>
    > >>
    > >> I am trying to capture the contents between the START and START
    > >> delineators. However, only if there are more than 1 line in between
    > >> them.

    >
    >
    > >> open(FH,"foobar.txt")|| die "Cannot open FHandle: $!";
    > >> local $/ = "START\n";
    > >> while ( <FH> )
    > >> {

    > >= s/.*START\n//;
    > > this line is useless, coz START has been in $/, so $_ doesnot contain
    > > the string "START",


    > Yes it does (if the file has the $/ value anywhere in it)..
    >
    > When $/="\n" do you get a newline in $_ ?
    >
    > Sure you do. Same here.


    yeah, you are right. I always use -l option on my command line which
    actually chomps off $/, so it's why I thought there is no such $/ in
    $_... anyway, the s/// expression there is about the same as chomp..:)

    >
    > > if you want to remove the last START which is not
    > > followed by a newline, then you may want to use:
    > >
    > > s/.*START//;

    >
    >
    > What is it that keeps the character after the START from
    > being a newline again?
    >
    > perl -le 'print "matched" if "START\n" =~ /.*START/'
    >
    >
    > > Just count the numer of newlines in $/, like
    > >
    > > my $number_of_lines = tr/\n//;

    >
    >

    = That counts the number of newlines in $_, not in $/

    my typo, and thanks for the correction.. :)

    Regards,
    Xicheng

    >
    > --
    > Tad McClellan SGML consulting
    > Perl programming
    > Fort Worth, Texas
     
    Xicheng Jia, Apr 10, 2006
    #4
  5. Chris L.

    Anno Siegel Guest

    Chris L. <> wrote in comp.lang.perl.misc:
    > Can someone please provide some assitance with a multi-line matching
    > problem? I have a datafile that looks like this:
    >
    > ***************DATAFILE************************
    > START
    > foo
    > START
    > foo
    > START
    > foo
    > bar
    > foo
    > bar
    > foo
    > bar
    > START
    >
    >
    > I am trying to capture the contents between the START and START
    > delineators. However, only if there are more than 1 line in between
    > them.
    > Specifically, I want to capture the entries with 6 lines in between
    > START and START--
    > but I want to leave out the entries that are only 1 line between START
    > and START.


    my @big_chunks = do {
    local $/ = "START\n";
    grep tr/\n// > 2, <DATA>;
    };

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Apr 10, 2006
    #5
  6. Chris L. <> wrote:

    > Specifically, I want to capture the entries with 6 lines in between
    > START and START--
    > but I want to leave out the entries that are only 1 line between START
    > and START.



    Why not capture all the chunks, and then filter them based
    on how many lines they contain?


    ----------------------
    #!/usr/bin/perl
    use warnings;
    use strict;

    local $/ = "START\n";

    while ( <DATA> ) {
    chomp;
    my @lines = split /\n/;
    next unless @lines == 6;

    print "found a 6-line chunk\n";
    }

    __DATA__
    START
    foo
    START
    foo
    START
    foo
    bar
    foo
    bar
    foo
    bar
    START
    ----------------------


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Apr 10, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Black

    match pattern of multiple lines

    John Black, Jul 15, 2004, in forum: Perl
    Replies:
    2
    Views:
    531
    Mohammad Mahmoud Khajah
    Jul 16, 2004
  2. Markus Fischer
    Replies:
    9
    Views:
    165
    7stud --
    Apr 8, 2011
  3. Darius
    Replies:
    4
    Views:
    94
    Anno Siegel
    Sep 5, 2004
  4. Match line by line

    , Jun 5, 2006, in forum: Perl Misc
    Replies:
    3
    Views:
    81
  5. samuel

    Multiple Line Pattern Match problem

    samuel, May 31, 2007, in forum: Perl Misc
    Replies:
    7
    Views:
    138
    samuel
    Jun 4, 2007
Loading...

Share This Page