Need RegExp

Discussion in 'Perl Misc' started by Indigo5, Nov 15, 2004.

  1. Indigo5

    Indigo5 Guest

    I need a way of parsing a variable format. Basically the section starts off
    with a Reference keyword and ends with a Comments keyword.

    Reference:
    1. document1
    2 document2
    Comments:

    I was doing this as follows:

    if (/^Reference/ .. /^Comments/){

    if (/^[0-9]/){

    chomp;

    $_ =~ s/^[0-9].\s+//g;

    push @references, $_;
    }
    }

    However, I just received a document where Reference 1 wrapped onto another
    line so that the format looks like this:

    Reference:
    1. document 1
    more of document 1 here
    2. document 2
    Comments:

    I cannot change the format of the input I receive, so I have to find a way
    to parse this the way I receive it. Any help would be highly appreciated.
    Thank you.
    Indigo5, Nov 15, 2004
    #1
    1. Advertising

  2. Indigo5 <> wrote:
    > I need a way of parsing a variable format. Basically the section starts off
    > with a Reference keyword and ends with a Comments keyword.



    Have you considered using $/ instead?

    Maybe $/ = "Comments:\n"; # ??


    > Reference:
    > 1. document1
    > 2 document2
    > Comments:
    >
    > I was doing this as follows:
    >
    > if (/^Reference/ .. /^Comments/){
    >
    > if (/^[0-9]/){
    >
    > chomp;
    >
    > $_ =~ s/^[0-9].\s+//g;
    >
    > push @references, $_;
    > }
    > }
    >
    > However, I just received a document where Reference 1 wrapped onto another
    > line so that the format looks like this:
    >
    > Reference:
    > 1. document 1
    > more of document 1 here



    So just tack it onto the end of the previous one then.

    What's the problem?

    if ( /^ +/ ) { # continuation line
    chomp;
    $references[-1] .= " $_";
    }


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Nov 15, 2004
    #2
    1. Advertising

  3. Indigo5 wrote:
    > I need a way of parsing a variable format. Basically the section starts off
    > with a Reference keyword and ends with a Comments keyword.
    >
    > Reference:
    > 1. document1
    > 2 document2
    > Comments:
    >
    > I was doing this as follows:
    >
    > if (/^Reference/ .. /^Comments/){
    >
    > if (/^[0-9]/){
    >
    > chomp;
    >
    > $_ =~ s/^[0-9].\s+//g;
    >
    > push @references, $_;
    > }
    > }
    >
    > However, I just received a document where Reference 1 wrapped onto another
    > line so that the format looks like this:
    >
    > Reference:
    > 1. document 1
    > more of document 1 here
    > 2. document 2
    > Comments:
    >
    > I cannot change the format of the input I receive, so I have to find a way
    > to parse this the way I receive it. Any help would be highly appreciated.



    $_ = <FILE> until /^Reference/;

    while ( <FILE> ) {
    last if /^Comments/;
    chomp;
    s/^\d+\.\s+// ? push( @references, $_ ) : ( $references[ -1 ] .= $_ );
    }



    John
    --
    use Perl;
    program
    fulfillment
    John W. Krahn, Nov 16, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Greg Hurrell
    Replies:
    4
    Views:
    152
    James Edward Gray II
    Feb 14, 2007
  2. Mikel Lindsaar
    Replies:
    0
    Views:
    467
    Mikel Lindsaar
    Mar 31, 2008
  3. Joao Silva
    Replies:
    16
    Views:
    344
    7stud --
    Aug 21, 2009
  4. Uldis  Bojars
    Replies:
    2
    Views:
    186
    Janwillem Borleffs
    Dec 17, 2006
  5. Matìj Cepl

    new RegExp().test() or just RegExp().test()

    Matìj Cepl, Nov 24, 2009, in forum: Javascript
    Replies:
    3
    Views:
    171
    Matěj Cepl
    Nov 24, 2009
Loading...

Share This Page