Parsing multi-line text

Discussion in 'Perl Misc' started by keith@bytebrothers.co.uk, Feb 18, 2008.

  1. Guest

    Hi all,

    I have a data file structured something like this:

    ------------------8<-----------------------
    Chunk 01
    NAME: "Alice"
    Description: "Some other string"
    Age: 37
    Chunk 02
    NAME: "Bob"
    Description: "Some other string"
    Age: 28
    Chunk 03
    FIRST: "Carol"
    Description: "Some other string"
    Age: 32
    Chunk 04
    FIRST: "Dave"
    Description: "Some other string"
    Age: 22
    ------------------8<-----------------------

    and I want to extract from it to produce output something like this:

    ------------------8<-----------------------
    01 NAME: Alice -> 37
    02 NAME: Bob -> 28
    03 NAME: Carol -> 32
    04 NAME: Dave -> 22
    ------------------8<-----------------------

    I've read and re-read the section in perlfaq6 (no, really, I have!)
    about milt-line matching, but I can't see how to adapt what is there
    to this.

    Can someone please point me in the right direction?
    Thx!
     
    , Feb 18, 2008
    #1
    1. Advertising

  2. wrote:
    > I have a data file structured something like this:
    >
    > ------------------8<-----------------------
    > Chunk 01
    > NAME: "Alice"
    > Description: "Some other string"
    > Age: 37
    > Chunk 02
    > NAME: "Bob"
    > Description: "Some other string"
    > Age: 28
    > Chunk 03
    > FIRST: "Carol"
    > Description: "Some other string"
    > Age: 32
    > Chunk 04
    > FIRST: "Dave"
    > Description: "Some other string"
    > Age: 22
    > ------------------8<-----------------------
    >
    > and I want to extract from it to produce output something like this:
    >
    > ------------------8<-----------------------
    > 01 NAME: Alice -> 37
    > 02 NAME: Bob -> 28
    > 03 NAME: Carol -> 32
    > 04 NAME: Dave -> 22
    > ------------------8<-----------------------


    local $/ = 'Chunk';
    while (<>) {
    if ( /(\d+).+[A-Z]+:\s+"([^"]*)".+Age:\s+(\d+)/s ) {
    printf "%02d NAME: %-10s -> %d\n", $1, $2, $3;
    }
    }

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Feb 18, 2008
    #2
    1. Advertising

  3. Guest

    On 18 Feb, 11:18, Gunnar Hjalmarsson <> wrote:
    > wrote:
    > > I have a data file structured something like this:

    >
    > > ------------------8<-----------------------
    > > Chunk 01
    > > NAME: "Alice"
    > > Description: "Some other string"
    > > Age: 37
    > > ------------------8<-----------------------

    >
    > > and I want to extract from it to produce output something like this:

    >
    > > ------------------8<-----------------------
    > > 01 NAME: Alice -> 37
    > > ------------------8<-----------------------

    >
    > local $/ = 'Chunk';
    > while (<>) {
    > if ( /(\d+).+[A-Z]+:\s+"([^"]*)".+Age:\s+(\d+)/s ) {
    > printf "%02d NAME: %-10s -> %d\n", $1, $2, $3;
    > }
    > }


    Gotta love this place - thanks!

    Now, let's see if I can decipher (no point in asking if I don't learn
    from the answer)...

    You make the text 'Chunk' the record delimiter. Then inside each
    record you look for digits (store in $1). Skip anything followed by
    uppercase text followed by colon followed by space followed by double-
    quote. Now grab everything up to next double quote (store in $2).
    Skip double-quote, then anything then the text 'Age:' then spaces,
    then grab digits (store in $3), and we're done.

    Is that close?!
     
    , Feb 18, 2008
    #3
  4. wrote:
    > On 18 Feb, 11:18, Gunnar Hjalmarsson <> wrote:
    >>
    >> local $/ = 'Chunk';
    >> while (<>) {
    >> if ( /(\d+).+[A-Z]+:\s+"([^"]*)".+Age:\s+(\d+)/s ) {
    >> printf "%02d NAME: %-10s -> %d\n", $1, $2, $3;
    >> }
    >> }

    >
    > Gotta love this place - thanks!
    >
    > Now, let's see if I can decipher (no point in asking if I don't learn
    > from the answer)...
    >
    > You make the text 'Chunk' the record delimiter. Then inside each
    > record you look for digits (store in $1). Skip anything followed by
    > uppercase text followed by colon followed by space followed by double-
    > quote. Now grab everything up to next double quote (store in $2).
    > Skip double-quote, then anything then the text 'Age:' then spaces,
    > then grab digits (store in $3), and we're done.
    >
    > Is that close?!


    Yep, that's about it.

    Since each chunk spans over multiple lines, the /s modifier is important
    (makes . match also newlines).

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Feb 18, 2008
    #4
  5. On 2008-02-18 10:42, <> wrote:
    > I have a data file structured something like this:
    >
    > ------------------8<-----------------------
    > Chunk 01
    > NAME: "Alice"
    > Description: "Some other string"
    > Age: 37
    > Chunk 02
    > NAME: "Bob"
    > Description: "Some other string"
    > Age: 28
    > Chunk 03
    > FIRST: "Carol"
    > Description: "Some other string"
    > Age: 32
    > Chunk 04
    > FIRST: "Dave"
    > Description: "Some other string"
    > Age: 22
    > ------------------8<-----------------------
    >
    > and I want to extract from it to produce output something like this:
    >
    > ------------------8<-----------------------
    > 01 NAME: Alice -> 37
    > 02 NAME: Bob -> 28
    > 03 NAME: Carol -> 32
    > 04 NAME: Dave -> 22
    > ------------------8<-----------------------


    Sure about that? In the input you have sometimes "FIRST" and sometimes
    "NAME", but in the output it is always NAME. Assuming this is
    intentional:


    #!/usr/bin/perl
    use strict;
    use warnings;

    my $s = <<EOS;
    Chunk 01
    NAME: "Alice"
    Description: "Some other string"
    Age: 37
    Chunk 02
    NAME: "Bob"
    Description: "Some other string"
    Age: 28
    Chunk 03
    FIRST: "Carol"
    Description: "Some other string"
    Age: 32
    Chunk 04
    FIRST: "Dave"
    Description: "Some other string"
    Age: 22
    EOS

    while ($s =~ m{
    ^Chunk \s (\d+) \n
    \s+(NAME|FIRST): \s "(.*?)" \n
    \s+Description: \s "(.*?)" \n
    \s+Age: \s (\d+) \n
    }xmg
    ) {
    print "$1 NAME: $3 -> $5\n";
    }

    hp
     
    Peter J. Holzer, Mar 1, 2008
    #5
  6. On 2008-02-18 11:18, Gunnar Hjalmarsson <> wrote:
    > wrote:
    >> I have a data file structured something like this:
    >>
    >> ------------------8<-----------------------
    >> Chunk 01
    >> NAME: "Alice"
    >> Description: "Some other string"
    >> Age: 37
    >> Chunk 02
    >> NAME: "Bob"
    >> Description: "Some other string"


    change this line to

    Description: "Some Chunky string"

    >> Age: 28

    ....
    >> ------------------8<-----------------------

    >
    > local $/ = 'Chunk';
    > while (<>) {
    > if ( /(\d+).+[A-Z]+:\s+"([^"]*)".+Age:\s+(\d+)/s ) {
    > printf "%02d NAME: %-10s -> %d\n", $1, $2, $3;
    > }
    > }


    and then run this script again.

    hp
     
    Peter J. Holzer, Mar 1, 2008
    #6
  7. Peter J. Holzer wrote:
    > On 2008-02-18 11:18, Gunnar Hjalmarsson <> wrote:
    >> wrote:
    >>> I have a data file structured something like this:
    >>>
    >>> ------------------8<-----------------------
    >>> Chunk 01
    >>> NAME: "Alice"
    >>> Description: "Some other string"
    >>> Age: 37
    >>> Chunk 02
    >>> NAME: "Bob"
    >>> Description: "Some other string"

    >
    > change this line to
    >
    > Description: "Some Chunky string"
    >
    >>> Age: 28

    > ...
    >>> ------------------8<-----------------------

    >> local $/ = 'Chunk';
    >> while (<>) {
    >> if ( /(\d+).+[A-Z]+:\s+"([^"]*)".+Age:\s+(\d+)/s ) {
    >> printf "%02d NAME: %-10s -> %d\n", $1, $2, $3;
    >> }
    >> }

    >
    > and then run this script again.


    Well, what's the likelihood that that would happen? At least the OP
    didn't object to the idea with 'Chunk' as record separator.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Mar 2, 2008
    #7
  8. On 2008-03-02 09:50, Gunnar Hjalmarsson <> wrote:
    > Peter J. Holzer wrote:
    >> On 2008-02-18 11:18, Gunnar Hjalmarsson <> wrote:
    >>> wrote:
    >>>> I have a data file structured something like this:
    >>>>
    >>>> ------------------8<-----------------------
    >>>> Chunk 01
    >>>> NAME: "Alice"
    >>>> Description: "Some other string"
    >>>> Age: 37
    >>>> Chunk 02
    >>>> NAME: "Bob"
    >>>> Description: "Some other string"

    >>
    >> change this line to
    >>
    >> Description: "Some Chunky string"
    >>
    >>>> Age: 28

    >> ...
    >>>> ------------------8<-----------------------
    >>> local $/ = 'Chunk';
    >>> while (<>) {
    >>> if ( /(\d+).+[A-Z]+:\s+"([^"]*)".+Age:\s+(\d+)/s ) {
    >>> printf "%02d NAME: %-10s -> %d\n", $1, $2, $3;
    >>> }
    >>> }

    >>
    >> and then run this script again.

    >
    > Well, what's the likelihood that that would happen?


    How would I know? The OP didn't say much about the contents of the
    fields. But I'd say it is non-zero. "Chunk" is an English word which
    might occur in a description, and it might even be the first 5
    characters of a name. Finally, we don't know where data comes from -
    somebody might deliberately try to sabotage the script.

    > At least the OP didn't object to the idea with 'Chunk' as record
    > separator.


    I was under the impression that he was glad to understand your solution
    at all and wasn't trying to find flaws in it. Far too few people think
    about the edge-cases of possible input.

    A word of warning about the solution I posted in a different message: It
    doesn't handle embedded quotes - that would be quite easy to add, but
    there are different systems of escaping quotes and one would need to
    know which one to use - the OP didn't tell us.

    hp
     
    Peter J. Holzer, Mar 2, 2008
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jacob Rael
    Replies:
    7
    Views:
    448
    Jacob Rael
    Nov 28, 2006
  2. dean
    Replies:
    5
    Views:
    481
    Phlip
    Jun 4, 2006
  3. kaushikshome
    Replies:
    4
    Views:
    796
    kaushikshome
    Sep 10, 2006
  4. Replies:
    27
    Views:
    430
    Peter J. Holzer
    May 18, 2007
  5. Replies:
    2
    Views:
    152
Loading...

Share This Page