Reading in data until I have a full structure

Discussion in 'Perl Misc' started by pwaring@gmail.com, Feb 25, 2007.

  1. Guest

    I've got a text file which is full of questions in a format similar to
    the following:

    QUESTION_ID "QUESTION_META_DATA
    FULL_QUESTION"
    /"SHORT_QUESTION"
    (ANSWER_1,
    ANSWER_2,
    ...
    ANSWER_N)

    At the moment I can parse each individual question into its component
    parts without any problems (it's not the most pleasant regex in the
    world, but it works), however I'm having trouble turning the whole
    file into an array of questions which I can then parse individually.
    Each question is separated from the next by at least two newlines, but
    unfortunately there is sometimes two newlines between SHORT_QUESTION
    and (ANSWER_1, so I can't assume that two newlines indicate the end of
    a question, which is what I've been doing so far.

    I was wondering if anyone could point me in the right direction for a
    way to get around this problem - basically I need to read in data
    until I know I've got a full question with answers (assuming this ends
    at two newlines often means I get the answers separately, which causes
    problems when I try to split this into smaller parts), parse that
    (which I can already do), save the results somewhere (already done as
    well) and then carry on to read in the next question.

    If anyone has any ideas as to how I can get around this, I'd be very
    grateful.

    Thanks in advance,

    Paul
     
    , Feb 25, 2007
    #1
    1. Advertising

  2. wrote:
    > I've got a text file which is full of questions in a format similar to
    > the following:
    >
    > QUESTION_ID "QUESTION_META_DATA
    > FULL_QUESTION"
    > /"SHORT_QUESTION"
    > (ANSWER_1,
    > ANSWER_2,
    > ...
    > ANSWER_N)
    >
    > At the moment I can parse each individual question into its component
    > parts without any problems (it's not the most pleasant regex in the
    > world, but it works), however I'm having trouble turning the whole
    > file into an array of questions which I can then parse individually.
    > Each question is separated from the next by at least two newlines, but
    > unfortunately there is sometimes two newlines between SHORT_QUESTION
    > and (ANSWER_1, so I can't assume that two newlines indicate the end of
    > a question, which is what I've been doing so far.
    >
    > I was wondering if anyone could point me in the right direction for a
    > way to get around this problem - basically I need to read in data
    > until I know I've got a full question with answers (assuming this ends
    > at two newlines often means I get the answers separately, which causes
    > problems when I try to split this into smaller parts), parse that
    > (which I can already do), save the results somewhere (already done as
    > well) and then carry on to read in the next question.
    >


    I'm sure someone here who knows far more about regular expressions than
    I do will come up with a workable solution, but personally I'd be
    tempted to use a lexer instead.

    http://www.perl.com/pub/a/2006/01/05/parsing.html

    Mark
     
    Mark Clements, Feb 25, 2007
    #2
    1. Advertising

  3. "" <> wrote in
    news::

    > I've got a text file which is full of questions in a format similar to
    > the following:


    Please read the posting guidelines for this group before posting again.

    > QUESTION_ID "QUESTION_META_DATA
    > FULL_QUESTION"
    > /"SHORT_QUESTION"
    > (ANSWER_1,
    > ANSWER_2,
    > ...
    > ANSWER_N)
    >


    ....

    > I was wondering if anyone could point me in the right direction for a
    > way to get around this problem - basically I need to read in data
    > until I know I've got a full question with answers (assuming this ends
    > at two newlines often means I get the answers separately, which causes
    > problems when I try to split this into smaller parts), parse that
    > (which I can already do), save the results somewhere (already done as
    > well) and then carry on to read in the next question.


    You might want to read perldoc perlvar, especially about $/ :

    #!/usr/bin/perl

    use strict;
    use warnings;

    local $/ = ")\n\n";

    my %questions;

    while( my $chunk = <DATA> ) {
    chomp $chunk;

    $chunk =~ s/\A\s+//;
    $chunk =~ s/\s+\z//;

    if( $chunk =~ m{
    \A
    \s*
    (\w+) # QUESTION_ID
    \s+"
    (\w+) # QUESTION_META_DATA
    \n+\s+
    (\w+) # FULL_QUESTION
    "\n\s+/"
    (\w+) # SHORT_QUESTION
    "\n+\s+\(
    (.+) # ANSWERS
    }xms
    )
    {
    my %q;
    @q{ qw( qmeta qfull qshort ) } = ($2, $3, $4);
    $q{ answers } = [ split /,\n\s+/, $5 ];
    $questions{ $1 } = \%q;
    }
    }

    use Data::Dumper;
    print Dumper \%questions;

    __DATA__

    QUESTION_1 "QUESTION_META_DATA
    FULL_QUESTION"
    /"SHORT_QUESTION"
    (ANSWER_1,
    ANSWER_2,

    ANSWER_3,

    ANSWER_4,
    ANSWER_N)


    QUESTION_2 "QUESTION_META_DATA
    FULL_QUESTION"
    /"SHORT_QUESTION"

    (ANSWER_1,
    ANSWER_2,
    ANSWER_N)

    QUESTION_3 "QUESTION_META_DATA
    FULL_QUESTION"
    /"SHORT_QUESTION"
    (ANSWER_1,
    ANSWER_2,
    ANSWER_X,
    ANSWER_N)

    C:\DOCUME~1\asu1\LOCALS~1\Temp\2> t
    $VAR1 = {
    'QUESTION_3' => {
    'qfull' => 'FULL_QUESTION',
    'qshort' => 'SHORT_QUESTION',
    'answers' => [
    'ANSWER_1',
    'ANSWER_2',
    'ANSWER_X',
    'ANSWER_N'
    ],
    'qmeta' => 'QUESTION_META_DATA'
    },
    'QUESTION_1' => {
    'qfull' => 'FULL_QUESTION',
    'qshort' => 'SHORT_QUESTION',
    'answers' => [
    'ANSWER_1',
    'ANSWER_2',
    'ANSWER_3',
    'ANSWER_4',
    'ANSWER_N'
    ],
    'qmeta' => 'QUESTION_META_DATA'
    },
    'QUESTION_2' => {
    'qfull' => 'FULL_QUESTION',
    'qshort' => 'SHORT_QUESTION',
    'answers' => [
    'ANSWER_1',
    'ANSWER_2',
    'ANSWER_N'
    ],
    'qmeta' => 'QUESTION_META_DATA'
    }
    };
     
    A. Sinan Unur, Feb 25, 2007
    #3
  4. Guest

    On Feb 25, 7:49 pm, "A. Sinan Unur" <> wrote:
    > You might want to read perldoc perlvar, especially about $/ :
    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > local $/ = ")\n\n";


    That looks almost like what I want, but I should have mentioned in my
    original post that the brackets are optional if there is only one
    answer, so I don't think that looking for )\n\n would work.

    Paul
     
    , Feb 25, 2007
    #4
  5. "" <> wrote in
    news::

    > On Feb 25, 7:49 pm, "A. Sinan Unur" <> wrote:
    >> You might want to read perldoc perlvar, especially about $/ :
    >>
    >> #!/usr/bin/perl
    >>
    >> use strict;
    >> use warnings;
    >>
    >> local $/ = ")\n\n";

    >
    > That looks almost like what I want, but I should have mentioned in my
    > original post that the brackets are optional if there is only one
    > answer, so I don't think that looking for )\n\n would work.


    Well, here's your last fish:

    #!/usr/bin/perl

    use strict;
    use warnings;

    my %questions;

    LINE: while( my $line = <DATA> ) {
    next LINE unless $line =~ /\AQUESTION/;

    NEW_QUESTION: my $chunk = $line;

    do {
    $line = <DATA>;

    unless ( defined $line ) {
    parse_chunk( $chunk );
    last LINE;
    }

    if ( $line =~ /\AQUESTION/ ) {
    parse_chunk( $chunk );
    goto NEW_QUESTION;
    }

    $chunk .= $line;
    } while ( 1 );
    }

    sub parse_chunk {
    my ($chunk) = @_;

    $chunk =~ s/\A\s+//;
    $chunk =~ s/\s+\z//;

    if( $chunk =~ m{
    \A
    \s*
    (\w+) # QUESTION_ID
    \s+"
    (\w+) # QUESTION_META_DATA
    \n+\s+
    (\w+) # FULL_QUESTION
    "\n\s+/"
    (\w+) # SHORT_QUESTION
    "\n+\s+\(
    (.+) # ANSWERS
    }xms
    )
    {
    my %q;
    @q{ qw( qmeta qfull qshort ) } = ($2, $3, $4);
    $q{ answers } = [ split /,\n\s+/, $5 ];
    $questions{ $1 } = \%q;
    }
    }



    use Data::Dumper;
    print Dumper \%questions;

    __DATA__

    QUESTION_1 "QUESTION_META_DATA
    FULL_QUESTION"
    /"SHORT_QUESTION"
    (ANSWER_1,
    ANSWER_2,

    ANSWER_3,

    ANSWER_4,
    ANSWER_N)


    QUESTION_2 "QUESTION_META_DATA
    FULL_QUESTION"
    /"SHORT_QUESTION"

    (ANSWER_1,
    ANSWER_2,
    ANSWER_N)

    QUESTION_3 "QUESTION_META_DATA
    FULL_QUESTION"
    /"SHORT_QUESTION"
    (ANSWER_1,
    ANSWER_2,
    ANSWER_X,
    ANSWER_N)

    $VAR1 = {
    'QUESTION_3' => {
    'qfull' => 'FULL_QUESTION',
    'qshort' => 'SHORT_QUESTION',
    'answers' => [
    'ANSWER_1',
    'ANSWER_2',
    'ANSWER_X',
    'ANSWER_N)'
    ],
    'qmeta' => 'QUESTION_META_DATA'
    },
    'QUESTION_1' => {
    'qfull' => 'FULL_QUESTION',
    'qshort' => 'SHORT_QUESTION',
    'answers' => [
    'ANSWER_1',
    'ANSWER_2',
    'ANSWER_3',
    'ANSWER_4',
    'ANSWER_N)'
    ],
    'qmeta' => 'QUESTION_META_DATA'
    },
    'QUESTION_2' => {
    'qfull' => 'FULL_QUESTION',
    'qshort' => 'SHORT_QUESTION',
    'answers' => [
    'ANSWER_1',
    'ANSWER_2',
    'ANSWER_N)'
    ],
    'qmeta' => 'QUESTION_META_DATA'
    }
    };


    Sinan
     
    A. Sinan Unur, Feb 25, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Samuel R. Neff
    Replies:
    2
    Views:
    624
    bradley
    Jun 10, 2005
  2. Scott Brady Drummonds

    Reading Until EOF

    Scott Brady Drummonds, Oct 21, 2003, in forum: Python
    Replies:
    7
    Views:
    647
    Donn Cave
    Oct 22, 2003
  3. Ian Bicking

    Re: Reading Until EOF

    Ian Bicking, Oct 21, 2003, in forum: Python
    Replies:
    0
    Views:
    517
    Ian Bicking
    Oct 21, 2003
  4. eblume
    Replies:
    3
    Views:
    222
    Peter Otten
    Jan 12, 2011
  5. Replies:
    1
    Views:
    199
    Ken Bloom
    May 28, 2007
Loading...

Share This Page