read file with while and then scan lines into array

Discussion in 'Perl Misc' started by Martin Foster, Dec 5, 2003.

  1. Hello

    I'm scanning text files into a database.

    My perl script looks like this:

    # start loop of file to scan for data
    while (defined ($_2 = <INFILE>)){
    # Find cell data
    if ($_2 =~ m/_cell_length_a\s+(-?([0-9]+(\.[0-9]*)?|\.[0-9]+))/){
    $cell[0] = $1;
    print "Found cell parameter a= ", $cell[0], " ";
    print "For str_id number ", $au_id, "\n";
    # Insert data
    $stmt1 = "UPDATE bgb_data SET latpar_a = ? WHERE str_id = ?";
    $sth = $dbh->prepare($stmt1);
    $sth->execute($cell[0], $au_id);
    }

    # get sequences
    if ($_2 =~ m/_Sequence/){
    # start loop to scan in sequences

    So now I've found a tag and the next few lines are number sequences
    which
    I want in an array.

    I want to scan in those lines into until a blank line appears and then
    continue scanning for further data, in the while loop.

    How can I do this?


    Many thanks for any help!

    Cheers,
    Martin
    Martin Foster, Dec 5, 2003
    #1
    1. Advertising

  2. Martin Foster

    Jim Keenan Guest

    "Martin Foster" <> wrote in message
    news:...
    > I'm scanning text files into a database.
    >
    > My perl script looks like this:
    >


    You've written your post in such a confusing manner that it is difficult to
    figure out what your problem is.

    > # start loop of file to scan for data
    > while (defined ($_2 = <INFILE>)){
    > # Find cell data
    > if ($_2 =~ m/_cell_length_a\s+(-?([0-9]+(\.[0-9]*)?|\.[0-9]+))/){
    > $cell[0] = $1;


    In the code presented, you don't assign to any element of @cell other than
    $cell[0]. So why use an array at all?

    > print "Found cell parameter a= ", $cell[0], " ";
    > print "For str_id number ", $au_id, "\n";


    Where did $au_id come from?

    > # Insert data
    > $stmt1 = "UPDATE bgb_data SET latpar_a = ? WHERE str_id = ?";
    > $sth = $dbh->prepare($stmt1);
    > $sth->execute($cell[0], $au_id);
    > }
    >
    > # get sequences
    > if ($_2 =~ m/_Sequence/){
    > # start loop to scan in sequences


    This loop is incomplete. Was what you really intended something like this?

    if ($_2 =~ m/_cell_length_a\s+(-?([0-9]+(\.[0-9]*)?|\.[0-9]+))/){
    # process
    } elsif () {
    # process
    } ($_2 =~ m/_Sequence/)

    >
    > So now I've found a tag and the next few lines are number sequences
    > which
    > I want in an array.
    >
    > I want to scan in those lines into until a blank line appears and then
    > continue scanning for further data, in the while loop.
    >

    Does that mean that when you are processing a file line-by-line and
    encounter a blank line, you wish to start a new array to hold the sequence
    numbers?

    Can you provide some sample data we could test this with?

    Jim Keenan
    Jim Keenan, Dec 7, 2003
    #2
    1. Advertising

  3. Here's the data
    ......skipping top part of file
    loop_
    _iza_sc_CoordinationSequence
    1 4 9 17 28 42 60 82 111 149 191 229 262 297 336 384
    1 4 10 19 30 44 63 89 121 155 188 221 258 302 355 415
    1 4 9 18 32 49 68 89 114 144 179 221 267 314 364 417

    loop_
    _iza_sc_VertexSymbols
    4.6.4.6.4.6
    4.4.6.6.6.8_{3}
    4.4.4.6.8.12
    .......skipping bottom part of file.

    I want to scan in the number sequences after
    _iza_sc_CoordinationSequence
    into an array and them into mySQL.



    "Jim Keenan" <> wrote in message news:<aSxAb.1503$>...
    > "Martin Foster" <> wrote in message
    > news:...
    > > I'm scanning text files into a database.
    > >
    > > My perl script looks like this:
    > >

    >
    > You've written your post in such a confusing manner that it is difficult to
    > figure out what your problem is.


    I was being a little too brief.

    >
    > > # start loop of file to scan for data
    > > while (defined ($_2 = <INFILE>)){
    > > # Find cell data
    > > if ($_2 =~ m/_cell_length_a\s+(-?([0-9]+(\.[0-9]*)?|\.[0-9]+))/){
    > > $cell[0] = $1;

    >
    > In the code presented, you don't assign to any element of @cell other than
    > $cell[0]. So why use an array at all?
    >

    I do have other data lines I scan in, but yes I could just reuse the
    same variable.

    > > print "Found cell parameter a= ", $cell[0], " ";
    > > print "For str_id number ", $au_id, "\n";

    >
    > Where did $au_id come from?
    >

    $au_id the auto-increment value from mySQL, I get this earlier in my
    code.

    > > # Insert data
    > > $stmt1 = "UPDATE bgb_data SET latpar_a = ? WHERE str_id = ?";
    > > $sth = $dbh->prepare($stmt1);
    > > $sth->execute($cell[0], $au_id);
    > > }
    > >
    > > # get sequences
    > > if ($_2 =~ m/_Sequence/){
    > > # start loop to scan in sequences

    >
    > This loop is incomplete. Was what you really intended something like this?
    >

    I've got several if statements... I can do several ifs and then the
    last one is else if, right? or is if...else if...elseif....else if
    etc.?

    > if ($_2 =~ m/_cell_length_a\s+(-?([0-9]+(\.[0-9]*)?|\.[0-9]+))/){
    > # process
    > } elsif () {
    > # process
    > } ($_2 =~ m/_Sequence/)
    >
    > >
    > > So now I've found a tag and the next few lines are number sequences
    > > which
    > > I want in an array.
    > >
    > > I want to scan in those lines into until a blank line appears and then
    > > continue scanning for further data, in the while loop.
    > >

    > Does that mean that when you are processing a file line-by-line and
    > encounter a blank line, you wish to start a new array to hold the sequence
    > numbers?
    >

    Yes almost.
    > Can you provide some sample data we could test this with?
    >

    Please see above.
    > Jim Keenan


    Thanks for your help.

    Kind regards,
    Martin Foster.
    Martin Foster, Dec 7, 2003
    #3
  4. Martin Foster

    Jim Keenan Guest

    "Martin Foster" <> wrote in message
    news:...
    > Here's the data
    > .....skipping top part of file
    > loop_
    > _iza_sc_CoordinationSequence
    > 1 4 9 17 28 42 60 82 111 149 191 229 262 297 336 384
    > 1 4 10 19 30 44 63 89 121 155 188 221 258 302 355 415
    > 1 4 9 18 32 49 68 89 114 144 179 221 267 314 364 417
    >
    > loop_
    > _iza_sc_VertexSymbols
    > 4.6.4.6.4.6
    > 4.4.6.6.6.8_{3}
    > 4.4.4.6.8.12
    > ......skipping bottom part of file.
    >
    > I want to scan in the number sequences after
    > _iza_sc_CoordinationSequence
    > into an array and them into mySQL.
    >


    Here is a solution which (a) assumes that the target lines all follow a
    pattern of "unsigned integers separated by a single whitespace" and (b)
    stores the results in a hash of arrays of arrays. I leave to you the task
    of feeding this into MySQL.

    jimk

    ##### START CODE BLOCK #################
    #!/usr/bin/perl
    use strict;
    use warnings;
    use Data::Dumper;

    my (@chunks, %results);
    {
    local $/ = "\n\n"; # slurp data in by 'paragraphs'
    while (<DATA>) {
    next unless /_iza_sc_CoordinationSequence/; # ignore all chunks
    except ones that contain this string
    push (@chunks, $_);
    }
    }

    for (my $i = 0; $i <= $#chunks; $i++) {
    my (@lines, @sequences);
    @lines = split(/\n/, $chunks[$i]);
    foreach my $line (@lines) {
    if ($line =~ /^(\d+\s)+\d+\s*$/) {
    push(@sequences, [ split(/\s/, $line) ]);
    }
    }
    $results{$i} = [@sequences];
    }

    print Dumper(\%results);

    __DATA__
    loop_
    _iza_sc_CoordinationSequence
    1 4 9 17 28 42 60 82 111 149 191 229 262 297 336 384
    1 4 10 19 30 44 63 89 121 155 188 221 258 302 355 415
    1 4 9 18 32 49 68 89 114 144 179 221 267 314 364 417

    loop_
    _iza_sc_VertexSymbols
    4.6.4.6.4.6
    4.4.6.6.6.8_{3}
    4.4.4.6.8.12

    loop_
    _iza_sc_SomethingElse
    3 7 9 17 28 42 60 82 111 149 191 229 262 297 336 384
    3 7 10 19 30 44 63 89 121 155 188 221 258 302 355 415
    3 7 9 18 32 49 68 89 114 144 179 221 267 314 364 417

    loop_
    _iza_sc_CoordinationSequence
    5 8 9 17 28 42 60 82 111 149 191 229 262 297 336 384
    5 8 10 19 30 44 63 89 121 155 188 221 258 302 355 415
    5 8 9 18 32 49 68 89 114 144 179 221 267 314 364 417

    ##### END CODE BLOCK #################

    If we were playing Perl Golf and wanted to trade off readability for
    brevity, we could re-write the 'for' loop as:

    for (my $i = 0; $i <= $#chunks; $i++) {
    my (@sequences);
    foreach (split(/\n/, $chunks[$i])) {
    push(@sequences, [ split(/\s/) ]) if (/^(\d+\s)+\d+\s*$/);
    }
    $results{$i} = [@sequences];
    }
    Jim Keenan, Dec 7, 2003
    #4
  5. Thanks! I'll try this out.

    > Here is a solution which (a) assumes that the target lines all follow a
    > pattern of "unsigned integers separated by a single whitespace" and (b)
    > stores the results in a hash of arrays of arrays. I leave to you the task
    > of feeding this into MySQL.
    >
    > jimk
    >
    > ##### START CODE BLOCK #################
    > #!/usr/bin/perl
    > use strict;
    > use warnings;
    > use Data::Dumper;
    >
    > my (@chunks, %results);
    > {
    > local $/ = "\n\n"; # slurp data in by 'paragraphs'
    > while (<DATA>) {
    > next unless /_iza_sc_CoordinationSequence/; # ignore all chunks
    > except ones that contain this string
    > push (@chunks, $_);
    > }
    > }
    >
    > for (my $i = 0; $i <= $#chunks; $i++) {
    > my (@lines, @sequences);
    > @lines = split(/\n/, $chunks[$i]);
    > foreach my $line (@lines) {
    > if ($line =~ /^(\d+\s)+\d+\s*$/) {
    > push(@sequences, [ split(/\s/, $line) ]);
    > }
    > }
    > $results{$i} = [@sequences];
    > }
    >
    > print Dumper(\%results);
    >
    > __DATA__
    > loop_
    > _iza_sc_CoordinationSequence
    > 1 4 9 17 28 42 60 82 111 149 191 229 262 297 336 384
    > 1 4 10 19 30 44 63 89 121 155 188 221 258 302 355 415
    > 1 4 9 18 32 49 68 89 114 144 179 221 267 314 364 417
    >
    > loop_
    > _iza_sc_VertexSymbols
    > 4.6.4.6.4.6
    > 4.4.6.6.6.8_{3}
    > 4.4.4.6.8.12
    >
    > loop_
    > _iza_sc_SomethingElse
    > 3 7 9 17 28 42 60 82 111 149 191 229 262 297 336 384
    > 3 7 10 19 30 44 63 89 121 155 188 221 258 302 355 415
    > 3 7 9 18 32 49 68 89 114 144 179 221 267 314 364 417
    >
    > loop_
    > _iza_sc_CoordinationSequence
    > 5 8 9 17 28 42 60 82 111 149 191 229 262 297 336 384
    > 5 8 10 19 30 44 63 89 121 155 188 221 258 302 355 415
    > 5 8 9 18 32 49 68 89 114 144 179 221 267 314 364 417
    >
    > ##### END CODE BLOCK #################
    >
    > If we were playing Perl Golf and wanted to trade off readability for
    > brevity, we could re-write the 'for' loop as:
    >
    > for (my $i = 0; $i <= $#chunks; $i++) {
    > my (@sequences);
    > foreach (split(/\n/, $chunks[$i])) {
    > push(@sequences, [ split(/\s/) ]) if (/^(\d+\s)+\d+\s*$/);
    > }
    > $results{$i} = [@sequences];
    > }
    Martin Foster, Dec 8, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Joe Wright
    Replies:
    0
    Views:
    496
    Joe Wright
    Jul 27, 2003
  2. Justme
    Replies:
    9
    Views:
    600
    clayne
    Oct 1, 2006
  3. Mufasa
    Replies:
    7
    Views:
    404
    Alexey Smirnov
    Sep 4, 2007
  4. Replies:
    0
    Views:
    256
  5. Ana Dionísio

    Scan CSV file and saving it into an array

    Ana Dionísio, Apr 25, 2013, in forum: Python
    Replies:
    2
    Views:
    204
    Oscar Benjamin
    Apr 25, 2013
Loading...

Share This Page