read file with while and then scan lines into array

M

Martin Foster

Hello

I'm scanning text files into a database.

My perl script looks like this:

# start loop of file to scan for data
while (defined ($_2 = <INFILE>)){
# Find cell data
if ($_2 =~ m/_cell_length_a\s+(-?([0-9]+(\.[0-9]*)?|\.[0-9]+))/){
$cell[0] = $1;
print "Found cell parameter a= ", $cell[0], " ";
print "For str_id number ", $au_id, "\n";
# Insert data
$stmt1 = "UPDATE bgb_data SET latpar_a = ? WHERE str_id = ?";
$sth = $dbh->prepare($stmt1);
$sth->execute($cell[0], $au_id);
}

# get sequences
if ($_2 =~ m/_Sequence/){
# start loop to scan in sequences

So now I've found a tag and the next few lines are number sequences
which
I want in an array.

I want to scan in those lines into until a blank line appears and then
continue scanning for further data, in the while loop.

How can I do this?


Many thanks for any help!

Cheers,
Martin
 
J

Jim Keenan

Martin Foster said:
I'm scanning text files into a database.

My perl script looks like this:

You've written your post in such a confusing manner that it is difficult to
figure out what your problem is.
# start loop of file to scan for data
while (defined ($_2 = <INFILE>)){
# Find cell data
if ($_2 =~ m/_cell_length_a\s+(-?([0-9]+(\.[0-9]*)?|\.[0-9]+))/){
$cell[0] = $1;

In the code presented, you don't assign to any element of @cell other than
$cell[0]. So why use an array at all?
print "Found cell parameter a= ", $cell[0], " ";
print "For str_id number ", $au_id, "\n";

Where did $au_id come from?
# Insert data
$stmt1 = "UPDATE bgb_data SET latpar_a = ? WHERE str_id = ?";
$sth = $dbh->prepare($stmt1);
$sth->execute($cell[0], $au_id);
}

# get sequences
if ($_2 =~ m/_Sequence/){
# start loop to scan in sequences

This loop is incomplete. Was what you really intended something like this?

if ($_2 =~ m/_cell_length_a\s+(-?([0-9]+(\.[0-9]*)?|\.[0-9]+))/){
# process
} elsif () {
# process
} ($_2 =~ m/_Sequence/)
So now I've found a tag and the next few lines are number sequences
which
I want in an array.

I want to scan in those lines into until a blank line appears and then
continue scanning for further data, in the while loop.
Does that mean that when you are processing a file line-by-line and
encounter a blank line, you wish to start a new array to hold the sequence
numbers?

Can you provide some sample data we could test this with?

Jim Keenan
 
M

Martin Foster

Here's the data
......skipping top part of file
loop_
_iza_sc_CoordinationSequence
1 4 9 17 28 42 60 82 111 149 191 229 262 297 336 384
1 4 10 19 30 44 63 89 121 155 188 221 258 302 355 415
1 4 9 18 32 49 68 89 114 144 179 221 267 314 364 417

loop_
_iza_sc_VertexSymbols
4.6.4.6.4.6
4.4.6.6.6.8_{3}
4.4.4.6.8.12
.......skipping bottom part of file.

I want to scan in the number sequences after
_iza_sc_CoordinationSequence
into an array and them into mySQL.



Jim Keenan said:
You've written your post in such a confusing manner that it is difficult to
figure out what your problem is.

I was being a little too brief.
# start loop of file to scan for data
while (defined ($_2 = <INFILE>)){
# Find cell data
if ($_2 =~ m/_cell_length_a\s+(-?([0-9]+(\.[0-9]*)?|\.[0-9]+))/){
$cell[0] = $1;

In the code presented, you don't assign to any element of @cell other than
$cell[0]. So why use an array at all?
I do have other data lines I scan in, but yes I could just reuse the
same variable.
print "Found cell parameter a= ", $cell[0], " ";
print "For str_id number ", $au_id, "\n";

Where did $au_id come from?
$au_id the auto-increment value from mySQL, I get this earlier in my
code.
# Insert data
$stmt1 = "UPDATE bgb_data SET latpar_a = ? WHERE str_id = ?";
$sth = $dbh->prepare($stmt1);
$sth->execute($cell[0], $au_id);
}

# get sequences
if ($_2 =~ m/_Sequence/){
# start loop to scan in sequences

This loop is incomplete. Was what you really intended something like this?
I've got several if statements... I can do several ifs and then the
last one is else if, right? or is if...else if...elseif....else if
etc.?
if ($_2 =~ m/_cell_length_a\s+(-?([0-9]+(\.[0-9]*)?|\.[0-9]+))/){
# process
} elsif () {
# process
} ($_2 =~ m/_Sequence/)
So now I've found a tag and the next few lines are number sequences
which
I want in an array.

I want to scan in those lines into until a blank line appears and then
continue scanning for further data, in the while loop.
Does that mean that when you are processing a file line-by-line and
encounter a blank line, you wish to start a new array to hold the sequence
numbers?
Yes almost.
Can you provide some sample data we could test this with?
Please see above.
Jim Keenan

Thanks for your help.

Kind regards,
Martin Foster.
 
J

Jim Keenan

Martin Foster said:
Here's the data
.....skipping top part of file
loop_
_iza_sc_CoordinationSequence
1 4 9 17 28 42 60 82 111 149 191 229 262 297 336 384
1 4 10 19 30 44 63 89 121 155 188 221 258 302 355 415
1 4 9 18 32 49 68 89 114 144 179 221 267 314 364 417

loop_
_iza_sc_VertexSymbols
4.6.4.6.4.6
4.4.6.6.6.8_{3}
4.4.4.6.8.12
......skipping bottom part of file.

I want to scan in the number sequences after
_iza_sc_CoordinationSequence
into an array and them into mySQL.

Here is a solution which (a) assumes that the target lines all follow a
pattern of "unsigned integers separated by a single whitespace" and (b)
stores the results in a hash of arrays of arrays. I leave to you the task
of feeding this into MySQL.

jimk

##### START CODE BLOCK #################
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my (@chunks, %results);
{
local $/ = "\n\n"; # slurp data in by 'paragraphs'
while (<DATA>) {
next unless /_iza_sc_CoordinationSequence/; # ignore all chunks
except ones that contain this string
push (@chunks, $_);
}
}

for (my $i = 0; $i <= $#chunks; $i++) {
my (@lines, @sequences);
@lines = split(/\n/, $chunks[$i]);
foreach my $line (@lines) {
if ($line =~ /^(\d+\s)+\d+\s*$/) {
push(@sequences, [ split(/\s/, $line) ]);
}
}
$results{$i} = [@sequences];
}

print Dumper(\%results);

__DATA__
loop_
_iza_sc_CoordinationSequence
1 4 9 17 28 42 60 82 111 149 191 229 262 297 336 384
1 4 10 19 30 44 63 89 121 155 188 221 258 302 355 415
1 4 9 18 32 49 68 89 114 144 179 221 267 314 364 417

loop_
_iza_sc_VertexSymbols
4.6.4.6.4.6
4.4.6.6.6.8_{3}
4.4.4.6.8.12

loop_
_iza_sc_SomethingElse
3 7 9 17 28 42 60 82 111 149 191 229 262 297 336 384
3 7 10 19 30 44 63 89 121 155 188 221 258 302 355 415
3 7 9 18 32 49 68 89 114 144 179 221 267 314 364 417

loop_
_iza_sc_CoordinationSequence
5 8 9 17 28 42 60 82 111 149 191 229 262 297 336 384
5 8 10 19 30 44 63 89 121 155 188 221 258 302 355 415
5 8 9 18 32 49 68 89 114 144 179 221 267 314 364 417

##### END CODE BLOCK #################

If we were playing Perl Golf and wanted to trade off readability for
brevity, we could re-write the 'for' loop as:

for (my $i = 0; $i <= $#chunks; $i++) {
my (@sequences);
foreach (split(/\n/, $chunks[$i])) {
push(@sequences, [ split(/\s/) ]) if (/^(\d+\s)+\d+\s*$/);
}
$results{$i} = [@sequences];
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top