Extract range of lines from a text file

A

Amer Neely

This is driving me nuts.

I'm walking through a mailbox file, and want to pull out specific lines
from each message. The body of each message is in a similar format,
having been generated by a script.

I'm doing OK except for one particular block of lines, the customer
address data. There is a blank line before and after this block. Example:

Transaction Time: 18:45:55

Amer Neely
POB 1481 Station Main
North Bay ON
P1B 8K7
CANADA

123-456-7890

I've managed to get the 5 lines into a string using this code:

while <IN>
{

# bunch of other comparisons deleted

if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
{
$CustData = $_;
$CustData =~ s/^Transaction Time:.+//; # lose the beginning pattern
$CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//; # lose the ending pattern
next if ($CustData =~ m/^$/); # skip the blank lines
$CustData =~ s/\n//g; # get rid of blank lines. don't think this working
print "\t$CustData\n";
}
}
close IN;
print "\nAll done.\n";

The problem seems to be that $CustData holds all 5 lines. I need to
break out each of the lines into a separate string variable so as to
populate a database field. This is what has me stumped. Sure would
appreciate some light on this.
 
X

Xicheng Jia

Amer said:
This is driving me nuts.

I'm walking through a mailbox file, and want to pull out specific lines
from each message. The body of each message is in a similar format,
having been generated by a script.

I'm doing OK except for one particular block of lines, the customer
address data. There is a blank line before and after this block. Example:

Transaction Time: 18:45:55

Amer Neely
POB 1481 Station Main
North Bay ON
P1B 8K7
CANADA

123-456-7890

I've managed to get the 5 lines into a string using this code:

while <IN>
{

# bunch of other comparisons deleted

if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)

by using /A/ ... /B/ expression, you are still in single-line mode, if
you want to get all these lines in $_, and then parse the data, try to
reset the IRS $/ to something like:

local $/ = "Transaction Time:";

then you can use block-mode which seperates your records by the given
string "Transaction Time:" in $/,

Xicheng
 
A

Amer Neely

Xicheng said:
by using /A/ ... /B/ expression, you are still in single-line mode, if
you want to get all these lines in $_, and then parse the data, try to
reset the IRS $/ to something like:

local $/ = "Transaction Time:";

then you can use block-mode which seperates your records by the given
string "Transaction Time:" in $/,

Xicheng

Thanks for the quick reply. Still a little foggy though.
If I set the record separator to "Transaction Time:", then I don't need
the 'if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)' loop?

Then set $CustData = $_ ?

But doesn't that leave me in the same position? All 5 lines are now in
$CustData.


--
Amer Neely
Home of Spam Catcher
W: www.softouch.on.ca
E: (e-mail address removed)
Perl | MySQL | CGI programming for all data entry forms.
"We make web sites work!"
 
X

Xicheng Jia

Amer said:
Thanks for the quick reply. Still a little foggy though.
If I set the record separator to "Transaction Time:", then I don't need
the 'if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)' loop?
yes, you dont need this "if" loop coz it invokes perl in line-mode by
default (in fact it depends on yout $/)..
Then set $CustData = $_ ?

But doesn't that leave me in the same position? All 5 lines are now in
$CustData.

not really, after you do so, you get something like:

$_ = "18:45:55

Amer Neely
POB 1481 Station Main
North Bay ON
P1B 8K7
CANADA

123-456-7890
"

then split it with "\n" like: my @arr = split "\n";
you get:
$arr[0] = "18:45:55";
$arr[1] = "";
$arr[2] = "Amer Neely";
$arr[3] = "POB 1481 Station Main";
$arr[4] = "North Bay ON"
........

so you use the following line to collect your date..:

my (undef, undef, $var1, $var2, $var3, $var4, $var5, undef, undef) =
split "\n";

or you can use regex to parse whatever data you need from $_. it really
depends on what information do you really need.

Another way: if you are sure there are 5 lines for each record you want
to keep, then you can read your data in paragraph-mode,like:

local $/ = "";

while ( <IN> ) {
next unless tr/\n// > 5; #use paragraph only have more than 5
lines(count also a blank line, so you have 6 lines)
my ($name, $pob, $add1, $add2, $cont) = split "\n";
# do sth on the avobe variables..
}

then you get:
-------------------------
$name = "Amer Neely"
$pob = "POB 1481 Station Main"
$add1 = "North Bay ON"
$add2 = "P1B 8K7"
$cont = "CANADA"
 
X

Xicheng Jia

Amer said:
This is driving me nuts.

I'm walking through a mailbox file, and want to pull out specific lines
from each message. The body of each message is in a similar format,
having been generated by a script.

I'm doing OK except for one particular block of lines, the customer
address data. There is a blank line before and after this block. Example:

Transaction Time: 18:45:55

Amer Neely
POB 1481 Station Main
North Bay ON
P1B 8K7
CANADA

123-456-7890

I've managed to get the 5 lines into a string using this code:

while <IN>
{

# bunch of other comparisons deleted

AN > if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)

this keeps your input as line-mode, you get one line each time to $_
from your input file.

AN > $CustData = $_;

for each iteration of your while loop, you get only one line in
$CustData..

AN > $CustData =~ s/^Transaction Time:.+//; # lose the beginning
pattern
AN > $CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//; # lose the ending
pattern
AN > next if ($CustData =~ m/^$/); # skip the blank lines
AN > $CustData =~ s/\n//g; # get rid of blank lines. don't think this
working

this does not get rid of the blank line, it removes the newline "\n"
character, when you are in default line-mode, it's the same as "chomp".

Xicheng
 
A

Amer Neely

Xicheng said:
Amer said:
Thanks for the quick reply. Still a little foggy though.
If I set the record separator to "Transaction Time:", then I don't need
the 'if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)' loop?
yes, you dont need this "if" loop coz it invokes perl in line-mode by
default (in fact it depends on yout $/)..
Then set $CustData = $_ ?

But doesn't that leave me in the same position? All 5 lines are now in
$CustData.

not really, after you do so, you get something like:

$_ = "18:45:55

Amer Neely
POB 1481 Station Main
North Bay ON
P1B 8K7
CANADA

123-456-7890
"

then split it with "\n" like: my @arr = split "\n";
you get:
$arr[0] = "18:45:55";
$arr[1] = "";
$arr[2] = "Amer Neely";
$arr[3] = "POB 1481 Station Main";
$arr[4] = "North Bay ON"
.......

so you use the following line to collect your date..:

my (undef, undef, $var1, $var2, $var3, $var4, $var5, undef, undef) =
split "\n";

or you can use regex to parse whatever data you need from $_. it really
depends on what information do you really need.

Another way: if you are sure there are 5 lines for each record you want
to keep, then you can read your data in paragraph-mode,like:

local $/ = "";

while ( <IN> ) {
next unless tr/\n// > 5; #use paragraph only have more than 5
lines(count also a blank line, so you have 6 lines)
my ($name, $pob, $add1, $add2, $cont) = split "\n";
# do sth on the avobe variables..
}

then you get:
-------------------------
$name = "Amer Neely"
$pob = "POB 1481 Station Main"
$add1 = "North Bay ON"
$add2 = "P1B 8K7"
$cont = "CANADA"

This is very close. It will work if the input file only consists of
blocks of 5 lines delimited by a blank line. However, I need to pull
these blocks out of the middle of the message body. There are lines
before and after. That's why I was using the 'if (/^Transaction Time:/
.... /^\d\d\d-\d\d\d-\d\d\d\d$/)' loop.


--
Amer Neely
Home of Spam Catcher
W: www.softouch.on.ca
E: (e-mail address removed)
Perl | MySQL | CGI programming for all data entry forms.
"We make web sites work!"
 
M

MSG

Amer said:
if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
{
$CustData = $_;
$CustData =~ s/^Transaction Time:.+//; # lose the beginning pattern
$CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//; # lose the ending pattern
next if ($CustData =~ m/^$/); # skip the blank lines
$CustData =~ s/\n//g; # get rid of blank lines. don't think this working
print "\t$CustData\n";
}
}

You don't have to process each line inside the loop. Instead, push each
line to an array and then process each array element after the loop.
It can be a lot cleaner and easier. Something like this:

my @records;
while (<IN>){
chomp;
if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/){
push @records, $_;
}
}

Now @records contains lines from "Transaction" to "123-456-7890",
each of which is an element of the array.
 
X

Xicheng Jia

Amer said:
Xicheng said:
Amer said:
Xicheng Jia wrote:
Amer Neely wrote:
This is driving me nuts.

I'm walking through a mailbox file, and want to pull out specific lines
from each message. The body of each message is in a similar format,
having been generated by a script.

I'm doing OK except for one particular block of lines, the customer
address data. There is a blank line before and after this block. Example:

Transaction Time: 18:45:55

Amer Neely
POB 1481 Station Main
North Bay ON
P1B 8K7
CANADA

123-456-7890

I've managed to get the 5 lines into a string using this code:

while <IN>
{

# bunch of other comparisons deleted

if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
by using /A/ ... /B/ expression, you are still in single-line mode, if
you want to get all these lines in $_, and then parse the data, try to
reset the IRS $/ to something like:

local $/ = "Transaction Time:";

then you can use block-mode which seperates your records by the given
string "Transaction Time:" in $/,

Xicheng

Thanks for the quick reply. Still a little foggy though.
If I set the record separator to "Transaction Time:", then I don't need
the 'if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)' loop?
yes, you dont need this "if" loop coz it invokes perl in line-mode by
default (in fact it depends on yout $/)..
Then set $CustData = $_ ?

But doesn't that leave me in the same position? All 5 lines are now in
$CustData.

not really, after you do so, you get something like:

$_ = "18:45:55

Amer Neely
POB 1481 Station Main
North Bay ON
P1B 8K7
CANADA

123-456-7890
"

then split it with "\n" like: my @arr = split "\n";
you get:
$arr[0] = "18:45:55";
$arr[1] = "";
$arr[2] = "Amer Neely";
$arr[3] = "POB 1481 Station Main";
$arr[4] = "North Bay ON"
.......

so you use the following line to collect your date..:

my (undef, undef, $var1, $var2, $var3, $var4, $var5, undef, undef) =
split "\n";

or you can use regex to parse whatever data you need from $_. it really
depends on what information do you really need.

Another way: if you are sure there are 5 lines for each record you want
to keep, then you can read your data in paragraph-mode,like:

local $/ = "";

while ( <IN> ) {
next unless tr/\n// > 5; #use paragraph only have more than 5
lines(count also a blank line, so you have 6 lines)
my ($name, $pob, $add1, $add2, $cont) = split "\n";
# do sth on the avobe variables..
}

then you get:
-------------------------
$name = "Amer Neely"
$pob = "POB 1481 Station Main"
$add1 = "North Bay ON"
$add2 = "P1B 8K7"
$cont = "CANADA"

This is very close. It will work if the input file only consists of
blocks of 5 lines delimited by a blank line. However, I need to pull
these blocks out of the middle of the message body. There are lines
before and after. That's why I was using the 'if (/^Transaction Time:/
... /^\d\d\d-\d\d\d-\d\d\d\d$/)' loop.

yeah, you can actually use it here, coz each of them takes a
single-separated-paragraph in your input stream(you've overwritten the
line-mode by reset $/), so:

local $/ = "";

while ( <DATA> ) {
if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/){
next unless tr/\n// == 6;
my ($name, $pob, $add1, $add2, $cont) = split "\n";
# do sth on the above variables..
}
}

will discard all lines which are not between these two patterns, and
then split only the paragraphs between...

Best,
Xicheng
 
X

Xicheng Jia

MSG said:
You don't have to process each line inside the loop. Instead, push each
line to an array and then process each array element after the loop.
It can be a lot cleaner and easier. Something like this:
= my @records;
= while (<IN>){
= chomp;
= if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/){
= push @records, $_;
= }
= }

you might get some troubles if you have more than one /Transaction/
<==> /^telephone$ / blocks in your input file. :)

Xicheng
 
A

Amer Neely

MSG said:
You don't have to process each line inside the loop. Instead, push each
line to an array and then process each array element after the loop.
It can be a lot cleaner and easier. Something like this:

my @records;
while (<IN>){
chomp;
if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/){
push @records, $_;
}
}

Now @records contains lines from "Transaction" to "123-456-7890",
each of which is an element of the array.

OK, I see what that does, but I'm not sure it helps me. The goal is to
pull out that address block, on a line-per-line basis, and insert each
line into a database field.

@records contains all the address blocks from the whole file. I'd like
to deal with each address block (line-by-line) as I go through the file
if I can.

Another problem is that some of the addresses have 6 lines, not 5.

--
Amer Neely
Home of Spam Catcher
W: www.softouch.on.ca
E: (e-mail address removed)
Perl | MySQL | CGI programming for all data entry forms.
"We make web sites work!"
 
A

Amer Neely

Xicheng said:
= my @records;
= while (<IN>){
= chomp;
= if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/){
= push @records, $_;
= }
= }

you might get some troubles if you have more than one /Transaction/
<==> /^telephone$ / blocks in your input file. :)

Yes, no kidding :)
In fact the file I'm working with has 79 messages. So I now have 79
address blocks in @records.


--
Amer Neely
Home of Spam Catcher
W: www.softouch.on.ca
E: (e-mail address removed)
Perl | MySQL | CGI programming for all data entry forms.
"We make web sites work!"
 
M

MSG

Amer said:
Yes, no kidding :)
In fact the file I'm working with has 79 messages. So I now have 79
address blocks in @records.
That is easy to deal with. Just change the order of the two lines:
Instead of
my @records;
while (<IN>){
change to:
while (<IN>){
my @records;
# and now process each record in your code.

Of course there should always be
use strict;
use warnings;
 
M

MSG

Amer said:
@records contains all the address blocks from the whole file. I'd like
to deal with each address block (line-by-line) as I go through the file
if I can.
while (<IN>){
my @records;
if ( /^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/){
push @records, $_;
}
# now you get each address block in @records on each loop iteration
# Do some processing here
}
Another problem is that some of the addresses have 6 lines, not 5.
That is why you don't want to process every line on every iteration. It
is
better to first group each address block into its own array.
 
A

Amer Neely

MSG said:
while (<IN>){
my @records;
if ( /^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/){
push @records, $_;
}
# now you get each address block in @records on each loop iteration
# Do some processing here
}
That is why you don't want to process every line on every iteration. It
is
better to first group each address block into its own array.

OK, I'm trying that, but it's still giving me grief.

while (<IN>)
{
my @CustData=();
if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
{
push @CustData, $_;
}
foreach my $line (@CustData)
{
$line =~ s/^Transaction Time:.+//;
$line =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//;
($CustName,$Address1,$Address2,$CityProv,$Code,$Country) =
split(/\n/,$line);
print "Name: $CustName\n";
print "Address: $Address1\n";
print "Address: $Address2\n";
print "City/Prov: $CityProv\n";
print "Code: $Code\n";
print "Country: $Country\n";
}

} # end while (<IN>)
close IN;

--
Amer Neely
Home of Spam Catcher
W: www.softouch.on.ca
E: (e-mail address removed)
Perl | MySQL | CGI programming for all data entry forms.
"We make web sites work!"
 
A

Amer Neely

Amer said:
This is driving me nuts.

I'm walking through a mailbox file, and want to pull out specific lines
from each message. The body of each message is in a similar format,
having been generated by a script.

I'm doing OK except for one particular block of lines, the customer
address data. There is a blank line before and after this block. Example:

Transaction Time: 18:45:55

Amer Neely
POB 1481 Station Main
North Bay ON
P1B 8K7
CANADA

123-456-7890

I've managed to get the 5 lines into a string using this code:

while <IN>
{

# bunch of other comparisons deleted

if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
{
$CustData = $_;
$CustData =~ s/^Transaction Time:.+//; # lose the beginning pattern
$CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//; # lose the ending pattern
next if ($CustData =~ m/^$/); # skip the blank lines
$CustData =~ s/\n//g; # get rid of blank lines. don't think this working
print "\t$CustData\n";
}
}
close IN;
print "\nAll done.\n";

The problem seems to be that $CustData holds all 5 lines. I need to
break out each of the lines into a separate string variable so as to
populate a database field. This is what has me stumped. Sure would
appreciate some light on this.

The closest I've gotten so far is with the following code.

if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
{
$CustData = $_;
$CustData =~ s/^Transaction Time:.+//;
$CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//;
next if ($CustData =~ m/^$/);
$CustData =~ s/\n//g;
#print "$CustData\n";
(@CustData) = split(/\n/,$CustData);

my $addcounter=0;
foreach (@CustData)
{
$addcounter++;
print "\t[$addcounter] $_\n";
}
}

Bear in mind this block is in the middle of a message, so there is more
text before and after this.

But this puts the whole $CustData string (all 5 or 6 lines) into
$CustData[0], so it's ignoring the split.

--
Amer Neely
Home of Spam Catcher
W: www.softouch.on.ca
E: (e-mail address removed)
Perl | MySQL | CGI programming for all data entry forms.
"We make web sites work!"
 
M

MSG

Amer said:
while (<IN>)
{
my @CustData=();
if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
{
push @CustData, $_;
}
foreach my $line (@CustData)
So far so good!
{
$line =~ s/^Transaction Time:.+//;
$line =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//;
($CustName,$Address1,$Address2,$CityProv,$Code,$Country) =
split(/\n/,$line);
Unfortunately you are still in the mind set of processing strings.
Please switch gear and treat your @CustData as what it is- ARRAY.
All the lines have already been separated and put into an array. There
is no need to split any more. What do data look like in this array?
$CustData[0] : "Transaction Time: ..." # always the first element
$CustData[1] : (blank)
$CustData[2[ : (Name)
$CustData[3] : (Address 1)
....
$CustData{$#CustData]: "123-456-7890" # always the last
One way to get to only the name and the address part:
for ( @CustData[2..$#CustData-2] ){
print $_, "\n";
}
 
A

Amer Neely

MSG said:
Amer said:
while (<IN>)
{
my @CustData=();
if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
{
push @CustData, $_;
}
foreach my $line (@CustData)
So far so good!
{
$line =~ s/^Transaction Time:.+//;
$line =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//;
($CustName,$Address1,$Address2,$CityProv,$Code,$Country) =
split(/\n/,$line);
Unfortunately you are still in the mind set of processing strings.
Please switch gear and treat your @CustData as what it is- ARRAY.
All the lines have already been separated and put into an array. There
is no need to split any more. What do data look like in this array?
$CustData[0] : "Transaction Time: ..." # always the first element
$CustData[1] : (blank)
$CustData[2[ : (Name)
$CustData[3] : (Address 1)
...
$CustData{$#CustData]: "123-456-7890" # always the last
One way to get to only the name and the address part:
for ( @CustData[2..$#CustData-2] ){
print $_, "\n";
}
} # end while (<IN>)
close IN;

My code:
open IN, "<$Infile";
while (<IN>)
{
my @CustData=();
if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
{
push @CustData, $_;
}

foreach my $line (@CustData)
{
my $addcounter=0;
$line =~ s/^Transaction Time:.+//;
$line =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//;
for ( $CustData[2..$#CustData-2] )
{
$addcounter++;
print "[$addcounter] $_";
}
}

}
close IN;
print "\nAll done.\n";

All I've changed is to add a counter for each element. Also had to
change your ( @CustData[2 to ( $CustData[2 otherwise I got no output at all.

The output:
[1]
[1] [1] [1] xxxxxxxxxxxx
[1] xxxxxxxxxxxxxxxxxx
[1] SAULT STE MARIE Ontario
[1] P6A 3P4
[1] CANADA
[1]
[1]
[1]
[1]
[1] xxxxxxxxxxxxxx
[1] xxxxxxxxxxxxxxxxxxx
[1] Yellowknife NT
[1] X1A 3N2
[1] CANADA
[1]
[1]
[1]
[1]
[1] xxxxxxxxxxxxn
[1] xxxxxxxxxxxx
[1] Tara ON
[1] N0H 2N0
[1] CANADA
[1]
[1]
[1]
[1]
[1] xxxxxxxxxxxxxxxxx
[1] xxxxxxxxxxxxxxxxxx
[1] Laval Qc
[1] H7E2B4
[1] CANADA
[1]
[1]
[1]
[1]
[1] xxxxxxxxxxx
[1] xxxxxxxxxxxxxxxxxxxx
[1] xxxxxx
[1] sault te. marie ON
[1] P6A 6E9
[1] CANADA
[1]
[1]

All done.


Now I just changed the inner loop to print all elements.

while (<IN>)
{
my @CustData=();
if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
{
push @CustData, $_;
}

foreach my $line (@CustData)
{
my $addcounter=0;
$line =~ s/^Transaction Time:.+//;
$line =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//;
#for ( $CustData[0..$#CustData] )
#for ( $CustData[2..$#CustData-2] )
foreach (@CustData)
{
$addcounter++;
print "[$addcounter] $_";
}
}

}
close IN;
print "\nAll done.\n";

Here's my output from a subset of the whole file:
[1]
[1]
[1] xxxxxxxxxxxx
[1] xxxxxxxxxxxxxxxxxxxx
[1] SAULT STE MARIE Ontario
[1] P6A 3P4
[1] CANADA
[1]
[1]
[1]
[1]
[1] xxxxxxxxxxxxx
[1] xxxxxxxxxxxxxxxxxxx
[1] Yellowknife NT
[1] X1A 3N2
[1] CANADA
[1]
[1]
[1]
[1]
[1] xxxxxxxxxxxxn
[1] xxxxxxxxxxxx
[1] Tara ON
[1] N0H 2N0
[1] CANADA
[1]
[1]
[1]
[1]
[1] xxxxxxxxxxxxxxxx
[1] xxxxxxxxxxxxxxxxxx
[1] Laval Qc
[1] H7E2B4
[1] CANADA
[1]
[1]
[1]
[1]
[1] xxxxxxxxxxx
[1] xxxxxxxxxxxxxxxxxxxx
[1] xxxxxxx
[1] sault te. marie ON
[1] P6A 6E9
[1] CANADA
[1]
[1]

All done.

So it still seems that the @CustData array only has 1 element in it.
This is what has been driving me nuts.

--
Amer Neely
Home of Spam Catcher
W: www.softouch.on.ca
E: (e-mail address removed)
Perl | MySQL | CGI programming for all data entry forms.
"We make web sites work!"
 
D

Dr.Ruud

Amer Neely schreef:
I'm walking through a mailbox file, and want to pull out specific
lines from each message. The body of each message is in a similar
format, having been generated by a script.

I'm doing OK except for one particular block of lines, the customer
address data. There is a blank line before and after this block.
Example:

Transaction Time: 18:45:55

Amer Neely
POB 1481 Station Main
North Bay ON
P1B 8K7
CANADA

123-456-7890


Or use a simplified state machine.

my $state = -1;
my $line = -1;

while (<>) {
chomp; # s/^\s+//; s/\s+$//;

if (-1 == $state) {
if (/^Transaction Time:/) {
++$state;
}
}
elsif (0 == $state) {
if (/^$/) {
++$state;
$line = 0;
}
else {
die "$state: <$_>?";
}
}
elsif (1 == $state) { # in address
if (^$) {
# skip
}
elsif (/^\d{3}-\d{3}-\d{4}$/) {
$state = -1;
$line = -1;
}
else {
++$line;
print "$line: $_\n";
}
}
else {
die "$state: <$_>?";
}
}

(untested)
 
X

Xicheng Jia

Amer said:
OK, I'm trying that, but it's still giving me grief.

while (<IN>)
{
my @CustData=();
if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
{
push @CustData, $_;
}
foreach my $line (@CustData)
{
$line =~ s/^Transaction Time:.+//;
$line =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//;
($CustName,$Address1,$Address2,$CityProv,$Code,$Country) =
split(/\n/,$line);
print "Name: $CustName\n";
print "Address: $Address1\n";
print "Address: $Address2\n";
print "City/Prov: $CityProv\n";
print "Code: $Code\n";
print "Country: $Country\n";
}

} # end while (<IN>)
close IN;

Here is a test code which uses paragraph-mode to extract info and try
to insert into your database (tested under WinXP)..
--------------------------
use strict;
use warnings;

local $/ = "";

while ( <DATA> ) {
if (/^Transaction Time:/ .. /^\d\d\d-\d\d\d-\d\d\d\d\s*$/){
my $lines = tr/\n//;
next if $lines < 6;
my ( $name, $addr1, $addr2, $city, $code, $cont );
if ( $lines == 6 ) {
( $name, $addr1, $city, $code, $cont ) = split "\n";
$addr2 = "";
} elsif ( $lines == 7 ) {
( $name, $addr1, $addr2, $city, $code, $cont ) = split
"\n";
}
# to INSERT INTO mytable from mydb.
#$sth->execute( $name, $addr1, $addr2, $city, $code, $cont );
print <<TEST;
name = $name
addr1 = $addr1
addr2 = $addr2
city = $city
code = $code
country = $cont

TEST
}
}

__DATA__
one block
one block
one block
one block
one block

Transaction Time: 18:45:55

Amer Neely
POB 1481 Station Main
AMS dept
North Bay ON
P1B 8K7
CANADA

123-456-7890

some other blocks
some other blocks
some other blocks
some other blocks
some other blocks
some other blocks

Transaction Time: 18:45:34

Bmer Neely
POB 123
South
ABC 879
USA

800-346-7890

another block
another block
another block
another block
another block
another block
another block

Transaction Time: 18:45:55

Amer Neely
POB 1481 Station Main
North Bay ON
P1B 8K7
CANADA

123-456-7890

more blocks
more blocks
more blocks
more blocks
more blocks
more blocks
more blocks
more blocks
more blocks
more blocks
---------------------------------------------------
======print result=======
name = Amer Neely
addr1 = POB 1481 Station Main
addr2 = AMS
city = North Bay ON
code = P1B 8K7
country = CANADA

name = Bmer Neely
addr1 = POB 123
addr2 =
city = South
code = ABC 879
country = USA

name = Amer Neely
addr1 = POB 1481 Station Main
addr2 =
city = North Bay ON
code = P1B 8K7
country = CANADA
========================
 
A

Amer Neely

Xicheng said:
Here is a test code which uses paragraph-mode to extract info and try
to insert into your database (tested under WinXP)..
--------------------------
use strict;
use warnings;

local $/ = "";

while ( <DATA> ) {
if (/^Transaction Time:/ .. /^\d\d\d-\d\d\d-\d\d\d\d\s*$/){
my $lines = tr/\n//;
next if $lines < 6;
my ( $name, $addr1, $addr2, $city, $code, $cont );
if ( $lines == 6 ) {
( $name, $addr1, $city, $code, $cont ) = split "\n";
$addr2 = "";
} elsif ( $lines == 7 ) {
( $name, $addr1, $addr2, $city, $code, $cont ) = split
"\n";
}
# to INSERT INTO mytable from mydb.
#$sth->execute( $name, $addr1, $addr2, $city, $code, $cont );
print <<TEST;
name = $name
addr1 = $addr1
addr2 = $addr2
city = $city
code = $code
country = $cont

TEST
}
}

__DATA__
one block
one block
one block
one block
one block

Transaction Time: 18:45:55

Amer Neely
POB 1481 Station Main
AMS dept
North Bay ON
P1B 8K7
CANADA

123-456-7890

some other blocks
some other blocks
some other blocks
some other blocks
some other blocks
some other blocks

Transaction Time: 18:45:34

Bmer Neely
POB 123
South
ABC 879
USA

800-346-7890

another block
another block
another block
another block
another block
another block
another block

Transaction Time: 18:45:55

Amer Neely
POB 1481 Station Main
North Bay ON
P1B 8K7
CANADA

123-456-7890

more blocks
more blocks
more blocks
more blocks
more blocks
more blocks
more blocks
more blocks
more blocks
more blocks
---------------------------------------------------
======print result=======
name = Amer Neely
addr1 = POB 1481 Station Main
addr2 = AMS
city = North Bay ON
code = P1B 8K7
country = CANADA

name = Bmer Neely
addr1 = POB 123
addr2 =
city = South
code = ABC 879
country = USA

name = Amer Neely
addr1 = POB 1481 Station Main
addr2 =
city = North Bay ON
code = P1B 8K7
country = CANADA
========================


EXCELLENT!

I modified it to get input from my file, and it still works :)

Thank you, thank you, thank you. Now I can move on.

--
Amer Neely
Home of Spam Catcher
W: www.softouch.on.ca
E: (e-mail address removed)
Perl | MySQL | CGI programming for all data entry forms.
"We make web sites work!"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,834
Latest member
BuyCannaLabsCBD

Latest Threads

Top