Extract range of lines from a text file

Discussion in 'Perl Misc' started by Amer Neely, Apr 9, 2006.

  1. Amer Neely

    Amer Neely Guest

    This is driving me nuts.

    I'm walking through a mailbox file, and want to pull out specific lines
    from each message. The body of each message is in a similar format,
    having been generated by a script.

    I'm doing OK except for one particular block of lines, the customer
    address data. There is a blank line before and after this block. Example:

    Transaction Time: 18:45:55

    Amer Neely
    POB 1481 Station Main
    North Bay ON
    P1B 8K7
    CANADA

    123-456-7890

    I've managed to get the 5 lines into a string using this code:

    while <IN>
    {

    # bunch of other comparisons deleted

    if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
    {
    $CustData = $_;
    $CustData =~ s/^Transaction Time:.+//; # lose the beginning pattern
    $CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//; # lose the ending pattern
    next if ($CustData =~ m/^$/); # skip the blank lines
    $CustData =~ s/\n//g; # get rid of blank lines. don't think this working
    print "\t$CustData\n";
    }
    }
    close IN;
    print "\nAll done.\n";

    The problem seems to be that $CustData holds all 5 lines. I need to
    break out each of the lines into a separate string variable so as to
    populate a database field. This is what has me stumped. Sure would
    appreciate some light on this.

    --
    Amer Neely
     
    Amer Neely, Apr 9, 2006
    #1
    1. Advertising

  2. Amer Neely

    Xicheng Jia Guest

    Amer Neely wrote:
    > This is driving me nuts.
    >
    > I'm walking through a mailbox file, and want to pull out specific lines
    > from each message. The body of each message is in a similar format,
    > having been generated by a script.
    >
    > I'm doing OK except for one particular block of lines, the customer
    > address data. There is a blank line before and after this block. Example:
    >
    > Transaction Time: 18:45:55
    >
    > Amer Neely
    > POB 1481 Station Main
    > North Bay ON
    > P1B 8K7
    > CANADA
    >
    > 123-456-7890
    >
    > I've managed to get the 5 lines into a string using this code:
    >
    > while <IN>
    > {
    >
    > # bunch of other comparisons deleted
    >
    > if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)


    by using /A/ ... /B/ expression, you are still in single-line mode, if
    you want to get all these lines in $_, and then parse the data, try to
    reset the IRS $/ to something like:

    local $/ = "Transaction Time:";

    then you can use block-mode which seperates your records by the given
    string "Transaction Time:" in $/,

    Xicheng

    > {
    > $CustData = $_;
    > $CustData =~ s/^Transaction Time:.+//; # lose the beginning pattern
    > $CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//; # lose the ending pattern
    > next if ($CustData =~ m/^$/); # skip the blank lines
    > $CustData =~ s/\n//g; # get rid of blank lines. don't think this working
    > print "\t$CustData\n";
    > }
    > }
    > close IN;
    > print "\nAll done.\n";
    >
    > The problem seems to be that $CustData holds all 5 lines. I need to
    > break out each of the lines into a separate string variable so as to
    > populate a database field. This is what has me stumped. Sure would
    > appreciate some light on this.
    >
    > --
    > Amer Neely
     
    Xicheng Jia, Apr 9, 2006
    #2
    1. Advertising

  3. Amer Neely

    Amer Neely Guest

    Xicheng Jia wrote:
    > Amer Neely wrote:
    >> This is driving me nuts.
    >>
    >> I'm walking through a mailbox file, and want to pull out specific lines
    >> from each message. The body of each message is in a similar format,
    >> having been generated by a script.
    >>
    >> I'm doing OK except for one particular block of lines, the customer
    >> address data. There is a blank line before and after this block. Example:
    >>
    >> Transaction Time: 18:45:55
    >>
    >> Amer Neely
    >> POB 1481 Station Main
    >> North Bay ON
    >> P1B 8K7
    >> CANADA
    >>
    >> 123-456-7890
    >>
    >> I've managed to get the 5 lines into a string using this code:
    >>
    >> while <IN>
    >> {
    >>
    >> # bunch of other comparisons deleted
    >>
    >> if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)

    >
    > by using /A/ ... /B/ expression, you are still in single-line mode, if
    > you want to get all these lines in $_, and then parse the data, try to
    > reset the IRS $/ to something like:
    >
    > local $/ = "Transaction Time:";
    >
    > then you can use block-mode which seperates your records by the given
    > string "Transaction Time:" in $/,
    >
    > Xicheng
    >


    Thanks for the quick reply. Still a little foggy though.
    If I set the record separator to "Transaction Time:", then I don't need
    the 'if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)' loop?

    Then set $CustData = $_ ?

    But doesn't that leave me in the same position? All 5 lines are now in
    $CustData.

    >> {
    >> $CustData = $_;
    >> $CustData =~ s/^Transaction Time:.+//; # lose the beginning pattern
    >> $CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//; # lose the ending pattern
    >> next if ($CustData =~ m/^$/); # skip the blank lines
    >> $CustData =~ s/\n//g; # get rid of blank lines. don't think this working
    >> print "\t$CustData\n";
    >> }
    >> }
    >> close IN;
    >> print "\nAll done.\n";
    >>
    >> The problem seems to be that $CustData holds all 5 lines. I need to
    >> break out each of the lines into a separate string variable so as to
    >> populate a database field. This is what has me stumped. Sure would
    >> appreciate some light on this.
    >>
    >> --
    >> Amer Neely

    >



    --
    Amer Neely
    Home of Spam Catcher
    W: www.softouch.on.ca
    E:
    Perl | MySQL | CGI programming for all data entry forms.
    "We make web sites work!"
     
    Amer Neely, Apr 9, 2006
    #3
  4. Amer Neely

    Xicheng Jia Guest

    Amer Neely wrote:
    > Xicheng Jia wrote:
    > > Amer Neely wrote:
    > >> This is driving me nuts.
    > >>
    > >> I'm walking through a mailbox file, and want to pull out specific lines
    > >> from each message. The body of each message is in a similar format,
    > >> having been generated by a script.
    > >>
    > >> I'm doing OK except for one particular block of lines, the customer
    > >> address data. There is a blank line before and after this block. Example:
    > >>
    > >> Transaction Time: 18:45:55
    > >>
    > >> Amer Neely
    > >> POB 1481 Station Main
    > >> North Bay ON
    > >> P1B 8K7
    > >> CANADA
    > >>
    > >> 123-456-7890
    > >>
    > >> I've managed to get the 5 lines into a string using this code:
    > >>
    > >> while <IN>
    > >> {
    > >>
    > >> # bunch of other comparisons deleted
    > >>
    > >> if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)

    > >
    > > by using /A/ ... /B/ expression, you are still in single-line mode, if
    > > you want to get all these lines in $_, and then parse the data, try to
    > > reset the IRS $/ to something like:
    > >
    > > local $/ = "Transaction Time:";
    > >
    > > then you can use block-mode which seperates your records by the given
    > > string "Transaction Time:" in $/,
    > >
    > > Xicheng
    > >

    >
    > Thanks for the quick reply. Still a little foggy though.
    > If I set the record separator to "Transaction Time:", then I don't need
    > the 'if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)' loop?

    yes, you dont need this "if" loop coz it invokes perl in line-mode by
    default (in fact it depends on yout $/)..

    >
    > Then set $CustData = $_ ?
    >
    > But doesn't that leave me in the same position? All 5 lines are now in
    > $CustData.


    not really, after you do so, you get something like:

    $_ = "18:45:55

    Amer Neely
    POB 1481 Station Main
    North Bay ON
    P1B 8K7
    CANADA

    123-456-7890
    "

    then split it with "\n" like: my @arr = split "\n";
    you get:
    $arr[0] = "18:45:55";
    $arr[1] = "";
    $arr[2] = "Amer Neely";
    $arr[3] = "POB 1481 Station Main";
    $arr[4] = "North Bay ON"
    ........

    so you use the following line to collect your date..:

    my (undef, undef, $var1, $var2, $var3, $var4, $var5, undef, undef) =
    split "\n";

    or you can use regex to parse whatever data you need from $_. it really
    depends on what information do you really need.

    Another way: if you are sure there are 5 lines for each record you want
    to keep, then you can read your data in paragraph-mode,like:

    local $/ = "";

    while ( <IN> ) {
    next unless tr/\n// > 5; #use paragraph only have more than 5
    lines(count also a blank line, so you have 6 lines)
    my ($name, $pob, $add1, $add2, $cont) = split "\n";
    # do sth on the avobe variables..
    }

    then you get:
    -------------------------
    $name = "Amer Neely"
    $pob = "POB 1481 Station Main"
    $add1 = "North Bay ON"
    $add2 = "P1B 8K7"
    $cont = "CANADA"
    ------------------------

    Xicheng

    > >> {
    > >> $CustData = $_;
    > >> $CustData =~ s/^Transaction Time:.+//; # lose the beginning pattern
    > >> $CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//; # lose the ending pattern
    > >> next if ($CustData =~ m/^$/); # skip the blank lines
    > >> $CustData =~ s/\n//g; # get rid of blank lines. don't think this working
    > >> print "\t$CustData\n";
    > >> }
    > >> }
    > >> close IN;
    > >> print "\nAll done.\n";
    > >>
    > >> The problem seems to be that $CustData holds all 5 lines. I need to
    > >> break out each of the lines into a separate string variable so as to
    > >> populate a database field. This is what has me stumped. Sure would
    > >> appreciate some light on this.
    > >>
    > >> --
    > >> Amer Neely

    > >

    >
    >
    > --
    > Amer Neely
    > Home of Spam Catcher
    > W: www.softouch.on.ca
    > E:
    > Perl | MySQL | CGI programming for all data entry forms.
    > "We make web sites work!"
     
    Xicheng Jia, Apr 9, 2006
    #4
  5. Amer Neely

    Xicheng Jia Guest

    Amer Neely wrote:
    > This is driving me nuts.
    >
    > I'm walking through a mailbox file, and want to pull out specific lines
    > from each message. The body of each message is in a similar format,
    > having been generated by a script.
    >
    > I'm doing OK except for one particular block of lines, the customer
    > address data. There is a blank line before and after this block. Example:
    >
    > Transaction Time: 18:45:55
    >
    > Amer Neely
    > POB 1481 Station Main
    > North Bay ON
    > P1B 8K7
    > CANADA
    >
    > 123-456-7890
    >
    > I've managed to get the 5 lines into a string using this code:
    >
    > while <IN>
    > {
    >
    > # bunch of other comparisons deleted


    AN > if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)

    this keeps your input as line-mode, you get one line each time to $_
    from your input file.

    AN > $CustData = $_;

    for each iteration of your while loop, you get only one line in
    $CustData..

    AN > $CustData =~ s/^Transaction Time:.+//; # lose the beginning
    pattern
    AN > $CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//; # lose the ending
    pattern
    AN > next if ($CustData =~ m/^$/); # skip the blank lines
    AN > $CustData =~ s/\n//g; # get rid of blank lines. don't think this
    working

    this does not get rid of the blank line, it removes the newline "\n"
    character, when you are in default line-mode, it's the same as "chomp".

    Xicheng

    > print "\t$CustData\n";
    > }
    > }
    > close IN;
    > print "\nAll done.\n";
    >
    > The problem seems to be that $CustData holds all 5 lines. I need to
    > break out each of the lines into a separate string variable so as to
    > populate a database field. This is what has me stumped. Sure would
    > appreciate some light on this.
    >
    > --
    > Amer Neely
     
    Xicheng Jia, Apr 9, 2006
    #5
  6. Amer Neely

    Amer Neely Guest

    Xicheng Jia wrote:
    > Amer Neely wrote:
    >> Xicheng Jia wrote:
    >>> Amer Neely wrote:
    >>>> This is driving me nuts.
    >>>>
    >>>> I'm walking through a mailbox file, and want to pull out specific lines
    >>>> from each message. The body of each message is in a similar format,
    >>>> having been generated by a script.
    >>>>
    >>>> I'm doing OK except for one particular block of lines, the customer
    >>>> address data. There is a blank line before and after this block. Example:
    >>>>
    >>>> Transaction Time: 18:45:55
    >>>>
    >>>> Amer Neely
    >>>> POB 1481 Station Main
    >>>> North Bay ON
    >>>> P1B 8K7
    >>>> CANADA
    >>>>
    >>>> 123-456-7890
    >>>>
    >>>> I've managed to get the 5 lines into a string using this code:
    >>>>
    >>>> while <IN>
    >>>> {
    >>>>
    >>>> # bunch of other comparisons deleted
    >>>>
    >>>> if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
    >>> by using /A/ ... /B/ expression, you are still in single-line mode, if
    >>> you want to get all these lines in $_, and then parse the data, try to
    >>> reset the IRS $/ to something like:
    >>>
    >>> local $/ = "Transaction Time:";
    >>>
    >>> then you can use block-mode which seperates your records by the given
    >>> string "Transaction Time:" in $/,
    >>>
    >>> Xicheng
    >>>

    >> Thanks for the quick reply. Still a little foggy though.
    >> If I set the record separator to "Transaction Time:", then I don't need
    >> the 'if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)' loop?

    > yes, you dont need this "if" loop coz it invokes perl in line-mode by
    > default (in fact it depends on yout $/)..
    >
    >> Then set $CustData = $_ ?
    >>
    >> But doesn't that leave me in the same position? All 5 lines are now in
    >> $CustData.

    >
    > not really, after you do so, you get something like:
    >
    > $_ = "18:45:55
    >
    > Amer Neely
    > POB 1481 Station Main
    > North Bay ON
    > P1B 8K7
    > CANADA
    >
    > 123-456-7890
    > "
    >
    > then split it with "\n" like: my @arr = split "\n";
    > you get:
    > $arr[0] = "18:45:55";
    > $arr[1] = "";
    > $arr[2] = "Amer Neely";
    > $arr[3] = "POB 1481 Station Main";
    > $arr[4] = "North Bay ON"
    > .......
    >
    > so you use the following line to collect your date..:
    >
    > my (undef, undef, $var1, $var2, $var3, $var4, $var5, undef, undef) =
    > split "\n";
    >
    > or you can use regex to parse whatever data you need from $_. it really
    > depends on what information do you really need.
    >
    > Another way: if you are sure there are 5 lines for each record you want
    > to keep, then you can read your data in paragraph-mode,like:
    >
    > local $/ = "";
    >
    > while ( <IN> ) {
    > next unless tr/\n// > 5; #use paragraph only have more than 5
    > lines(count also a blank line, so you have 6 lines)
    > my ($name, $pob, $add1, $add2, $cont) = split "\n";
    > # do sth on the avobe variables..
    > }
    >
    > then you get:
    > -------------------------
    > $name = "Amer Neely"
    > $pob = "POB 1481 Station Main"
    > $add1 = "North Bay ON"
    > $add2 = "P1B 8K7"
    > $cont = "CANADA"
    > ------------------------
    >
    > Xicheng
    >


    This is very close. It will work if the input file only consists of
    blocks of 5 lines delimited by a blank line. However, I need to pull
    these blocks out of the middle of the message body. There are lines
    before and after. That's why I was using the 'if (/^Transaction Time:/
    .... /^\d\d\d-\d\d\d-\d\d\d\d$/)' loop.

    >>>> {
    >>>> $CustData = $_;
    >>>> $CustData =~ s/^Transaction Time:.+//; # lose the beginning pattern
    >>>> $CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//; # lose the ending pattern
    >>>> next if ($CustData =~ m/^$/); # skip the blank lines
    >>>> $CustData =~ s/\n//g; # get rid of blank lines. don't think this working
    >>>> print "\t$CustData\n";
    >>>> }
    >>>> }
    >>>> close IN;
    >>>> print "\nAll done.\n";
    >>>>
    >>>> The problem seems to be that $CustData holds all 5 lines. I need to
    >>>> break out each of the lines into a separate string variable so as to
    >>>> populate a database field. This is what has me stumped. Sure would
    >>>> appreciate some light on this.
    >>>>
    >>>> --
    >>>> Amer Neely

    >>
    >> --
    >> Amer Neely
    >> Home of Spam Catcher
    >> W: www.softouch.on.ca
    >> E:
    >> Perl | MySQL | CGI programming for all data entry forms.
    >> "We make web sites work!"

    >



    --
    Amer Neely
    Home of Spam Catcher
    W: www.softouch.on.ca
    E:
    Perl | MySQL | CGI programming for all data entry forms.
    "We make web sites work!"
     
    Amer Neely, Apr 9, 2006
    #6
  7. Amer Neely

    MSG Guest

    Amer Neely wrote:
    > if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
    > {
    > $CustData = $_;
    > $CustData =~ s/^Transaction Time:.+//; # lose the beginning pattern
    > $CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//; # lose the ending pattern
    > next if ($CustData =~ m/^$/); # skip the blank lines
    > $CustData =~ s/\n//g; # get rid of blank lines. don't think this working
    > print "\t$CustData\n";
    > }
    > }


    You don't have to process each line inside the loop. Instead, push each
    line to an array and then process each array element after the loop.
    It can be a lot cleaner and easier. Something like this:

    my @records;
    while (<IN>){
    chomp;
    if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/){
    push @records, $_;
    }
    }

    Now @records contains lines from "Transaction" to "123-456-7890",
    each of which is an element of the array.
     
    MSG, Apr 9, 2006
    #7
  8. Amer Neely

    Xicheng Jia Guest

    Amer Neely wrote:
    > Xicheng Jia wrote:
    > > Amer Neely wrote:
    > >> Xicheng Jia wrote:
    > >>> Amer Neely wrote:
    > >>>> This is driving me nuts.
    > >>>>
    > >>>> I'm walking through a mailbox file, and want to pull out specific lines
    > >>>> from each message. The body of each message is in a similar format,
    > >>>> having been generated by a script.
    > >>>>
    > >>>> I'm doing OK except for one particular block of lines, the customer
    > >>>> address data. There is a blank line before and after this block. Example:
    > >>>>
    > >>>> Transaction Time: 18:45:55
    > >>>>
    > >>>> Amer Neely
    > >>>> POB 1481 Station Main
    > >>>> North Bay ON
    > >>>> P1B 8K7
    > >>>> CANADA
    > >>>>
    > >>>> 123-456-7890
    > >>>>
    > >>>> I've managed to get the 5 lines into a string using this code:
    > >>>>
    > >>>> while <IN>
    > >>>> {
    > >>>>
    > >>>> # bunch of other comparisons deleted
    > >>>>
    > >>>> if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
    > >>> by using /A/ ... /B/ expression, you are still in single-line mode, if
    > >>> you want to get all these lines in $_, and then parse the data, try to
    > >>> reset the IRS $/ to something like:
    > >>>
    > >>> local $/ = "Transaction Time:";
    > >>>
    > >>> then you can use block-mode which seperates your records by the given
    > >>> string "Transaction Time:" in $/,
    > >>>
    > >>> Xicheng
    > >>>
    > >> Thanks for the quick reply. Still a little foggy though.
    > >> If I set the record separator to "Transaction Time:", then I don't need
    > >> the 'if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)' loop?

    > > yes, you dont need this "if" loop coz it invokes perl in line-mode by
    > > default (in fact it depends on yout $/)..
    > >
    > >> Then set $CustData = $_ ?
    > >>
    > >> But doesn't that leave me in the same position? All 5 lines are now in
    > >> $CustData.

    > >
    > > not really, after you do so, you get something like:
    > >
    > > $_ = "18:45:55
    > >
    > > Amer Neely
    > > POB 1481 Station Main
    > > North Bay ON
    > > P1B 8K7
    > > CANADA
    > >
    > > 123-456-7890
    > > "
    > >
    > > then split it with "\n" like: my @arr = split "\n";
    > > you get:
    > > $arr[0] = "18:45:55";
    > > $arr[1] = "";
    > > $arr[2] = "Amer Neely";
    > > $arr[3] = "POB 1481 Station Main";
    > > $arr[4] = "North Bay ON"
    > > .......
    > >
    > > so you use the following line to collect your date..:
    > >
    > > my (undef, undef, $var1, $var2, $var3, $var4, $var5, undef, undef) =
    > > split "\n";
    > >
    > > or you can use regex to parse whatever data you need from $_. it really
    > > depends on what information do you really need.
    > >
    > > Another way: if you are sure there are 5 lines for each record you want
    > > to keep, then you can read your data in paragraph-mode,like:
    > >
    > > local $/ = "";
    > >
    > > while ( <IN> ) {
    > > next unless tr/\n// > 5; #use paragraph only have more than 5
    > > lines(count also a blank line, so you have 6 lines)
    > > my ($name, $pob, $add1, $add2, $cont) = split "\n";
    > > # do sth on the avobe variables..
    > > }
    > >
    > > then you get:
    > > -------------------------
    > > $name = "Amer Neely"
    > > $pob = "POB 1481 Station Main"
    > > $add1 = "North Bay ON"
    > > $add2 = "P1B 8K7"
    > > $cont = "CANADA"
    > > ------------------------
    > >
    > > Xicheng
    > >

    >
    > This is very close. It will work if the input file only consists of
    > blocks of 5 lines delimited by a blank line. However, I need to pull
    > these blocks out of the middle of the message body. There are lines
    > before and after. That's why I was using the 'if (/^Transaction Time:/
    > ... /^\d\d\d-\d\d\d-\d\d\d\d$/)' loop.


    yeah, you can actually use it here, coz each of them takes a
    single-separated-paragraph in your input stream(you've overwritten the
    line-mode by reset $/), so:

    local $/ = "";

    while ( <DATA> ) {
    if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/){
    next unless tr/\n// == 6;
    my ($name, $pob, $add1, $add2, $cont) = split "\n";
    # do sth on the above variables..
    }
    }

    will discard all lines which are not between these two patterns, and
    then split only the paragraphs between...

    Best,
    Xicheng

    > >>>> {
    > >>>> $CustData = $_;
    > >>>> $CustData =~ s/^Transaction Time:.+//; # lose the beginning pattern
    > >>>> $CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//; # lose the ending pattern
    > >>>> next if ($CustData =~ m/^$/); # skip the blank lines
    > >>>> $CustData =~ s/\n//g; # get rid of blank lines. don't think this working
    > >>>> print "\t$CustData\n";
    > >>>> }
    > >>>> }
    > >>>> close IN;
    > >>>> print "\nAll done.\n";
    > >>>>
    > >>>> The problem seems to be that $CustData holds all 5 lines. I need to
    > >>>> break out each of the lines into a separate string variable so as to
    > >>>> populate a database field. This is what has me stumped. Sure would
    > >>>> appreciate some light on this.
    > >>>>
    > >>>> --
    > >>>> Amer Neely
    > >>
    > >> --
    > >> Amer Neely
    > >> Home of Spam Catcher
    > >> W: www.softouch.on.ca
    > >> E:
    > >> Perl | MySQL | CGI programming for all data entry forms.
    > >> "We make web sites work!"

    > >

    >
    >
    > --
    > Amer Neely
    > Home of Spam Catcher
    > W: www.softouch.on.ca
    > E:
    > Perl | MySQL | CGI programming for all data entry forms.
    > "We make web sites work!"
     
    Xicheng Jia, Apr 9, 2006
    #8
  9. Amer Neely

    Xicheng Jia Guest

    MSG wrote:
    > Amer Neely wrote:
    > > if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
    > > {
    > > $CustData = $_;
    > > $CustData =~ s/^Transaction Time:.+//; # lose the beginning pattern
    > > $CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//; # lose the ending pattern
    > > next if ($CustData =~ m/^$/); # skip the blank lines
    > > $CustData =~ s/\n//g; # get rid of blank lines. don't think this working
    > > print "\t$CustData\n";
    > > }
    > > }

    >
    > You don't have to process each line inside the loop. Instead, push each
    > line to an array and then process each array element after the loop.
    > It can be a lot cleaner and easier. Something like this:
    >

    = my @records;
    = while (<IN>){
    = chomp;
    = if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/){
    = push @records, $_;
    = }
    = }

    you might get some troubles if you have more than one /Transaction/
    <==> /^telephone$ / blocks in your input file. :)

    Xicheng

    > Now @records contains lines from "Transaction" to "123-456-7890",
    > each of which is an element of the array.
     
    Xicheng Jia, Apr 9, 2006
    #9
  10. Amer Neely

    Amer Neely Guest

    MSG wrote:
    > Amer Neely wrote:
    >> if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
    >> {
    >> $CustData = $_;
    >> $CustData =~ s/^Transaction Time:.+//; # lose the beginning pattern
    >> $CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//; # lose the ending pattern
    >> next if ($CustData =~ m/^$/); # skip the blank lines
    >> $CustData =~ s/\n//g; # get rid of blank lines. don't think this working
    >> print "\t$CustData\n";
    >> }
    >> }

    >
    > You don't have to process each line inside the loop. Instead, push each
    > line to an array and then process each array element after the loop.
    > It can be a lot cleaner and easier. Something like this:
    >
    > my @records;
    > while (<IN>){
    > chomp;
    > if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/){
    > push @records, $_;
    > }
    > }
    >
    > Now @records contains lines from "Transaction" to "123-456-7890",
    > each of which is an element of the array.
    >


    OK, I see what that does, but I'm not sure it helps me. The goal is to
    pull out that address block, on a line-per-line basis, and insert each
    line into a database field.

    @records contains all the address blocks from the whole file. I'd like
    to deal with each address block (line-by-line) as I go through the file
    if I can.

    Another problem is that some of the addresses have 6 lines, not 5.

    --
    Amer Neely
    Home of Spam Catcher
    W: www.softouch.on.ca
    E:
    Perl | MySQL | CGI programming for all data entry forms.
    "We make web sites work!"
     
    Amer Neely, Apr 9, 2006
    #10
  11. Amer Neely

    Amer Neely Guest

    Xicheng Jia wrote:
    > MSG wrote:
    >> Amer Neely wrote:
    >>> if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
    >>> {
    >>> $CustData = $_;
    >>> $CustData =~ s/^Transaction Time:.+//; # lose the beginning pattern
    >>> $CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//; # lose the ending pattern
    >>> next if ($CustData =~ m/^$/); # skip the blank lines
    >>> $CustData =~ s/\n//g; # get rid of blank lines. don't think this working
    >>> print "\t$CustData\n";
    >>> }
    >>> }

    >> You don't have to process each line inside the loop. Instead, push each
    >> line to an array and then process each array element after the loop.
    >> It can be a lot cleaner and easier. Something like this:
    >>

    > = my @records;
    > = while (<IN>){
    > = chomp;
    > = if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/){
    > = push @records, $_;
    > = }
    > = }
    >
    > you might get some troubles if you have more than one /Transaction/
    > <==> /^telephone$ / blocks in your input file. :)


    Yes, no kidding :)
    In fact the file I'm working with has 79 messages. So I now have 79
    address blocks in @records.

    >
    > Xicheng
    >
    >> Now @records contains lines from "Transaction" to "123-456-7890",
    >> each of which is an element of the array.

    >



    --
    Amer Neely
    Home of Spam Catcher
    W: www.softouch.on.ca
    E:
    Perl | MySQL | CGI programming for all data entry forms.
    "We make web sites work!"
     
    Amer Neely, Apr 9, 2006
    #11
  12. Amer Neely

    MSG Guest

    Amer Neely wrote:
    > Xicheng Jia wrote:
    > > you might get some troubles if you have more than one /Transaction/
    > > <==> /^telephone$ / blocks in your input file. :)

    >
    > Yes, no kidding :)
    > In fact the file I'm working with has 79 messages. So I now have 79
    > address blocks in @records.
    >

    That is easy to deal with. Just change the order of the two lines:
    Instead of
    my @records;
    while (<IN>){
    change to:
    while (<IN>){
    my @records;
    # and now process each record in your code.

    Of course there should always be
    use strict;
    use warnings;
     
    MSG, Apr 9, 2006
    #12
  13. Amer Neely

    MSG Guest

    Amer Neely wrote:
    > @records contains all the address blocks from the whole file. I'd like
    > to deal with each address block (line-by-line) as I go through the file
    > if I can.

    while (<IN>){
    my @records;
    if ( /^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/){
    push @records, $_;
    }
    # now you get each address block in @records on each loop iteration
    # Do some processing here
    }
    >
    > Another problem is that some of the addresses have 6 lines, not 5.

    That is why you don't want to process every line on every iteration. It
    is
    better to first group each address block into its own array.
     
    MSG, Apr 9, 2006
    #13
  14. Amer Neely

    Amer Neely Guest

    MSG wrote:
    > Amer Neely wrote:
    >> @records contains all the address blocks from the whole file. I'd like
    >> to deal with each address block (line-by-line) as I go through the file
    >> if I can.

    > while (<IN>){
    > my @records;
    > if ( /^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/){
    > push @records, $_;
    > }
    > # now you get each address block in @records on each loop iteration
    > # Do some processing here
    > }
    >> Another problem is that some of the addresses have 6 lines, not 5.

    > That is why you don't want to process every line on every iteration. It
    > is
    > better to first group each address block into its own array.
    >


    OK, I'm trying that, but it's still giving me grief.

    while (<IN>)
    {
    my @CustData=();
    if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
    {
    push @CustData, $_;
    }
    foreach my $line (@CustData)
    {
    $line =~ s/^Transaction Time:.+//;
    $line =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//;
    ($CustName,$Address1,$Address2,$CityProv,$Code,$Country) =
    split(/\n/,$line);
    print "Name: $CustName\n";
    print "Address: $Address1\n";
    print "Address: $Address2\n";
    print "City/Prov: $CityProv\n";
    print "Code: $Code\n";
    print "Country: $Country\n";
    }

    } # end while (<IN>)
    close IN;

    --
    Amer Neely
    Home of Spam Catcher
    W: www.softouch.on.ca
    E:
    Perl | MySQL | CGI programming for all data entry forms.
    "We make web sites work!"
     
    Amer Neely, Apr 9, 2006
    #14
  15. Amer Neely

    Amer Neely Guest

    Amer Neely wrote:
    > This is driving me nuts.
    >
    > I'm walking through a mailbox file, and want to pull out specific lines
    > from each message. The body of each message is in a similar format,
    > having been generated by a script.
    >
    > I'm doing OK except for one particular block of lines, the customer
    > address data. There is a blank line before and after this block. Example:
    >
    > Transaction Time: 18:45:55
    >
    > Amer Neely
    > POB 1481 Station Main
    > North Bay ON
    > P1B 8K7
    > CANADA
    >
    > 123-456-7890
    >
    > I've managed to get the 5 lines into a string using this code:
    >
    > while <IN>
    > {
    >
    > # bunch of other comparisons deleted
    >
    > if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
    > {
    > $CustData = $_;
    > $CustData =~ s/^Transaction Time:.+//; # lose the beginning pattern
    > $CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//; # lose the ending pattern
    > next if ($CustData =~ m/^$/); # skip the blank lines
    > $CustData =~ s/\n//g; # get rid of blank lines. don't think this working
    > print "\t$CustData\n";
    > }
    > }
    > close IN;
    > print "\nAll done.\n";
    >
    > The problem seems to be that $CustData holds all 5 lines. I need to
    > break out each of the lines into a separate string variable so as to
    > populate a database field. This is what has me stumped. Sure would
    > appreciate some light on this.
    >


    The closest I've gotten so far is with the following code.

    if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
    {
    $CustData = $_;
    $CustData =~ s/^Transaction Time:.+//;
    $CustData =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//;
    next if ($CustData =~ m/^$/);
    $CustData =~ s/\n//g;
    #print "$CustData\n";
    (@CustData) = split(/\n/,$CustData);

    my $addcounter=0;
    foreach (@CustData)
    {
    $addcounter++;
    print "\t[$addcounter] $_\n";
    }
    }

    Bear in mind this block is in the middle of a message, so there is more
    text before and after this.

    But this puts the whole $CustData string (all 5 or 6 lines) into
    $CustData[0], so it's ignoring the split.

    --
    Amer Neely
    Home of Spam Catcher
    W: www.softouch.on.ca
    E:
    Perl | MySQL | CGI programming for all data entry forms.
    "We make web sites work!"
     
    Amer Neely, Apr 9, 2006
    #15
  16. Amer Neely

    MSG Guest

    Amer Neely wrote:
    > while (<IN>)
    > {
    > my @CustData=();
    > if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
    > {
    > push @CustData, $_;
    > }
    > foreach my $line (@CustData)

    So far so good!

    > {
    > $line =~ s/^Transaction Time:.+//;
    > $line =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//;
    > ($CustName,$Address1,$Address2,$CityProv,$Code,$Country) =
    > split(/\n/,$line);

    Unfortunately you are still in the mind set of processing strings.
    Please switch gear and treat your @CustData as what it is- ARRAY.
    All the lines have already been separated and put into an array. There
    is no need to split any more. What do data look like in this array?
    $CustData[0] : "Transaction Time: ..." # always the first element
    $CustData[1] : (blank)
    $CustData[2[ : (Name)
    $CustData[3] : (Address 1)
    ....
    $CustData{$#CustData]: "123-456-7890" # always the last
    One way to get to only the name and the address part:
    for ( @CustData[2..$#CustData-2] ){
    print $_, "\n";
    }

    >
    > } # end while (<IN>)
    > close IN;
    >
    > --
    > Amer Neely
    > Home of Spam Catcher
    > W: www.softouch.on.ca
    > E:
    > Perl | MySQL | CGI programming for all data entry forms.
    > "We make web sites work!"
     
    MSG, Apr 9, 2006
    #16
  17. Amer Neely

    Amer Neely Guest

    MSG wrote:
    > Amer Neely wrote:
    >> while (<IN>)
    >> {
    >> my @CustData=();
    >> if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
    >> {
    >> push @CustData, $_;
    >> }
    >> foreach my $line (@CustData)

    > So far so good!
    >
    >> {
    >> $line =~ s/^Transaction Time:.+//;
    >> $line =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//;
    >> ($CustName,$Address1,$Address2,$CityProv,$Code,$Country) =
    >> split(/\n/,$line);

    > Unfortunately you are still in the mind set of processing strings.
    > Please switch gear and treat your @CustData as what it is- ARRAY.
    > All the lines have already been separated and put into an array. There
    > is no need to split any more. What do data look like in this array?
    > $CustData[0] : "Transaction Time: ..." # always the first element
    > $CustData[1] : (blank)
    > $CustData[2[ : (Name)
    > $CustData[3] : (Address 1)
    > ...
    > $CustData{$#CustData]: "123-456-7890" # always the last
    > One way to get to only the name and the address part:
    > for ( @CustData[2..$#CustData-2] ){
    > print $_, "\n";
    > }
    >
    >> } # end while (<IN>)
    >> close IN;


    My code:
    open IN, "<$Infile";
    while (<IN>)
    {
    my @CustData=();
    if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
    {
    push @CustData, $_;
    }

    foreach my $line (@CustData)
    {
    my $addcounter=0;
    $line =~ s/^Transaction Time:.+//;
    $line =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//;
    for ( $CustData[2..$#CustData-2] )
    {
    $addcounter++;
    print "[$addcounter] $_";
    }
    }

    }
    close IN;
    print "\nAll done.\n";

    All I've changed is to add a counter for each element. Also had to
    change your ( @CustData[2 to ( $CustData[2 otherwise I got no output at all.

    The output:
    [1]
    [1] [1] [1] xxxxxxxxxxxx
    [1] xxxxxxxxxxxxxxxxxx
    [1] SAULT STE MARIE Ontario
    [1] P6A 3P4
    [1] CANADA
    [1]
    [1]
    [1]
    [1]
    [1] xxxxxxxxxxxxxx
    [1] xxxxxxxxxxxxxxxxxxx
    [1] Yellowknife NT
    [1] X1A 3N2
    [1] CANADA
    [1]
    [1]
    [1]
    [1]
    [1] xxxxxxxxxxxxn
    [1] xxxxxxxxxxxx
    [1] Tara ON
    [1] N0H 2N0
    [1] CANADA
    [1]
    [1]
    [1]
    [1]
    [1] xxxxxxxxxxxxxxxxx
    [1] xxxxxxxxxxxxxxxxxx
    [1] Laval Qc
    [1] H7E2B4
    [1] CANADA
    [1]
    [1]
    [1]
    [1]
    [1] xxxxxxxxxxx
    [1] xxxxxxxxxxxxxxxxxxxx
    [1] xxxxxx
    [1] sault te. marie ON
    [1] P6A 6E9
    [1] CANADA
    [1]
    [1]

    All done.


    Now I just changed the inner loop to print all elements.

    while (<IN>)
    {
    my @CustData=();
    if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
    {
    push @CustData, $_;
    }

    foreach my $line (@CustData)
    {
    my $addcounter=0;
    $line =~ s/^Transaction Time:.+//;
    $line =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//;
    #for ( $CustData[0..$#CustData] )
    #for ( $CustData[2..$#CustData-2] )
    foreach (@CustData)
    {
    $addcounter++;
    print "[$addcounter] $_";
    }
    }

    }
    close IN;
    print "\nAll done.\n";

    Here's my output from a subset of the whole file:
    [1]
    [1]
    [1] xxxxxxxxxxxx
    [1] xxxxxxxxxxxxxxxxxxxx
    [1] SAULT STE MARIE Ontario
    [1] P6A 3P4
    [1] CANADA
    [1]
    [1]
    [1]
    [1]
    [1] xxxxxxxxxxxxx
    [1] xxxxxxxxxxxxxxxxxxx
    [1] Yellowknife NT
    [1] X1A 3N2
    [1] CANADA
    [1]
    [1]
    [1]
    [1]
    [1] xxxxxxxxxxxxn
    [1] xxxxxxxxxxxx
    [1] Tara ON
    [1] N0H 2N0
    [1] CANADA
    [1]
    [1]
    [1]
    [1]
    [1] xxxxxxxxxxxxxxxx
    [1] xxxxxxxxxxxxxxxxxx
    [1] Laval Qc
    [1] H7E2B4
    [1] CANADA
    [1]
    [1]
    [1]
    [1]
    [1] xxxxxxxxxxx
    [1] xxxxxxxxxxxxxxxxxxxx
    [1] xxxxxxx
    [1] sault te. marie ON
    [1] P6A 6E9
    [1] CANADA
    [1]
    [1]

    All done.

    So it still seems that the @CustData array only has 1 element in it.
    This is what has been driving me nuts.

    --
    Amer Neely
    Home of Spam Catcher
    W: www.softouch.on.ca
    E:
    Perl | MySQL | CGI programming for all data entry forms.
    "We make web sites work!"
     
    Amer Neely, Apr 9, 2006
    #17
  18. Amer Neely

    Dr.Ruud Guest

    Amer Neely schreef:

    > I'm walking through a mailbox file, and want to pull out specific
    > lines from each message. The body of each message is in a similar
    > format, having been generated by a script.
    >
    > I'm doing OK except for one particular block of lines, the customer
    > address data. There is a blank line before and after this block.
    > Example:
    >
    > Transaction Time: 18:45:55
    >
    > Amer Neely
    > POB 1481 Station Main
    > North Bay ON
    > P1B 8K7
    > CANADA
    >
    > 123-456-7890



    Or use a simplified state machine.

    my $state = -1;
    my $line = -1;

    while (<>) {
    chomp; # s/^\s+//; s/\s+$//;

    if (-1 == $state) {
    if (/^Transaction Time:/) {
    ++$state;
    }
    }
    elsif (0 == $state) {
    if (/^$/) {
    ++$state;
    $line = 0;
    }
    else {
    die "$state: <$_>?";
    }
    }
    elsif (1 == $state) { # in address
    if (^$) {
    # skip
    }
    elsif (/^\d{3}-\d{3}-\d{4}$/) {
    $state = -1;
    $line = -1;
    }
    else {
    ++$line;
    print "$line: $_\n";
    }
    }
    else {
    die "$state: <$_>?";
    }
    }

    (untested)

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Apr 9, 2006
    #18
  19. Amer Neely

    Xicheng Jia Guest

    Amer Neely wrote:
    > MSG wrote:
    > > Amer Neely wrote:
    > >> @records contains all the address blocks from the whole file. I'd like
    > >> to deal with each address block (line-by-line) as I go through the file
    > >> if I can.

    > > while (<IN>){
    > > my @records;
    > > if ( /^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/){
    > > push @records, $_;
    > > }
    > > # now you get each address block in @records on each loop iteration
    > > # Do some processing here
    > > }
    > >> Another problem is that some of the addresses have 6 lines, not 5.

    > > That is why you don't want to process every line on every iteration. It
    > > is
    > > better to first group each address block into its own array.
    > >

    >
    > OK, I'm trying that, but it's still giving me grief.
    >
    > while (<IN>)
    > {
    > my @CustData=();
    > if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
    > {
    > push @CustData, $_;
    > }
    > foreach my $line (@CustData)
    > {
    > $line =~ s/^Transaction Time:.+//;
    > $line =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//;
    > ($CustName,$Address1,$Address2,$CityProv,$Code,$Country) =
    > split(/\n/,$line);
    > print "Name: $CustName\n";
    > print "Address: $Address1\n";
    > print "Address: $Address2\n";
    > print "City/Prov: $CityProv\n";
    > print "Code: $Code\n";
    > print "Country: $Country\n";
    > }
    >
    > } # end while (<IN>)
    > close IN;


    Here is a test code which uses paragraph-mode to extract info and try
    to insert into your database (tested under WinXP)..
    --------------------------
    use strict;
    use warnings;

    local $/ = "";

    while ( <DATA> ) {
    if (/^Transaction Time:/ .. /^\d\d\d-\d\d\d-\d\d\d\d\s*$/){
    my $lines = tr/\n//;
    next if $lines < 6;
    my ( $name, $addr1, $addr2, $city, $code, $cont );
    if ( $lines == 6 ) {
    ( $name, $addr1, $city, $code, $cont ) = split "\n";
    $addr2 = "";
    } elsif ( $lines == 7 ) {
    ( $name, $addr1, $addr2, $city, $code, $cont ) = split
    "\n";
    }
    # to INSERT INTO mytable from mydb.
    #$sth->execute( $name, $addr1, $addr2, $city, $code, $cont );
    print <<TEST;
    name = $name
    addr1 = $addr1
    addr2 = $addr2
    city = $city
    code = $code
    country = $cont

    TEST
    }
    }

    __DATA__
    one block
    one block
    one block
    one block
    one block

    Transaction Time: 18:45:55

    Amer Neely
    POB 1481 Station Main
    AMS dept
    North Bay ON
    P1B 8K7
    CANADA

    123-456-7890

    some other blocks
    some other blocks
    some other blocks
    some other blocks
    some other blocks
    some other blocks

    Transaction Time: 18:45:34

    Bmer Neely
    POB 123
    South
    ABC 879
    USA

    800-346-7890

    another block
    another block
    another block
    another block
    another block
    another block
    another block

    Transaction Time: 18:45:55

    Amer Neely
    POB 1481 Station Main
    North Bay ON
    P1B 8K7
    CANADA

    123-456-7890

    more blocks
    more blocks
    more blocks
    more blocks
    more blocks
    more blocks
    more blocks
    more blocks
    more blocks
    more blocks
    ---------------------------------------------------
    ======print result=======
    name = Amer Neely
    addr1 = POB 1481 Station Main
    addr2 = AMS
    city = North Bay ON
    code = P1B 8K7
    country = CANADA

    name = Bmer Neely
    addr1 = POB 123
    addr2 =
    city = South
    code = ABC 879
    country = USA

    name = Amer Neely
    addr1 = POB 1481 Station Main
    addr2 =
    city = North Bay ON
    code = P1B 8K7
    country = CANADA
    ========================

    > --
    > Amer Neely
    > Home of Spam Catcher
    > W: www.softouch.on.ca
    > E:
    > Perl | MySQL | CGI programming for all data entry forms.
    > "We make web sites work!"
     
    Xicheng Jia, Apr 9, 2006
    #19
  20. Amer Neely

    Amer Neely Guest

    Xicheng Jia wrote:
    > Amer Neely wrote:
    >> MSG wrote:
    >>> Amer Neely wrote:
    >>>> @records contains all the address blocks from the whole file. I'd like
    >>>> to deal with each address block (line-by-line) as I go through the file
    >>>> if I can.
    >>> while (<IN>){
    >>> my @records;
    >>> if ( /^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/){
    >>> push @records, $_;
    >>> }
    >>> # now you get each address block in @records on each loop iteration
    >>> # Do some processing here
    >>> }
    >>>> Another problem is that some of the addresses have 6 lines, not 5.
    >>> That is why you don't want to process every line on every iteration. It
    >>> is
    >>> better to first group each address block into its own array.
    >>>

    >> OK, I'm trying that, but it's still giving me grief.
    >>
    >> while (<IN>)
    >> {
    >> my @CustData=();
    >> if (/^Transaction Time:/ ... /^\d\d\d-\d\d\d-\d\d\d\d$/)
    >> {
    >> push @CustData, $_;
    >> }
    >> foreach my $line (@CustData)
    >> {
    >> $line =~ s/^Transaction Time:.+//;
    >> $line =~ s/^\d\d\d-\d\d\d-\d\d\d\d$//;
    >> ($CustName,$Address1,$Address2,$CityProv,$Code,$Country) =
    >> split(/\n/,$line);
    >> print "Name: $CustName\n";
    >> print "Address: $Address1\n";
    >> print "Address: $Address2\n";
    >> print "City/Prov: $CityProv\n";
    >> print "Code: $Code\n";
    >> print "Country: $Country\n";
    >> }
    >>
    >> } # end while (<IN>)
    >> close IN;

    >
    > Here is a test code which uses paragraph-mode to extract info and try
    > to insert into your database (tested under WinXP)..
    > --------------------------
    > use strict;
    > use warnings;
    >
    > local $/ = "";
    >
    > while ( <DATA> ) {
    > if (/^Transaction Time:/ .. /^\d\d\d-\d\d\d-\d\d\d\d\s*$/){
    > my $lines = tr/\n//;
    > next if $lines < 6;
    > my ( $name, $addr1, $addr2, $city, $code, $cont );
    > if ( $lines == 6 ) {
    > ( $name, $addr1, $city, $code, $cont ) = split "\n";
    > $addr2 = "";
    > } elsif ( $lines == 7 ) {
    > ( $name, $addr1, $addr2, $city, $code, $cont ) = split
    > "\n";
    > }
    > # to INSERT INTO mytable from mydb.
    > #$sth->execute( $name, $addr1, $addr2, $city, $code, $cont );
    > print <<TEST;
    > name = $name
    > addr1 = $addr1
    > addr2 = $addr2
    > city = $city
    > code = $code
    > country = $cont
    >
    > TEST
    > }
    > }
    >
    > __DATA__
    > one block
    > one block
    > one block
    > one block
    > one block
    >
    > Transaction Time: 18:45:55
    >
    > Amer Neely
    > POB 1481 Station Main
    > AMS dept
    > North Bay ON
    > P1B 8K7
    > CANADA
    >
    > 123-456-7890
    >
    > some other blocks
    > some other blocks
    > some other blocks
    > some other blocks
    > some other blocks
    > some other blocks
    >
    > Transaction Time: 18:45:34
    >
    > Bmer Neely
    > POB 123
    > South
    > ABC 879
    > USA
    >
    > 800-346-7890
    >
    > another block
    > another block
    > another block
    > another block
    > another block
    > another block
    > another block
    >
    > Transaction Time: 18:45:55
    >
    > Amer Neely
    > POB 1481 Station Main
    > North Bay ON
    > P1B 8K7
    > CANADA
    >
    > 123-456-7890
    >
    > more blocks
    > more blocks
    > more blocks
    > more blocks
    > more blocks
    > more blocks
    > more blocks
    > more blocks
    > more blocks
    > more blocks
    > ---------------------------------------------------
    > ======print result=======
    > name = Amer Neely
    > addr1 = POB 1481 Station Main
    > addr2 = AMS
    > city = North Bay ON
    > code = P1B 8K7
    > country = CANADA
    >
    > name = Bmer Neely
    > addr1 = POB 123
    > addr2 =
    > city = South
    > code = ABC 879
    > country = USA
    >
    > name = Amer Neely
    > addr1 = POB 1481 Station Main
    > addr2 =
    > city = North Bay ON
    > code = P1B 8K7
    > country = CANADA
    > ========================
    >



    EXCELLENT!

    I modified it to get input from my file, and it still works :)

    Thank you, thank you, thank you. Now I can move on.

    --
    Amer Neely
    Home of Spam Catcher
    W: www.softouch.on.ca
    E:
    Perl | MySQL | CGI programming for all data entry forms.
    "We make web sites work!"
     
    Amer Neely, Apr 9, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Joe Wright
    Replies:
    0
    Views:
    538
    Joe Wright
    Jul 27, 2003
  2. Murali
    Replies:
    2
    Views:
    593
    Jerry Coffin
    Mar 9, 2006
  3. Roger Reeks
    Replies:
    1
    Views:
    105
    Jesús Gabriel y Galán
    Oct 16, 2008
  4. Adam Akhtar
    Replies:
    3
    Views:
    107
    Adam Akhtar
    Jan 10, 2009
  5. it_says_BALLS_on_your forehead

    extract range of lines using range op bug?

    it_says_BALLS_on_your forehead, Mar 3, 2006, in forum: Perl Misc
    Replies:
    3
    Views:
    197
    it_says_BALLS_on_your forehead
    Mar 3, 2006
Loading...

Share This Page