Perl Strings vs FileHandle

Discussion in 'Perl Misc' started by shadabh, Sep 6, 2008.

  1. shadabh

    shadabh Guest

    Hi all,

    Just wanted to run this through Perl gurus to see if fit is correct?.

    I have a file that could possibly be 1GB in variable length EBCDIC
    data. I will read the file as EBCDIC data and based on some criteria
    split it into 100 different files which will add up to the 1GB. This
    way a particular copy book can be applied to easy of the split files.

    The approach I am using is a filehandle ( IO::FileHandle and
    $Strings), substr and write out to 100 different files after applying
    the 'logic'. I will use two routine, one to read and one to write, I
    have tested this out with 100MB file and it works fine. The question
    though is there a memory limit to this, as we are using strings to
    break the files. Or is there an alternative way to do this?

    Comments, suggestions, improvements and alternatives will really help
    to design the code. thanks

    Shadab
    shadabh, Sep 6, 2008
    #1
    1. Advertising

  2. shadabh wrote:
    > Hi all,
    >
    > Just wanted to run this through Perl gurus to see if fit is correct?.
    >
    > I have a file that could possibly be 1GB in variable length EBCDIC
    > data. I will read the file as EBCDIC data and based on some criteria
    > split it into 100 different files which will add up to the 1GB. This
    > way a particular copy book can be applied to easy of the split files.
    >
    > The approach I am using is a filehandle ( IO::FileHandle and
    > $Strings), substr and write out to 100 different files after applying
    > the 'logic'. I will use two routine, one to read and one to write, I
    > have tested this out with 100MB file and it works fine. The question
    > though is there a memory limit to this, as we are using strings to
    > break the files. Or is there an alternative way to do this?
    >
    > Comments, suggestions, improvements and alternatives will really help
    > to design the code. thanks


    You have to show us what "this" is first. It is kind of hard to make
    comments, suggestions, improvements and alternatives to something that
    we cannot see.


    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
    John W. Krahn, Sep 7, 2008
    #2
    1. Advertising

  3. shadabh

    shadabh Guest

    On Sep 6, 6:08 pm, "John W. Krahn" <> wrote:
    > shadabh wrote:
    > > Hi all,

    >
    > > Just wanted to run this through Perl gurus to see if fit is correct?.

    >
    > > I have a file that could possibly be 1GB in variable length EBCDIC
    > > data. I will read the file as EBCDIC data and based on some criteria
    > > split it into 100 different files which will add up to the 1GB. This
    > > way a particular copy book can be applied to easy of the split files.

    >
    > > The approach I am using is a filehandle ( IO::FileHandle and
    > > $Strings), substr and write out to 100 different files after applying
    > > the 'logic'. I will use two routine, one to read and one to write, I
    > > have tested this out with 100MB file and it works fine. The question
    > > though is there a memory limit to this, as we are using strings to
    > > break the files. Or is there an alternative way to do this?

    >
    > > Comments, suggestions, improvements and alternatives will really help
    > > to design the code. thanks

    >
    > You have to show us what "this" is first.  It is kind of hard to make
    > comments, suggestions, improvements and alternatives to something that
    > we cannot see.
    >
    > John
    > --
    > Perl isn't a toolbox, but a small machine shop where you
    > can special-order certain sorts of tools at low cost and
    > in short order.                            --Larry Wall- Hide quoted text -
    >
    > - Show quoted text -


    Sure. Here is what I meant by 'this way'. Please comment. Thanks

    while ($raw_data) {
    $var_len=$INIT_LEN+$AC_NUM_LENGTH;
    $val = substr $raw_data,$var_len,4;
    $asc_string = $val;
    eval '$asc_string =~ tr/\000-\377/' . $cp_037 . '/';
    # open(OUTF, '>>ebcdic_ID.txt');
    # #print OUTF $var_len, $asc_string, "\n";
    # close OUTF;
    open(PARM, '<TSY2_PARM.par') || die("Could not open the
    parammeter file $PARM_FILE ! File read failed ..check iF file exits");
    $parm_data = <PARM>;
    if
    (($parm_data =~ m!($asc_string)!g) eq 1) {
    $COPYBOOK_LEN = substr $parm_data,length($`)+4,4;
    print $COPYBOOK_LEN;
    close (PARM);
    $OUT_DATAFILE = 'EBCDIC_'.$asc_string.'.txt';
    $RECORD_LEN= $COPYBOOK_LEN+$HEADER_DATA;
    open(OUTF, ">>$OUT_DATAFILE")|| die("Could not open file. File
    read failed ..check id file exits");
    print OUTF substr $raw_data, $INIT_LEN, $RECORD_LEN;
    close OUTF;
    $INIT_LEN = $INIT_LEN + $RECORD_LEN;
    print $INIT_LEN;
    print $var_len;
    }
    else {
    print 'End of file reached or copy book is not a part of the
    loading process', "\n";
    exit 0;
    }
    }
    shadabh, Sep 17, 2008
    #3
  4. shadabh

    Guest

    shadabh <> wrote:
    > On Sep 6, 6:08=A0pm, "John W. Krahn" <> wrote:
    > > shadabh wrote:
    > > > Hi all,

    > >
    > > > Just wanted to run this through Perl gurus to see if fit is correct?.

    > >
    > > > I have a file that could possibly be 1GB in variable length EBCDIC
    > > > data. I will read the file as EBCDIC data and based on some criteria
    > > > split it into 100 different files which will add up to the 1GB. This
    > > > way a particular copy book can be applied to easy of the split files.

    > >
    > > > The approach I am using is a filehandle ( IO::FileHandle and
    > > > $Strings), substr and write out to 100 different files after applying
    > > > the 'logic'. I will use two routine, one to read and one to write, I
    > > > have tested this out with 100MB file and it works fine. The question
    > > > though is there a memory limit to this, as we are using strings to
    > > > break the files. Or is there an alternative way to do this?


    If you are loading the entire file in memory as a giant string, then
    you will need enough memory to do so. My favorite machine has enough
    memory to do that with no problem. We don't know how much memory your
    machine has. ...

    Where did $raw_data come from?

    >
    > while ($raw_data) {


    As far as I can tell, $raw_data is never modified. So the loop will be
    infinite (or never iterate at all)

    And is an indent size of 40 characters really a good idea?

    > $var_len=$INIT_LEN+$AC_NUM_LENGTH;
    > $val = substr $raw_data, $var_len,4;


    So maybe do:

    while (1) {
    $var_len=$INIT_LEN+$AC_NUM_LENGTH;
    $val = substr $raw_data, $var_len,4;
    last unless 4==length $val;

    You do seem to have a conditional exit later on, but why wait until then
    when you already know you are at the end?

    The rest of the code is hideously difficult to read. All cap variable
    names make my eyes bleed. So there is only so much time I'm willing to
    spend staring at it.

    Why reread $parm_data each time through the loop? You are reading it from
    the same file each time, so doesn't it have the same value?

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
    , Sep 17, 2008
    #4
  5. shadabh wrote:
    > On Sep 6, 6:08 pm, "John W. Krahn" <> wrote:
    >> shadabh wrote:
    >>> Hi all,
    >>> Just wanted to run this through Perl gurus to see if fit is correct?.
    >>> I have a file that could possibly be 1GB in variable length EBCDIC
    >>> data. I will read the file as EBCDIC data and based on some criteria
    >>> split it into 100 different files which will add up to the 1GB. This
    >>> way a particular copy book can be applied to easy of the split files.
    >>> The approach I am using is a filehandle ( IO::FileHandle and
    >>> $Strings), substr and write out to 100 different files after applying
    >>> the 'logic'. I will use two routine, one to read and one to write, I
    >>> have tested this out with 100MB file and it works fine. The question
    >>> though is there a memory limit to this, as we are using strings to
    >>> break the files. Or is there an alternative way to do this?
    >>> Comments, suggestions, improvements and alternatives will really help
    >>> to design the code. thanks

    >> You have to show us what "this" is first. It is kind of hard to make
    >> comments, suggestions, improvements and alternatives to something that
    >> we cannot see.
    >>
    >> John
    >> --
    >> Perl isn't a toolbox, but a small machine shop where you
    >> can special-order certain sorts of tools at low cost and
    >> in short order. -- Larry Wall- Hide quoted text -
    >>
    >> - Show quoted text -

    >
    > Sure. Here is what I meant by 'this way'. Please comment. Thanks
    >
    > while ($raw_data) {


    You don't modify $raw_data inside the loop so there is no point in
    testing it every time through the loop.

    > $var_len=$INIT_LEN+$AC_NUM_LENGTH;


    my $var_len = $INIT_LEN + $AC_NUM_LENGTH;

    > $val = substr $raw_data,$var_len,4;
    > $asc_string = $val;


    my $asc_string = substr $raw_data, $var_len, 4;

    > eval '$asc_string =~ tr/\000-\377/' . $cp_037 . '/';
    > # open(OUTF, '>>ebcdic_ID.txt');
    > # #print OUTF $var_len, $asc_string, "\n";
    > # close OUTF;
    > open(PARM, '<TSY2_PARM.par') || die("Could not open the
    > parammeter file $PARM_FILE ! File read failed ..check iF file exits");


    You are opening 'TSY2_PARM.par' but your error message says you are
    opening $PARM_FILE. You should include the $! variable in the error
    message so you know *why* it failed to open.

    open my $PARM, '<', 'TSY2_PARM.par' or die "Could not open
    'TSY2_PARM.par' $!";

    > $parm_data = <PARM>;


    You assign the same data to $parm_data every time so...

    > if (($parm_data =~ m!($asc_string)!g) eq 1) {


    your pattern will always match at the same place every time through the
    loop so if the pattern is present you have an infinite loop. You are
    not using the contents of $1 so the parentheses are superfluous. The
    comparison test to the string '1' is superfluous.

    if ( $parm_data =~ /$asc_string/g ) {

    > $COPYBOOK_LEN = substr $parm_data,length($`)+4,4;


    The use of $` is discouraged as it slows down *all* regular expressions
    in your program.

    my $COPYBOOK_LEN = substr $parm_data, $-[ 0 ] + 4, 4;

    > print $COPYBOOK_LEN;
    > close (PARM);
    > $OUT_DATAFILE = 'EBCDIC_'.$asc_string.'.txt';
    > $RECORD_LEN= $COPYBOOK_LEN+$HEADER_DATA;
    > open(OUTF, ">>$OUT_DATAFILE")|| die("Could not open file. File
    > read failed ..check id file exits");


    open my $OUTF, '>>', $OUT_DATAFILE or die "Could not open
    '$OUT_DATAFILE' $!";

    > print OUTF substr $raw_data, $INIT_LEN, $RECORD_LEN;
    > close OUTF;
    > $INIT_LEN = $INIT_LEN + $RECORD_LEN;
    > print $INIT_LEN;
    > print $var_len;
    > }
    > else {
    > print 'End of file reached or copy book is not a part of the
    > loading process', "\n";
    > exit 0;
    > }
    > }


    So, to summarise:

    open my $PARM, '<', 'TSY2_PARM.par' or die "Could not open the
    parammeter file 'TSY2_PARM.par' $!";
    my $parm_data = <$PARM>;
    close $PARM;

    while ( 1 ) {

    my $var_len = $INIT_LEN + $AC_NUM_LENGTH;
    my $asc_string = substr $raw_data, $var_len, 4;
    eval '$asc_string =~ tr/\000-\377/' . $cp_037 . '/';

    if ( $parm_data =~ /\Q$asc_string\E/g ) {

    my $COPYBOOK_LEN = substr $parm_data, $-[ 0 ] + 4, 4;
    my $OUT_DATAFILE = "EBCDIC_$asc_string.txt";
    my $RECORD_LEN = $COPYBOOK_LEN + $HEADER_DATA;

    open my $OUTF, '>>', $OUT_DATAFILE or die "Could not open
    '$OUT_DATAFILE' $!";
    print $OUTF substr $raw_data, $INIT_LEN, $RECORD_LEN;
    close $OUTF;

    $INIT_LEN += $RECORD_LEN;
    print $COPYBOOK_LEN, $INIT_LEN, $var_len;
    }
    else {
    print "End of file reached or copy book is not a part of the
    loading process\n";
    last;
    }
    }

    exit 0;

    __END__


    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
    John W. Krahn, Sep 17, 2008
    #5
  6. shadabh

    shadabh Guest

    On Sep 17, 12:39 pm, "John W. Krahn" <> wrote:
    > shadabh wrote:
    > > On Sep 6, 6:08 pm, "John W. Krahn" <> wrote:
    > >> shadabh wrote:
    > >>> Hi all,
    > >>> Just wanted to run this through Perl gurus to see if fit is correct?.
    > >>> I have a file that could possibly be 1GB in variable length EBCDIC
    > >>> data. I will read the file as EBCDIC data and based on some criteria
    > >>> split it into 100 different files which will add up to the 1GB. This
    > >>> way a particular copy book can be applied to easy of the split files.
    > >>> The approach I am using is a filehandle ( IO::FileHandle and
    > >>> $Strings), substr and write out to 100 different files after applying
    > >>> the 'logic'. I will use two routine, one to read and one to write, I
    > >>> have tested this out with 100MB file and it works fine. The question
    > >>> though is there a memory limit to this, as we are using strings to
    > >>> break the files. Or is there an alternative way to do this?
    > >>> Comments, suggestions, improvements and alternatives will really help
    > >>> to design the code. thanks
    > >> You have to show us what "this" is first.  It is kind of hard to make
    > >> comments, suggestions, improvements and alternatives to something that
    > >> we cannot see.

    >
    > >> John
    > >> --
    > >> Perl isn't a toolbox, but a small machine shop where you
    > >> can special-order certain sorts of tools at low cost and
    > >> in short order.                            -- Larry Wall- Hide quoted text -

    >
    > >> - Show quoted text -

    >
    > > Sure. Here is what I meant by 'this way'. Please comment. Thanks

    >
    > > while ($raw_data) {

    >
    > You don't modify $raw_data inside the loop so there is no point in
    > testing it every time through the loop.
    >
    > >     $var_len=$INIT_LEN+$AC_NUM_LENGTH;

    >
    >      my $var_len = $INIT_LEN + $AC_NUM_LENGTH;
    >
    > >     $val = substr $raw_data,$var_len,4;
    > >     $asc_string = $val;

    >
    >      my $asc_string = substr $raw_data, $var_len, 4;
    >
    > >     eval '$asc_string =~ tr/\000-\377/' . $cp_037 . '/';
    > > #   open(OUTF, '>>ebcdic_ID.txt');
    > > #   #print OUTF $var_len, $asc_string, "\n";
    > > #   close OUTF;
    > >     open(PARM, '<TSY2_PARM.par') || die("Could not open the
    > > parammeter file $PARM_FILE ! File read failed ..check iF file exits");

    >
    > You are opening 'TSY2_PARM.par' but your error message says you are
    > opening $PARM_FILE.  You should include the $! variable in the error
    > message so you know *why* it failed to open.
    >
    >      open my $PARM, '<', 'TSY2_PARM.par' or die "Could not open
    > 'TSY2_PARM.par' $!";
    >
    > >     $parm_data = <PARM>;

    >
    > You assign the same data to $parm_data every time so...
    >
    > >     if (($parm_data =~ m!($asc_string)!g) eq 1) {

    >
    > your pattern will always match at the same place every time through the
    > loop so if the pattern is present you have an infinite loop.  You are
    > not using the contents of $1 so the parentheses are superfluous.  The
    > comparison test to the string '1' is superfluous.
    >
    >      if ( $parm_data =~ /$asc_string/g ) {
    >
    > >         $COPYBOOK_LEN = substr $parm_data,length($`)+4,4;

    >
    > The use of $` is discouraged as it slows down *all* regular expressions
    > in your program.
    >
    >          my $COPYBOOK_LEN = substr $parm_data, $-[ 0 ] + 4, 4;
    >
    > >         print $COPYBOOK_LEN;
    > >         close (PARM);
    > >         $OUT_DATAFILE = 'EBCDIC_'.$asc_string.'.txt';
    > >         $RECORD_LEN= $COPYBOOK_LEN+$HEADER_DATA;
    > >         open(OUTF, ">>$OUT_DATAFILE")|| die("Could not open file. File
    > > read failed ..check id file exits");

    >
    >          open my $OUTF, '>>', $OUT_DATAFILE or die "Could not open
    > '$OUT_DATAFILE' $!";
    >
    > >         print OUTF substr $raw_data, $INIT_LEN, $RECORD_LEN;
    > >         close OUTF;
    > >         $INIT_LEN = $INIT_LEN + $RECORD_LEN;
    > >         print $INIT_LEN;
    > >         print $var_len;
    > >     }
    > >     else {
    > >         print 'End of file reached or copy book is not a part of the
    > > loading process', "\n";
    > >         exit 0;
    > >     }
    > > }

    >
    > So, to summarise:
    >
    > open my $PARM, '<', 'TSY2_PARM.par' or die "Could not open the
    > parammeter file 'TSY2_PARM.par' $!";
    > my $parm_data = <$PARM>;
    > close $PARM;
    >
    > while ( 1 ) {
    >
    >      my $var_len = $INIT_LEN + $AC_NUM_LENGTH;
    >      my $asc_string = substr $raw_data, $var_len, 4;
    >      eval '$asc_string =~ tr/\000-\377/' . $cp_037 . '/';
    >
    >      if ( $parm_data =~ /\Q$asc_string\E/g ) {
    >
    >          my $COPYBOOK_LEN = substr $parm_data, $-[ 0 ] + 4, 4;
    >          my $OUT_DATAFILE = "EBCDIC_$asc_string.txt";
    >          my $RECORD_LEN   = $COPYBOOK_LEN + $HEADER_DATA;
    >
    >          open my $OUTF, '>>', $OUT_DATAFILE or die "Could not open
    > '$OUT_DATAFILE' $!";
    >          print $OUTF substr $raw_data, $INIT_LEN, $RECORD_LEN;
    >          close $OUTF;
    >
    >          $INIT_LEN += $RECORD_LEN;
    >          print $COPYBOOK_LEN, $INIT_LEN, $var_len;
    >          }
    >      else {
    >          print "End of file reached or copy book is not a part of the
    > loading process\n";
    >          last;
    >          }
    >      }
    >
    > exit 0;
    >
    > __END__
    >
    > John
    > --
    > Perl isn't a toolbox, but a small machine shop where you
    > can special-order certain sorts of tools at low cost and
    > in short order.                            --Larry Wall


    Thanks for all your suggestions/comments. So what is your idea of
    using this method. is there an alternative at all??
    because $parm_data could potentially contain GB's worth of data.
    Suggestions to improve the method are welcome. BTW all your comments
    are good and have included in my code. TY

    Shadab
    shadabh, Sep 29, 2008
    #6
  7. shadabh

    Guest

    shadabh <> wrote:
    >
    > Thanks for all your suggestions/comments. So what is your idea of
    > using this method. is there an alternative at all??


    Yes, many alternatives. The easiest might be to use Sys::Mmap to make
    the file look like a variable without having it all in memory at the same
    time. That would require no changes to the code you showed us.

    Or you could translate all of your substr operations on the variable into
    the equivalent combination of "seek" and "read" operations on a file
    handle.

    Or you could read the file in suitable sized chunks. But I found your code
    so hard to follow (and you didn't show how variables get initialized) that
    I won't dig into the exact way to do that.

    > because $parm_data could potentially contain GB's worth of data.


    But that doesn't mean anything to us. We don't know what computer you
    are using. Do you have GB worth of RAM to go along with it? In other
    words, do you actually have a problem?

    > Suggestions to improve the method are welcome. BTW all your comments
    > are good and have included in my code. TY


    If you don't have enough RAM to use your current method and thus actually
    have a problem, please post your new code with the improvements in place
    and with the indents and such cleaned up so we can follow it, and showing
    how variables get initialized.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
    , Sep 29, 2008
    #7
  8. shadabh

    Tim Greer Guest

    shadabh wrote:

    > On Sep 17, 12:39 pm, "John W. Krahn" <> wrote:
    >> shadabh wrote:
    >> > On Sep 6, 6:08 pm, "John W. Krahn" <> wrote:
    >> >> shadabh wrote:
    >> >>> Hi all,
    >> >>> Just wanted to run this through Perl gurus to see if fit is
    >> >>> correct?. I have a file that could possibly be 1GB in variable
    >> >>> length EBCDIC data. I will read the file as EBCDIC data and based
    >> >>> on some criteria split it into 100 different files which will add
    >> >>> up to the 1GB. This way a particular copy book can be applied to
    >> >>> easy of the split files. The approach I am using is a filehandle
    >> >>> ( IO::FileHandle and $Strings), substr and write out to 100
    >> >>> different files after applying the 'logic'. I will use two
    >> >>> routine, one to read and one to write, I have tested this out
    >> >>> with 100MB file and it works fine. The question though is there a
    >> >>> memory limit to this, as we are using strings to break the files.
    >> >>> Or is there an alternative way to do this? Comments, suggestions,
    >> >>> improvements and alternatives will really help to design the
    >> >>> code. thanks
    >> >> You have to show us what "this" is first.  It is kind of hard to
    >> >> make comments, suggestions, improvements and alternatives to
    >> >> something that we cannot see.

    >>
    >> >> John
    >> >> --
    >> >> Perl isn't a toolbox, but a small machine shop where you
    >> >> can special-order certain sorts of tools at low cost and
    >> >> in short order.                            -- Larry Wall- Hide
    >> >> quoted text -

    >>
    >> >> - Show quoted text -

    >>
    >> > Sure. Here is what I meant by 'this way'. Please comment. Thanks

    >>
    >> > while ($raw_data) {

    >>
    >> You don't modify $raw_data inside the loop so there is no point in
    >> testing it every time through the loop.
    >>
    >> > $var_len=$INIT_LEN+$AC_NUM_LENGTH;

    >>
    >> my $var_len = $INIT_LEN + $AC_NUM_LENGTH;
    >>
    >> > $val = substr $raw_data,$var_len,4;
    >> > $asc_string = $val;

    >>
    >> my $asc_string = substr $raw_data, $var_len, 4;
    >>
    >> > eval '$asc_string =~ tr/\000-\377/' . $cp_037 . '/';
    >> > # open(OUTF, '>>ebcdic_ID.txt');
    >> > # #print OUTF $var_len, $asc_string, "\n";
    >> > # close OUTF;
    >> > open(PARM, '<TSY2_PARM.par') || die("Could not open the
    >> > parammeter file $PARM_FILE ! File read failed ..check iF file
    >> > exits");

    >>
    >> You are opening 'TSY2_PARM.par' but your error message says you are
    >> opening $PARM_FILE.  You should include the $! variable in the error
    >> message so you know *why* it failed to open.
    >>
    >> open my $PARM, '<', 'TSY2_PARM.par' or die "Could not open
    >> 'TSY2_PARM.par' $!";
    >>
    >> > $parm_data = <PARM>;

    >>
    >> You assign the same data to $parm_data every time so...
    >>
    >> > if (($parm_data =~ m!($asc_string)!g) eq 1) {

    >>
    >> your pattern will always match at the same place every time through
    >> the loop so if the pattern is present you have an infinite loop.  You
    >> are not using the contents of $1 so the parentheses are superfluous.
    >> The comparison test to the string '1' is superfluous.
    >>
    >> if ( $parm_data =~ /$asc_string/g ) {
    >>
    >> > $COPYBOOK_LEN = substr $parm_data,length($`)+4,4;

    >>
    >> The use of $` is discouraged as it slows down *all* regular
    >> expressions in your program.
    >>
    >> my $COPYBOOK_LEN = substr $parm_data, $-[ 0 ] + 4, 4;
    >>
    >> > print $COPYBOOK_LEN;
    >> > close (PARM);
    >> > $OUT_DATAFILE = 'EBCDIC_'.$asc_string.'.txt';
    >> > $RECORD_LEN= $COPYBOOK_LEN+$HEADER_DATA;
    >> > open(OUTF, ">>$OUT_DATAFILE")|| die("Could not open file. File
    >> > read failed ..check id file exits");

    >>
    >> open my $OUTF, '>>', $OUT_DATAFILE or die "Could not open
    >> '$OUT_DATAFILE' $!";
    >>
    >> > print OUTF substr $raw_data, $INIT_LEN, $RECORD_LEN;
    >> > close OUTF;
    >> > $INIT_LEN = $INIT_LEN + $RECORD_LEN;
    >> > print $INIT_LEN;
    >> > print $var_len;
    >> > }
    >> > else {
    >> > print 'End of file reached or copy book is not a part of the
    >> > loading process', "\n";
    >> > exit 0;
    >> > }
    >> > }

    >>
    >> So, to summarise:
    >>
    >> open my $PARM, '<', 'TSY2_PARM.par' or die "Could not open the
    >> parammeter file 'TSY2_PARM.par' $!";
    >> my $parm_data = <$PARM>;
    >> close $PARM;
    >>
    >> while ( 1 ) {
    >>
    >> my $var_len = $INIT_LEN + $AC_NUM_LENGTH;
    >> my $asc_string = substr $raw_data, $var_len, 4;
    >> eval '$asc_string =~ tr/\000-\377/' . $cp_037 . '/';
    >>
    >> if ( $parm_data =~ /\Q$asc_string\E/g ) {
    >>
    >> my $COPYBOOK_LEN = substr $parm_data, $-[ 0 ] + 4, 4;
    >> my $OUT_DATAFILE = "EBCDIC_$asc_string.txt";
    >> my $RECORD_LEN   = $COPYBOOK_LEN + $HEADER_DATA;
    >>
    >> open my $OUTF, '>>', $OUT_DATAFILE or die "Could not open
    >> '$OUT_DATAFILE' $!";
    >> print $OUTF substr $raw_data, $INIT_LEN, $RECORD_LEN;
    >> close $OUTF;
    >>
    >> $INIT_LEN += $RECORD_LEN;
    >> print $COPYBOOK_LEN, $INIT_LEN, $var_len;
    >> }
    >> else {
    >> print "End of file reached or copy book is not a part of the
    >> loading process\n";
    >> last;
    >> }
    >> }
    >>
    >> exit 0;
    >>
    >> __END__
    >>
    >> John
    >> --
    >> Perl isn't a toolbox, but a small machine shop where you
    >> can special-order certain sorts of tools at low cost and
    >> in short order.                            -- Larry Wall

    >
    > Thanks for all your suggestions/comments. So what is your idea of
    > using this method. is there an alternative at all??
    > because $parm_data could potentially contain GB's worth of data.
    > Suggestions to improve the method are welcome. BTW all your comments
    > are good and have included in my code. TY
    >
    > Shadab


    You said that "$parm_data could potentially contain GB's worth of data",
    in which case, the below:

    my $parm_data = <$PARM>;

    Is going to kill your resources and probably even crash/overload the
    system. You need to do a while on the <PARM> file handle and step
    through it per line (instead of reading it into a string or array
    first). You can cut out the middle man that way, too. Otherwise you
    are reading a potentially huge amount of data into a string (into
    memory) and then trying to parse it. That is always bad.
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
    Tim Greer, Sep 29, 2008
    #8
  9. Tim Greer wrote:
    >
    > You said that "$parm_data could potentially contain GB's worth of data",
    > in which case, the below:
    >
    > my $parm_data = <$PARM>;
    >
    > Is going to kill your resources and probably even crash/overload the
    > system. You need to do a while on the <PARM> file handle and step
    > through it per line (instead of reading it into a string or array
    > first). You can cut out the middle man that way, too. Otherwise you
    > are reading a potentially huge amount of data into a string (into
    > memory) and then trying to parse it. That is always bad.


    In that context the size of data read is defined by the value of $/
    which defaults to "\n" and the OP is working with "text" files so each
    record is probably not that large.


    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
    John W. Krahn, Sep 30, 2008
    #9
  10. shadabh

    Tim Greer Guest

    John W. Krahn wrote:

    > Tim Greer wrote:
    >>
    >> You said that "$parm_data could potentially contain GB's worth of
    >> data", in which case, the below:
    >>
    >> my $parm_data = <$PARM>;
    >>
    >> Is going to kill your resources and probably even crash/overload the
    >> system. You need to do a while on the <PARM> file handle and step
    >> through it per line (instead of reading it into a string or array
    >> first). You can cut out the middle man that way, too. Otherwise you
    >> are reading a potentially huge amount of data into a string (into
    >> memory) and then trying to parse it. That is always bad.

    >
    > In that context the size of data read is defined by the value of $/
    > which defaults to "\n" and the OP is working with "text" files so each
    > record is probably not that large.
    >
    >
    > John


    I've seen some pretty large files that use a new line as a separator.
    Maybe huge logs are being parsed? I've seen Apache log files that
    people want processed easily get up in the 1 - 2 gig range (of course,
    they have to rotate or clear them once they get 2+ gigs in size).
    Anyway, as I said, I am only seeing the tail end of this thread.
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
    Tim Greer, Sep 30, 2008
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Eduard W. Lohmann
    Replies:
    1
    Views:
    627
  2. Bill

    key as filehandle error

    Bill, Sep 1, 2004, in forum: Perl
    Replies:
    2
    Views:
    524
  3. Christopher Reeve
    Replies:
    1
    Views:
    454
    Kevin Goodsell
    Sep 14, 2003
  4. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    732
    Malcolm
    Jun 24, 2006
  5. Replies:
    6
    Views:
    121
    Uri Guttman
    May 2, 2007
Loading...

Share This Page