Parsing long files by using read

Discussion in 'Perl Misc' started by gavs, Jan 16, 2004.

  1. gavs

    gavs Guest

    Hi,

    I am fairly new to perl and need to split a fairly large file that
    contains no newlines. The records contained in this file is fixed
    length. I have written the following code to split this long record
    into 600 byte long records and appending a newline. After executing
    this program, the file size doubles.

    For example: a record in this file can be split up into 3 records of
    600 byte length; hence the original length of this file is 1800 bytes.

    size = size of the original file.

    while($bytes_read < $size) {
    my $record;
    $bytes_read += read(FIN, $record, $record_len, $offset);
    print "Bytes read # $bytes_read, OFFSET=$offset\n";

    $record .= "\n";

    print FOUT $record;
    $offset += $record_len;
    }

    fclose(FIN);
    fclose(FOUT);

    Viewing the out file with vi generates the following:
    "a" 3 lines, 3603 characters (1800 null characters)

    Where are extra 1800 bytes coming from? How do I get rid of them?

    Thanks.
    gavs
     
    gavs, Jan 16, 2004
    #1
    1. Advertising

  2. gavs

    Uri Guttman Guest

    perldoc perlvar. look for $/ and assigning it a ref to an integer.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
     
    Uri Guttman, Jan 16, 2004
    #2
    1. Advertising

  3. gavs

    Ben Morrow Guest

    (gavs) wrote:
    > I am fairly new to perl and need to split a fairly large file that
    > contains no newlines. The records contained in this file is fixed
    > length. I have written the following code to split this long record
    > into 600 byte long records and appending a newline. After executing
    > this program, the file size doubles.
    >
    > For example: a record in this file can be split up into 3 records of
    > 600 byte length; hence the original length of this file is 1800 bytes.
    >
    > size = size of the original file.
    >
    > while($bytes_read < $size) {
    > my $record;
    > $bytes_read += read(FIN, $record, $record_len, $offset);
    > print "Bytes read # $bytes_read, OFFSET=$offset\n";
    >
    > $record .= "\n";
    >
    > print FOUT $record;
    > $offset += $record_len;
    > }
    >
    > fclose(FIN);
    > fclose(FOUT);


    Perl has no fclose function. Please show us your real code.

    >
    > Viewing the out file with vi generates the following:
    > "a" 3 lines, 3603 characters (1800 null characters)
    >
    > Where are extra 1800 bytes coming from? How do I get rid of them?


    The 'offset' parameter to read() is an offset into the string, not
    into the file. The bytes are read from the file starting wherever the
    last read left off. However, the whole thing looks more like C than
    Perl.

    Here's how I'd do it (untested):

    {
    local $/ = \600; # 600-byte input records
    local $\ = "\n"; # see perldoc perlvar

    open my $IN, ... or die "can't open input: $!";
    open my $OUT, ... or die "can't open output: $!";

    print $OUT $_ while <$IN>;
    }
    # no need for close() as the filehandles are closed when they go out
    # of scope.

    or indeed

    perl -lpe'BEGIN { $/ = \600 }' < in > out

    Ben

    --
    Joy and Woe are woven fine,
    A Clothing for the Soul divine William Blake
    Under every grief and pine 'Auguries of Innocence'
    Runs a joy with silken twine.
     
    Ben Morrow, Jan 16, 2004
    #3
  4. In article <bu9gku$1jb$>,
    Ben Morrow <> wrote:
    : local $/ = \600; # 600-byte input records

    How does that work, Ben? When I look at the documentation for $/
    there does not appear to be an option for setting a record size.
    And a reference to a scalar looks odd there...
    --
    Rump-Titty-Titty-Tum-TAH-Tee -- Fritz Lieber
     
    Walter Roberson, Jan 16, 2004
    #4
  5. gavs

    Uri Guttman Guest

    >>>>> "WR" == Walter Roberson <-cnrc.gc.ca> writes:

    WR> In article <bu9gku$1jb$>,
    WR> Ben Morrow <> wrote:
    WR> : local $/ = \600; # 600-byte input records

    WR> How does that work, Ben? When I look at the documentation for $/
    WR> there does not appear to be an option for setting a record size.
    WR> And a reference to a scalar looks odd there...

    what docs are you looking at? perldoc perlvar says this:

    Setting "$/" to a reference to an integer, scalar
    containing an integer, or scalar that's convertible
    to an integer will attempt to read records instead
    of lines, with the maximum record size being the
    referenced integer. So this:

    $/ = \32768; # or \"32768", or \$var_containing_32768
    open(FILE, $myfile);
    $_ = <FILE>;

    will read a record of no more than 32768 bytes from
    FILE. If you're not reading from a record-oriented
    file (or your OS doesn't have record-oriented
    files), then you'll likely get a full chunk of data
    with every read. If a record is larger than the
    record size you've set, you'll get the record back
    in pieces.


    seems to be clearly documented to me.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
     
    Uri Guttman, Jan 16, 2004
    #5
  6. gavs

    gnari Guest

    "Walter Roberson" <-cnrc.gc.ca> wrote in message
    news:bu9ht4$mh4$...
    > In article <bu9gku$1jb$>,
    > Ben Morrow <> wrote:
    > : local $/ = \600; # 600-byte input records
    >
    > How does that work, Ben? When I look at the documentation for $/
    > there does not appear to be an option for setting a record size.


    see http://perldoc.com/perl5.8.0/pod/perlvar.html
    look for $/, where it says:

    Setting $/ to a reference to an integer, scalar containing an integer,
    or scalar that's convertible to an integer will attempt to read records
    instead of lines, with the maximum record size being the referenced
    integer.

    gnari
     
    gnari, Jan 16, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    659
  2. George Marsaglia

    Assigning unsigned long to unsigned long long

    George Marsaglia, Jul 8, 2003, in forum: C Programming
    Replies:
    1
    Views:
    749
    Eric Sosman
    Jul 8, 2003
  3. Daniel Rudy

    unsigned long long int to long double

    Daniel Rudy, Sep 19, 2005, in forum: C Programming
    Replies:
    5
    Views:
    1,247
    Peter Shaggy Haywood
    Sep 20, 2005
  4. Mathieu Dutour

    long long and long

    Mathieu Dutour, Jul 17, 2007, in forum: C Programming
    Replies:
    4
    Views:
    517
    santosh
    Jul 24, 2007
  5. Bart C

    Use of Long and Long Long

    Bart C, Jan 9, 2008, in forum: C Programming
    Replies:
    27
    Views:
    853
    Peter Nilsson
    Jan 15, 2008
Loading...

Share This Page