Reading chunks from file?

Discussion in 'Perl Misc' started by Bryan, Jun 10, 2004.

  1. Bryan

    Bryan Guest

    Hi, I'm reading in a file in fasta format:
    >header

    DATADATADATA
    DATADATA

    >header

    DATA

    I have been doing this:
    open (INFILE, "< $filename") or die "Cannot open $filename] for read\n\n";
    undef $/;
    my @chunks = split(/>/, <INFILE>);
    $/ = "\n";
    close INFILE;

    This works, but this split loses the '>' from the header part of the
    file, which I would rather keep for identifying header info later. So
    first, why do I lose the '>' on this particular split, is there
    something I can do to keep it? Second, is there a better way to split
    this file into chunks than I am doing?

    Thanks,
    Bryan
    Bryan, Jun 10, 2004
    #1
    1. Advertising

  2. Bryan

    Paul Lalli Guest

    On Thu, 10 Jun 2004, Bryan wrote:

    > Hi, I'm reading in a file in fasta format:
    > >header

    > DATADATADATA
    > DATADATA
    >
    > >header

    > DATA
    >
    > I have been doing this:
    > open (INFILE, "< $filename") or die "Cannot open $filename] for read\n\n";
    > undef $/;
    > my @chunks = split(/>/, <INFILE>);
    > $/ = "\n";
    > close INFILE;
    >
    > This works, but this split loses the '>' from the header part of the
    > file, which I would rather keep for identifying header info later. So
    > first, why do I lose the '>' on this particular split, is there
    > something I can do to keep it?


    Have you read the documentation for split? The answer to both questions
    is found within.

    perldoc -f split

    > Second, is there a better way to split
    > this file into chunks than I am doing?


    Do you need to store the whole file in memory at once? Might it be a
    better idea to read one record at a time? Rather than undefining the
    input record separator, maybe you want to set that variable to the actual
    string which separates your records, and then read a file in one record at
    a time.

    perldoc perlop
    for info on $/

    Hope this helps,
    Paul Lalli
    Paul Lalli, Jun 10, 2004
    #2
    1. Advertising

  3. Bryan

    Guest

    Bryan <> wrote:
    > Hi, I'm reading in a file in fasta format:
    > >header

    > DATADATADATA
    > DATADATA
    >
    > >header

    > DATA
    >
    > I have been doing this:
    > open (INFILE, "< $filename") or die "Cannot open $filename] for
    > read\n\n"; undef $/;
    > my @chunks = split(/>/, <INFILE>);
    > $/ = "\n";
    > close INFILE;
    >
    > This works, but this split loses the '>' from the header part of the
    > file, which I would rather keep for identifying header info later. So
    > first, why do I lose the '>' on this particular split, is there
    > something I can do to keep it?


    You lose the '>' because that is what split does.

    You could keep it by using a look-ahead assertion.

    split /(?=>)/ , <DATA>

    This will probably produce an empty string or a sting containing just
    whitespace as the first element.

    > Second, is there a better way to split
    > this file into chunks than I am doing?


    If the file is big, it would probably be better not to slurp it all
    at once. You could set $/ ='>', but then you would have an '>' at the
    end of every record (except the last), and not one at the beginning if
    every record. (You would also have a blank record as the first one read).
    This is kind of ugly, but what you gonna do?

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Jun 10, 2004
    #3
  4. writes:

    > If the file is big, it would probably be better not to slurp it all
    > at once. You could set $/ ='>', but then you would have an '>' at the
    > end of every record (except the last), and not one at the beginning if
    > every record. (You would also have a blank record as the first one read).
    > This is kind of ugly, but what you gonna do?


    Perpaps File::Stream would help?

    --
    \\ ( )
    . _\\__[oo
    .__/ \\ /\@
    . l___\\
    # ll l\\
    ###LL LL\\
    Brian McCauley, Jun 10, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sean
    Replies:
    5
    Views:
    439
    Default User
    Feb 1, 2007
  2. Martin Marcher

    reading file objects in chunks

    Martin Marcher, Nov 12, 2007, in forum: Python
    Replies:
    1
    Views:
    290
    Marc 'BlackJack' Rintsch
    Nov 12, 2007
  3. ColdStart
    Replies:
    0
    Views:
    559
    ColdStart
    Jul 12, 2010
  4. Replies:
    1
    Views:
    337
    Robert Klemme
    Jan 29, 2009
  5. bwv549
    Replies:
    3
    Views:
    404
    Eleanor McHugh
    Jun 17, 2009
Loading...

Share This Page