Reading chunks from file?

Bryan · Jun 10, 2004

Hi, I'm reading in a file in fasta format:

header DATADATADATA
DATADATA

header

DATA

I have been doing this:
open (INFILE, "< $filename") or die "Cannot open $filename] for read\n\n";
undef $/;
my @chunks = split(/>/, <INFILE>);
$/ = "\n";
close INFILE;

This works, but this split loses the '>' from the header part of the
file, which I would rather keep for identifying header info later. So
first, why do I lose the '>' on this particular split, is there
something I can do to keep it? Second, is there a better way to split
this file into chunks than I am doing?

Thanks,
Bryan

Paul Lalli · Jun 10, 2004

Hi, I'm reading in a file in fasta format:

header DATADATADATA
DATADATA

header

Click to expand...

DATA

I have been doing this:
open (INFILE, "< $filename") or die "Cannot open $filename] for read\n\n";
undef $/;
my @chunks = split(/>/, <INFILE>);
$/ = "\n";
close INFILE;

This works, but this split loses the '>' from the header part of the
file, which I would rather keep for identifying header info later. So
first, why do I lose the '>' on this particular split, is there
something I can do to keep it?

Have you read the documentation for split? The answer to both questions
is found within.

perldoc -f split

Second, is there a better way to split
this file into chunks than I am doing?

Do you need to store the whole file in memory at once? Might it be a
better idea to read one record at a time? Rather than undefining the
input record separator, maybe you want to set that variable to the actual
string which separates your records, and then read a file in one record at
a time.

perldoc perlop
for info on $/

Hope this helps,
Paul Lalli

ctcgag · Jun 10, 2004

Bryan said:
Hi, I'm reading in a file in fasta format:

header DATADATADATA
DATADATA

header

Click to expand...

DATA

I have been doing this:
open (INFILE, "< $filename") or die "Cannot open $filename] for
read\n\n"; undef $/;
my @chunks = split(/>/, <INFILE>);
$/ = "\n";
close INFILE;

This works, but this split loses the '>' from the header part of the
file, which I would rather keep for identifying header info later. So
first, why do I lose the '>' on this particular split, is there
something I can do to keep it?

You lose the '>' because that is what split does.

You could keep it by using a look-ahead assertion.

split /(?=>)/ , <DATA>

This will probably produce an empty string or a sting containing just
whitespace as the first element.

Second, is there a better way to split
this file into chunks than I am doing?

If the file is big, it would probably be better not to slurp it all
at once. You could set $/ ='>', but then you would have an '>' at the
end of every record (except the last), and not one at the beginning if
every record. (You would also have a blank record as the first one read).
This is kind of ugly, but what you gonna do?

Xho

Brian McCauley · Jun 10, 2004

If the file is big, it would probably be better not to slurp it all
at once. You could set $/ ='>', but then you would have an '>' at the
end of every record (except the last), and not one at the beginning if
every record. (You would also have a blank record as the first one read).
This is kind of ugly, but what you gonna do?

Perpaps File::Stream would help?

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\

binary file creation clarification	3	Jun 3, 2014
Reading a file in chunks, to a byte array	1	Jan 29, 2009
Problem reading binary chunks from file	0	Jul 12, 2010
nice parallel file reading	14	Apr 26, 2013
Opening a file twice and having an if loop	5	Jun 9, 2007
reading .ini file without using a module	2	Mar 16, 2011
Error in Handling Unicode(UTF16-LE) File & String	4	May 6, 2008
Converting SVG to PNG not working.	3	Oct 18, 2009

Reading chunks from file?

Bryan

Paul Lalli

ctcgag

Brian McCauley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads