file reading by record separator (not line by line)

Lee Sander · May 31, 2007

Dear all,
I would like to read a really huge file that looks like this:

name1.... line_11
line_12
line_13
....
name2 ...

line_21
line_22
....
etc

where line_ij is just a free form text on that line.

how can i read file so that every time i do a "read()" i get exactly
one record
up to the next ">"

many thanks
Lee

Lee Sander · May 31, 2007

I wanted to also say that this file is really huge, so I cannot
just do a read() and then split on ">" to get a record
thanks
lee

aspineux · May 31, 2007

something like

name=None
lines=[]
for line in open('yourfilename.txt'):
if line.startwith('>'):
if name!=None:
print 'Here is the record', name
print lines
print
name=line.stripr('\r')
lines=[]
else:
lines.append(line.stripr('\n'))

Tijs · May 31, 2007

Lee said:
I wanted to also say that this file is really huge, so I cannot
just do a read() and then split on ">" to get a record
thanks
lee

Below is the easy solution. To get even better performance, or if '<' is not
always at the start of the line, you would have to implement the buffering
that is done by readline() yourself (see _fileobject in socket.py in the
standard lib for example).

def chunkreader(f):
name = None
lines = []
while True:
line = f.readline()
if not line: break
if line[0] == '>':
if name is not None:
yield name, lines
name = line[1:].rstrip()
lines = []
else:
lines.append(line)
if name is not None:
yield name, lines

if __name__ == '__main__':
from StringIO import StringIO
s = \
"""> name1
line1
line2
line3

name2

line 4
line 5
line 6"""
f = StringIO(s)
for name, lines in chunkreader(f):
print '***', name
print ''.join(lines)

$ python test.py
*** name1
line1
line2
line3

*** name2
line 4
line 5
line 6

Tijs · May 31, 2007

aspineux said:
something like

name=None
lines=[]
for line in open('yourfilename.txt'):
if line.startwith('>'):
if name!=None:
print 'Here is the record', name
print lines
print
name=line.stripr('\r')
lines=[]
else:
lines.append(line.stripr('\n'))

That would miss the last chunk.

Marc 'BlackJack' Rintsch · May 31, 2007

Lee Sander said:
Dear all,
I would like to read a really huge file that looks like this:

line_21
line_22
...
etc

where line_ij is just a free form text on that line.

how can i read file so that every time i do a "read()" i get exactly
one record
up to the next ">"

There was just recently a thread with a `itertools.groupby()` solution.
Something like this:

from itertools import count, groupby, imap
from operator import itemgetter

def mark_records(lines):
counter = 0
for line in lines:
if line.startswith('>'):
counter += 1
yield (counter, line)

def iter_records(lines):
fst = itemgetter(0)
snd = itemgetter(1)
for dummy, record_lines in groupby(mark_records(lines), fst):
yield imap(snd, record_lines)

def main():
source = """\

name1.... line_11
line_12
line_13
....
name2 ...

line_21
line_22
....""".splitlines()

for record in iter_records(source):
print 'Start of record...'
for line in record:
print ':', line

Ciao,
Marc 'BlackJack' Rintsch

Hendrik van Rooyen · Jun 1, 2007

I wanted to also say that this file is really huge, so I cannot
just do a read() and then split on ">" to get a record
thanks
lee

I would do something like: (not tested):

def get_a_record(f,sep):
ret_rec = ''
while True:
char = f.read(1)
if char == sep:
break
else:
ret_rec += char
return ret_rec

- Hendrik

Command Line Arguments	0	Mar 7, 2023
Sort by number of characters	1	Nov 2, 2023
The sum of numbers in a line from a file	12	Feb 20, 2014
How can I train a neural network by reading different csv files	0	Nov 24, 2022
how to compare two json file line by line using python?	6	May 27, 2013
csv read _csv.Error: line contains NULL byte	5	Mar 21, 2014
Reading by positions plain text files	13	Nov 30, 2010
How to use PDF-lib and how to center each line of texts on the page?	1	Aug 16, 2023

file reading by record separator (not line by line)

Lee Sander

Lee Sander

aspineux

Tijs

Tijs

Marc 'BlackJack' Rintsch

Hendrik van Rooyen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads