Question about file objects...

J · Dec 2, 2009

Something that came up in class...

when you are pulling data from a file using f.next(), the file is read
one line at a time.

What was explained to us is that Python iterates the file based on a
carriage return as the delimiter.
But what if you have a file that has one line of text, but that one
line has 16,000 items that are comma delimited?

Is there a way to read the file, one item at a time, delimited by
commas WITHOUT having to read all 16,000 items from that one line,
then split them out into a list or dictionary??

Cheers
Jeff

nn · Dec 2, 2009

Something that came up in class...

when you are pulling data from a file using f.next(), the file is read
one line at a time.

What was explained to us is that Python iterates the file based on a
carriage return as the delimiter.
But what if you have a file that has one line of text, but that one
line has 16,000 items that are comma delimited?

Is there a way to read the file, one item at a time, delimited by
commas WITHOUT having to read all 16,000 items from that one line,
then split them out into a list or dictionary??

Cheers
Jeff

File iteration is a convenience since it is the most common case. If
everything is on one line, you will have to handle record separators
manually by using the .read(<number_of_bytes>) method on the file
object and searching for the comma. If everything fits in memory the
straightforward way would be to read the whole file with .read() and
use .split(",") on the returned string. That should give you a nice
list of everything.

J · Dec 2, 2009

File iteration is a convenience since it is the most common case. If
everything is on one line, you will have to handle record separators
manually by using the .read(<number_of_bytes>) method on the file
object and searching for the comma. If everything fits in memory the
straightforward way would be to read the whole file with .read() and
use .split(",") on the returned string. That should give you a nice
list of everything.

Agreed. The confusion came because the guy teaching said that
iterating the file is delimited by a carriage return character...
which to me sounds like it's an arbitrary thing that can be changed...

I was already thinking that I'd have to read it in small chunks and
search for the delimiter i want... and reading the whole file into a
string and then splitting that would would be nice, until the file is
so large that it starts taking up significant amounts of memory.

Anyway, thanks both of you for the explanations... I appreciate the help!

Cheers
Jeff

Terry Reedy · Dec 2, 2009

J said:
Agreed. The confusion came because the guy teaching said that
iterating the file is delimited by a carriage return character...

If he said exactly that, he is not exactly correct. File iteration looks
for line ending character(s), which depends on the system or universal
newline setting.

which to me sounds like it's an arbitrary thing that can be changed...

I was already thinking that I'd have to read it in small chunks and
search for the delimiter i want... and reading the whole file into a
string and then splitting that would would be nice, until the file is
so large that it starts taking up significant amounts of memory.

Anyway, thanks both of you for the explanations... I appreciate the help!

I would not be surprised if a generic file chunk generator were posted
somewhere. It would be a good entry for the Python Cookbook, if not
there already.

tjr

r0g · Dec 3, 2009

J said:
Something that came up in class...

when you are pulling data from a file using f.next(), the file is read
one line at a time.

What was explained to us is that Python iterates the file based on a
carriage return as the delimiter.
But what if you have a file that has one line of text, but that one
line has 16,000 items that are comma delimited?

Is there a way to read the file, one item at a time, delimited by
commas WITHOUT having to read all 16,000 items from that one line,
then split them out into a list or dictionary??

Cheers
Jeff

Generators are good way of dealing with that sort of thing...

http://dalkescientific.com/writings/NBN/generators.html

Have the generator read in large chunks from file in binary mode then
use string searching/splitting to dole out records one at a time,
topping up the cache when needed.

Roger.

nn · Dec 3, 2009

If he said exactly that, he is not exactly correct. File iteration looks
for line ending character(s), which depends on the system or universal
newline setting.

I would not be surprised if a generic file chunk generator were posted
somewhere. It would be a good entry for the Python Cookbook, if not
there already.

tjr

There should be but writing one isn't too difficult:

def chunker(file_obj):
parts=['']
while True:
fdata=file_obj.read(8192)
if not fdata: break
parts=(parts[-1]+fdata).split(',')
for col in parts[:-1]:
yield col
yield parts[-1]

Question about multiple metadata files to one file	0	Feb 14, 2022
Just started coding and im stuck on a lesson?	1	Oct 30, 2022
How to sort a CSV file with merge sort JAVA	7	May 6, 2021
Question about using dictionaries and QTableWidget	0	Jun 19, 2013
question about try/except blocks	1	May 3, 2013
Question about sub-packages	0	Feb 28, 2012
question about posting data using MultipartPostHandler	1	Aug 15, 2013
When the first line of a file tells something about the other lines	1	Aug 16, 2010

Question about file objects...

J

nn

J

Terry Reedy

r0g

nn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads