Question about file objects...

J

J

Something that came up in class...

when you are pulling data from a file using f.next(), the file is read
one line at a time.

What was explained to us is that Python iterates the file based on a
carriage return as the delimiter.
But what if you have a file that has one line of text, but that one
line has 16,000 items that are comma delimited?

Is there a way to read the file, one item at a time, delimited by
commas WITHOUT having to read all 16,000 items from that one line,
then split them out into a list or dictionary??

Cheers
Jeff
 
N

nn

Something that came up in class...

when you are pulling data from a file using f.next(), the file is read
one line at a time.

What was explained to us is that Python iterates the file based on a
carriage return as the delimiter.
But what if you have a file that has one line of text, but that one
line has 16,000 items that are comma delimited?

Is there a way to read the file, one item at a time, delimited by
commas WITHOUT having to read all 16,000 items from that one line,
then split them out into a list or dictionary??

Cheers
Jeff

File iteration is a convenience since it is the most common case. If
everything is on one line, you will have to handle record separators
manually by using the .read(<number_of_bytes>) method on the file
object and searching for the comma. If everything fits in memory the
straightforward way would be to read the whole file with .read() and
use .split(",") on the returned string. That should give you a nice
list of everything.
 
J

J

File iteration is a convenience since it is the most common case. If
everything is on one line, you will have to handle record separators
manually by using the .read(<number_of_bytes>) method on the file
object and searching for the comma. If everything fits in memory the
straightforward way would be to read the whole file with .read() and
use .split(",") on the returned string. That should give you a nice
list of everything.

Agreed. The confusion came because the guy teaching said that
iterating the file is delimited by a carriage return character...
which to me sounds like it's an arbitrary thing that can be changed...

I was already thinking that I'd have to read it in small chunks and
search for the delimiter i want... and reading the whole file into a
string and then splitting that would would be nice, until the file is
so large that it starts taking up significant amounts of memory.

Anyway, thanks both of you for the explanations... I appreciate the help!

Cheers
Jeff
 
T

Terry Reedy

J said:
Agreed. The confusion came because the guy teaching said that
iterating the file is delimited by a carriage return character...

If he said exactly that, he is not exactly correct. File iteration looks
for line ending character(s), which depends on the system or universal
newline setting.
which to me sounds like it's an arbitrary thing that can be changed...

I was already thinking that I'd have to read it in small chunks and
search for the delimiter i want... and reading the whole file into a
string and then splitting that would would be nice, until the file is
so large that it starts taking up significant amounts of memory.

Anyway, thanks both of you for the explanations... I appreciate the help!

I would not be surprised if a generic file chunk generator were posted
somewhere. It would be a good entry for the Python Cookbook, if not
there already.

tjr
 
R

r0g

J said:
Something that came up in class...

when you are pulling data from a file using f.next(), the file is read
one line at a time.

What was explained to us is that Python iterates the file based on a
carriage return as the delimiter.
But what if you have a file that has one line of text, but that one
line has 16,000 items that are comma delimited?

Is there a way to read the file, one item at a time, delimited by
commas WITHOUT having to read all 16,000 items from that one line,
then split them out into a list or dictionary??

Cheers
Jeff


Generators are good way of dealing with that sort of thing...

http://dalkescientific.com/writings/NBN/generators.html

Have the generator read in large chunks from file in binary mode then
use string searching/splitting to dole out records one at a time,
topping up the cache when needed.

Roger.
 
N

nn

If he said exactly that, he is not exactly correct. File iteration looks
for line ending character(s), which depends on the system or universal
newline setting.




I would not be surprised if a generic file chunk generator were posted
somewhere. It would be a good entry for the Python Cookbook, if not
there already.

tjr

There should be but writing one isn't too difficult:

def chunker(file_obj):
parts=['']
while True:
fdata=file_obj.read(8192)
if not fdata: break
parts=(parts[-1]+fdata).split(',')
for col in parts[:-1]:
yield col
yield parts[-1]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top